![]()
ACM TechNews
Google Fellows Reveal Parallel Processing Model
InfoWorld (01/09/08) Snyder, JasonGoogle Fellows Jeff Dean and Sanjay Ghemawat published a paper in the January issue of Communications of the ACM that details the programming model Google leverages to process more than 20 petabytes of data every day on commodity-based clusters. The method, known as MapReduce, lets users break computations into a map and a reduce function, which the runtime system automatically parallelizes across large clusters while navigating machine failures and honing the efficiency of network and disk use in the process. The methodology abstracts parallelization, fault tolerance, data distribution, and load balancing into a library. Over 10 thousand programs have been implemented at Google using MapReduce, which can also parallelize computations for multicore processing on a single machine. MapReduce has been used for large-scale graph processing, text processing, data mining, machine learner, statistical machine translation, and other algorithms. Computations are submitted to a scheduler that maps tasks to available machines. Dean and Ghemawat write that the most significant use of MapReduce has been rewriting the indexing system used in Google search. The paper, "MapReduce: Simplified Data Processing on Large Clusters," is available at
http://portal.acm.org/citation.cfm?id=1327492
© Copyright 2008 Information, Inc. This service may be reproduced for internal distribution.