World's most popular travel blog for travel bloggers.

[Solved]: Why does the map function in MapReduce take two parameters?

, , No Comments
Problem Detail: 

I am playing around with MapReduce and wanted to ask why the map function takes two inputs (key, value)? Why not just pass along the value?

Specifically, if you look at the word count example on Wikipedia page you will see that the map function is:

function map(String name, String document): // name: document name // document: document contents   for each word w in document:     emit (w, 1) 

However, the function never does anything with the parameter "name". Why even pass it in?

Asked By : user1357015

Answered By : Raphael

Let me quote the documentation (emphasis mine):

Map You write a map function that runs once for each input value. It returns a collection of name-value pairs which are passed on to the next stage. If there are many, many input values, then this function runs many, many times. The framework divides up the input so that subsets are handled in parallel on multiple instances of your application. A typical map function could count things that occur in each input value that matches some filter.

So there are two immediate uses I can imagine:

  • Use the "chunk" name in the returned names. For example, you might want to distinguish bible_author from fiftyshadesofgrey_author in the result.
  • Identify "subprocesses". If anything goes wrong (and something will go wrong) an error message like "Error processing bible: Illegal character at position 666" is obviously only useful if you include the document name.

And, of course, you may also want to use the name in the actual computation. A general API should cover that case.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/14295

0 comments:

Post a Comment

Let us know your responses and feedback