I am playing around with MapReduce and wanted to ask why the map
function takes two inputs (key, value)
? Why not just pass along the value?
Specifically, if you look at the word count example on Wikipedia page you will see that the map function is:
function map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)
However, the function never does anything with the parameter "name". Why even pass it in?
Asked By : user1357015
Answered By : Raphael
Let me quote the documentation (emphasis mine):
Map You write a map function that runs once for each input value. It returns a collection of name-value pairs which are passed on to the next stage. If there are many, many input values, then this function runs many, many times. The framework divides up the input so that subsets are handled in parallel on multiple instances of your application. A typical map function could count things that occur in each input value that matches some filter.
So there are two immediate uses I can imagine:
- Use the "chunk" name in the returned names. For example, you might want to distinguish
bible_author
fromfiftyshadesofgrey_author
in the result. - Identify "subprocesses". If anything goes wrong (and something will go wrong) an error message like "Error processing
bible
: Illegal character at position 666" is obviously only useful if you include the document name.
And, of course, you may also want to use the name in the actual computation. A general API should cover that case.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/14295
0 comments:
Post a Comment
Let us know your responses and feedback