In an example of the application of MapReduce provided by University of Utah, it says that Map() function emits <"hello", 1> every time it sees hello where the reduce function counts the number of instances "hello" occurs
My question is if this is the case, why isn't reduce doing <"hello", {1,1,1,1,1,1,1,1,1,1,1,1...}>, where each 1 is an instance Map() emits a key,value pair? In the example it wrote something like <"hello", (3,5,2,7)>, what does it mean?
Also, why do you need MapReduce to do this? I can just use an linked list on my computer.
Thanks
Asked By : Beached Whale
Answered By : Sean Easter
[...] it says that Map() function emits <"hello", 1> every time it sees hello where the reduce function counts the number of instances "hello" occurs
Not quite: It appears the mapper reads each file, counts the number of times a word appears, and outputs a single (word, count)
pair per file, rather than per occurrence of the word. The reduce step then sums these. ("hello", 1)
indicates that "hello" appeared once in a given file, ("hello", 3)
indicates three appearances in a file, etc.
In the example for the reduce step, it appears four files were mapped, and that "hello" appeared 3 times in the first, 5 in the second, etc.
Also, why do you need MapReduce to do this?
Via wiki MapReduce is "for processing parallelizable problems across huge datasets using a large number of computers[.]" Meaning, if your task is to count the number of times "hello" appears in four small documents, you likely don't need MapReduce. But if your task is to count the appearances of all words that appear in a large set of documents, then the only way to accomplish this is a practically useful time may require distributing across multiple processors.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/33483
0 comments:
Post a Comment
Let us know your responses and feedback