World's most popular travel blog for travel bloggers.

[Solved]: Hadoop MapReduce Word Counting Example

, , No Comments
Problem Detail: 

enter image description here

In an example of the application of MapReduce provided by University of Utah, it says that Map() function emits <"hello", 1> every time it sees hello where the reduce function counts the number of instances "hello" occurs

My question is if this is the case, why isn't reduce doing <"hello", {1,1,1,1,1,1,1,1,1,1,1,1...}>, where each 1 is an instance Map() emits a key,value pair? In the example it wrote something like <"hello", (3,5,2,7)>, what does it mean?

Also, why do you need MapReduce to do this? I can just use an linked list on my computer.

Thanks

Asked By : Beached Whale

Answered By : Sean Easter

[...] it says that Map() function emits <"hello", 1> every time it sees hello where the reduce function counts the number of instances "hello" occurs

Not quite: It appears the mapper reads each file, counts the number of times a word appears, and outputs a single (word, count) pair per file, rather than per occurrence of the word. The reduce step then sums these. ("hello", 1) indicates that "hello" appeared once in a given file, ("hello", 3) indicates three appearances in a file, etc.

In the example for the reduce step, it appears four files were mapped, and that "hello" appeared 3 times in the first, 5 in the second, etc.

Also, why do you need MapReduce to do this?

Via wiki MapReduce is "for processing parallelizable problems across huge datasets using a large number of computers[.]" Meaning, if your task is to count the number of times "hello" appears in four small documents, you likely don't need MapReduce. But if your task is to count the appearances of all words that appear in a large set of documents, then the only way to accomplish this is a practically useful time may require distributing across multiple processors.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/33483

0 comments:

Post a Comment

Let us know your responses and feedback