World's most popular travel blog for travel bloggers.

[Solved]: Efficiently count frequency of n-grams at start of words

, , No Comments
Problem Detail: 

I have a text file with all possible 5-grams (26^5 = 11.881.376) organized in rows as:

aaaaa
aaaab
aaaac
aaaad
.....

and I have a txt file (organized in rows) with all English words. I have to find how much time every 5-gram is at the beginning of a word. I'm using this code:

def frequency(five-grams, dict):     result = {}      for gram in five-grams:         freq = 0         for word in dict:             if word.startswith(gram):                 freq = freq + 1          result[gram] = freq      return result 

However this method takes too long (after hours the algorithm is still running). Is there a more efficient and faster method?

Asked By : Zirko88

Answered By : Tom van der Zanden

There is a 11.881.376 times faster method. Instead of for every 5-gram looping over the entire dictionary, loop over the dictionary once, and for every word determine with what 5-gram it starts and then increment the counter for that particular 5-gram.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/37277

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback