[Solved]: Efficiently count frequency of n-grams at start of words

posted by , on 8:09:00 AM , No Comments

Problem Detail:

I have a text file with all possible 5-grams (26^5 = 11.881.376) organized in rows as:

aaaaa
aaaab
aaaac
aaaad
.....

and I have a txt file (organized in rows) with all English words. I have to find how much time every 5-gram is at the beginning of a word. I'm using this code:

def frequency(five-grams, dict):     result = {}      for gram in five-grams:         freq = 0         for word in dict:             if word.startswith(gram):                 freq = freq + 1          result[gram] = freq      return result

However this method takes too long (after hours the algorithm is still running). Is there a more efficient and faster method?

Asked By : Zirko88

Answered By : Tom van der Zanden

There is a 11.881.376 times faster method. Instead of for every 5-gram looping over the entire dictionary, loop over the dictionary once, and for every word determine with what 5-gram it starts and then increment the counter for that particular 5-gram.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/37277

3.2K people like this

Download Related Notes/Documents

MCA IGNOU GROUP

[Solved]: Efficiently count frequency of n-grams at start of words

Asked By : Zirko88

Answered By : Tom van der Zanden

Best Answer from StackOverflow

Akshay Mishra

0 comments:

Post a Comment

Search By Course Code

Sponsored Links

Blog Archive

Popular Posts

Label Cloud

adsbygoogle

Ad Blocker

[Solved]: Efficiently count frequency of n-grams at start of words

Asked By : Zirko88

Answered By : Tom van der Zanden

Best Answer from StackOverflow

Akshay Mishra

0 comments:

Post a Comment

Search By Course Code

Sponsored Links

Blog Archive

Popular Posts

Label Cloud

adsbygoogle

Ad Blocker

Click to Download Now