Easily change the dates and times on your photos and videos
Click here for more info about the app
English dictionary with notifications so you won't forget what you're studying!
Click here for more info about the app
The app that quizzes and scores you on your vocabulary!
Click here for more info about the app

Trigrams, Bigrams and Ngrams in Python for Text Analysis


Creating trigrams in Python is very simple

trigrams = lambda a: zip(a, a[1:], a[2:])
trigrams(('a', 'b', 'c', 'd', 'e', 'f'))
# => [('a', 'b', 'c'), ('b', 'c', 'd'), ('c', 'd', 'e'), ('d', 'e', 'f')]

You can generalize this a little bit more

ngrams = lambda a, n: zip(*[a[i:] for i in range(n)])
bigrams = ngrams(('a', 'b','c', 'd','e', 'f'), 2)
# [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f')]

When analyzing text it's useful to see frequency of terms that are used together.

txt = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis lorem ipsum aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint lorem ipsum occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'.lower().split()
print ngrams(txt, 2)

You can use a Counter from the collections module to see most common features

from collections import Counter
Counter(ngrams(txt, 2)).most_common(5)
[(('lorem', 'ipsum'), 3),
 (('consequat.', 'duis'), 1),
 (('in', 'voluptate'), 1),
 (('consectetur', 'adipisicing'), 1),
 (('ipsum', 'dolor'), 1)]
Tagged w/ #python #text analysis #ngrams #trigrams #bigrams #functional programming