- The Python Standard Library, especially str.methods and string module are powerful for text processing. Start there.
- regex - Extends Python's Standard Library
re
module while being backwards-compatible. - chardet - Finds character encoding.
- ftfy - Take in bad Unicode and output good Unicode. Seriously automagical.
- ploygot - Helpful for multilingual preprocessing.
- fuzzywuzzy - Fuzzy string matching like a boss.
- enchant - Spell checking.
- inflect - Convert numbers to words, switch between singular/plural, and generate ordinals.
- nltk - Hard pass. Too academic, too slow.
- scikit-learn - Handles basic text processing and modeling. Easy to combine text-based features with other features.
- TextBlob - A great package for common NLP tasks. Consistent OOP-style API.
- spaCy - Industrial strength NLP including Fast syntactic parsing
- textacy - Higher level NLP built on top of spaCy
- gensim - a nice API for all kinds of topic modeling and word2vec.
- pattern - Text mining at its finest. Handles normalizing numbers, comparatives, and superlatives.