A Hacker's Guide to Python string and Natural Language Processing (NLP) packages

Extraction

Python's Standard Library, especially str.methods and string module are powerful for text processing. Start there.
regex - Extends Python's Standard Library re module while being backwards-compatible.
chardet - Finds character encoding.
ftfy - Takes in bad Unicode and outputs good Unicode. Seriously automagical.
ploygot - Helpful for multilingual preprocessing.
fuzzywuzzy - Fuzzy string matching like a boss.
enchant - Spell checking.
inflect - Convert numbers to words, switch between singular/plural, and generate ordinals.

nltk - Hard pass. Too academic, too slow.
scikit-learn - Handles basic text processing and modeling. Easy to combine text-based features with other features.
TextBlob - A great package for common NLP tasks. Consistent OOP-style API.
spaCy - Industrial strength NLP including, very good transformers and named entity recognition (NER) abilities.
- textacy - Higher level NLP built on top of spaCy.
Hugging Face - Collections of datasets and pretrained models.
gensim - A nice API for all kinds of topic modeling and word2vec.
pattern - Text mining at its finest. Handles normalizing numbers, comparatives, and superlatives.
jellyfish - Approximate & phonetic string matching.