Skip to content

Instantly share code, notes, and snippets.

View KUDANDOU's full-sized avatar

Funani Ndou KUDANDOU

  • Harare, Zimbabwe
View GitHub Profile
@Witty-Kitty
Witty-Kitty / alp_data_prep.py
Created February 18, 2019 11:54
Text pre-processing
import nltk
from nltk.tokenize import word_tokenize
from nltk.text import Text
# read in text data
file = open("crawl-for-parallel-corpora/DataSet/luganda.txt", "r")
raw = file.read()
# tokenize
tokens = word_tokenize(raw)