These are two solutions for a topic extraction task. The sample data is loaded into a variable by the script. I’ve included running times for both solutions, so we could have precise information about the cost that each one takes, in addition to their results. According to (Pazienza et al. 2005)
, two trends on textual information can be identified: one based on linguistic and syntactical information, another based on statistical analysis of frequency patterns (which usually consider text as a bags-of-words). Whilst the first approach is a purely syntactic one, the second one aims to imcorporate information about syntatic categories into the analysis (hence a hybrid approach)
1 – Set-up used:
*Ubuntu 11.04 Natty AMD64
*Python 2.7.3
*python re library
*python nltk 2.0 library and the required NumPy and PyYaml (For NLP tas