Last active
July 22, 2020 12:57
-
-
Save alexott/79f3fdc4478c1753e38cbfce87fd881e to your computer and use it in GitHub Desktop.
Testing quality of the language detection of Spark NLP, compared with FastText
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// full code is here: https://github.com/alexott/spark-playground/tree/master/spark-nlp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Results of evaluation against dataset linked to the following blog post: | |
http://alexott.blogspot.com/2017/10/evaluating-fasttexts-models-for.html | |
+--------+-----+-------+-------------------+ | |
|src_lang|count|correct| precision| | |
+--------+-----+-------+-------------------+ | |
| bg| 203| 146| 0.7192118226600985| | |
| de| 236| 150| 0.635593220338983| | |
| el| 199| 165| 0.8291457286432161| | |
| en| 249| 99|0.39759036144578314| | |
| es| 255| 45|0.17647058823529413| | |
| fi| 199| 176| 0.8844221105527639| | |
| fr| 205| 140| 0.6829268292682927| | |
| hr| 175| 130| 0.7428571428571429| | |
| hu| 197| 170| 0.8629441624365483| | |
| it| 206| 144| 0.6990291262135923| | |
| no| 183| 142| 0.7759562841530054| | |
| pl| 196| 158| 0.8061224489795918| | |
| pt| 206| 156| 0.7572815533980582| | |
| ro| 193| 105| 0.5440414507772021| | |
| ru| 446| 373| 0.8363228699551569| | |
| sk| 195| 164| 0.841025641025641| | |
| sv| 202| 201| 0.995049504950495| | |
| tr| 195| 152| 0.7794871794871795| | |
| uk| 197| 171| 0.868020304568528| | |
+--------+-----+-------+-------------------+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment