Last active
September 5, 2018 12:38
-
-
Save warenlg/e7eb96204f36359c32d5823b3948d144 to your computer and use it in GitHub Desktop.
You can find the initial distribution compiled from the PGA CSV file in this gist https://gist.github.com/warenlg/44bd576637ee161929a3f7e1a88554f5
However, you'll see that the number don't match, the reason :
- The step to preprocess all PGA in parquet files misses some guys
- At the time it has been run, the preprocess command from
src-d/ml
did not includelang
in the output parquet files. So I had to filter by file extension, and here I missed a lot of files.
I put the shareable link to the CSV file at the beginning of the notebook. Just in case https://drive.google.com/open?id=1es02UUFUWlR9k4hswCSQCAsSOqjma06y
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, thanks for the analysis!
Is it possible to add distributions repositories based on JS files count?