Idea: Read each repository on GitHub (et al.) with Go code. Maybe limit this to repositories with a go.mod file, maybe not. You can't get this from the godoc.org API because imports are only updated when you visit the importer, and if nobody does that the imports don't change (you can verify this by checking cases you know of manually and reloading to watch the counter go up).
Use go list ./...
to list all the import paths of all the packages, and find the import paths of all packages depended upon by each one.
Build a matrix of: depends-on(x ipath) : [ipath]
Include version numbers maybe, if they're available (e.g., from a go.mod file).
Invert the matrix to get depended-on-by(x ipath) : [ipath]
Now you can query for any package which packages depend on it.
As a side effect we could also record for each repository the import paths of all the Go packages defined in that repository.
Problem steps:
- List the repositories to examine.
- Fetch each repository (maybe shallowly?)
- Scan the repository for Go packages and record their deps.
- Merge, dedup, invert.
How to list the repos.
- https://github.com/golang/gddo/wiki/API
curl -L https://api.godoc.org/packages | tee capture.json
jq .results[].path
- Per the documentation "This API returns all packages, including packages with errors, vendored packages, internal packages and more."
Resolving vanity URLs:
HTTP GET <url>?go-get=1
, follow redirects if necessary till you get a 200.- Parse
<meta name="go-import" content="<import-path> <vcs> <fetch-url>">
Tools: https://github.com/creachadair/repodeps. Currently does not use module information at all.
Per Alex: 30K ~42K repositories on GitHub with go.mod
files in the root (not in a vendor
directory, for example).
- Use the module files to record which versions of each import path are being used.
- Include file content digests in the index, so that dependencies can be matched against file contents during/near a parse.
- Plugin model for other languages: Run in the root of a repo, do whatever you have to do, spit out dependencies in this format. Should work for Python, maybe others (though ipath format will vary; we might not care).
- Preserve mapping from ipath to repository during indexing ("which repositories provide this package?")
- PyPI: https://warehouse.readthedocs.io/api-reference/#
- Adoption: How many external dependencies are there on packages in my org? This one package?
- Migration: How many depend on Old instead of New, and how does it change over time?
- Breakdown of transitive dependencies by org: Which non-standard packages are "crucial" to the health of the ecosystem?
To be precise, right now search request for
filename:go extension:mod path:/
returns42287
results.