Skip to content

Instantly share code, notes, and snippets.

Last active September 9, 2024 23:53
Show Gist options
  • Save MichaelCurrin/6777b91e6374cdb5662b64b8249070ea to your computer and use it in GitHub Desktop.
Save MichaelCurrin/6777b91e6374cdb5662b64b8249070ea to your computer and use it in GitHub Desktop.
GitHub GraphQL - Get files in a repository

Get GitHub Files

Get the metadata and content of all files in a given GitHub repo using the GraphQL API

You might want to get a tree summary of files in a repo without downloading the repo, or maybe you want to lookup the contents of a file again without download the whole repo.

The approach here is to query data from GitHub using the Github V4 GraphQL API.

About the query


In the sample GQL file in this gist, I included some useful attributes about files in a GitHub repo. The query can be modified to work with any repo you have read access to.

  • name
    • File or path name.
  • mode
    • Usually 16384 or 33188.
  • type
    • blob for text or binary files.
    • tree for a directory path.
  • text
    • This is the content of your file. For larger files, this field will of coruse make your JSON response very long.
    • From the schema: "UTF8 text data or null if the Blob is binary".
    • Includes \n for line breaks in text. Note your code might have "\n" in strings too.
  • isBinary
    • Useful if you want to separate file types or not try and count lines in a binary.
    • Binary might be images or compiled files.


  • Unfortunately I could not find summary values for number of files or a count of the number of lines, so you have to work those out yourself.
  • Regarding the expression value for object:
    • See expression or GitObject in the Object reference docs.

      "A Git revision expression suitable for rev-parse".

    • Choose a commit reference and add a colon e.g. "HEAD:". You can use master or a commit ID instead.
    • You will only get objects at the repo root though, unless you use a nested query or choose a path. e.g. "master: docs/".
    • You can also use a nested query to get multiple level down, as in the second GQL file below. But I can't see a way to nest this recursively. And a Fragment doesn't let you nest in itself.

How to use the query


Try the query out in the explorer.

  1. Go to the explorer and sign in - V4 explorer
  2. Paste the GQL query from get_github_files.gql to the main pane.
  3. Paste the sample JSON from sample_params.json into the query variables pane.
  4. Press the play/arrow button to run.


Use curl, or a library in Python, Ruby, etc.

Here is a generic example from the GitHub docs. This as it is will fail though, as the auth token is missing. You must generate and pass an auth token for GraphQL. The REST API lets you make requests without an auth token (within limits).

$ curl \
  -d '{ "query": "query { viewer { login } }" }' \
  -H "Authorization: bearer token" 

Sample output

After executing get_github_files.gql.

Simplified JSON output
  "entries": [
      "name": ".gitignore",
      "type": "blob",
      "object": {
        "byteSize": 32,
        "text": "node_modules/\npackage-lock.json\n"
      "name": ".vscode",
      "type": "tree",
      "object": {}
      "name": "",
      "type": "blob",
      "object": {
        "byteSize": 1520,
        "text": "..."


  • Thanks to this gist by @johndevs, for getting me going with using the Tree and Blob structure.
  • See Intro to GraphQL on the GraphQL website.
  • GitHub GraphQL in my Dev Resources.
query RepoFiles($owner: String!, $name: String!) {
repository(owner: $owner, name: $name) {
object(expression: "HEAD:") {
... on Tree {
entries {
object {
... on Blob {
query RepoFiles($owner: String!, $name: String!) {
repository(owner: $owner, name: $name) {
object(expression: "HEAD:") {
# Top-level.
... on Tree {
entries {
object {
... on Blob {
# One level down.
... on Tree {
entries {
object {
... on Blob {
{ "owner": "MichaelCurrin", "name": "python-twitter-guide" }
Copy link

sorokinvj commented Dec 15, 2020

I figured out the way to query specific directory, you just change "master:" for "master: path/to/dir"

Copy link

Thanks for the tip. I've added a note to the md file for this.

Copy link

I added a second GQL file for nested objects

Copy link

jxmot commented Dec 17, 2020

I'm getting errors running the get_github_files_nested.gql query.

Copy link

Sorry, I fixed it up so it runs.

Copy link

I'm curious to see if anyone had implemented a multi level query somehow.

Copy link

@MichaelCurrin is it possible to paginate this request? Getting timeout when trying to fetch thousands of files. Can't seem to get first/last/offset to work as per normal GQL queries

Copy link

@jmcallister-msft sorry I can't see how.

Object accepts 2 params -object(expression: "HEAD:") and oid. Nothing on first or last.

Copy link

Came up with a workaround for this. Solution was to use GraphQL aliasing and manually batch the requests. So the flow is:

  1. Fetch all file names with a query similar to get_github_files.gql
  2. For each group of n files, fetch the file content directly but in one request with aliasing. The query looks something like this:
query RepoFiles {
  repository(owner: "username", name: "repository") {
    file1: object(expression: "HEAD:test/filename1.txt") {
      ... on Blob {
    file2: object(expression: "HEAD:test/filename2.txt") {
      ... on Blob {

The GitHub API will treat step 2 as only 1-2 request "points" toward the rate limit (not a request point per file)

Copy link

ahmafi commented Jan 15, 2022

Is there any way to get all the files in a repository with 1 request without nesting?

Copy link

MichaelCurrin commented Jan 16, 2022

This looks like a job for


I have a repo of over 1000 files.

$ git ls-files | wc -l

And I found this endpoint which gives blobs (files) and trees (directory paths)

that includes index 1362. without paging

Copy link

MichaelCurrin commented Jan 16, 2022

I found an option for GQL for getting paths recursively.

But it is limited to a finite number of nested steps. So you have to add more lines to the query to get deeper paths, at the risk of not having enough depth in the query.

So that is close to the content my gist anyway.

  repository(owner: "MyLogin", name: "MyRepo") {
    defaultBranchRef {
      target {
        ... on Commit {
          history(first: 1 until: "2019-10-08T00:00:00") {
            nodes {
              tree {
                entries {
                  object {
                    ... on Tree {
                      entries {
                          ...on Tree{
                                ...on Tree{

Copy link

Came up with a workaround for this. Solution was to use GraphQL aliasing and manually batch the requests. So the flow is:

  1. Fetch all file names with a query similar to get_github_files.gql
  2. For each group of n files, fetch the file content directly but in one request with aliasing. The query looks something like this:
query RepoFiles {
  repository(owner: "username", name: "repository") {
    file1: object(expression: "HEAD:test/filename1.txt") {
      ... on Blob {
    file2: object(expression: "HEAD:test/filename2.txt") {
      ... on Blob {

The GitHub API will treat step 2 as only 1-2 request "points" toward the rate limit (not a request point per file)

Is there any way to create multiple aliases dynamically?

Copy link

I wrote a little snippet to generate a query for fetching all files, that is generated based on the output of the recursive tree fetch call from github:

Copy link

I want to get a specific file on root dir:

Copy link

@claudiu-cristea I got this to work based on answer by @jmcallister-msft above

query RepoFiles {
  repository(owner: "MichaelCurrin", name: "python-twitter-guide") {
    object(expression: "") {
      ... on Blob {

Copy link

@MichaelCurrin great, that works. Now I want to filter to only show repositories having a certain file

Copy link

@claudiu-cristea you could repeat the snippet for a bunch of repos you enter by hand. Maybe the output will be empty for the repos where it does not exist - see what happens. If the repo doesn't have that file, then take the repo out of the query. I don't know how dynamic you want your solution to be or what kind of files you are looking for and if the file would ever get deleted/moved/renamed.

You could also do something fancier by iterating over of a list repos and checking if the file exists but I don't see that being so useful.

Copy link

prpanto commented Feb 27, 2024

Look at this too here

Copy link

@prpanto thank looks close to what I commented above except master instead of HEAD. And for their case they check and as separate objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment