Panel App has a RESTful API which allows programmatic access to it's data. The best place to start exploring the API is via the documentation at https://panelapp.genomicsengland.co.uk/api/docs/, which uses Swagger. Swagger is an open-source software framework that helps developers design, build, and document RESTful web services.
The documentation is split into two parts:
- The first section describes the API endpoints for various different queries.
- The second section descibes the Models returned for each endpoint, i.e. how the returned JSON file is formatted.
The cool thing about Swagger documentation is that it is interactive. Select the endpoint you want to investigate, click the "Try it out" button, enter the query you want to test, and then click "execute". It will return the following:
- Curl - The Curl command to query the endpoint from the command line
- Request URL - The URL for the end point, for example https://panelapp.genomicsengland.co.uk/api/v1/panels/?page=1
- Response Body - The data you requested in JSON format
- Response Headers - Various meta data about the request
Only GET endpoints are available to the user via the Panel App API, you can read data but not ammend it. The basic structure of all queries is:
-
GET /items: This endpoint is used to fetch a list of all 'items'. The 'GET' method is used to read information.
-
GET /items/{id}: This endpoint is used to fetch a specific 'item' based on its unique ID.
Where items would be something like 'panel', or 'gene', or 'region'.
Lets look at an example endpoint which returns data on the panels in Panel App:
https://panelapp.genomicsengland.co.uk/api/v1/panels/?page=1
RESTful APIs have human readable endpoints which are built in a logical way:
-
/v1/ - It is best practice to version control APIs, this ensures future changes to the API do not
-
https://panelapp.genomicsengland.co.uk/api/ - the base URL to access the APIbreak legacy code.
-
/panel/ - The endpoint for the data you would like to access
-
/?page=1 - The page of the returned data. Pagination in APIs is a technique for handling large sets of data by dividing the data into smaller, manageable chunks, or "pages".
To return all data from a paginated API, you'll typically have to write a loop that makes a request for each page and combines the results. If you do not do this then you may only be accessing the first "chunk" of data, rather than the complete data set.
Models describe the nested structure of the JSON file, the keys used to identify values, and the data type of the values. This information Is useful when you need to seralize, or parse the JSON file.
The header returns various metadata about the request. HTTP status codes, which are also returned in the header of an HTTP response, provide a standardized way for servers to inform clients about the status of their request. The standard codes to expect when using this API are:
- 200 Reuest successful
- 400 Bad Request
- 404 Not Found
All data is returned as a JSON object. It is likely you will need to parse this JSON object to obtain the data you want.
If you are parsing JSON on the command line ensure you have jq
installed, if not install it:
sudo apt-get update
sudo apt-get install jq
jq
is a utility for filtering JSON objects, the manual can be found at https://stedolan.github.io/jq/manual/
We can use curl
to access the endpoint:
curl -X GET "https://panelapp.genomicsengland.co.uk/api/v1/genes/"
We will use jq
to filter the JSON object returned from the gene
endpoint, the command jq "."
will pass all the data, but will add colour syntaxing to make the output more readable:
curl -X GET "https://panelapp.genomicsengland.co.uk/api/v1/genes/" | jq "."
In the following examples, we'll use the jq
tool to filter and transform data from the API endpoint. NOTE that for many of these queries we are only getting the first page of the data:
- Extract gene names and confidence levels for each gene:
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.results[] | .entity_name + ": " + .confidence_level'
- Extract gene names and associated panel names:
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.results[] | .entity_name + ": " + .panel.name'
- Filter genes with a specific confidence level ("High" in this example):
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.results[] | select(.confidence_level == "High") | .entity_name'`
- Extract the total count of genes returned by the API:
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.count'
jq
is a very powerful tool and can perform complex transformations and queries on JSON data.
To download paginated data you will need to use a while loop to reiterate through the pages. This can be easier in Python where you can parse the data into a dataframe or database, and add error handling code. One way to do it in bash
would be something like the approach below:
url="https://panelapp.genomicsengland.co.uk/api/v1/genes/?page=1"
while [[ "$url" != null ]]; do
response=$(curl -s "$url")
echo "$response" | jq -r '.results[].entity_name'
url=$(echo "$response" | jq -r '.next')
done
Below are two simple functions to return all the data from two different end points, one which uses pagination, and on that does not.
import requests
import json
import pandas as pd
# Example function using a loop to load all pages for paginated endpoints
def get_panel_app_list():
"""
Queries the Panel App API to return details on all signed off Panels
:return: Pandas dataframe, Columns:id, hash_id, name, disease_group, disease_sub_group, status version, version_created, relevant_disorders, types, stats.number_of_genes, stats.number_of_strs, stats.number_of_regions
:rtype: pandas dataframe
"""
server = "https://panelapp.genomicsengland.co.uk"
ext = f"/api/v1/panels/signedoff/"
r = requests.get(server + ext, headers={"Content-Type": "application/json"})
# Send informative error message if bad request returned
if not r.ok:
r.raise_for_status()
sys.exit()
expected_panels = r.json()["count"]
# df columns: 'Name', 'DiseaseSubGroup', 'DiseaseGroup', 'CurrentVersion',
# 'CurrentCreated', 'Number_of_Genes', 'Number_of_STRs', 'Number_of_Regions',
# 'Panel_Id', 'Relevant_disorders', 'Status', 'PanelTypes'
GEL_panel_app_df = pd.json_normalize(r.json(), record_path=["results"])
# Reiterate over remaining pages of data
while r.json()["next"] is not None:
r = requests.get(
r.json()["next"], headers={"Content-Type": "application/json"})
GEL_panel_app_df = GEL_panel_app_df.append(pd.json_normalize(r.json(), record_path=["results"]))
return GEL_panel_app_df
# Example function not using pagination
def get_panel_app_genes(panel_id, panel_version, genome_build):
server = "https://panelapp.genomicsengland.co.uk"
ext = f"/api/v1/panels/{panel_id}/genes/?version={panel_version}"
print(f"{server}{ext}")
r = requests.get(server + ext, headers={"Content-Type": "application/json"})
# Send informative error message if bad request returned
if not r.ok:
r.raise_for_status()
sys.exit()
decoded = r.json()
gene_list = []
for entry in decoded.get("results"):
gene_list.append(entry.get("gene_data").get("gene_symbol"))
return gene_list