Skip to content

Instantly share code, notes, and snippets.

@wbazant
Created August 1, 2018 13:36
Show Gist options
  • Save wbazant/65e941e29455b148ac5d7100e51513e4 to your computer and use it in GitHub Desktop.
Save wbazant/65e941e29455b148ac5d7100e51513e4 to your computer and use it in GitHub Desktop.
cross-references
<title>cross-references</title>

Cross-references pipeline

It connects our gene IDs with IDs from other resources. Desired result:

WBGene00001135
Probable vesicular glutamate transporter eat-4 [Source:UniProtKB/Swiss-Prot;Acc:P34644]

External database database identifier
UniGene Cel.19624
EntrezGene Probable vesicular glutamate transporter eat-4

Pipeline overview

  • Download the sources from other resources
  • Parse the source files and pick relevant parts
  • Map to our genes
  • Extract results

Kinds of references

Direct (through IDs)

e.g. RefSeq

LOCUS NP_499023 576 aa linear INV 05-JUN-2018  
DEFINITION Probable vesicular glutamate transporter eat-4 [Caenorhabditis  
elegans].  
ACCESSION NP_499023  
...  
[CDS]([https://www.ncbi.nlm.nih.gov/nuccore/NM_066622.4?from=1&to=1731](https://www.ncbi.nlm.nih.gov/nuccore/NM_066622.4?from=1&to=1731)) 1..576  
...  
/db_xref="WormBase:[WBGene00001135]([https://www.wormbase.org/search/gene/WBGene00001135](https://www.wormbase.org/search/gene/WBGene00001135))"  

Checksum

UniParc, RNACentral

URS0000000001 6bba097c8c39ed9a0fdf02273ee1c79a  
URS0000000002 1fe2f0e3c3a2d6d708ac98e9bfb1d7a8  
URS0000000003 7bb11d0572ff6bb42427ce74450ba564  
...  

Sequence match

E.g. UniProt

>sp|P34644  
MSSWNEAWDRGKQMVGEPLAKMTAAAASATGAAPPQQMQEEGNENPMQMHSNKVLQVMEQ  
TWIGKCRKRWLLAILANMGFMISFGIRCNFGAAKTHMYKNYTDPYGKVHMHEFNWTIDEL  
...  

Dependent / transitive

E.g. RefSeq also mentions WikiGene ids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment