Skip to content

Instantly share code, notes, and snippets.

@wbazant
Created August 1, 2018 13:34
Show Gist options
  • Save wbazant/b4fedc4797eebda8674363ca2f3c8265 to your computer and use it in GitHub Desktop.
Save wbazant/b4fedc4797eebda8674363ca2f3c8265 to your computer and use it in GitHub Desktop.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Lab meeting 24.07.2018</title>
<link rel="stylesheet" href="https://stackedit.io/style.css" />
</head>
<body class="stackedit">
<div class="stackedit__html">
<h2 id="cross-references-pipeline">Cross-references pipeline</h2>
<p>It connects our gene IDs with IDs from other resources. Desired result:</p>
<p><strong>WBGene00001135</strong><br>
<em>Probable vesicular glutamate transporter eat-4 [Source:UniProtKB/Swiss-Prot;Acc:<a href="%5Bhttp://www.uniprot.org/uniprot/P34644%5D(http://www.uniprot.org/uniprot/P34644%5D*)">P34644</a>]</em></p>
<table>
<thead>
<tr>
<th>External database</th>
<th>database identifier</th>
</tr>
</thead>
<tbody>
<tr>
<td>UniGene</td>
<td>Cel.19624</td>
</tr>
<tr>
<td>EntrezGene</td>
<td>Probable vesicular glutamate transporter eat-4</td>
</tr>
</tbody>
</table><h3 id="pipeline-overview">Pipeline overview</h3>
<ul>
<li>Download the sources from other resources</li>
<li>Parse the source files and pick relevant parts</li>
<li>Map to our genes</li>
<li>Extract results</li>
</ul>
<h3 id="kinds-of-references">Kinds of references</h3>
<h4 id="direct-through-ids">Direct (through IDs)</h4>
<p>e.g. RefSeq</p>
<pre><code>LOCUS NP_499023 576 aa linear INV 05-JUN-2018
DEFINITION Probable vesicular glutamate transporter eat-4 [Caenorhabditis
elegans].
ACCESSION NP_499023
...
[CDS]([https://www.ncbi.nlm.nih.gov/nuccore/NM_066622.4?from=1&amp;to=1731](https://www.ncbi.nlm.nih.gov/nuccore/NM_066622.4?from=1&amp;to=1731)) 1..576
...
/db_xref="WormBase:[WBGene00001135]([https://www.wormbase.org/search/gene/WBGene00001135](https://www.wormbase.org/search/gene/WBGene00001135))"
</code></pre>
<h4 id="checksum">Checksum</h4>
<p>UniParc, RNACentral</p>
<pre><code>URS0000000001 6bba097c8c39ed9a0fdf02273ee1c79a
URS0000000002 1fe2f0e3c3a2d6d708ac98e9bfb1d7a8
URS0000000003 7bb11d0572ff6bb42427ce74450ba564
...
</code></pre>
<h4 id="sequence-match">Sequence match</h4>
<p>E.g. UniProt</p>
<pre><code>&gt;sp|P34644
MSSWNEAWDRGKQMVGEPLAKMTAAAASATGAAPPQQMQEEGNENPMQMHSNKVLQVMEQ
TWIGKCRKRWLLAILANMGFMISFGIRCNFGAAKTHMYKNYTDPYGKVHMHEFNWTIDEL
...
</code></pre>
<h4 id="dependent--transitive">Dependent / transitive</h4>
<p>E.g. RefSeq also mentions WikiGene ids</p>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment