Skip to content

Instantly share code, notes, and snippets.

@MattBlissett
Last active September 25, 2015 08:18
Show Gist options
  • Save MattBlissett/89f68394c8a4b1c9374e to your computer and use it in GitHub Desktop.
Save MattBlissett/89f68394c8a4b1c9374e to your computer and use it in GitHub Desktop.
Graph Gist of Taxonomic Name Units
= Generating Global Checklists from Taxonomic Name Units
:neo4j-version: 2.2.0
:author: Matthew Blissett, Donald Hobern
:description: Example Taxonomic Name Units that result from subsequent taxonomic acts
:tags: domain:life-science, use-case:taxonomy
== Notation
* Single letters represent Linnaean names at various ranks
* Capital letters mean that the name is accepted within a treatment as the name for a taxon
* Lowercase letters mean that the name is considered a synonym
* An accepted name may be followed by its lower case synonyms as a “word”
* The same letter, regardless of case, always represents the same Linnaean name wherever it is used
* Pn (_n_ is an integer) represents an authority (publicationy) with nomenclatural acts and/or taxonomic treatments
In other words _A_, _Bc_, _Def_ represents the following:
* Species _A_
* Species _B_ with synonym _C_
* Species _D_ with synonyms _E_ and _F_
== Problem
Imagine that the entire world taxonomic literature had never proceeded beyond the following
(each bullet represents one publication, in chronological order):
* P1 – _A, B, C, D, E_
* P2 – _Cd, E, F, G_
* P3 – _Af, Ceg, D, H, I_
* P4 – _Befg, I, J, K_
* P5 – _A, B, Cd, E, Gj, H, I, K_
What possible global checklists of names might exist?
(The initial set-up of data is hidden, reveal it with the `+`.)
//setup
//hide
[source,cypher]
----
// Create names (nodes) from P1
CREATE (`A`:name {name:'A',author:'P1'}),
(`B`:name {name:'B',author:'P1'}),
(`C`:name {name:'C',author:'P1'}),
(`D`:name {name:'D',author:'P1'}),
(`E`:name {name:'E',author:'P1'})
// Create accepted-name relationships from P1
CREATE (`A`)-[:ACCEPTED_AS {author:'P1'}]->(`A`),
(`B`)-[:ACCEPTED_AS {author:'P1'}]->(`B`),
(`C`)-[:ACCEPTED_AS {author:'P1'}]->(`C`),
(`D`)-[:ACCEPTED_AS {author:'P1'}]->(`D`),
(`E`)-[:ACCEPTED_AS {author:'P1'}]->(`E`)
// Create names (nodes) from P2
CREATE (`F`:name {name:'F',author:'P2'}),
(`G`:name {name:'G',author:'P2'})
// Create accepted-name relationships from P2
CREATE (`C`)-[:ACCEPTED_AS {author:'P2'}]->(`C`),
(`D`)-[:ACCEPTED_AS {author:'P2'}]->(`C`),
(`E`)-[:ACCEPTED_AS {author:'P2'}]->(`E`),
(`F`)-[:ACCEPTED_AS {author:'P2'}]->(`F`),
(`G`)-[:ACCEPTED_AS {author:'P2'}]->(`G`)
// Create names (nodes) from P3
CREATE (`H`:name {name:'H',author:'P3'}),
(`I`:name {name:'I',author:'P3'})
// Create accepted-name relationships from P3
CREATE (`A`)-[:ACCEPTED_AS {author:'P3'}]->(`A`),
(`F`)-[:ACCEPTED_AS {author:'P3'}]->(`A`),
(`C`)-[:ACCEPTED_AS {author:'P3'}]->(`C`),
(`E`)-[:ACCEPTED_AS {author:'P3'}]->(`C`),
(`G`)-[:ACCEPTED_AS {author:'P3'}]->(`C`),
(`D`)-[:ACCEPTED_AS {author:'P3'}]->(`D`),
(`H`)-[:ACCEPTED_AS {author:'P3'}]->(`H`),
(`I`)-[:ACCEPTED_AS {author:'P3'}]->(`I`)
// Create names (nodes) from P4
CREATE (`J`:name {name:'J',author:'P4'}),
(`K`:name {name:'K',author:'P4'})
// Create accepted-name relationships from P4
CREATE (`B`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
(`E`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
(`F`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
(`G`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
(`I`)-[:ACCEPTED_AS {author:'P4'}]->(`I`),
(`J`)-[:ACCEPTED_AS {author:'P4'}]->(`J`),
(`K`)-[:ACCEPTED_AS {author:'P4'}]->(`K`)
// No new names (nodes) from P5
// Create accepted-name relationships from P5
CREATE (`A`)-[:ACCEPTED_AS {author:'P5'}]->(`A`),
(`B`)-[:ACCEPTED_AS {author:'P5'}]->(`B`),
(`C`)-[:ACCEPTED_AS {author:'P5'}]->(`C`),
(`D`)-[:ACCEPTED_AS {author:'P5'}]->(`C`),
(`E`)-[:ACCEPTED_AS {author:'P5'}]->(`E`),
(`G`)-[:ACCEPTED_AS {author:'P5'}]->(`G`),
(`J`)-[:ACCEPTED_AS {author:'P5'}]->(`G`),
(`H`)-[:ACCEPTED_AS {author:'P5'}]->(`H`),
(`I`)-[:ACCEPTED_AS {author:'P5'}]->(`I`),
(`K`)-[:ACCEPTED_AS {author:'P5'}]->(`K`)
RETURN *
----
_Unfortunately, multiple relationships between the same nodes are not visible._
//graph_result
== Results
=== How many authorities are there?
[source,cypher]
----
MATCH (x:name)-[a:ACCEPTED_AS]->(y:name)
RETURN COUNT(DISTINCT a.author) AS Count_of_Authorities
----
//table
== How many names are there?
[source,cypher]
----
MATCH (x:name)
RETURN COUNT(x) AS Count_of_Names
----
//table
== How many _taxonomic name units_ (TNUs) are there?
[source,cypher]
----
MATCH (x:name)-[a:ACCEPTED_AS]->(y:name)
RETURN COUNT(a) AS Count_of_TNUs
----
//table
== How many taxon concepts _sensu_ authority?
[source,cypher]
----
MATCH (x:name)-[a:ACCEPTED_AS]->(x:name)
RETURN COUNT(a) AS Count_of_Taxon_Concepts_sensu_Authority
----
//table
== How many taxon concepts by synonymy
Explanation of query: Start by finding accepted taxa, and all synonyms (if any). Group those synonyms into “taxon concepts” according to authority, then group authorities by distinct sets of taxon concepts.
[source,cypher]
----
MATCH (acc:name)-[accrel:ACCEPTED_AS]->(acc:name)
OPTIONAL MATCH (syn:name)-[synrel:ACCEPTED_AS]->(acc:name)
WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, synrel.author AS author, acc.name AS Accepted_name
RETURN Taxon_concept, COLLECT(author) AS Authorities, Accepted_name
ORDER BY Accepted_name, LENGTH(Taxon_concept)
----
//table
== What are our possible “catalogues of life”?
Assuming we rely heavily on P5, what are the possible complete taxonomies?
[source,cypher]
----
MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P5" }]->(acc:name)
WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name
MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P5" }]->(acc2:name)
WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an
MATCH (x:name)-[r]->(y:name)
WHERE NONE (a IN allNames
WHERE x = a)
RETURN DISTINCT x.name AS Unplaced_name, y.name AS is_a_synonym_of, r.author AS according_to, tc AS Taxa_according_to_P5
----
//table
What if, instead, we start with P4?
[source,cypher]
----
MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P4" }]->(acc:name)
WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name
MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P4" }]->(acc2:name)
WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an
MATCH (x:name)-[r]->(y:name)
WHERE NONE (a IN allNames
WHERE x = a)
WITH x.name AS Unplaced_name, y.name AS is_a_synonym_of, COLLECT(DISTINCT r.author) AS according_tos, tc AS Taxa_according_to_P5
RETURN DISTINCT Unplaced_name, is_a_synonym_of, according_tos, Taxa_according_to_P5
ORDER BY Unplaced_name
----
//table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment