Separation of scientific names

Question

Separation of scientific names

A scientific name usually consists of three pieces of information: genus, epitheton, and author. A simple example would be the following:

Acanthus ilicifolius L.

Genus: Acanthus
Epitecton Species: ilicifolious
Posted by: L.

Easy. However, the issue becomes more complicated when we deal with hybrids, subspecies / varieties / forms, several authors and other inconsistencies. In these cases, the name of the view may look like this:

Wed Andrographis paniculata (Burm.f.) Wall. ex nees

cf .: view was not defined with 100% certainty
Genus: Andrographis
Epitecton species: paniculata li>
Posted by (Burm.f.) Wall. ex nees

or that:

Ipomoea pes-caprae (L.) DC. subspecies. brasiliensis (L.) Ooststr.f

Genus: Ipomea
Epitecton Species: pes-caprae
Posted by: (L.) DC.
Epithet subspecies: brasiliensis
Subspecies Author: (L.) Ooststr.f

I am trying to find a reliable way to deconstruct such names. I could write some hacker code using tc if if / else, but I'm looking for something more elegant (and reliable). I was thinking of some kind of parser that parses the name like a calculator that parses a mathematical expression. Unfortunately, I am not the most difficult programmer, and I did not even write a real parser before, and I do not know if this will make sense in this case, since there are quite a lot of variations in scientific names. What do you think is the best way to solve this problem? The preferred language is R, possibly also Julia, if it is better suited to the task.

+6

r parsing bioinformatics julia-lang

Hav0k Jan 21 '15 at 16:25

source share

1 answer

jbaums · Accepted Answer · 2015-01-21T16:35:28+0000

You are lucky (sort of). GBIF has a parser, and the taxize package intercepts its API using the gbif_parse function.

 library(taxize) gbif_parse(c('Acanthus ilicifolius L.', 'cf. Andrographis paniculata (Burm.f.) Wall. ex Nees', 'Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f')) # scientificname type genusorabove specificepithet authorsparsed authorship canonicalname canonicalnamewithmarker canonicalnamecomplete bracketauthorship infraspecificepithet rankmarker # 1 Acanthus ilicifolius L. WELLFORMED Acanthus ilicifolius TRUE L. Acanthus ilicifolius Acanthus ilicifolius Acanthus ilicifolius L. <NA> <NA> <NA> # 2 cf. Andrographis paniculata (Burm.f.) Wall. ex Nees INFORMAL Andrographis paniculata TRUE Wall. ex Nees Andrographis paniculata Andrographis paniculata Andrographis paniculata (Burm. f.) Wall. ex Nees Burm. f. <NA> <NA> # 3 Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f SCINAME Ipomoea pes-caprae TRUE Ooststr.f Ipomoea pes-caprae brasiliensis Ipomoea pes-caprae subsp. brasiliensis Ipomoea pes-caprae subsp. brasiliensis (L.) Ooststr.f L. brasiliensis subsp.

See ?gbif_parse more details. You can also find GBIF on github .

taxize also uses the EOL API - see ?gni_parse .

Separation of scientific names

More articles: