Separation of scientific names

A scientific name usually consists of three pieces of information: genus, epitheton, and author. A simple example would be the following:

Acanthus ilicifolius L.

  • Genus: Acanthus
  • Epitecton Species: ilicifolious
  • Posted by: L.

Easy. However, the issue becomes more complicated when we deal with hybrids, subspecies / varieties / forms, several authors and other inconsistencies. In these cases, the name of the view may look like this:

Wed Andrographis paniculata (Burm.f.) Wall. ex nees

  • cf .: view was not defined with 100% certainty
  • Genus: Andrographis
  • Epitecton species: paniculata li>
  • Posted by (Burm.f.) Wall. ex nees

or that:

Ipomoea pes-caprae (L.) DC. subspecies. brasiliensis (L.) Ooststr.f

  • Genus: Ipomea
  • Epitecton Species: pes-caprae
  • Posted by: (L.) DC.
  • Epithet subspecies: brasiliensis
  • Subspecies Author: (L.) Ooststr.f

I am trying to find a reliable way to deconstruct such names. I could write some hacker code using tc if if / else, but I'm looking for something more elegant (and reliable). I was thinking of some kind of parser that parses the name like a calculator that parses a mathematical expression. Unfortunately, I am not the most difficult programmer, and I did not even write a real parser before, and I do not know if this will make sense in this case, since there are quite a lot of variations in scientific names. What do you think is the best way to solve this problem? The preferred language is R, possibly also Julia, if it is better suited to the task.

+6
source share
1 answer

You are lucky (sort of). GBIF has a parser, and the taxize package intercepts its API using the gbif_parse function.

 library(taxize) gbif_parse(c('Acanthus ilicifolius L.', 'cf. Andrographis paniculata (Burm.f.) Wall. ex Nees', 'Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f')) # scientificname type genusorabove specificepithet authorsparsed authorship canonicalname canonicalnamewithmarker canonicalnamecomplete bracketauthorship infraspecificepithet rankmarker # 1 Acanthus ilicifolius L. WELLFORMED Acanthus ilicifolius TRUE L. Acanthus ilicifolius Acanthus ilicifolius Acanthus ilicifolius L. <NA> <NA> <NA> # 2 cf. Andrographis paniculata (Burm.f.) Wall. ex Nees INFORMAL Andrographis paniculata TRUE Wall. ex Nees Andrographis paniculata Andrographis paniculata Andrographis paniculata (Burm. f.) Wall. ex Nees Burm. f. <NA> <NA> # 3 Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f SCINAME Ipomoea pes-caprae TRUE Ooststr.f Ipomoea pes-caprae brasiliensis Ipomoea pes-caprae subsp. brasiliensis Ipomoea pes-caprae subsp. brasiliensis (L.) Ooststr.f L. brasiliensis subsp. 

See ?gbif_parse more details. You can also find GBIF on github .

taxize also uses the EOL API - see ?gni_parse .

+15
source

Source: https://habr.com/ru/post/981355/


All Articles