Post Provided by Andrew C. Martin
The Global Pollen Project is an online, freely available tool and data source developed to help people identify and disseminate palynological resources. Palynology – the study of pollen grains and other spores – is used across many fields of study including modern and fossil vegetation dynamics, forensic sciences, pollination, and beekeeping. To help make pollen identification quicker and more transparent, we developed the Global Pollen Project (GPP) – an open, peer-reviewed database of global pollen morphology, where content and expertise is crowdsourced from across the world. Our approach to developing this tool was open: open code, open data, open access. It connects to other data services, including the Global Biodiversity Information Facility and Neotoma Palaeoecology Database, to provide occurrence data for each taxon, alongside pollen images and metadata.
The framework enables the exchange of information from two main places:
- Existing reference material (which is digitised using common standards for the database). Pollen and spore reference sets are commonly stored in cabinets of glass slides. The GPP introduced tools to digitise these resources into images and associated metadata, integrating them into the Master Reference Collection and making them globally available.
- Individual pollen grain images of unknown taxonomic identity that are submitted by users. The tool lets you upload unknown grains, and share identifications of the unknown grains to help identify them. The framework analyses the identities given to every item of information, and assigns a taxonomic identity if a critical certainty threshold is reached.
We fuse the data from these two sources into a common, up-to-date botanical taxonomy so that all of the information can be displayed in a common Master Reference Collection. You can read more about these functions in my previous blog post.
Since the Global Pollen Project (GPP) was featured in the July 2017 issue of Methods in Ecology and Evolution, we’ve been hard at work to improve and enhance the rigour, speed, and presentation of our tools and data. An enhanced framework was created for this purpose, and launched in October 2017.
Image Focusing and Calibration
Pollen grains are complex 3D structures: their identification often requires viewing them at multiple focus levels to see all the required traits. In the GPP, we included focusable images, which were made up of a stack of five images on a common frame of reference. This allowed you to see the 3D structure by zooming through a single image. Since publication, we’ve enabled focusable images to have any number of frames, removing the previous cap of five frames. With unlimited frames, you can see pollen grains in greater detail.
Pollen traits are used to perform identification and create identification keys. Accurate measurement at the micrometre scale is crucial. All static and focusable images now have mandatory calibration, which means we can include a scale-bar on every image. Image calibration occurs using one of two methods: fixed calibration or floating calibration.
Floating calibrations involve ‘click-click’ calibration of a line of known distance (see figure): these are useful for single-level images, which may have been taken ad-hoc. Fixed calibrations are more targeted at reference material. If you’re digitising pollen images, you can set up each of your microscopes and fixed cameras so that, for each magnification, there’s an associated calibration. The calibrations can then be quickly applied when digitising material from reference slides.
Identification Methods: Field, Herbaria, Botanic Gardens
The GPP’s original identification model made it clear what material that had been collected directly from identified plant material (‘direct’), and what material had a taxonomic identity inferred from its morphological characteristics (‘morphological’). This original model has been expanded to incorporate additional details of ‘direct’ identification methods to increase taxonomic transparency and traceability of reference material.
The GPP now uses institution codes from the Index Herbariorum and Botanic Gardens Conservation International, alongside internal barcodes. This lets you trace reference material back to the original plant. For material that has been identified in the field without a herbarium identification, we record the name of both the person conducting the identification, as well as who sampled the plant. You can see examples of living collection, herbarium, and field-identification material in the collection.
Using the enhanced identification model, we’ve been able to better constrain the taxonomy of much of the currently held digitised reference material. Some reference collections didn’t include taxonomic authorship information with their slides. But, using associated herbarium barcodes and botanic garden IDs, most of this material has now been traced back to its original identification. Most material expert-identified in the field though, didn’t have any form of name authorship, so it couldn’t be taxonomically constrained for inclusion in the Global Pollen Project’s Master Reference Collection. If you’re making glass slides and want to make sure that your work is useful in the long-term, it’s important to keep this in mind.
Additional Information in the Master Reference Collection
We’ve added two components onto each taxon page in the Master Reference Collection to help you interpret palynological data. First, our datasets are now connected to the Encyclopaedia of Life. We use this connection to generate information cards for every family, genus, and species, (where the necessary details are available), alongside the existing modern and historic occurrence data (from GBIF and NeotomaDB respectively).
Second, we’ve included an indication of taxonomic completeness. Our master collection usually only represents a subset of the taxonomic diversity within the currently displayed taxon. Knowing how much of the taxonomic diversity is in the collection is useful when judging the potential representativeness of observed morphological characteristics to the whole taxon.
Digitisation Tools and Reference Collections
We’ve launched a preview of our new online digitisation tools and new reference collection model. Reference collections are now incremental, allowing them to be progressively digitised, and versioned, following a publication model similar to code in places like CRAN. First, slides can be recorded in the GPP without full digitisation, giving curators the ability to share with others the material that is available in physical form. Collections can be progressively digitised over time. Second, collection versioning enables any references to refer to a static reference collection, as well as a curation process on the publication of new versions.
The positive reception to our paper encouraged us to future-proof the GPP’s framework. At its core, the GPP now hosts a temporal, event-driven data model. This tracks changes to taxonomy and data through time. The value of this approach should become more and more apparent as taxonomic changes occur. We now have a rigorous core data model to expand from.
An early goal of the GPP was to start to include simple traits – size, shape, wall thickness, etc. – to enable the use of the GPP as a dynamic, morphological key. Such tagged data may also be useful for machine learning approaches. We have multiple research institutions now interested in using such a dataset for artificial intelligence projects. The foundations discussed above will allow us to progress with this effort.
More and more people are visiting and actively using the tool every month. We now have over 700 monthly users, with some staying on the site for hours at a time. The master reference collection contains 1,599 species, or 25% of global plant families: the latest reference material to be included is from Mexico, the Galapagos Islands, and Mongolia. Our goal is to get as close to 100% as possible, through collaboration and good data practices.
To find out more about the Global Pollen Project, read our Open Access Methods in Ecology and Evolution article ‘The Global Pollen Project: a new tool for pollen identification and the dissemination of physical reference collections’.
This article was highly commended for the 2017 Robert May Early Career Researcher Prize. You can find out more about other papers that were considered for the prize here.