Post provided by Ute Bradter, Mari Jönsson and Tord Snäll
Opportunistically collected species observation data, or citizen science data, are increasingly available. Importantly, they’re also becoming available for regions of the world and species for which few other data are available, and they may be able to fill a data gap.
In Sweden, over 60 million citizen science observations have been collected – an impressive number given that Sweden has a population of about 10 million people and that the Swedish Species Observation System, Artportalen, was created in 2000. For bird-watchers (or plant, fungi, or other animal enthusiasts), this is a good website to bookmark. It will give you a bit of help in finding species and as a bonus, has a lot of pretty pictures of interesting species. Given the amount of data citizen science can provide in areas with few other data, it’s important to evaluate whether they can be used reliably to answer questions in applied ecology or conservation.
Potential Problems with Opportunistically Collected Citizen Science Data
Opportunistically collected observations have some potential problems. As there’s no systematic sampling protocol observers have to follow, they can report observations from wherever they want. This means that observations from popular or easily accessible places, such as near towns and cities, tend to be overrepresented. And there are fewer observations from areas off-the-beaten-track. So, the recorded observations are geographically biased. They’re the result of where a species occurs and where reporters like to go. It’s not always easy to separate the two.
Citizen science data also often contain only observations of species presences. Absences, the locations where a species was not found, are frequently not recorded. Globally, a large number of such presence-only data is collected by the Global Biodiversity Information Facility.
The lack of absences limits the modelling methods that can be used to analyse the data. Logistic regression for example, a method familiar to many as one of the first modelling techniques introduced in statistic courses, uses presence and absence data. Logistic regression is also more robust to geographical bias than other methods, such as MaxEnt, a popular method for presence-only data. The lack of absences is a limitation for applications of citizen science data. This includes habitat suitability models, which can be affected by geographical bias.
Habitat suitability models have an important role in ecology and conservation. They facilitate spatial conservation planning and prioritization, monitoring of invasive species, understanding of species-habitat requirements or climate change studies and much more. But, even if absences are not recorded, perhaps we can add some retrospectively with a bit of extra information?
Enhancing Presence-Only Citizen Science Data by Inferring Absences
Even in systematic surveys, observers typically only write down the species they did find. The absences are created once observations have been entered on a checklist of all species known to occur in a given region or country. For species that were not found, we can create absence information because we know that the observer:
- Searched the survey area
- Could identify all the species in the region or country
- Would have recorded the species if found
For opportunistically collected presence-only observations, we can usually tell, at least with some confidence, if a site was searched from the number of observations reported at a site. We usually don’t know if a reporter had good identification skills for a given species or if the reporter would have reported the species if found though.
Some reporters consistently report certain species, often species that are less common. If we can identify reporters that have good identification skills for a focal species and that consistently report the species when found, then we can use their observations to infer absences for the focal species.
A Case Study with the Siberian jay
In our article ‘Can opportunistically collected Citizen Science data fill a data gap for habitat suitability models of less common species?’, we tested if we can infer such absences for the Siberian jay (Perisoreus infaustus). This boreal forest specialist has been impacted by modern forest management and was, until recently, red-listed in Sweden.
We identified the reporters in Artportalen that had contributed the most locations for eight forest bird species, including the Siberian jay. Then, we emailed them a questionnaire, asking if he or she could identify the focal species by sight and sound and if she or he always reports the species when found. Of those that answered, 63% said yes to both questions. These 38 reporters had contributed a staggering 2 million observations to Artportalen during our 14 year study period. No wonder that we actually had to phone some of the most active reporters as they were too busy bird-watching to answer our email questionnaire. One even answered our questions over the phone while out bird-watching!
Evaluating Habitat Suitability Models
With this information, we enhanced the Siberian jay presence-only data with inferred absences. Next, we evaluated habitat suitability models of Siberian jay with opportunistically collected data against independently and systematically collected data from the Swedish Breeding Bird Survey. We evaluated logistic regression with inferred absences and several other methods: two versions of MaxEnt, a site-occupancy-detection model and a presence-only, presence-absence multispecies model (in a slight variation to how it was originally proposed).
All methods produced nationwide habitat suitability maps that were very similar to the habitat suitability map produced using the systematically collected observations. At finer geographical scales, logistic regression with inferred absences was closer to the results from the systematic survey than the other methods. The species-habitat relationships found with logistic regression also matched up well with those found from systematically collected data and with prior expectations based on what we know about the species ecology, compared to the other methods.
Why are Inferred Absences Relevant for Presence-Only Citizen Science Data?
We were able to produce high-quality inferred absences that minimized false absences from observations of very active reporters for the Siberian jay. This gave us habitat suitability models similar to those from a systematic survey. There aren’t many reporters who are so active, but they tend to contribute a lot of observations to citizen science projects. It’s likely that high-quality absences can be inferred for other less common species, too. This has been shown for a fungus, a species in a group that is considerably less popular with reporters than birds.
Importantly, inferred absences are not dependent on the existence of a species checklist. They can be obtained even in poorly studied regions without full knowledge of which species live there. They’re also not dependent on observers being able to identify all species in a group. So, they can be obtained for species in groups for which only very few highly skilled observers can identify all species. You just need enough reporters who can identify the focal species.
Inferred absences may be key to creating reliable habitat suitability models for the many species and regions in the world where systematically collected data are not (or are only sparsely) available.
To find out more, read the full Methods in Ecology and Evolution article ‘Can opportunistically collected Citizen Science data fill a data gap for habitat suitability models of less common species?’