OBIS QC Flags

Introduction

In the normal processes of uploading data into OBIS, the regional nodes perform a set or quality control checks on the data. Those checks has been defined in the article:

Vandepitte, L., Bosch, S., Tyberghein, L., Waumans, F., Vanhoorne, B., Hernandez, F., … Mees, J. (2015). Fishing for data and sorting the catch: assessing the data quality, completeness and fitness for use of data in marine biogeographic databases. Database: The Journal of Biological Databases and Curation, 2015. https://doi.org/10.1093/database/bau125

The regional nodes interact with the data providers to fix the quality check issues before committing the final upload. OBIS central database also performs quality control checks on the data uploaded, to guarantee that all the records meet the OBIS standards. However, if a record has a quality issue, it is normally not deleted form the database, but flagged. This is more or less common with old records when OBIS wasn’t structured in a regional nodes network.

So it could happen that you the user could end with a set of records with some QC issues, like a missing date or an incomplete taxonomy. To identify those issues OBIS puts a qc flag in each record and when you retrieve the occurrences, the QC checks are summarized in the qc variable.

The QC Flags

In OBIS, the QC Flags covers the following aspects:

  • OBIS data format (1, 10, 17)
  • Taxonomy (2:3)
  • Geography (4:6, 18:19)
  • Completeness (7, 11:16)
  • Species outliers (21:27)
  • Dataset outliers (29:30)

At the moment there are 29 QC flags, numbered from 1 to 30 with disabled flag (qc 8, 9 and 20).

  1. OBIS data format: are the required fields from the OBIS Schema completed?
  2. Taxonomy: is the taxon name matched to WoRMS?
  3. Taxonomy: is the taxon level genus or lower?
  4. Geography (lat/lon): are the latitude/longitude values different from zero?
  5. Geography (lat/lon): are the latitude/longitude values within their possible boundaries? (world coordinates)
  6. Geography (lat/lon): are the coordinates situated in sea or along the coastline (20 km buffer)?
  7. Completeness (date/time): is the sampling year (start/end) completed and valid?
  8. OBIS data format: is the ‘Basis of Record’ documented, and is an existing OBIS code used?
  9. Completeness (date/time): is the sampling date (year/month/day; start/end) valid?
  10. Completeness (date/time): if a start and end date are given, is the start before the end?
  11. Completeness (date/time): if a sampling time is given, is this valid and is the time zone completed?
  12. Completeness (presence/abundance/biomass): is the value of the field ‘ObservedIndividualCount’ empty or >0?
  13. Completeness (presence/abundance/biomass): is the value of the field ‘Observedweight’ empty or >0?
  14. Completeness (presence/abundance/biomass): is the field ‘SampleSize’ completed if the field ‘ObservedIndividualCount’ is >0?
  15. OBIS data format: is the value of the field ‘Sex’ empty or is an existing OBIS code used?
  16. Geography (depth): is minimum depth <= maximum depth?
  17. Geography (depth): is the sampling depth possible when compared with GEBCO depth map (incl. margin)?
  18. Species outliers (environment/depth): is the observation within six MADs from the median depth of this taxon?
  19. Species outliers (environment/depth): is the observation within three IQRs from the first & third quartile depth of this taxon?
  20. Species outliers (environment/SSS): is the observation within six MADs from the median sea surface salinity (SSS) of this taxon?
  21. Species outliers (environment/SSS): is the observation within three IQRs from the first & third quartile sea surface salinity (SSS) of this taxon?
  22. Species outliers (environment/SST): is the observation within six MADs from the median sea surface temperature (SST) of this taxon?
  23. Species outliers (environment/SST): is the observation within three IQRs from the first & third quartile sea surface temperature (SST) of this taxon?
  24. Species outliers (geography): is the observation within six MADs from the distance to the geographic centroid of this taxon?
  25. Species outliers (geography): is the observation within three IQRs from the first & third quartile distance to the geographic centroid of this taxon?
  26. Dataset outliers (geography): is the observation within six MADs from the distance to the geographic centroid of this dataset?
  27. Dataset outliers (geography): is the observation within three IQRs from the first & third quartile distance to the geographic centroid of this dataset?

With robis there are actually two different ways to deal with the QC Flags:

  1. Filter the records in the query(with checklist or occurrence)
  2. Filter the record after the query (manipulating the qc variable in the resulting data frame)

Let see each one with more detail.

Filter your search with QC flags

First lets load the required packages

library(robis)
## it is also good habit to set up your working directory at the beggining. 
## Uncomment the setwd line and insert the path of your directory
## setwd("your_working_directory here")

Both checklist and occurrence accept a vector of qc flag numbers as a parameter of the function.

For this case, we’re interested in the area around Cabo Verde Islands in the East Atlantic

## (you can obtain the WKT terms using the OBIS map application)
wkt =  "POLYGON ((-26.93848 18.18761, -20.47852 18.14585, -20.61035 13.96605, -27.07031 14.00870, -26.93848 18.18761))"