Leader: Eric Orenstein (SIO-UCSD)
Scientists are increasingly relying on digital imaging technology to study marine organisms. These tools promise to yield new insight into ecosystem function by densely sampling in space and time. But drawing conclusions and developing long term monitoring programs based on imaging systems is challenging due to the sheer volume of data they produce.
This tutorial will use a labeled plankton data set drawn from the Scripps Plankton Camera System to illustrate a variety of supervised image classification methods. The materials will cover basic image manipulation, feature extraction, margin, ensemble, and neural network classification. All coding examples and exercises will be presented in Python. These techniques are broadly applicable to all sorts of image data, from the macro- to the microscale. Participants are encouraged to bring their own images for experimentation.
Leaders: Tristan Cordier (University of Geneva), Anders Lanzén (AZTI-Tecnalia)
High-throughput amplicon sequencing of environmental DNA (eDNA metabarcoding) is a molecular technique that enables profiling of both the composition and diversity (both known and unknown “species”) of biological communities, directly from environmental samples. It provides biologists with an unprecedented amount of ecologically meaningful data. These tools have been recently tested in an environmental monitoring context, in which the bioindicator values of the species present in an environment indicate its Ecological Quality Status (EQS), i.e. its level of disturbance, usually caused by anthropogenic pressures. Those studies showed that anthropogenic impact can be clearly detected from metabarcoding data, although the taxonomic identification of many sequences remain challenging, hampering the transition of existing monitoring tools and practices into the genomics era. Recent studies have shown that Supervised Machine Learning (SML) algorithms can be successfully used to overcome this challenge in many cases, enabling robust predictive models that can provide EQS predictions from metabarcoding data, regardless of the taxonomic affiliations.
This tutorial will first provide a brief overview of general bioinformatics practices for processing of metabarcoding data, to obtain an exploitable contingency matrix (“OTU table”). We will then focus on the training and testing (cross-validation) of SML models to predict the EQS of samples in a monitoring context. Data collected along a pollution gradient in coastal environments in Norway will be provided and analysed, with the possibility to explore participants’ own data as well. The tutorial will be carried out using command line tools and R, and therefore requires basic familiarity with GNU/Linux (bash) as well as basic statistics commands and data manipulation (“programming”) in R.
Leaders: Danelle Cline (MBARI), John Ryan (MBARI)
Passive acoustic sensing in the ocean provides a wealth of information about the presence and activity of marine life, as well as anthropogenic noise that can negatively impact marine life. This mode of sensing generates very large and complex data sets. In this research domain, ML is proving to be highly effective. This tutorial will cover ML fundamentals, optimum decimation filtering, spectrogram enhancement methods, and classification using convolutional neural networks (CNN). We will focus on end-to-end analysis methods for one type of sound source: low frequency whale calls, including detection and classification. Some basic experience in Python programming in Jupyter notebooks would be beneficial. Data will be provided for the tutorial. If you wish to bring your own data for experimenting outside of class time, please contact us to understand required dataset organization.