Not logged in
PANGAEA.
Data Publisher for Earth & Environmental Science

Schürholz, Daniel; Chennu, Arjun (2022): Dense and taxonomically detailed habitat maps of coral reef benthos machine-generated from underwater hyperspectral transects in Curaçao [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.946315

Always quote citation above when using data! You can download the citation in several formats below.

RIS CitationBibTeX CitationShow MapGoogle Earth

Abstract:
This dataset contains 248 benthic habitat maps, that were created from 31 underwater hyperspectral images captured with the HyperDiver device in 8 reef sites across the western coastline of Curacao (see https://doi.org/10.3390/data5010019 for information on the acquisition of the transects). The maps were produced by 8 combinations of two semantic labelspaces (detailed and reefgroups), two machine learning classifiers (patched and segmented), and two spectral signals (radiance and reflectance). Maps in the detailed labelspace have each pixel assigned to one of 43 labels, which are taxonomic labels at family, genus and species levels for biotic components of the reef (corals, sponges, macroalgae, etc.), as well as substrate labels (sediment, cyanobacterial mats, turf algae) and survey material labels (transect tape, reference board, etc.). The set of maps in the reefgroups labelspace cluster the labels in the detailed labelspace into 11 classes that describe reef functional groups (i.e. corals, sponges, algae, etc.). All habitat maps were produced with high accuracy (Fbeta 87%), by two different machine learning methods: a random forest ensemble classifier (segmented method) and a deep learning neural network classifier (patched method). The maps are further divided by the signal type from the hyperspectral image that was used, either radiance or reflectance (the latter was calculated with a reference board located at the beginning and end of each transect). These benthic habitat maps can be used to obtain accurate descriptions of the benthic community and habitat structure of coral reef sites in Curacao. The dataset also contains: an assessment of the accuracy and data efficiency of the machine learning methods, a consistency assessment of the mapped regions, a comparison of habitat metrics (class coverage, biodiversity indices, composition and configuration) between habitat maps produced by each method, and an effort-vs-error analysis of sparse sampling techniques on the densely classified maps.
Keyword(s):
Biodiversity; Classification; Coral Reef; Habitat Mapping; hyperspectral imaging; machine learning; Taxonomy; underwater
Supplement to:
Schürholz, Daniel; Chennu, Arjun (2022): Digitizing the coral reef: Machine learning of underwater spectral images enables dense taxonomic mapping of benthic habitats. Methods in Ecology and Evolution, 2041-210X.14029, https://doi.org/10.1111/2041-210X.14029
Related to:
Chennu, Arjun; Färber, Paul; De'ath, Glenn; de Beer, Dirk; Fabricius, Katharina Elisabeth (2017): A diver-operated hyperspectral imaging and topographic surveying system for automated mapping of benthic habitats. Scientific Reports, 7(1), 19, https://doi.org/10.1038/s41598-017-07337-y
Chennu, Arjun; Rashid, Ahmad Rafiuddin; den Haan, Joost; de Beer, Dirk (2020): Taxonomically annotated underwater hyperspectral and color images of coral reef transects from Curaçao. PANGAEA, https://doi.org/10.1594/PANGAEA.911300
Rashid, Ahmad Rafiuddin; Chennu, Arjun (2020): A Trillion Coral Reef Colors: Deeply Annotated Underwater Hyperspectral Images for Automated Classification and Habitat Mapping. Data, 5(1), 19, https://doi.org/10.3390/data5010019
Funding:
Horizon 2020 (H2020), grant/award no. 813360: 4D_REEF
Coverage:
Median Latitude: 12.147903 * Median Longitude: -68.964690 * South-bound Latitude: 12.042249 * West-bound Longitude: -69.158931 * North-bound Latitude: 12.375344 * East-bound Longitude: -68.745104
Event(s):
Carmabi * Latitude: 12.122331 * Longitude: -68.969234 * Location: Curacao * Method/Device: Hyperspectral imaging
East_Point * Latitude: 12.042249 * Longitude: -68.745104 * Location: Curacao * Method/Device: Hyperspectral imaging
Habitat * Latitude: 12.197850 * Longitude: -69.079558 * Location: Curacao * Method/Device: Hyperspectral imaging
Comment:
The files for the habitat maps are in the form: habitat_maps_dataset/habitat_maps/transects/transect_<num>/habitat_map_<labelspace>_<spectrum>_<method>.<ext>
where the parameters can have the following values:
num: 005, 006, 019, 024, 026, 028, 031, 043, 044, 046, 054, 080, 081, 082, 084, 085, 086, 090, 091, 095, 097, 102, 107, 114, 118, 125, 129, 130, 132, 134, 141
labelspace: detailed, reefgroups
spectrum: radiance, reflectance
method: patched, segmented
ext: nc or jpg
For each transect, the following files are available:
habitat_map_<labelspace>_<spectrum>_<method>.nc in netCDF4 format: contains the habitat map data for the given combination of semantic labelspace, signal type (or spectrum) and machine learning method used. The map data contains a 2D (Y, X) dataarray classmap which has an integer in each position. The integers are a code for each class. To decode the class integers into the class labels, a lookup table for each labelspace is provided in the attributes 'label' and 'label_id' of the data array for each class map.
habitat_map_<labelspace>_<spectrum>_<method>.jpg: an image file that visualizes the habitat map with a corresponding color for each class.
Parameter(s):
#NameShort NameUnitPrincipal InvestigatorMethod/DeviceComment
Binary ObjectBinarySchürholz, Daniel
Binary Object (Media Type)Binary (Type)Schürholz, Daniel
Binary Object (File Size)Binary (Size)BytesSchürholz, Daniel
File contentContentSchürholz, Daniel
Status:
Curation Level: Enhanced curation (CurationLevelC)
Size:
12 data points

Data

Download dataset as tab-delimited text — use the following character encoding:

All files referred to in data matrix can be downloaded in one go as ZIP or TAR. Be careful: This download can be very large! To protect our systems from misuse, we require to sign up for an user account before downloading.


Binary

Binary (Type)

Binary (Size) [Bytes]

Content
effort_vs_error.zipapplication/zip94.4 kByteseffort-vs-error analysis of sparse sampling techniques on the densely classified maps. One file contains the values for the habitat metrics (Shannon index, Gini Simpson index and class coverages) for the dense maps from each machine learning method and labelspace combination. The second file contains the 95% quantile error from the relative metric (in the previous file) for the same metric calculated from random sparse sampling experiments with different number of points. Both files are saved in TAB-separated csv format.
region_consistency.zipapplication/zip816 Bytescontains the consistency calculations of predicted regions in the habitat maps vs. those same regions in the manual annotations set. Different combinations of method, labelspace, spectrum, post-processing (without any and with Dense Conditional Random Fields) and transect sets (learning and validation) were used. The file is saved in TAB-separated csv format.
habitat_metrics.zipapplication/zip11.5 kBytesone file contains the values for the habitat metrics (Shannon index , Gini Simpson index and class coverages) for the dense maps from each method and labelspace combination. A second file contains a compositional comparison (using Bray-Curtis similarity) across method of the habitat maps in each labelspace. A third file contains a compositional comparison across labelspaces of habitat maps created by the same method. The last file contains a configurational analysis (using Jaccard score) of the habitat maps across labelspaces of the same method. The files are saved in TAB-separated csv format.
ml_classifiers_assessment.zipapplication/zip20.1 kBytesin this folder we provide confusion matrices of the machine learning classifiers on a test set of transect regions for each combination of labelspace and spectral signal type. We also provide per class performance metrics and summary metrics for the same combinations of method, labelspace and spectral signal type. All the files in this folder are saved in TAB-separated csv format.
ml_data_experiments.zipapplication/zip4.8 kBytesin this folder we provide two files. One provides a spectral bands ablation study, where the performance of the machine learning classifiers was compared when subsets of spectral bands were used. The second file contains performance data of an experiment where increasing sizes of training data (measured in unique pixels) were used to train the classifiers under constant computational resources. Both the files in this folder are saved in TAB-separated csv format.
habitat_maps.zipapplication/zip250.5 MBytesthis folder contains the final benthic habitat maps created from different combinations of machine learning method, labelspace and spectral signal type (or spectrum). It also contains the definition of each labelspace in an image and a YAML file.