Not logged in
PANGAEA.
Data Publisher for Earth & Environmental Science

Tait, Alexander M; Brumby, Steven P; Hyde, Samantha Brooks; Mazzariello, Joseph; Corcoran, Melanie (2021): Dynamic World training dataset for global land use and land cover categorization of satellite imagery [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.933475

Always quote citation above when using data! You can download the citation in several formats below.

RIS CitationBibTeX CitationShow MapGoogle Earth

Abstract:
The Dynamic World Training Data is a dataset of over 5 billion pixels of human-labeled ESA Sentinel-2 satellite image, distributed over 24000 tiles collected from all over the world. The dataset is designed to train and validate automated land use and land cover mapping algorithms. The 10m resolution 5.1km-by-5.1km tiles are densely labeled using a ten category classification schema indicating general land use land cover categories. The dataset was created between 2019-08-01 and 2020-02-28, using satellite imagery observations from 2019, with approximately 10% of observations extending back to 2017 in very cloudy regions of the world. This dataset is a component of the National Geographic Society - Google - World Resources Institute Dynamic World project.
The dataset consists of two file types: GeoTIFF files of 510x510 pixel 10m resolution satellite image tiles markup provided by human labelers, and Excel (.xlsx) tables of metadata and class statistics for the above GeoTIFF files. The data is organized into three main folders. One folder contains training data labeled by a team of 25 expert human labelers recruited by National Geographic Society specifically for this project. A second folder contains training data labeled by a larger group of commissioned labelers provided by a commercial crowd-labeler service. The data in these folders is organized by hemisphere and biome number from the RESOLVE Ecoregions2017 biomes categories (https://ecoregions2017.appspot.com/). A third folder contains a validation dataset. This is a holdout set of training data for assessing model accuracy. None of this data is intended to be used in the formulation of the model. Each validation tile was independently labeled by three experts. The validation set contains two versions: the individual markup from each expert labeler, and the image composites of the individual markups.
Each GeoTIFF file encodes information on the location of landscape feature classes as determined by a given labeler. Classes were labeled by visual examination of true color (RGB) composites of Sentinel-2 MultiSpectral Level-2A scenes. The Tier 1 class values used in this phase of the project are as follows: 0 No data (left unmarked), 1 Water, 2 Trees, 3 Grass, 4 Flooded Vegetation, 5 Crops, 6 Scrub, 7 Built Area, 8 Bare Ground, 9 Snow/Ice, 10 Cloud. This dataset does not include the original Sentinel-2 imagery tiles, but metadata on the exact image ID and date is provided The original Sentinel-2 imagery was obtained via Google Earth Engine.
This data is available under a Creative Commons BY-4.0 license and requires the following attribution: This dataset is produced for the Dynamic World Project by National Geographic Society in partnership with Google and the World Resources Institute. Development of the Dynamic World training data was funded in part by the Gordon and Betty Moore Foundation.
Keyword(s):
land use and land cover; satellite image analysis
Coverage:
Median Latitude: 12.671000 * Median Longitude: -177.328000 * South-bound Latitude: -55.508000 * West-bound Longitude: 179.626000 * North-bound Latitude: 80.850000 * East-bound Longitude: -174.282000
Date/Time Start: 2017-03-28T00:00:00 * Date/Time End: 2019-12-12T00:00:00
Event(s):
LandUseCover_2017_2019 * Latitude Start: -55.508000 * Longitude Start: -174.282000 * Latitude End: 80.850000 * Longitude End: 179.626000 * Date/Time Start: 2017-03-28T00:00:00 * Date/Time End: 2019-12-12T00:00:00 * Method/Device: Satellite imagery (SATI)
Parameter(s):
#NameShort NameUnitPrincipal InvestigatorMethod/DeviceComment
File contentContentTait, Alexander M
Binary Object (File Size)Binary (Size)BytesTait, Alexander M
Binary ObjectBinaryTait, Alexander M
Status:
Curation Level: Basic curation (CurationLevelB)
Size:
10 data points

Data

Download dataset as tab-delimited text — use the following character encoding:

All files referred to in data matrix can be downloaded in one go as ZIP or TAR. Be careful: This download can be very large! To protect our systems from misuse, we require to sign up for an user account before downloading.


Content

Binary (Size) [Bytes]

Binary
Description5.4 kBytesREADME.txt
Metadata for the main training dataset4.4 MBytesv1_dw_tile_metadata_for_public_release.xlsx
Experts - training data labeled by expert labelers47.8 MBytesExperts_tiles.zip
Non_expert - training data labeled by commissioned labelers from Labelbox214.7 MBytesNon_expert_tiles.zip
validation_set - the holdout set of training data for assessing model accuracy29 MBytesvalidation_set_tiles.zip