Not logged in
PANGAEA.
Data Publisher for Earth & Environmental Science

van Geffen, Femke; Brieger, Frederic; Pestryakova, Luidmila A; Zakharov, Evgenii S; Herzschuh, Ulrike; Kruse, Stefan (2021): SiDroForest: Synthetic Siberian Larch Tree Crown Dataset of 10.000 instances in the Microsoft's Common Objects in Context dataset (coco) format [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.932795

Always quote citation above when using data! You can download the citation in several formats below.

RIS CitationBibTeX CitationShow MapGoogle Earth

Abstract:
This synthetic Siberian Larch tree crown dataset was created for upscaling and machine learning purposes as a part of the SiDroForest (Siberia Drone Forest Inventory) project. The SiDroForest data collection (https://www.pangaea.de/?q=keyword%3A%22SiDroForest%22) consists of vegetation plots covered in Siberia during a 2-month fieldwork expedition in 2018 by the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research in Germany. During fieldwork fifty-six, 50*50-meter vegetation plots were covered by Unmanned Aerial Vehicle (UAV) flights and Red Green Blue (RGB) and Red Green Near Infrared (RGNIR) photographs were taken with a consumer grade DJI Phantom 4 quadcopter. The synthetic dataset provided here contains Larch (Larix gmelinii (Rupr.) Rupr. and Larix cajanderi Mayr.) tree crowns extracted from the onboard camera RGB UAV images of five selected vegetation plots from this expedition, placed on top of full-resized images from the same RGB flights.
The extracted tree crowns have been rotated, rescaled and repositioned across the images with the result of a diverse synthetic dataset that contains 10.000 images for training purposes and 2000 images for validation purposes for complex machine learning neural networks.
In addition, the data is saved in the Microsoft's Common Objects in Context dataset (COCO) format (Lin et al.,2013) and can be easily loaded as a dataset for networks such as the Mask R-CNN, U-Nets or the Faster R-NN. These are neural networks for instance segmentation tasks that have become more frequently used over the years for forest monitoring purposes.
The images included in this dataset are from the field plots: EN18062 (62.17° N 127.81° E), EN18068 (63.07° N 117.98° E), EN18074 (62.22° N 117.02° E), EN18078 (61.57° N 114.29° E), EN18083 (59.97° N 113° E), located in Central Yakutia, Siberia. These sites were selected based on their vegetation content, their spectral differences in color as well as UAV flight angles and the clarity of the UAV images that were taken with automatic shutter and white balancing (Brieger et al. 2019). From each site 35 images were selected in order of acquisition, starting at the fifteenth image in the flight to make up the backgrounds for the dataset. The first fifteen images were excluded because they often contain a visual representation of the research team.
The 117 tree crowns were manually cut out in Gimp software to ensure that they were all Larix trees.Of the tree crowns,15% were included that are at the margin of the image to make sure that the algorithm does not rely on a full tree crown in order to detect a tree.
As a background image for the extracted tree crowns, 35 raw UAV images for each of the five sites were selected were included. The images were selected based on their content. In some of the UAV images, the research teams are visible and those have been excluded from this dataset. The five sites were selected based on their spectral diversity, and their vegetation content. The raw UAV images were cropped to 640 by 480 pixels at a resolution of 72 dpi. These are later rescaled to 448 by 448 pixels in the process of the dataset creation. In total there were 175 cropped backgrounds.
The synthetic images and their corresponding annotations and masks were created using the cocosynth python software provided by Adam Kelly (2019). The software is open source and available on GitHub: https://github.com/akTwelve/cocosynth.
The software takes the tree crowns and rescales and transform them before placing up to three tree crowns on the backgrounds that were provided. The software also creates matching masks that are used by instance segmentation and object detection algorithms to learn the shapes and location of the synthetic crown. COCO annotation files with information about the crowns name and label are also generated. This format can be loaded into a variety of neural networks for training purposes.
Keyword(s):
Larch; machine learning; Siberian Arctic; SiDroForest; synthetic data
Supplement to:
van Geffen, Femke; Heim, Birgit; Brieger, Frederic; Geng, Rongwei; Shevtsova, Iuliia; Schulte, Luise; Stuenzi, Simone Maria; Bernhardt, Nadine; Troeva, Elena I; Pestryakova, Luidmila A; Zakharov, Evgenii S; Pflug, Bringfried; Herzschuh, Ulrike; Kruse, Stefan (2022): SiDroForest: a comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labeled trees, synthetically generated tree crowns, and Sentinel-2 labeled image patches. Earth System Science Data, 14(11), 4967-4994, https://doi.org/10.5194/essd-14-4967-2022
Related to:
Brieger, Frederic; Herzschuh, Ulrike; Pestryakova, Luidmila A; Bookhagen, Bodo; Zakharov, Evgenii S; Kruse, Stefan (2019): Advances in the Derivation of Northeast Siberian Forest Metrics Using High-Resolution UAV-Based Photogrammetric Point Clouds. Remote Sensing, 11(12), 1447, https://doi.org/10.3390/rs11121447
Kelley, A (2019): Complete Guide to Creating COCO Datasets. GitHub repository, https://github.com/akTwelve/cocosynth
Kruse, Stefan; Bolshiyanov, Dimitry Yu; Grigoriev, Mikhail N; Morgenstern, Anne; Pestryakova, Luidmila A; Tsibizov, Leonid; Udke, Annegret (2019): Russian-German Cooperation: Expeditions to Siberia in 2018. Berichte zur Polar- und Meeresforschung = Reports on Polar and Marine Research, 734, 257 pp, https://doi.org/10.2312/BzPM_0734_2019
Lin, Tsung-Yi; et al. (2014): Microsoft COCO: Common Objects in Context. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham., 740-755, https://doi.org/10.1007/978-3-319-10602-1_48
Coverage:
Median Latitude: 61.800000 * Median Longitude: 118.020000 * South-bound Latitude: 59.970000 * West-bound Longitude: 113.000000 * North-bound Latitude: 63.070000 * East-bound Longitude: 127.810000
Date/Time Start: 2018-07-25T00:00:00 * Date/Time End: 2018-08-21T00:00:00
Event(s):
EN18062-2 * Latitude: 62.170000 * Longitude: 127.810000 * Date/Time Start: 2018-07-25T00:00:00 * Date/Time End: 2018-08-21T00:00:00 * Location: Siberia * Campaign: RU-Land_2018_Yakutia (Chukotka 2018) * Basis: AWI Arctic Land Expedition * Method/Device: Unmanned aerial vehicle (UAV)
EN18068-2 * Latitude: 63.070000 * Longitude: 117.980000 * Date/Time Start: 2018-07-25T00:00:00 * Date/Time End: 2018-08-21T00:00:00 * Location: Siberia * Campaign: RU-Land_2018_Yakutia (Chukotka 2018) * Basis: AWI Arctic Land Expedition * Method/Device: Unmanned aerial vehicle (UAV)
EN18074-2 * Latitude: 62.220000 * Longitude: 117.020000 * Date/Time Start: 2018-07-25T00:00:00 * Date/Time End: 2018-08-21T00:00:00 * Location: Siberia * Campaign: RU-Land_2018_Yakutia (Chukotka 2018) * Basis: AWI Arctic Land Expedition * Method/Device: Unmanned aerial vehicle (UAV)
Parameter(s):
#NameShort NameUnitPrincipal InvestigatorMethod/DeviceComment
1Binary ObjectBinaryvan Geffen, Femke
2Binary Object (File Size)Binary (Size)Bytesvan Geffen, Femke
3Binary Object (MD5 Hash)Binary (Hash)van Geffen, Femke
Status:
Curation Level: Basic curation (CurationLevelB)
Size:
3 data points

Download Data

Download dataset as tab-delimited text — use the following character encoding:

View dataset as HTML