Not logged in
PANGAEA.
Data Publisher for Earth & Environmental Science

Schoening, Timm; Schütt, Andrea (2017): Simulated Hierarchical Benchmark Dataset to assess dendro-classification methods (hierarchical classification) [dataset]. GEOMAR - Helmholtz Centre for Ocean Research Kiel, PANGAEA, https://doi.org/10.1594/PANGAEA.884173

Always quote citation above when using data! You can download the citation in several formats below.

Published: 2017-12-18DOI registered: 2018-01-15

RIS CitationBibTeX Citation Share

Abstract:
A hierarchically ordered distribution of 3D-points was created with matlab. It contains 120,000 datapoints in five hierarchical levels with one to four child nodes per parent. Data values for the three axes range betwwen 0 and 1. The structure can be seen in the attached figure. In each hierarchical level different distributions of datapoints are implemented. This allows to test classifiers under various conditions. The most common distribution in the dataset is a simple gaussian distributed point cloud. Other sampled distributions are a spherical distribution (sphere in 3D), or a circular (donut) distribution along different axes. XOR distributions are implemented in different patterns, e.g. four batches with crossed classes or eight batches with two or four classes. The most complex data distribution is the springroll, where the datapoints are intertwined into one another. To create indistinguishable cases, where the prediction of a classifier is supposed to perform bad, some datapoints are just randomly intermixed with another class.
The .csv-file contains four columns: label | x-coordinate | y-coordinate | z-coordinate
The label for each sample provides all hierarchical information needed. Each label is composed of five digits, one for each hierarchical level. As an example:
Sample '11421':
Hierarchical level 1: class 1
Hierarchical level 2: class 1
Hierarchical level 3: class 4
Hierarchical level 4: class 2
Hierarchical level 5: class 1
Size:
1.2 MBytes

Download Data

Download dataset