Bernard, Jürgen; Ruppert, Tobias; Scherer, Maximilian; Schreck, Tobias; Kohlhammer, Jörn (2012): Reference list of 265 sources used for the discovery of relationships between data clusters and metadata properties. PANGAEA, https://doi.org/10.1594/PANGAEA.785666
Always quote above citation when using data! You can download the citation in several formats below.
Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.
Bernard, Jürgen; Ruppert, Tobias; Scherer, Maximilian; Schreck, Tobias; Kohlhammer, Jörn (2012): Guided discovery of interesting relationships between time series clusters and metadata properties. Special Track on Theory and Applications of Visual Analytics, i-KNOW 2012 conference proceedings, https://doi.org/10.1145/2362456.2362485
Median Latitude: 15.289839 * Median Longitude: 3.897982 * South-bound Latitude: -89.983000 * West-bound Longitude: -156.607000 * North-bound Latitude: 78.925000 * East-bound Longitude: 167.731000
Date/Time Start: 1992-01-01T00:00:00 * Date/Time End: 2017-12-31T00:00:00
Minimum Elevation: 0.0 m * Maximum Elevation: 2800.0 m
BAR (Barrow) * Latitude: 71.323000 * Longitude: -156.607000 * Date/Time: 1992-01-01T00:00:00 * Elevation: 8.0 m * Location: Alaska, USA * Campaign: WCRP/GEWEX * Method/Device: Monitoring station (MONS) * Comment: BSRN station no: 22; Surface type: tundra; Topography type: flat, rural; Station scientist: Sara Morris (Sara.Morris@noaa.gov)
BER (Bermuda) * Latitude: 32.267000 * Longitude: -64.667000 * Date/Time: 1992-01-01T00:00:00 * Elevation: 8.0 m * Location: Bermuda * Campaign: WCRP/GEWEX * Method/Device: Monitoring station (MONS) * Comment: BSRN station no: 24; Surface type: water, ocean; Topography type: flat, rural; Horizon: doi:10.1594/PANGAEA.669510; Station scientist: Sara Morris (Sara.Morris@noaa.gov)
BOU (Boulder) * Latitude: 40.050000 * Longitude: -105.007000 * Date/Time Start: 1992-01-01T00:00:00 * Date/Time End: 2016-06-30T00:00:00 * Elevation: 1577.0 m * Location: Colorado, United States of America * Campaign: WCRP/GEWEX * Method/Device: Monitoring station (MONS) * Comment: BSRN station no: 23; Surface type: grass; Topography type: flat, rural; Station scientist: David Longenecker (David.U.Longenecker@noaa.gov) ** Station closed in July 2016 **
The dataset contains 265 links (childs) to any of the BSRN datasets. Any user who accepts the BSRN data release guidelines (http://bsrn.awi.de/data/conditions-of-data-release) may ask Gert König-Langlo (Gert.Koenig-Langlo@awi.de) to obtain an account to download these datasets.
1060 data points