Hierarchical Classification of Complex Data

Classification of multivariate data is practiced in many fields, and is fairly well established. Despite years of work and thousands of published manuscripts, there is rather little available on several aspects that are key to its use in forensics and provenance analysis. The goal of the work carried out in this research is the development of systematic methodology to allow classification in complex data such as that observed in watersheds, forensics, images, and in other fields where there are many closely-related classes present in data. As the number of distinct classes increases, success rates from traditional classification usually decrease significantly, especially when the classification is attempted globally. An alternative approach to classification is needed to simplify the task. We demonstrated in earlier work that a tree-structured hierarchical classification model, with locally-defined classes, could be used to distinguish among 18 watershed classes with good accuracy. (Chen and Brown, 2014) The class hierarchy was found there by a long series of separate clustering and modeling steps, and the mechanism for training the hierarchy was not studied in detail, nor was the process for assessing the hierarchy studied. It is therefore unclear what can be gained by use of similar hierarchies, how these hierarchies should be trained for best performance, or what methodology for generating hierarchies gives rise to ones with the strongest classification performance. Most of the work reported so far focuses on applications and does not consider these important, general issues in modeling. This research is intended to address these aspects of hierarchical modeling.

usgs_classes

Tree-based classifier for 18 regions of US

US sites

                       The 18 regions

We are now studying the methodology for creation and use of the hierarchy to optimize classifier performance and to obtain an estimate of uncertainty. We are also investigating how a hierarchy can be generated from a dataset and whether the details of the hierarchy matter in the performance of the classifier.

L. Chen, S.D. Brown, Use of a tree-structured hierarchical model for estimation of location and uncertainty in multivariate spatial data, J. Chemometrics. 28 (2014) 523–538. doi:10.1002/cem.2611.

©2016 University of Delaware

Site Created by S.D. Brown       

Last revision: 23 August 2016

 

Print Friendly, PDF & Email