Research

Chemical Data Science – Thriving near the limit of detection

Boring problems are easy to solve and don’t need chemometrics. Fun problems are when you are pushing the performance limits of your instrumentation and you are fighting to extract any chemical information from the noise. A new algorithm or an intelligent tweak to how the data is collected can make a huge difference in project success. Whether a classification, calibration, or signature extraction problem, we are collaborating on applications where the optimized synergy between data science and instrumentation is essential.

We are particularly interested in exploring ways to best handle large amounts of noisy data of the type that might be collected by hand-held sensors or rapid hyperspectral imaging. The paradigm for rapidly collecting and analyzing large amounts of noisy data is markedly different than the traditional analytical chemistry paradigm of slowly collecting and analyzing a few targeted samples with exquisite signal-to-noise and resolution. The available data analyses tools for these two paradigms are quite different. Fast, noisy measurements leading to big data may well be the future of analytical chemistry. We are building the tools to get there.

Quantifying Uncertainty – Assigning Confidence Intervals

Part and parcel with analyzing noisy data is assigning statistically valid confidence intervals to the extracted information. We are adapting two new methods for determining confidence intervals to the different applications in our research portfolio.

First, we are investigating the propagation of correlated measurement errors through models to place limits of discrimination (similar to limits of detection) on the ability to differentiation among two or more target classes. Classical statistics assume all measurements are independent and identically distributed (i.i.d). That is, for example, there is no correlation between random errors on successive spectroscopic wavelengths or time units in a chromatogram. However, residual baselines, wavelength axis instability, and many other physical effects can lead to correlated errors. With this method we can determine which samples cannot be classified at a certain confidence limit.

Also, we are exploring conformal predictions as a tool both to place confidence limits on model predictions and to better assess and compare model performances. Conformal prediction quantifies uncertainty in machine learning predictions by constructing prediction sets or intervals that are guaranteed to contain the true outcome with a specified confidence level. These prediction intervals can vary in size with local model efficacy. Conformal predictions primary advantages are providing valid coverage guarantees, avoiding assumptions regarding the underlying data distribution, and identifying regions where the model ‘struggles’, relatively speaking.

Forensics: Timber Identity and Provenance

In an ongoing collaboration with the US Forest Service and the US Geological Survey, we are developing methods and models to determine the species and origin for timber with hand-held Laser Induced Breakdown Spectroscopy (LIBS) instrumentation. Applications include both combatting illegal trade of endangered species (e.g. Rosewood) and combatting illegal harvesting by determining which area domestic or foreign timber was harvested. We are working to expand the methodology to similar applications such as provenance of illicit drugs and other botanicals.

Art Conservation / Cultural Heritage

Collaborations with researchers in art conservation and cultural heritage provide a wealth of fun applications on which to test our chemical data science methods and hone our spectroscopic skills. With Prof. Jocelyn Alcantara-Garcia, we are using portable Vis/NIR spectroscopy to identify and sort fiber blends and to study the chemistry of ancient dying practices; these to projects will hopefully lead to greener manufacturing practices. With Prof. Roxanne Radpour, we are developing new hyperspectral image analysis techniques to identify pigments in layered mixtures of paints; this project will lead to better preservation of historic artifacts. There is no shortage of projects and collaborations with researchers at the local Winterthur Museum, Garden, and Library (https://www.winterthur.org/ ), an internationally recognized leader in cultural heritage.

Process Analytical Chemistry (PAC)

The research group’s vision is rooted in process analytical chemistry – the development of measurements and models to understand and guide dynamic chemical processes. Currently we are investigating Excitation Emission Matrix (EEM) fluorescence spectroscopy as a means of process analysis for creation of vaccines in yeast and e. coli bioreactors in a collaboration with Dr. Joe Smith at Merck. We have also collaborated with Merck on monitoring the degradation of virus like particles (VLPs) with atomic force microscopy and developing better classification model confidence intervals in biopharma applications.

Future Project: Colored Glasses

We have an extensive collection of 100s of different colored glasses. I want to analyze them with the hand-held LIBS and XRF. Preliminary analyses with both instruments indicate that this would make a great dataset to test different chemometric methods for supervised, unsupervised, and semi-supervised classification models.

Archived Project: Peroxide Value of Edible Oils

Peroxide Value (PV) is a chemical measure of rancidity in edible oils before the rancid taste/smell occurs. We had an interesting collaboration with researchers at Laurence Livermore National Lab (LLNL) to develop chemometric methods to determine PV as the oils aged and identify adulterants in the oils. From LLNL’s perspective, the edible oils are a proxy for lubricants and other organic components that might be found in a complex system that is prone to degradation.

Archived Project: Shocked Mineral Polymorphs

Working with geologists at UD, we were the first group to find TiO2-II, a high-pressure and low-temperature polymorph of titanium dioxide, in nature. Titanium dioxide occurs in many polymorphic forms with rutile and anatase being the most common stable polymorphs. TiO2-II is a rare form, hypothesized to be a geomarker for asteroid impact events. TiO2-II normally would not form on Earth, by going deep enough in the Earth’s crust to achieve sufficient pressure, the temperature would be too great. However, these conditions could be met at the impact of a large meteorite. Using hyperspectral Raman imaging and multivariate curve resolution (MCR), we discovered TiO2-II in the ejecta layer of four different pre-Cambrian asteroid impacts.