Prof. Steven Brown
Office Hours: Th, F 10-11
Course meets: Fall 2019 11:00-12:15 T, Th 215 Willard Hall
Introduction to Multivariate Statistical Analysis in Chemometrics, by K. Varmuza and P. Filzmoser, CRC Press 2009. ISBN978-1-4200-5947-2. (You can get this text new or used, from the UD Bookstore or elsewhere.)
There will also be required readings from handouts and papers from the literature available for download at the Chem 623 site on Canvas.
You will need to be registered for the course as a student or as a listener to gain access to the Canvas site.
Other Required Items:
1. Ready access to a suitable laptop computer where you can install and use R data analysis software. This can be a group computer or your own computer. You will need access to the internet from this computer.
2. R computational software installed and functional on your computer. You can get R at no cost from The Comprehensive R Archive Network (CRAN; http://cran.r-project.org). Although you can do all that is necessary using base R, it is much easier if you have an IDE editing and plotting environment on top of R. The one that most R users select is RStudio (http://www.rstudio.com). This is an open-source, free software package, and it runs on Mac OsX, Linux/Unix and Windows.
3. Software for use in the R computational environment. This semester, we will use several public-domain R packages. There are packages for chemometrics from Filzmoser and Wehrens, as well as some others. These packages, as well as thousands of others, can be found at the CRAN site and installed into the R programming environment using R or RStudio. All are public-domain, open-source software packages available at no charge.
Materials to be used during the course include: simulations, short video clips and analysis of some datasets using software provided and software available from the internet. You may need to download data sets and public-domain software packages and to check some work against published results available on the internet. You will be asked to submit your work in digital form (as a PDF file) via Canvas. In general, long lists of program output will not be acceptable as an answer to homework or exam questions in this course; you will need to become proficient at plotting and summarizing results of analyses to assemble results in the form of a report. Results from an analysis must be integrated into your work, and may not be submitted as separate files.
Note: If you already have software and know another modern data analysis language, you may use it in this course, though I do not recommend doing so. Work in Matlab (for example) for homework and exams, while not optimal, is acceptable, as is python. You will need to cheerfully accept any limitations imposed by your choice, port my R code, and you must find suitable replacement chemometrics and data analysis packages, a task that will be mostly non-trivial in Matlab, but possibly achievable in python. Note that Excel has a commercial chemometrics package, but please note that Excel/Visual Basic is NOT an acceptable language for work in this course.
Full lecture notes will be available in PDF format online, through Canvas. The instructor will also provide PDF or hardcopy of supporting material.
You are expected to attend all lectures and demonstrations and to read material that I provide. I am unable to provide transcripts for all demonstrations and video clips shown in the course, but you will have source code in R for much of the material presented.
Please note that the instructor retains copyright on all materials associated with the course, except where noted or where copyright is held by others, and you will need to get written permission from me to distribute course materials.
Student feedback on instruction:
I will ask for student feedback at midterm for course/instructor improvement purposes. There will also be an end-of-term student evaluation with a supplement to our departmental student evaluation form. I welcome student comments and constructive criticism at any time.
Chemistry 623 is a graduate-level, overview course in the analysis of data generated from instrumentation used in chemistry, biochemistry and related fields. The emphasis is on the understanding and practical application of chemometric methods. The course is intended for graduate students or for advanced majors in chemistry, biochemistry or chemical engineering who need to analyze data obtained from such instruments. This course presumes some knowledge of basic statistics and some prior exposure to simple computational computer programming. Brief reviews of concepts are provided to provide background, as is a discussion of pre-processing of chemical signals to improve signal quality. The course’s main focus is on the systematic evaluation of high-dimensional data through multivariate calibration and classification of multivariate chemical responses.
Course Requirements and Policies
This course is an introduction to computational statistical analysis of data from chemical instrumentation. Chemometrics involves math, so you will need to become comfortable with probability, statistical tests, all sorts of matrix algebra and projections and regression. There is also some small amount of R programming expected.
You will learn a mixture of theory and practice, and will be asked to apply the theory in working computer code. Almost all of the code that you will need will be made available to you, but you will need to know how to find a suitable package, to incorporate it, to make your data available to it, and to use it effectively to do some of the work required. This skill will enable you to make effective use of the large code base available on CRAN and other R repositories, such as BioconductoR.
Each homework set will involve some theory, some computation and some critical evaluation of results.
The instructor does not plan any absences. If weather causes any missed classes, these will be made up if possible. Make up lecture dates and times will be announced.
The instructor takes scientific integrity very seriously. You are encouraged to become familiar with The University’s Policy of Academic Honesty (Links to an external site.) found in the UD Student Guide to University Policies. More on the whole issue of academic integrity can be found here (Links to an external site.). Policies delineated in the Guide apply to this course. While homework sets for Chem 623 can be done in collaboration with others enrolled in the course, all work on the out-of-class examinations and projects must be done entirely independently. By turning work into the instructor of this course, you acknowledge being made aware of the academic honesty policy and affirm your adherence to the letter and spirit of the policy.
Homework deadlines are posted and you are expected to meet the deadlines. If you have a problem and cannot make a deadline, please let me know. I may be able to allow some extra time for a once-only problem. Repeated late work will be penalized. Work missed for a reason – a documented illness or family emergency, or conference/job-related travel (but only if you advise me by e-mail in advance of the travel), etc. – can be made up without penalty.
Grading, Evaluation Policies and Procedures:
The course will be marked on the basis of your performance on homework, on a 5-day take-home final exam and on a project. The grade given will be determined on the basis of the total number of points earned.
The distribution of points is as follows:
Homework (4 sets, each worth 25 pts): 100 pts
Final Exam (12/5-10/2019): 100 pts
Project ( Report Due 12/5/2019 ): 100 pts
TOTAL: 300 pts
> 240 pts A
210-239 pts B
150-209 pts C
120 -149pts D
< 120pts F
Plus and minus adjustments may be added at the discretion of the instructor. The average grade earned by previous students in this course has been B+, but the nature of this course pre-selects for motivated, prepared students.
Tentative Schedule for Lectures/Demos/Class Exercises:
All class meetings are scheduled for 1100-1215 TR in 215 Willard Hall.
The schedule of topics given below is approximate and may vary to reflect scheduling changes and student needs.
Week Topics to be Covered
8/29/19 Overview of Chemometrics, Introduction to R
9/ 5/ 19 Estimation, Prediction, Confidence Intervals and Statistical Testing
9/12/19 Exploratory Data Analysis-1
9/19/19 Exploratory Data Analysis-1(continued)
9/26/19 Exploratory Data Analysis-2
10/3/19 Methods for Modeling Multivariate Data
10/10/19 Chemical Calibration-1- Curve Fitting Multivariate Data
10/17/19 Chemical Calibration-2- Classical and Inverse Analysis
10/24/19 Chemical Calibration-3- Soft Calibration
10/31/19 Chemical Calibration-4- Preprocessing
11/ 6/19 Chemical Calibration-5- Multiway Methods
11/13/19 Classification Methods-1- KNN and Discriminants
11/20/19 Classification Methods-2- Unsupervised Methods
11/27/19 Classification Methods-3- Advanced Methods
12/ 5/19 Self-Modeling Methods in Evolving Systems
General course information
This course presumes some knowledge of chemical instrumentation at the level of Chem 437 and Chem 438.
Students should also have had an exposure to basic statistics as covered in an elementary statistics course or in Chem 120/220. Prior experience with some scientific programming is not required but will be helpful.
This course covers an introduction to analysis of data, with some emphasis on data from chemical instrumentation.
A brief review of basic statistics and probability is given. Regression methods are introduced to model the sources of variance, and approaches are covered to develop, evaluate and improve regression models.Soft modeling is developed as a way to deal with bias created from model-data mismatches, and soft modeling-based projections are shown effective at visual examination of multivariate data such as spectra. Methods are then presented to relate multivariate data to group and to external properties. Prediction of group membership is demonstrated, and prediction of external property is discussed in some detail. A brief introduction to signal processing is provided. As a final topic, methods for discovery of underlying chemical signatures of the pure components comprising a mixed response is discussed and methods for systematic discovery of those components are presented
Students completing this course should be able to read, understand, and critically evaluate literature making use of basic techniques used in computational multivariate statistics and chemometrics. They should also be able to perform basic chemometric data analysis by using existing R code or by making small modifications to R code provided by others.
Completion of this course will provide the student a foundation for research in measurement-oriented chemistry or computational data analysis. It also prepares students for more advanced work in computational modeling or chemometrics and for applying chemometric methods in other research projects.
This course meets Departmental Objectives 1, 2, 4, 5, 6, 9.
Last Updated: 19 June 2019
Copyright © 2013-9 University of Delaware