This series of columns has been running for a long time. Long-time readers will recall that it has even changed its name since
its inception. The original name was "Statistics in Spectroscopy." This was a multiple pun, as it referred to the science of Statistics in the journal Spectroscopy and the science of Statistics in the science of Spectroscopy as well as statistics (the subject of the science of Statistics) in the journal Spectroscopy. [See our third column ever (1) for a discussion of the double meaning of the word "Statistics." The same discussion
is found in the book based upon those first 38 columns (2).]
Our goal then, as now, was to bring the study of chemometrics and the study of statistics closer together. While there are
isolated points of light, it seems that many people who study chemometrics have no interest in and do not appreciate the statistical
background upon which many of our chemometric techniques are based, nor do they appreciate the usefulness of the techniques
that we could learn from that discipline. Worse, there are some who actively denigrate and oppose the use of statistical concepts
and techniques in the chemometric analysis of data. The first group can, perhaps claim unfamiliarity (ignorance?) with statistical
concepts. It is difficult, however, to find excuses for the second group.
Nevertheless, at its very fundamental core, there is a very deep and close connection between the two disciplines. How could
it be otherwise? Chemometric concepts and techniques are based upon principles that were formulated by mathematicians hundreds
of years ago, even before the label "statistics" was applied to the subfield of mathematics that deals with the behavior and
effect of random numbers on data. Nevertheless, recognition of statistics as a distinct subdiscipline of mathematics also
goes back a long way, certainly long before the term "chemometrics" was coined to describe a subfield of that subfield.
Before we discuss the relationship between these two disciplines, it is, perhaps, useful to consider what they are. We have
already defined "statistics" as ". . . the study of the properties of random numbers . . ." (3). A definition of "chemometrics" is a little trickier of come by. The term originally was coined by Kowalski, but currently,
many chemometricians use the definition by Massart (4). On the other hand, one compilation presents nine different definitions
for "chemometrics" (5,6) (including "what chemometricians do," a definition that apparently was suggested only half humorously).
But our goal here is not to get into the argument over the definition of the term, so for our current purposes, it is convenient
to consider a somewhat simplified definition of "chemometrics" as meaning "multivariate methods of data analysis applied to
data of chemical interest."
This definition is convenient because it allows us to then jump directly to what is arguably the simplest chemometric technique
in use, and consider that as the prototype for all chemometric methods; that technique is multiple regression analysis. Written
out in matrix notation, multiple regression analysis takes the form of a relatively simple matrix equation:
where B represents the vector of coefficients, A represents the matrix of independent variables, and C represents the vector of dependent variables.