ca(a)
a
| data matrix to be decomposed, the rows representing observations and the columns variables. |
nf
| number of factors or axes to be sought; default 7. |
rproj
| projections of row points on the factors. |
cproj
| projections of column points on the factors. |
evals
| eigenvalues associated with the new factors. These provide figures of merit for the "inertia explained" by the factors. They are usually quoted in terms of percentage of the total, or in terms of cumulative percentage of the total. |
evecs
|
definition of the factors in terms of the original variables.
The first column is the linear combination of columns of
a defining the first factor, etc.
|
rcntr
| contributions of observations to the factors. The contributions are mass times projection (on the factor) squared. Since contributions take account of the mass, they more accurately indicate influential observations for the interpretation of the factor, compared to the projections alone. |
ccntr
| contributions of variables to the factors. See above remark concerning row contributions. |
profiles
.
The question of coding
of input data is an important one. For instance,
in a matrix of scores, one might wish to adjoin extra columns to the input
matrix such that both the initial score, and the maximum score minus it,
are included in the observation's set of values. Note that this has the
effect that all row masses are equal. Hence the variables alone are
differentially weighted. This is known as doubling
the observations.
In the case of binary data, such coding is known as
complete disjunctive form
.
Other forms of input data for which correspondence analysis can be used include frequencies, or contingency-type data. In this case, the totaled chi-squared distances of all (row or column) points from the origin is the familiar chi-squared statistic. Hence the graphical output of correspondence analysis allows assessment of departure from a null hypothesis of no dependence of rows and columns.
Supplementary rows or columns are projected into the factor space, after
carrying out a correspondence analysis. That is to say, such row or
column profiles are assumed to have zero mass, and their projections are
to be found under such an assumption. Functions supplr
and supplc
may
be used for this purpose. Supplementary rows or columns are of a different
nature compared to the basis data analyzed (e.g. sex in the context of a
questionnaire); or they are rows or columns which, one suspects, would
untowardly influence the definition of the factors.
M.J. Greenacre, Theory and Applications of Correspondence Analysis Academic Press, New York, 1984.
L. Lebart, A. Morineau and K.M. Warwick, Multivariate Descriptive Statistical Analysis Wiley, New York, 1984.
S. Nishisato, Analysis of Categorical Data: Dual Scaling and Its Applications University of Toronto Press, Toronto, 1980.
(An extensive annotated bibliography is to be found in Greenacre.)
supplr
, supplc
. Initial data coding:
flou
, logique
. Other related functions: pca
, prcomp
, cancor
,
sammon
, cmdscale
. Plotting tool: plaxes
.# correspondence analysis of the breakfast cereal data, # in complete disjunctive form: bfpos <- t(cereal.attitude) bfneg <- max(bfpos) - bfpos bfposneg <- cbind(bfpos, bfneg) corr <- ca(bfposneg) # plot of first and second factors plot(corr$rproj[,1], corr$rproj[,2],type="n") text(corr$rproj[,1], corr$rproj[,2], labels=dimnames(bfposneg[[1]])) # Place additional axes through x=0 and y=0: plaxes(corr$rproj[,1], corr$rproj[,2]) # check of row contributions corr$rcntr # # Fuzzy coding of input variables, `a', `b', `c': a.fuzz <- flou(a) b.fuzz <- flou(b) c.fuzz <- flou(c) newdata <- cbind(a.fuzz, b.fuzz, c.fuzz) ca.newdata <- ca(newdata)