mclustDA {mclust}R Documentation

MclustDA discriminant analysis.

Description

MclustDA training and testing.

Usage

mclustDA(train, test, pro=NULL, G=NULL, modelNames=NULL, prior=NULL, 
         control=emControl(), initialization=NULL, 
         warn=FALSE, verbose=FALSE, ...)

Arguments

train

A list with two named components: data giving the data and labels giving the class labels for the observations in the data.

test

A list with two named components: data giving the data and labels giving the class labels for the observations in the data. The labels are used only to compute the error rate in the print method and can be set to NULL if unknown. The default is to test the training data.

pro

Optional prior probabilities for each class in the training data.

G

An integer vector specifying the numbers of mixture components (clusters) for which the BIC is to be calculated. The default is G=1:9.

modelNames

A vector of character strings indicating the models to be fitted in the EM phase of clustering. The help file for mclustModelNames describes the available models. The default is c("E", "V") for univariate data and mclustOptions()\$emModelNames for multivariate data.

prior

The default assumes no prior, but this argument allows specification of a conjugate prior on the means and variances through the function priorControl.

control

A list of control parameters for EM. The defaults are set by the call emControl().

initialization

A list containing zero or more of the following components:

  • hcPairs A matrix of merge pairs for hierarchical clustering such as produced by function hc. The default is to compute a hierarchical clustering tree by applying function hc with modelName = "E" to univariate data and modelName = "VVV" to multivariate data or a subset as indicated by the subset argument. The hierarchical clustering results are used as starting values for EM.

  • subset A logical or numeric vector specifying a subset of the data to be used in the initial hierarchical clustering phase.

warn

A logical value indicating whether or not certain warnings (usually related to singularity) should be issued when estimation fails. The default is to suppress these warnings.

verbose

A logical variable telling whether or not to print an indication that the function is in the training phase, which may take some time to complete.

...

Catches unused arguments in indirect or list calls via do.call.

Details

mclustDA combines functions mclustDAtrain and mclustDAtest and their summaries. This is suitable when all test data are available in advance, so that the training model is only used once.

Value

A list with the following components:

test

A list with the following components:

classification

The classification of the test data for this instance of mclustDA.

uncertainty

The uncertainty of the classification (0 least certain, 1 most certain).

labels

The test labels (if any) from the input.

training

A list with the following components:

classification

The classification of the training data for this instance of mclustDA.

z

A matrix whose [i,k]th entry is the probability that observation i in the training data belongs to the kth class.

labels

The training labels from the input.

summary

A data frame summarizing the mclustDA results including the mixture models and numbers of components for the training classes.

References

C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611-631.

C. Fraley and A. E. Raftery (2006). MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering, Technical Report no. 504, Department of Statistics, University of Washington.

See Also

plot.mclustDA, mclustDAtrain, mclustDAtest, classError

Examples

n <- 250 ## create artificial data
set.seed(1)
triModal <- c(rnorm(n,-5), rnorm(n,0), rnorm(n,5))
triClass <- c(rep(1,n), rep(2,n), rep(3,n))

odd <- seq(from = 1, to = length(triModal), by = 2)
even <- odd + 1
triMclustDA <- mclustDA(train=list(data=triModal[odd],labels=triClass[odd]),
                   test= list(data=triModal[even],labels=triClass[even]),
                       verbose = TRUE)

names(triMclustDA)
## Not run: 
  plot(triMclustDA, trainData = triModal[odd], testData = triModal[even])

## End(Not run)

odd <- seq(from = 1, to = nrow(cross), by = 2)
even <- odd + 1
crossMclustDA <- mclustDA( train=list(data=cross[odd,-1],
                                      labels=cross[odd,1]),
                       test= list(data=cross[even,-1],labels=cross[even,1]),
                       verbose = TRUE)

## Not run: 
  plot(crossMclustDA, trainData = cross[odd,-1], testData = cross[even,-1])

## End(Not run)

odd <- seq(from = 1, to = nrow(iris), by = 2)
even <- odd + 1
irisMclustDA <- mclustDA(train=list(data=iris[odd,-5],labels=iris[odd,5]),
                       test= list(data=iris[even,-5],labels=iris[even,5]),
                       verbose = TRUE)

## Not run: 
  plot(irisMclustDA, trainData = iris[odd,-5], testData = iris[even,-5])

## End(Not run)


[Package mclust version 3.4.11 Index]