fanny(x, k, diss = F, metric = "euclidean", stand = F)
x
|
data matrix or dataframe, or dissimilarity matrix, depending on the
value of the diss argument.
In case of a matrix or dataframe, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.
In case of a dissimilarity matrix,
|
k
|
integer, the number of clusters.
It is required that 0 < k < n/2 where n is the number of observations.
|
diss
|
logical flag: if TRUE, then x will be considered as a dissimilarity
matrix. If FALSE, then x will be considered as a matrix of observations
by variables.
|
metric
|
character string specifying the metric to be used for calculating
dissimilarities between observations.
The currently available options are "euclidean" and "manhattan".
Euclidean distances are root sum-of-squares of differences, and
manhattan distances are the sum of absolute differences.
If x is already a dissimilarity matrix, then this argument will
be ignored.
|
stand
|
logical flag: if TRUE, then the measurements in x are standardized before
calculating the dissimilarities. Measurements are standardized for each
variable (column), by subtracting the variable's mean value and dividing by
the variable's mean absolute deviation.
If x is already a dissimilarity matrix, then this argument
will be ignored.
|
fanny
stems from chapter 4 of
Kaufman and Rousseeuw (1990).
Compared to other fuzzy clustering methods, fanny
has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the spherical cluster
assumption; (c) it provides
a novel graphical display, the silhouette plot (see plot.partition
).
Fanny aims to minimize the objective function
SUM_v (SUM_(i,j) u(i,v)^2 u(j,v)^2 d(i,j)) / (2 SUM_j u(j,v)^2)
where n is the number of observations, k is the number of clusters and d(i,j) is the dissimilarity between observations i and j."fanny"
representing the clustering.
See fanny.object
for details.pam
, clara
, and
fanny
require that the number of clusters be given by the user.
Hierarchical methods like agnes
, diana
, and mona
construct a
hierarchy of clusterings, with the number of clusters ranging from one to
the number of observations.Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating Robust Clustering Techniques in S-PLUS, Computational Statistics and Data Analysis, 26, 17-37.
fanny.object
, daisy
, partition.object
, plot.partition
, dist
.# generate 25 objects, divided into two clusters, # and 3 objects lying between those clusters. x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)), cbind(rnorm(3,3.5,0.5), rnorm(3,3.5,0.5))) fannyx <- fanny(x, 2) fannyx summary(fannyx) plot(fannyx) fanny(daisy(x, metric = "manhattan"), 2, diss = T)