2
* This file is licensed to You under the "Simplified BSD License".
3
* You may not use this software except in compliance with the License.
4
* You may obtain a copy of the License at
6
* http://www.opensource.org/licenses/bsd-license.php
8
* See the COPYRIGHT file distributed with this work for information
9
* regarding copyright ownership.
11
package ch.usi.inf.sape.hac.agglomeration;
15
* The "centroid" or "Unweighted Pair-Group Method using Centroids (UPGMC)"
16
* method is a geometric approach that links the centroids of clusters.
18
* Each cluster is represented by its centroid.
19
* The distance between two clusters is calculated as the distance between their centriods.
20
* This method does not distort the cluster space.
21
* [The data analysis handbook. By Ildiko E. Frank, Roberto Todeschini]
23
* Can produce a dendrogram that is not monotonic
24
* (it can have so called inversions, which are hard to interpret).
25
* This occurs when the distance from the union of two clusters, r and s,
26
* to a third cluster is less than the distance between r and s.
28
* Used only for Euclidean distance!
30
* The general form of the Lance-Williams matrix-update formula:
31
* d[(i,j),k] = ai*d[i,k] + aj*d[j,k] + b*d[i,j] + g*|d[i,k]-d[j,k]|
33
* For the "centroid" method:
36
* b = -ci*cj/((ci+cj)*(ci+cj))
40
* d[(i,j),k] = ci/(ci+cj)*d[i,k] + cj/(ci+cj)*d[j,k] - ci*cj/((ci+cj)*(ci+cj))*d[i,j]
41
* = ( ci*d[i,k] + cj*d[j,k] - ci*cj/(ci+cj)*d[i,j] ) / (ci+cj)
43
* @author Matthias.Hauswirth@usi.ch
45
public final class CentroidLinkage implements AgglomerationMethod {
47
public double computeDissimilarity(final double dik, final double djk, final double dij, final int ci, final int cj, final int ck) {
48
return (ci*dik+cj*djk-ci*cj*dij/(ci+cj))/(ci+cj);
51
public String toString() {