A metric, , is a global function on a domain which indicates a human notion of distance. In our case, the domain is the set of all possible data instances. A metric function must conform to the following five axioms .
1. non-negativity, .
2. symmetry, .
3. identity, .
4. definiteness, .
5. triangle inequality, .
These constraints are not counter-intuitive. The first and second constraints define a metric as a scalar and not a directional vector quantity. The third property reflects the axiomatic belief that the distance between two objects must be zero when they are the identical. The fourth property strengthens the previous one by enforcing the constraint that the distance metric must only be zero when two objects are identical, and for no other pairs. The final constraint states that the minimal distance metric between two points must be that along the most direct path between them. The ubiquitous Euclidean distance over the multidimensional space of continuous real numbers, , .
Where the points take the form and , satisfies these constraints and is therefore a metric. A variant of the Euclidean, the Squared Euclidean , is also used in clustering algorithms as a greater weight is given to those objects which are further apart than does the standard Euclidean distance. .
Distance measures which only conform to the first three of the above constraints are also in common use. These measures are not metric and so do not conform to all of the above notions of distance measure behavior. Despite this, they are useful in certain problem domain where they capture the required essential difference between data instances, or are computationally simple to calculate. We briefly describe some commonly used measures. .
Common Measures .
The Manhattan, or City Block, metric simply sums the absolute differences between the components of the vectors being compared: .
Where the points take the form and .