Multidimensional scaling (MDS) is a dimensionality reduction technique for
exploratory data analysis. Its primary effect is to "squash" data into a
visualizable space. For example, your data may be 7 dimensional, but be
distributed in a clustered way. One can process the data with MDS to produce a
set of two-dimensional coordinates that can be plotted and visually analyzed.
There are essentially two ways to delinate
MDS:
- metric and nonmetric scaling
- classical and distance scaling
Glimmer doesn't handle nonmetric scaling, so I only describe the
classical/distance dichotomy. Classical scaling is a method for
linearly projecting the data into a subspace and is essentially equivalent to principal
components analysis. The projection is optimal in the least-squared error sense and
always exists.
Distance scaling is a little more nuanced than classical scaling. Rather than
finding the least-squares projection, distance scaling seeks to minimize the total
least-squared differences between the distances in the source coordinates of the data and
the low-dimensional "squashed" coordinates.
In many cases this strategy offers qualitative
advantages to classical scaling, but the optimal solution is NP-Hard. Worse,
the objective function (called "stress") is multi-modal, so local minima exist.
Glimmer can be downloaded
here. It is offered as a
compressed visual c++ project. The "cpusources" directory contains the CPU
code written in c++ and the "gpusources" directory contains the GPU shader code
written in GLSL.
You will need to have an need to have a somewhat recent NVIDIA card, with the
latest drivers and the latest version (as of
7/14/08) of
Visual C++. The README has instructions on using glimmer. The
EXAMPLE
download contains sample datasets and command lines including DOS scripts
to reproduce the figures in the paper.