A MODERN APPROACH TO TIME-SERIES ANALYSIS
Typical time-series techniques usually apply algorithms, be it SVMs, logistic regressions, decision trees etc, after transforming away temporal dynamics through feature engineering. By creating features, we often remove any underlying temporal information, resulting in a loss of predictive power.
Dynamic time warping (DTW) is a useful distance-like similarity measure that allows comparisons of two time-series sequences with varying lengths and speeds. Simple examples include detection of people 'walking' via wearable devices, arrhythmia in ECG, and speech recognition. Notice that whilst DTW is a distance-like quantity, since the triangle inequality doesn't hold, it is not a metric [We thank an avid reader for pointing this out to us].
The measure distinguishes the underlying pattern rather than looking for an exact match in the raw time-series. As its name suggestions, the usual Euclidean distance in problems is replaced with a dynamically adjusted metric. DTW thus allows us to retain the temporal dynamics by directly modeling the time-series.
Figure 1 shows what happens when we compare two time-series, symbolized as a red wave and a blue wave. The top image shows Euclidean matching, which is 1-1. The similarity score between these waves measured by a Euclidean metric would be poor, even though the rough shape of the waves (i.e. the peaks, troughs and plateaus) are similar.
The bottom image shows a similarity score that would be given by Dynamic Time Warping. You can see that the rough shapes of the waves would match up, giving a high similarity score.
In general, events that have similar shapes but different magnitudes, lengths and especially phases can prevent a machine from correctly identifying sequences as similar events using traditional distance metrics. DTW allows us to get around this issue by asynchronously mapping curves together.
DYNAMIC TIME WARPING
To do this, DTW checks all possible paths (subject to certain constraints) from the bottom left to the top right, computing the equivalent of a similarity score between the waves for each. The one with the largest similarity is kept.
The constraints are imposed to limit the otherwise exponential search required and are what you would expect. The most important ones are:
- Monotonicity - path does not go backwards in time
- Continuity - path must be contiguous
- Boundary conditions - path covers full sequence
- Warping windows - path does not wander too far from diagonal
- Slop constraint : path cannot be too steep or too shallow
Using these simple constraints, we are able to model time-series much more effectively. The University of Riverside recently developed a library to do this (UCR Suite - http://www.cs.ucr.edu/~eamonn/UCRsuite.html), which efficiently applies DTW to numerous applications.
A simple example that nicely shows the power of DTW, was trying to look for template matches to Premature Ventricular Contractions (PVC) in a 23 hour ECG recording with over 20,000,000 data points.
Figure 4, shows the template PVC wave that we are looking for and Figure 5, the entire dataset from which we are looking for a match in.
The search took 34.4 seconds (585,000 subsequences per second), with the closest match shown in Figure 5.
A Harvard cardiologist confirmed that this match is indeed an example of PVC.
DTW offers a modified distance metric, giving greater flexibility to the model at the cost of higher time and memory complexity. As such, we are also more at risk of over-fitting. We therefore need to carefully consider the bias-variance trade-offs involved and the computational requirements for larger scale projects.
DTW also require template sequences to match against, so it is inherently a supervised algorithm. However, it is possible to do a prescan through our dataset, pick up any interesting signals and find similar ones using DTW.
There are numerous applications for using DTW distances in all manners of time-series analysis; any type of (sub)sequence classification in EEG, ECG, DNA, data in the medical field to finding irregularities and anomalies in internet traffic, seismological data, fitness wearables, and speech/word recognition.
For a more detailed example, see our case study on DTW and k-Nearest Neighbor classification. The case study combines DTW with a k=1 nearest neighbor algorithm that reproduces state-of-the-art algorithm accuracies. There has been some work to use DTW with SVMs etc., but this is currently mostly confined to the world of academic research.