Map Matching in a Programmer's Perspective

Meili uses a Hidden Markov Model (HMM) approach, proposed by Paul Newson and John Krumm in 2009, to solve the map matching problem. The map-matching problem is modelled as follows: given a sequence of GPS measurements (observations in terms of HMM), each measurement has to match one of a set of potential candidate road segments (hidden states in terms of HMM), the problem is to find the most likely sequence of candidates.


In this figure we see a noisy GPS sequence of 4 measurements, from green to red. Each measurement could match one of its candidates nearby (the small markers in cyan) but which is the most likely one? The answer is 0, 4, 9, 11 respectively. That is our most likely sequence of candidates, which constructs the red matched route.

A good explanation about the modelling can be found here and here.

The problem can be formulated as a graph search problem that is familiar to programmers. Here we briefly introduce the problem formulation and then introduce several solutions, followed by improvements and experiments.

Graphical Model

Put the example above in a graphical model, it looks like this:


It is a Directed Acyclic Graph (DAG).

Each measurement has a column of nodes associated. A node (a hidden state in terms of HMM) in the graph represents a candidate, namely a tuple (road segment, offset) that tells a location on a road segment on the map. An edge (u, v) means that node u is able to affect the decision about v. Take the edge (node 9, node 12) as an example: since it takes a much longer way than it looks from node 9 to node 12, node 12 is therefore unlikely to be the match to the last measurement. As you can see in the graph a measurement can be affected only by its previous measurement.

Two probability models are used together to quantify how likely a measurement would be to match a node. The emission probability tells that closer a node stays to its measurement more likely the measurement is to match it, while the transition probability tells that closer the walk distance from u to v is to the measurement distance, the more likely v's measurement is to match it.

# A gaussian distribution
def emission_prob(u):
    c = 1 / (SIGMA_Z * math.sqrt(2 * math.pi))
    return c * math.exp(-great_circle_distance(u.measurement, u)**2)

# A empirical distribution
def transition_prob(u, v):
    c = 1 / BETA
    # Calculating route distance is expensive.
    # We will discuss how to reduce the number of calls to this function later.
    delta = math.abs(route_distance(u, v) - great_circle_distance(u.measurement, v.measurement))
    return c * math.exp(-delta)

A path is a list of edges. The probability of a path is defined as follows:

def path_prob(path):
    assert path
    u, v = path[0]
    joint_prob = emission_prob(u)
    for u, v in path:
        joint_prob *= transition_prob(u, v) * emission_prob(v)
    return joint_prob

The task is to find the most likely sequence over all possible sequences. For convenience's sake, we added two virtual nodes, the source node s and the target node t, and corresponding virtual edges to the graph with all emission and transition probabilities to be 1.0. The task hence becomes: find a path from s to t that maximizes the path probability.


To make the problem clear, a brute-force solution is given first. Given a graph represented by a Adjacency List, a source node and a target node, the procedure maximum_path_prob tries hard to list all paths in between and then selects the optimal one among them.

def maximum_path_prob(adjacency_list, s, t):
    return max((path_prob(path), path)
        for path in all_paths(adjacency_list, s, t),
        key=lambda prob, path: prob)

# Generate all paths from s to t recursively
def all_paths(adjacency_list, s, t):
    if s == t: return [[]]
    paths = []
    for v in adjacency_list[s]:
        for path in all_paths(adjacency_list, v, t):
            paths.append([(s, v)] + path)
    return paths

Viterbi Algorithm is usually used to find the most likely sequence in HMM. In the DAG the Viterbi Algorithm works like Breath-First-Search (BFS) algorithm that searches the target level by level. During the search/expansion, the Viterbi Algorithm also remembers each node's optimal solution (the optimal path from the source and the path probability) and use them to find optimal solutions for next level.

def viterbi_search(adjacency_list, s, t):
    # Initialize joint probability for each node
    joint_prob = {}
    for u in adjacency_list:
        joint_prob[u] = 0
    predecessor = {}
    queue = FIFOQueue()

    joint_prob[s] = emission_prob(s)
    predecessor[s] = None
    while not queue.empty():
        # Extract node u
        u = queue.pop()
        # Guarantee the optimal solution to u is found
        assert joint_prob[u] == maximum_path_prob(adjacency_list, s, u)[0]
        if u == t: break
        for v in adjacency_list[u]:
            # Relaxation
            new_prob = joint_prob[u] * transition_prob(u, v) * emission_prob(v)
            if joint_prob[v] < new_prob:
                joint_prob[v] = new_prob
                predecessor[v] = u
            if v not in queue:

    return joint_prob[t], construct_path(predecessor, s, t)

The Viterbi Algorithm is the one used in the paper to find the optimal path, while another approach is to use Topological Sort in the DAG. Both algorithms have to explore all edges to find the path. However in the map-matching model exploring an edge is expensive. During the exploration of an edge (u, v), it needs to find the shortest path between u and v in the road network for calculating its transition probability. Suppose that, given a sequence of S measurements and each measurement has average T states, the Viterbi Algorithm or the topological sort approach will traverse all edges i.e. do S * T * T times of shortest path calculations. This is not good, especially in dense urban areas where you can easily get a large average T.

To reduce the number of transition probability calculations, we use the Dijkstra's algorithm to avoid extracting those unlikely nodes as many as possible. Like the Viterbi Algorithm, the Dijkstra's algorithm solves the problem in a Dynamic Programming approach. The advantage of the Dijkstra's algorithm is that it extracts nodes in a greedy way, i.e. extracts the most likely node every time. With this strategy, as soon as the target is extracted, the optimal solution is guaranteed to be found and thus the rest of nodes can be safely thrown away.

Before applying the Dijkstra's algorithm we should notice that our objective is to find the most likely path, whereas the Dijkstra's algorithm is designed to solve shortest path problem, i.e. minimizing a certain objective. Therefore we need to transform the maximization problem to a minimization problem in the following way:

With following prerequisites:

T = transition_prob
E = emission_prob
lg = math.log10
assert 0 <= E(u) <= 1.0 and 0 <= T(u, v) <= 1.0

We are maximizing path_prob(path) among a set of paths. Expanding path_prob(path) we get:

# For simplicity we ignored the last node
product(E(u) * T(u, v) for u,v in path)

which is equivalent to maximizing:

lg(product(E(u) * T(u, v) for u,v in path))
# equivalent to
sum(lg(E(u)) + lg(T(u, v)) for u,v in path)
# because lg(a * b) == lg(a) + lg(b)

which is equivalent to minimizing:

-sum(lg(E(u)) + lg(T(u, v)) for u,v in path)
# equivalent to
sum(-lg(E(u)) + -lg(T(u, v)) for u,v in path)

# Must be non-negative
assert 0 <= -lg(E(u)) and 0 <= -lg(T(u, v))

After the transformation, the emission probability of a node turns to the node cost, and the transition probability of an edge turns to the edge cost.

def node_cost(u):
    return -1 * math.log10(emission_prob(u))

def edge_cost(u, v):
    return -1 * math.log10(transition_prob(u, v))

The problem becomes find a path from s to t that minimizes sum(node_cost(u) + edge_cost(u, v) for u, v in path) which can be efficiently solved by the Dijkstra's algorithm. Thanks to the similarity to the Viterbi Algorithm, changing the FIFO queue in the viterbi_search procedure to a priority queue, and the probability calculation to the cost calculation should get you a rough Dijkstra based solution, a greedy, lazy but faster version of the Viterbi Algorithm.

TODO Experiments

TODO comparison between the original Viterbi Algorithm and the Dijkstra Viterbi Algorithm, and comparisons with and without tweaks.