The following is a brief tutorial on how to solve the problem, using popular Python scientific and geospatial tools: Given the track of a race trail and the location of checkpoints, what are the distances between any two successive checkpoints. The example is the Iditarod dog sled race: the trail is approximated as a sequence of route points, or rather, the straight-line segments that connect them -- this is called a linestring. The checkpoints are given separately. It is assumed that both are available as (separate) files in ESRI Shapefile format.

First some libraries that will be needed later.

In [1]:

```
import math
import fiona
from shapely.geometry import shape, LineString, Point
from collections import OrderedDict
```

Let's define some helper functions. The first calculates the 3D Euclidian distance between two points in space, with the points represented as lists or tuples. I'm sure there's something usitable in the `math`

library, but it's not costing us much.

The second function does something quite generally useful. First of all, it's not a function at all, but a generator: when repeatedly called with a sequence variable or iterator as an argument, it successively returns pairs (i, i+1) of elements. For example, calling `pairs([1, 2, 3, 4])`

yields successively:

1, 2

2, 3

3, 4

We will use this function to iterate through pairs of successive checkpoints.

Last, `getnearest()`

is a function that determines the point on a linestring that is closest to a given point external to the line string. Such a function is needed because the checkpoint locations aren't actually **on** the trail linestring (which is only given as a discrete list of route points with imagined straight lines between them). The linestring is here simply a sequence variable or iterator made up of objects that can be compared to the point object via an appropriate distance function. The method is simplistic, but good enough if the line strings aren't too long. The main limitation is that it won't give the correct result if the line doubles back on itself and runs sections of the trail multiple times (think of a loop or an out-and-back). It would not be hard to write a more general function, but for the Iditarod, this one is good enough.

In [2]:

```
def dist(P3D1, P3D2):
sqr = (P3D1[0] - P3D2[0])*(P3D1[0] - P3D2[0])
sqr += (P3D1[1] - P3D2[1])*(P3D1[1] - P3D2[1])
sqr += (P3D1[2] - P3D2[2])*(P3D1[2] - P3D2[2])
return math.sqrt(sqr)
```

In [3]:

```
def pairs(seq):
i = iter(seq)
prev = item = i.next()
for item in i:
yield prev, item
prev = item
```

In [4]:

```
def getnearest(point, line, distancefunc=None):
"""
Get nearest index and point on a line for a given point external to the line
point: some object representing a point in space
line: iterable composed of objects of the same type as point
distancefunc: a function taking two points and returning a number
returns: index, point on line that is nearest to the input point
Not efficient for very long lines, because it searches along the line.
"""
if not distancefunc:
try:
distancefunc = lambda x, y: (x - y) * (x - y)
except:
# if no distance function provided and objects can't be subtracted and multiplied, use constant
distancefunc = lambda x, y: 1
# This would be inefficient for long lines, but ok for simple applications
# making sure we have an iterator; if line is an iterator, this doesn't make a difference
xline = iter(line)
# set nearest to first line element
nearest = next(xline)
nidx = 0
# iterate through the rest of the line
for idx, pt in enumerate(xline):
if distancefunc(pt, point) < distancefunc(nearest, point):
nearest = pt
nidx = idx
return nidx, nearest
```

In order to measure distances, we want to be working in a projection that operates in metres and is reasonably regular. We use Alaska Albers with the NAD83 datum (EPSG:3338) for this purpose. The KML files are converted to ESRI shapefiles and reprojected from Lat/Long (GPS coordinates with WGS 83 datum - they identify themselves as EPSG:6326).

In [5]:

```
data_track = fiona.open('Iditarod_routes_AKAlbers.shp', 'r')
data_checkpoints = fiona.open('Iditarod_chkp_AKAlbers.shp', 'r')
```

This makes the data accessible as Fiona Collection objects. The data_track object, however, contains records, which means the trail is provided in 3 separate lines strings. In this case, it turns out the segments are: Willow to Ophir (common to both the northern or southern Iditarod route); then the northern route Ophir to Kaltag; last the second common bit, Kaltag to Nome.

In [6]:

```
print type(data_track)
print len(data_track)
```

Luckily, the records are all of type LineString, and they are ordered end-to-end, so that we can stitch them together. This is verified by printing the first and last data point from each and checking that a line's end point coincides (or is at least very close to) the next line's starting point. Indeed, they coincide, which simplifies matters. We can simply stitch them together and then convert the result in a shapely linestring shape object.

In [7]:

```
for record in data_track:
print record['geometry']['type']
print record['geometry']['coordinates'][0], record['geometry']['coordinates'][-1]
```

In [8]:

```
fullinecoords = []
for line in data_track:
fullinecoords.extend(line['geometry']['coordinates'])
```

Now we have a single list of coordinates for the Iditarod trail (northern route) that we can convert into a Shapely LineString object.

In [9]:

```
fulline = {
'geometry': {
'coordinates': fullinecoords,
'type': 'LineString'
},
'id': 0,
'properties': OrderedDict([(u'Name', u'Iditarod_northern'), (u'descriptio', None), (u'timestamp', None), (u'begin', None), (u'end', None), (u'altitudeMo', None), (u'tessellate', 1), (u'extrude', -1), (u'visibility', -1)])
}
fulltrack = shape(fulline['geometry'])
fulltrack.geom_type
```

Out[9]:

The first method to calculate distances is the more manual one. It has two steps: Place every checkpoint actually *on* the track, but finding the closest track route point to it, and then summing up distances along the track.

In [10]:

```
chkp_ontrack = {}
for checkpoint in data_checkpoints:
name = checkpoint['properties']['Name']
coords = checkpoint['geometry']['coordinates']
nearestidx, nearestpoint = getnearest(coords, fulltrack.coords, dist)
chkp_ontrack[name] = {}
chkp_ontrack[name]['idx'] = nearestidx
chkp_ontrack[name]['coords'] = nearestpoint
chkp_ontrack = OrderedDict(sorted(chkp_ontrack.items(), key=lambda item: item[1]['idx']))
print chkp_ontrack
```

In [11]:

```
totaldist = 0
for chkpt, nextchkpt in pairs(chkp_ontrack):
idx, nextidx = chkp_ontrack[chkpt]['idx'], chkp_ontrack[nextchkpt]['idx']
partialtrack = LineString(fulltrack.coords[idx:nextidx+1])
totaldist = totaldist + partialtrack.length
print '%s --> %s' % (chkpt, nextchkpt)
print ' Distance in km: %s' % format(partialtrack.length/1000, '.1f')
print ' Distance in miles: %s' % format(partialtrack.length/1600, '.1f')
print
print 'Total distance summed up: %s km -- %s miles' % (format(totaldist/1000, '.1f'), format(totaldist/1600, '.1f'))
print 'Total distance from full track: %s km -- %s miles' % (format(fulltrack.length/1000, '.1f'), format(fulltrack.length/1600, '.1f'))
```

OK, this looks rather reasonable. Unalakleet is misspelled, because it was misspelled in the original KML file. The first leg seems to be quite a bit shorter than the Iditarod Trail Committee says, and looking at the track, there seems to be a rather long straight line at the beginning. We're therefore underestimating the first leg by quite a bit. For the other legs, we will always underestimate them by a few percent, given that we only have less than a route point per mile on the average: 771 route points for 900+ miles.

In [12]:

```
len(fulltrack.coords)
```

Out[12]:

Using the Shapely library, there is a more direct way: the `project`

method of a LineString object, which takes a Point object as an argument, calculates the distance from the start point to a point obtained by projecting the Point onto the LineString. This method is also more precise.

In [13]:

```
waypoints = {}
import itertools
for checkpoint in data_checkpoints:
name = checkpoint['properties']['Name']
chkptshape = Point(checkpoint['geometry']['coordinates'])
a = fulltrack.project(chkptshape)
waypoints[name] = a
waypoints = OrderedDict(sorted(waypoints.items(), key=lambda x: x[1]))
print waypoints
```

In [14]:

```
prev = 0
for pt1, pt2 in pairs(waypoints):
print '%s --> %s' % (pt1, pt2)
print ' Distance in km: %s' % format((waypoints[pt2] - waypoints[pt1])/1000, '.1f')
print ' Distance in miles: %s' % format((waypoints[pt2] - waypoints[pt1])/1600, '.1f')
print
print 'Total distance: %s km -- %s miles' % (format(waypoints[pt2]/1000, '.1f'), format(waypoints[pt2]/1600, '.1f'))
```