Monday, August 25, 2008

Analysing GPS Logs with Awk

This post describes the first two "chop" functions that fit into the partitioning framework outlined last post.

def chopToSpeedHistogram(dest, p):
    # create histogram of speeds from nmea written to stdout
    os.system("cat "+sh_escape(dest)+".log"
              + " | awk -F , '{if($1==\"$GPVTG\" && int($8)!=0){count[int($8+0.5)]++}}"
              + " END {for(w in count) printf(\"[%d,%d],\\n\", w, count[w]);}'"
              # sort it
              + " | sort -g -k 1.2"
              # output json of histogram
              + " > "+sh_escape(dest)+".hist")

def chopToHeadingHistogram(dest, p):
    # create histogram of headings from nmea written to stdout (ignore heading when stopped)
    os.system("cat "+sh_escape(dest)+".log"
              + " | awk -F , '{if($1==\"$GPVTG\" && int($8)!=0){count[5.0*int($2/5.0+0.5)]++;}}"
              + " END {for(w in count) printf(\"[%d,%d],\\n\", w, count[w]);}'"
              # sort it
              + " | sort -g -k 1.2"
              # output json of histogram
              + " > "+sh_escape(dest)+".head")

Both functions use awk to create a histogram from the speed (in km/h) and heading (or bearing, in degrees) from the NMEA VTG sentences. The speed is rounded to an integer, and the bearing to the nearest 5 degrees. The data logger records on reading per second, so this gives a measure of how much time was spent at each speed/bearing.

The histogram is output in a "json" array format that can be inserted straight into a webpage where the flot library is used to generate some graphs.

Speed Histogram

The average and standard deviation (shaded at ±0.5σ) are indicated on the graph for two bike rides along the same route, and match pretty closely with that recorded by my bike computer:

GPS logBike computer
Ride 1 (brown) 2hrs 59 min, minus 41 min stopped63.4km27.7km/h 2 hrs 16 min64.01km28.00km/h
Ride 2 (dark green) 2 hrs 25 min, minus 13 min stopped63.4km29.0km/h 2 hrs 10 min63.85km29.30km/h

The two rides went in different directions, the first in the "uphill" direction and the second with a bit of a tail wind. I got a flat tire on the first ride too, hence the extra time spent stopped.

Heading Histogram

Up is north and the radius represents the time spent heading in that direction (normalized during the plotting process and "expanded" by taking the square root to show a little more detail.)

No comments: