Friday, March 28, 2008

flickr set parser for GPSVisualizer

Geekier than normal post today. If you aren't interested in python programming, flickr's API and GPS display of images in Google maps or Google Earth then I'd stop reading now. Here's another picture from Death Valley. See you again tomorrow.

Right. Still here ? I've been using GPSVisualizer to combine a GPS tracklog, with geotagged images on flickr. Images are entered into GPSVisualizer as a series of CSV values, in a fairly flexible format. The first line provides the layout:
latitude, longitude, name, url, thumbnail, desc
Each line after that is one entry for an image, with the location in decimal degrees, followed by various text fields containing the description, title and links to the actual image. For example:
36.366953, -117.391867, "shot up car", "http://flickr.com/photos/mcgregorphoto/2355449153/", "http://farm3.static.flickr.com/2323/2355449153_c733f5ee1e.jpg", "a long way from nowhere, burnt out and shot up, just off the access road"
36.442947, -117.435447, "half way point", http://flickr.com/photos/mcgregorphoto/2356283640/, http://farm3.static.flickr.com/2388/2356283640_be78036888.jpg, "lunch about half way into the hike"
Initially I generated this file by hand, extracting the EXIF location from the image files using exiftool. I then went through each image, entered a title, found the URL for the image on flickr, extracted the URL for a thumbnail image and added a longer description. This was painful to say the least. It was only 16 images but it was a pain. Particularly as all the information was already there, in a flickr set. So I looked up the information on the flickr API, found a python library to access it and wrote the script below in half an hour. Given the URL or set id for a flickr set, it iterates over all of the photos and produces a CSV formatted list that's suitable to load straight into GPSVisualizer. This can then be linked along with the original GPS track log to generate a map with images and also the path traveled. It can be used for both Google Earth and Google maps path generation and runs pretty quickly. I'm posting it here in case it proves useful to anyone. To run, you'll need an up to date python install and additionally will need to download and install the flickrAPI for python. The final step is to obtain an API key for flickr. This key has to be added to the script, in the marked location. Once installed the program is run with:
python flickr2gpsv.py -s set_id > results.csv
or, alternatively,
python flickr2gpsv.py -s http://www.flickr.com/set_url/ > results.csv
If you find this useful, please let me know. You can download the script here. (probably better than cutting and pasting from below, because the download will keep the correct indentation). The archived version also includes a patch from Brad Crittenden to add unicode support.
# Author : Gordon McGregor
# Contact: http://gordonmcgregor.blogspot.com
#
# License : public domain
#
# Purpose: parses a flickr set to extract information to generate a map overlay, via http://gpsvisualizer.com
#
# Usage: python flickr2gpsv.py -s 72157604221838137
# or
#        python flickr2gpsv.py -s http://flickr.com/photos/mcgregorphoto/sets/72157604221838137/
#
#   If the URL is given, the set_id is automatically extracted

# Typically, you'll want to  redirect the output to a file, as errors & comments will appear on stderr (not in the file)
#
# e.g., python flickr2gpsv.py -s 72157604221838137 > output_list.csv
#
#
# only generates entries for photos with geographic information attached
#
# Required libraries:
# the flickrapi python libraries, from http://flickrapi.sourceforge.net/
# installation info here http://flickrapi.sourceforge.net/installation.html
#

import flickrapi
import time

import sys
from optparse import OptionParser
from urlparse import urlparse

# enter your api_key here to connect to flickr
# obtain one from http://www.flickr.com/services/api/keys/apply/
#

api_key = 'your_key_goes_here'

def getURL(sizes, size):

for element in sizes.sizes[0].size:
if element['label'] == size:
    return element['source']

raise flickrapi.exceptions.FlickrError, "No " + size+ " URL found."

# the main routine
# pass in a flickr set id
# outputs the appropriate data for GPSVisualizer to stdout
def parseSet(set_id):

cnt = 0

flickr = flickrapi.FlickrAPI(api_key)

photoset = flickr.photosets_getPhotos(photoset_id=set_id)

# iterate over list of photos in list

print 'latitude, longitude, name, thumbnail, url, desc'
for photo in photoset.photoset[0].photo:

try:
# get the various bits of data

    sizes = flickr.photos_getSizes(photo_id = photo['id'])
    info  = flickr.photos_getInfo(photo_id =  photo['id'])

# extract the required fields
    try:
        lat = info.photo[0].location[0]['latitude']
        lon = info.photo[0].location[0]['longitude']
    except AttributeError:
        raise flickrapi.exceptions.FlickrError, "No Geographical data found."

    name = photo['title']
    thumbnail =  getURL(sizes, 'Small')
    url = info.photo[0].urls[0].url[0].text
    desc = info.photo[0].description[0].text.strip()  # strip to remove extra newlines

    if(len(desc)):
        print '%s, %s, "%s", "%s", "%s", "%s"' % (lat, lon, name, thumbnail, url, desc)
    else:
        print '%s, %s, "%s", "%s", "%s",' % (lat, lon, name, thumbnail, url)
    sys.stderr.write('.')
    cnt = cnt + 1

except flickrapi.exceptions.FlickrError, e:
    sys.stderr.write( '\n'+photo['title'] +' : ' + e.__str__() + ' Skipping.\n')

return cnt



def main(argv=None):
if argv is None:
argv = sys.argv

usage = "usage: %prog [options]\nPass in a flickr set to produce output suitable for GPSVisualizer's google maps overlay"
opt_parser = OptionParser(usage=usage)

opt_parser.add_option('-s', '--set', dest='set', help="flickr set to process (full url or set number)")

(options, args) = opt_parser.parse_args()

if(options.set == None):
opt_parser.print_help()
return 1

# treat the command line value as a url
url = urlparse(options.set)

# even if it is just a set id, the value ends up in field 3 after urlparse (url[2])
# this removes any trailing /, explodes around any remaining /'s and takes the last value [-1]
# works for a full URL or simple set id
set_id = url[2].strip('/').split('/')[-1]

if (not set_id.isdigit()):
sys.stderr.write("set_id" + set_id + "is not a number. This is not expected.")
return 1

sys.stderr.write("Processing set id "+set_id+"\n")
start_time = time.time()
num = parseSet(set_id)
end_time = time.time()

total_time = end_time - start_time

average = 0
if(num):
average = total_time/num

results = "\nProcessed %d valid photos in %0.1f seconds (%0.2f seconds/photo). Finished.\n" % (num, total_time, average)
sys.stderr.write(results)

if __name__ == "__main__":
sys.exit(main())

2 comments:

Paul Allan Martin said...

G'day Gordon,
I don't know anything about geotagging images (and I don't think I care to), but I know Shannon would like the photograph!
Congrats on your thoughtful, absorbing and literate blog, which I have only recently discovered.
Paul

Unknown said...

Thanks Paulo. Glad you enjoy it.