Lately I've been playing around in my spare time with the Google Maps API and Rails. It turns out the choice of framework barely matters, as you spend most of your time in JavaScript when you work with this API. Nevertheless I gave a talk on this at the recent Vancouver RubyCamp session. People asked if I'd put my slides up, but since I've rarely found a pack of slides in isolation very useful, and wanted to get the code out that I used, I figured I'd write a post instead.

Some background: It had been a few years since I had built a web application for fun, and had felt it was time again. Last time I did everything by hand with Perl and MySQL. This time I could use Rails (although after yesterday's sessions I'm considering Merb), and try to avoid writing so much of the client side by hand.

I was looking for an application that would have plenty of geographical data, and would also be somewhat interesting. I had thought of pulling garage sale listings out of craigslist and presenting them in a Google Maps mashup, but quickly discovered that my idea of fun isn't maintaining an ever-growing table of regexes that is intended to extract correct addresses from craigslist.org. Discovering that "2850 4 1/2 St Cloud" is really "2850 4 1/2 St. North, Saint Cloud, Minnesota 56303[1] isn't that straightforward. When the user misspells their hometown, or even leaves it out because it should be obvious from the context, that makes it harder. And no regex is going to determine an address out of "behind the Norht Village Mall", typo and all.

The final straw was a recent article in wired.com on a startup called Listpic, that was pulling photos of items for sale off craigslist, and displaying them in a friendlier manner. The article describes how one day its founder, Ryan Sit, saw an email arrive from Jim Buckmaster, CEO of craigslist, and was hoping it was an offer to purchase. Instead it was a cease-and-desist notice for violating craigslist's Terms of Service.

While I'm very interested in the issues involved on who actually owns the data people add to Web 2.0 services, that isn't going to give me opportunities to present the code I discussed at RubyCamp. The part of the article that was interesting was the discussion of a service called Oodle.com, which scrapes legal sources, organizes the information somewhat, and makes it available via XML/RPC. I had a look at the site, and saw that they had a sizable list of garage sales, even for December. Then I saw the foreclosure category, and given the current economic news, figured there'd be way more data there, and found my new app. Let's look at some of the code behind it.

First, I defined my database schema with a simple migration:

class CreateProperties < ActiveRecord::Migration
  def self.up
    create_table :properties do |t|
      t.column :latitude, :float, :null => false
      t.column :longitude, :float, :null => false
      t.column :oodle_id, :string, :limit => 16, :null => false  # buncha digits
      t.column :created_at, :datetime
      t.column :oodle_created_at, :datetime
    end
    add_index :properties, :oodle_id
  end
end

All I need for a GMap application are the latitude and longitude fields. I can use the Oodle ID on an entry to go back to Oodle for more information as I need it. I figured both timestamps would be useful -- the "oodle_created_at" to know how to resolve duplicate listings by taking the most recent, and my own timestamp to help remove aged entries from my database in a sweeper.

Next I wrote a tiny Ruby program to get the data and dump it in a text file:

#!/usr/bin/env ruby
require 'xmlrpc/client'

class OodleFC
  @@increment = 25;
  @@endpoint = 'http://api.oodle.com/api/'
  @@methodName = 'get'
  @@OODLE_KEY = 'precious'

  def initialize(region='usa')
    @params = {
      'partner_id' => @@OODLE_KEY,
      'category' => 'housing/sale/foreclosure',
      'region' => 'us'
      'from' => 0,
      'to' => @@increment,
    }
    @service = XMLRPC::Client.new(@@endpoint)
    @rails_id = 1
  end
 
  def process_next_set
    result = @service.call(@@methodName, @params)
    return false if !result['items'] || result['items'].size == 0
    @params['from'] += @@increment
    @params['to'] += @@increment
    result['items'].each do |item|
      # Process each set of items here -- it's a simple hash
      vals = [@rails_id]  # we need to provide our own ID #s
      @rails_id += 1
      vals << item['latitude']
      vals << item['longitude']
      vals << item['id']
      # Process other items...
      puts vals.join("|")
    end  # end result['items'].each
    return true
  end # end function
end

getter = OodleFC.new()
loop do
  break if !getter.process_next_set()
end

If you haven't used XML/RPC, or it's been six years or so, the Ruby library makes it very straightforward. It takes three lines -- one to import the library, one to create an instance of an XMLRPC::Client instance, and then repeated calls to the service with the parameter hash returns an array of items, each of which is a hash.

My plan here was to write out the data to a simple text file and read it into my MySQL database using the mysqlimport command. Because of this, I was bypassing ActiveRecord, and had to provide explicit values for the ~id~ field. Importing the data was easy:

db $ grab_data.rb > properties.out
db $ mysqlimport --delete --fields-terminated-by='|' --user=realtor --local \
foreclosures_development ./properties.out
foreclosures_development.properties: Records: 5936  Deleted: 0  Skipped: 0 Warnings: 22

I don't know what the warnings were, and couldn't find a way to coax them out of mysqlimport, but it looks like I got everything:

data $ wc -l properties.out
   5936 properties.out

So now there are three quick steps left to get going:

  1. Build a view to hold a map and display the data
  2. Build a controller to retrieve the data
  3. Write the JavaScript that displays the data

Let's look at a simple example of each one in turn. Here's app/views/fc/map.rhtml:

&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
&lt;html xmlns="http://www.w3.org/1999/xhtml">
&lt;head>
  &lt;script type="javascript">
  var gkey = "&lt;%=GOOGLE_MAPS_KEY%>";
  &lt;/script>
  &lt;script src="http://maps.google.com/maps?file=api&v=2&key=&lt;%=GOOGLE_MAPS_KEY%>" type="text/javascript">&lt;/script>
  &lt;%= javascript_include_tag 'application' %>
  &lt;%= stylesheet_link_tag 'style' %>
  &lt;title>Find A Foreclosure&lt;/title>
&lt;/head>
&lt;body>
  &lt;div id="map" style="width: 500px; height: 300px">&lt;/div>
&lt;/body>
&lt;/html>

Here's the controller:

class FcController < ApplicationController

  def fc_in_bounds
    ne = params[:ne].split(',').collect{|e|e.to_f}  
    sw = params[:sw].split(',').collect{|e|e.to_f}    
    # if the NE longitude is less than the SW longitude,
    # it means we are split over the meridian.
    if ne[1] > sw[1]
      conditions = 'longitude > ? AND longitude < ? AND latitude <= ? AND latitude >= ?'
    else
      conditions = '(longitude >= ? OR longitude < ?) AND latitude <= ? AND latitude >= ?'    
    end
    fcs = Property.find(:all,
                         :conditions => [conditions, sw[1], ne[1], ne[0], sw[0]])
    # Now convert the list of foreclosures into a simple array of hashes
    fcs = fcs.map{|p|
      {
        :latitude => p.latitude.to_f,
        :longitude => p.longitude.to_f,
        :oodle_id => p.oodle_id,
      }
    }
    render :text=>{:result => fcs}.to_json
  end
end

Some of this code came from APress's "Beginning Google Maps Applications with Rails and Ajax" (http://www.amazon.com/Beginning-Google-Maps-Applications-Rails/dp/159059...), in particular the code above which shows you how to take data into account that straddles the international dateline.

I should say the book was useful, despite a title that was only a step or two above a "Dummies" title. The Rails part of the book often suggests it was translated from another framework (for example, there is no talk of testing), and there's a book on the same topic using PHP by the same group of authors, but I still found it useful. I found it a good use of my time to read through most of this book to get a sense of what I could do, and how to do it. Since then I've put the book aside and use the API reference.

The last piece is the JavaScript file, ~public/javascripts/application.js~:

var centerLatitude = 30.5;
var centerLongitude = -155.5;
var startZoom = 3;
var map;
var do_refresh = true;

function init() {
    if (GBrowserIsCompatible()) {
        map = new GMap2(document.getElementById("map"));
        map.setCenter(new GLatLng(centerLatitude, centerLongitude), startZoom);
        map.addControl(new GLargeMapControl());
        map.addControl(new GScaleControl());
        map.addControl(new GMapTypeControl());
        GEvent.addListener(map,'zoomend',function(oldLevel, newLevel) {
            // zooming requires this: remove the existing points
            map.clearOverlays();
            updateMarkers();
        });
   
        GEvent.addListener(map,'moveend',function() {
            updateMarkers();
        });
        setTimeout(updateMarkers, 1000, true);
    }
}

function createMarker(gpoint) {
     var marker = new GMarker(gpoint)
     GEvent.addListener(marker, 'click', function() {
          var markerHTML = (point.lat()
                            + ", "
                            + point.lng());
          do_refresh = false;
          marker.openInfoWindowHtml(markerHTML);
          setTimeout(function() {
            do_refresh = true;
          }, 5000);
    });
    return marker;
}

function updateMarkers() {
    if (!do_refresh) return;
    //create the boundary for the data
    var bounds = map.bounds;
    var southWest = bounds.getSouthWest();
    var northEast = bounds.getNorthEast();
    var url = ('/fc/fc_in_bounds'
               + '?ne=' + northEast.toUrlValue()
               + '&sw=' + southWest.toUrlValue());

    //retrieve the points using Ajax
    var request = GXmlHttp.create();
    request.open('GET', url, true);
    request.onreadystatechange = function() {
         if (request.readyState == 4) {
            if (request.status != 200) {
              GLog.write("status: " + (request.status || "?"));
            } else {
                var data = request.responseText;
                var edata = eval("(" + data + ")");
                //remove the existing points
                map.clearOverlays();
                var points = edata.result;
   
                //create each point from the list
                for (var i = 0; i < points.length; i++) {
                    var gp = new GLatLng(points[i].latitude, points[i].longitude);
                    var marker = createMarker(gp);
                    map.addOverlay(marker);
                }
            }
         }
    }
    request.send(null);
}

window.onload = init;
window.onunload = function() {
  // unloaded = true;
  GUnload();
};

The above code should be familiar to anyone who's built a GMap API app. Since I don't have time to teach the basics here, you'd be advised to find the basics elsewhere (and you could do worse than with the Apress book I've mentioned), and you're welcome back afterwards.

That's the core of a Google Maps + Rails app. There are several performance problems that come up which I discussed at RubyCamp, and will cover them here.

By the way, the "do_refresh" variable solves a problem I noticed immediately, but isn't covered in the standard examples I found. Whenever I clicked on a marker, if the popup information window was initially off-screen, it would scroll the map to put it in position, which would trigger another moveend event, updating the macros. Seems like a bug to me, but until it's fixed, the workaround was easy:

The "do_refresh" variable is set to false whenever I show an info window, and I use a setTimeout to turn it back on after five seconds. We'll see a few more uses of that function coming up.

The next opportunity for performance improvement came up in maps like this one of the Los Angeles. I needed about 15 seconds to render 737 properties on a quad-core 2.4 GHz machine:

Imagine an investor trying to take advantage of the current climate. She wants to find a reasonable property in southern California, and is stuck in a SigAlert[2] nightmare on the Santa Ana. Good thing she's got her iPhone, but by the time this map finally renders, the great deal could be gone. The map shows some neighborhoods of Los Angeles that are rife with foreclosures, but we can't even see their names because the markers obscure them.

Performance enhancement #1: replace simple markers with clusters.

Now when the server sees there are more than n markers, it can cluster some of them, like so:

class FcController < ApplicationController

  include ApplicationHelper

  def fc_in_bounds
    ne = params[:ne].split(',').collect{|e|e.to_f}  
    sw = params[:sw].split(',').collect{|e|e.to_f}
   
    # if the NE longitude is less than the SW longitude,
    # it means we are split over the meridian.
    if ne[1] > sw[1]
      conditions = 'longitude > ? AND longitude < ? AND latitude <= ? AND latitude >= ?'
    else
      conditions = '(longitude >= ? OR longitude < ?) AND latitude <= ? AND latitude >= ?'    
    end
    fcs = Property.find(:all,
                         :conditions => [conditions, sw[1], ne[1], ne[0], sw[0]])
    max_markers = 100
    if fcs.size > max_markers
      fcs2 = cluster_points_by_distance(fcs, max_markers, ne, sw)
    else
      fcs2 = fcs
    end
    fcs = fcs.map{|p|
      { :title => p.oodle_title,
        :latitude => p.latitude.to_f,
        :longitude => p.longitude.to_f,
        :price => p.price,
        :zipcode => p.zipcode,
        :url => p.url,
        :oodle_id => p.oodle_id,
        :city => p.city || "",
        :state => p.state,
        :type => 'm'
      }
    }
    render :text=>{:result => fcs2}.to_json
  end
 
  private
  def cluster_points_by_distance(points, max_markers, ne, sw)
    points = cluster_by_distance(points, max_markers, ne, sw)
    # At this point we've got max_markers or less points to render.
    # Now, let's go through and determine which cells have multiple markers
    # (which needs to be rendered as a cluster), and which cells have a single marker
    results = []
    points.each do |p|
      if p.is_cluster?
        p = {
          :latitude => p.y,
          :longitude => p.x,
          :members => p.members.map{|m| m.point[:oodle_id]},
          :type => 'c'
        }
        results << p
      else
        results << p.point
      end
    end
    return results
  end

end

The routine cluster_by_distance is implemented in code that I left in the app/helpers/application_helper.rb file. (It should be in a controller helper, but I left it that way.) It's posted standalone as a separate attachment ((here). The code points to a wikipedia article on the algorithm it implements.

The Google Maps book shows how to cluster by grid. I used their code as well, but since I used it as is from their book, I'd rather not repeat it here. You can download the sample source code at http://www.apress.com/book/downloadfile/3565, and find the code in the "chap_seven" directory (I have no idea why they didn't use directory names like "chap_07" that would sort reasonably well).

The only difference this time is that we're either returning a cluster that contains an array of IDs, or we're returning a simple property (type "m", for marker, which isn't the best name). Now we need to update the JavaScript code to handle this:

var centerLatitude = 30.5;
var centerLongitude = -155.5;
var startZoom = 3;
var map;

//create an icon for the clusters
var iconCluster = new GIcon();
iconCluster.image = "http://googlemapsbook.com/chapter7/icons/cluster.png";
iconCluster.shadow = "http://googlemapsbook.com/chapter7/icons/cluster_shadow.png";
iconCluster.iconSize = new GSize(26, 25);
iconCluster.shadowSize = new GSize(22, 20);
iconCluster.iconAnchor = new GPoint(13, 25);
iconCluster.infoWindowAnchor = new GPoint(13, 1);
iconCluster.infoShadowAnchor = new GPoint(26, 13);

//create an icon for the pins
var iconSingle = new GIcon();
iconSingle.image = "http://googlemapsbook.com/chapter7/icons/single.png";
iconSingle.shadow = "http://googlemapsbook.com/chapter7/icons/single_shadow.png";
iconSingle.iconSize = new GSize(12, 20);
iconSingle.shadowSize = new GSize(22, 20);
iconSingle.iconAnchor = new GPoint(6, 20);
iconSingle.infoWindowAnchor = new GPoint(6, 1);
iconSingle.infoShadowAnchor = new GPoint(13, 13);

// I bought the book, I don't feel guilty using their icons, but wouldn't
// rely on them for a live application.

var iconTypeFromCode = {c:iconCluster, m:iconSingle}

function createMarker(gpoint, appPoint) {
    var type = appPoint['type'];
   // type='m';
     var marker = new GMarker(gpoint, iconTypeFromCode[type] || iconSingle, true);
     GEvent.addListener(marker, 'click', function() {
        /// same code as above
        // ...
}

// Same code as above

    request.onreadystatechange = function() {
     // ...
     //create each point from the list
     for (var i = 0; i < points.length; i++) {
         var gp = new GLatLng(points[i].latitude, points[i].longitude);
         var marker = createMarker(gp, points[i]);
         map.addOverlay(marker);
     }

Now the map is clearer:

There are other things I'd like to do, like add numbers to the cluster icons, so I can see that the cluster in San Bernardino represents 100 properties, while the cluster near Murrieta in the south might represent only 30. I'd also use color to distinguish the expensive properties from the cheap. Those will have to wait for a later date though. There were still perf problems to deal with.

The first is that I noticed sometimes a response would arrive, and my JavaScript code would dutifully fill in the map. And as soon as it was done, a new response would arrive, so the code would erase all the markers and do it all over again. Here's the sequence of events that was taking place:

  • user nudges the map
  • JS sends an Ajax request A to the server
  • user nudges the map again
  • JS sends an Ajax request B to the server
  • the response for request A arrives, and JavaScript updates the map
  • the response for request B arrives, and JavaScript updates the map

I handled this situation by adding a timestamp on every request, and keeping track of what the latest timestamp was. I'll show the changes to the server first:

  def fc_in_bounds
    # ...
    render :text=>{:requestTag => params[:tag] || "", :result => fcs2}.to_json
  end

Yeah, the server just echoes back the tag parameter. All the work is done in the client:

var request_tag = 0;
// ...

function updateMarkers() {
    if (!do_refresh) return;
    //create the boundary for the data
    var bounds = map.bounds;
    var southWest = bounds.getSouthWest();
    var northEast = bounds.getNorthEast();
    request_tag = (new Date()).valueOf()(); // Global
    var url = ('/fc/fc_in_bounds'
               + '?ne=' + northEast.toUrlValue()
               + '&sw=' + southWest.toUrlValue()
               + '&tag=' + request_tag);  #New

    //retrieve the points using Ajax
    var request = GXmlHttp.create();
    request.open('GET', url, true);
    request.onreadystatechange = function() {
         if (request.readyState == 4) {
            if (request.status != 200) {
              GLog.write("status: " + (request.status || "?"));
            } else {
                var data = request.responseText;
                var edata = eval("(" + data + ")");
                if (edata.requestTag != request_tag) {
                    GLog.write("ignoring old request")
                    return;
                }
                // ... the rest is the same
}

This addition made the client work more smoothly. But I didn't like the way that the server was still happily pulling items out of the database and partitioning them into clusters, only to have all that hard work blithely tossed away. I started wondering if I could avoid doing that as well.

Now the key event handlers on the client side are the zoomend and moveend events. Supposedly these fire once a user has hit the end of a series of zooms or moves. I thought maybe Google was being too optimistic on how much of a delay is needed to indicate when the user has reached the end of an operation, and thought maybe I could wait another 200 milliseconds or so. In this case rather than call the updateMarkers routine immediately, I would use setTimeout to simulate a queue of requests in the client side. I would use a separate timestamp on each request, so the client could decide when a request was the most recent, and only then fire it.

Once again, not many changes were needed to the code. And once again, I turned to ~setTimeout~:

var pendingRequest = null;
var checkDelay = 300; // wait checkDelay msec before hitting server.

// ...

function updateMarkers(do_now) {
    if (!do_refresh) return;
    if (typeof(do_now) == "undefined")  do_now = false;
    var currRequestTag = (new Date()).valueOf();
    // update the global
    pendingRequest = {tag: currRequestTag, bounds: map.getBounds()};
    if (do_now) {
        finishUpdatingMarkers(currRequestTag);
    } else {
        GLog.write("New pending request: tag " + currRequestTag);
        setTimeout(finishUpdatingMarkers, checkDelay, currRequestTag);
    }
}

function finishUpdatingMarkers(expectedTag) {
    if (pendingRequest.tag != expectedTag) {
        GLog.write("tossing tag " + expectedTag);
        return;
    }
    //create the boundary for the data
    request_tag = expectedTag;  // this is the global!
    var bounds = pendingRequest.bounds;
    var southWest = bounds.getSouthWest();
    var northEast = bounds.getNorthEast();
    var url = ('/fc/fc_in_bounds'
               + '?ne=' + northEast.toUrlValue()
               + '&sw=' + southWest.toUrlValue()
               + '&tag=' + request_tag
               + '&cl=' + clusterStyle);
    // rest is the same
    // ...
}

This change split the ~updateMarkers~ routine into two -- the first part starts preparing the request, but only suggests that ~finishUpdatingMarkers~ carry it out. ~finishUpdatingMarkers~ acts as a filter, throwing out any partial requests that it knows will be out of date.

Since I haven't left the development phase of this project yet[4], I've always run both the client and the server on the same machine. I noticed a definite improvement after this step.

The database schema I have suggests some improvements. First, on every query I carry out a calculation on every property to see if it's in bounds. But I know some facts about the geography of the planet, and can partition the map into a grid, assign each grid an arbitrary number based on its latitude and longitude, and track which points fall into each of those grids at each zoom level. I can also realize that for some zoom levels like "1", which is of the whole planet, all my data will be hit, and can translate a query into a "select *".

I'd start with this migration.

class CreateTiles < ActiveRecord::Migration
  def self.up
    create_table :properties do |t|
      t.column :zoom, :integer
      t.column :lat_base, :integer
      t.column :lng_base, :integer
      t.column :property_id, :integer
    end
    add_index :zoom, :lat_base, :lng_base

I don't have any code on this, so I'll leave it as an exercise. I should mention that all my calculations involving distance use the Euclidean formula we all learned in grade 7, and don't take into account that the Earth is a round solid, yet alone an ellipsoid. It works in this application, because all the data is plotted on a Mercator projection, where horizontal distances are exaggerated as they move towards either pole. If you want to show which points are closest more realistically, you'll need to use the correct formulas.

I also suggested having the client and the server both keep track of how long certain operations take. The client could keep records on how long it takes to render a certain number of points, and constantly suggest to the server the maximum number of points it's prepared to accept.

Some server operations take a long time as well. If you're sure there aren't any changes you can make, you could have the client preface one of these requests with a preliminary request on whether this is going to be an expensive operation or not. If the server replies (asynchronously, of course) that it will be, the client could break the request into smaller areas, dividing the map into four parts, for example. Then it would work on each part in a separate request.

If you're web application's taking too long, don't give up on it. As I've shown here, there are plenty of approaches you can take to find performance gains, keep your users happy, and, most important, keep your users.

[1] This is a real street, possibly not a real address, but Google Maps was able to resolve it to an actual location. It's been a while since I've been in St. Cloud, and even then I'm not sure if my booster seat was high enough to let me see out the window, but if there's a residence at that location, I hope I didn't compromise anyone's privacy. Please leave them alone if that's the case. They didn't ask to have their address published here.

[2] From "Grey in L.A." by Loudon Wainwright, on this album.

[3] On Amazon right now used copies of the PHP book are at $21.62, retail $23.09. The Rails book sells used for $18.65, against the conveniently same retail price of $23.09. Draw your own conclusions.

[4] And possibly never will, if the U.S. housing market turns around as fast as some commentators suggest it will,