Comparing the Local Onebox with the Maps business directory: an architectural view

This guest article is written by Erek Dyskant, a database analyst, who has written a number of apps that interface with Google Maps and Yahoo Local. It provides an interesting look at the inner workings of the Google Maps database.

As I’ve been reading about some of the discrepancies between the Google Maps business directory and the Local Onebox, I thought that I’d shed some light on the likely architectural differences between the two approaches.

Google’s main search is highly distributed, based on the approach that search results need to be both quick and impervious to natural disaster. However, it’s not important that the main search results be identical across all their datacenters. If a specific site is 5th in one datacenter, but 9th in another, or one has a more recently updated version of a site than another, that’s not a major concern to Google.

However, the Google Maps Business Directory has different priorities. Data consistency is much more important as they’re dealing with structured data submitted by trusted data sources, and general purpose databases are more suited to the task at hand. While I don’t have any information to back this up, I expect that they’re using Oracle or MySQL as the datastore for the Business Directory.

I tracked down as many maps.google.com IP addresses as I could, and came up with the following list (note that it’s much shorter than Google Search addresses, where the lists tend to be on the order of several hundred):

209.85.133.104, 209.85.133.147, 209.85.133.99, 64.233.161.104, 64.233.161.147, 64.233.161.99, 66.102.1.103, 66.102.1.102 ,66.102.1.147, 66.102.1.99, 72.14.205.99, 72.14.205.103, 72.14.205.104, 72.14.205.147

To determine a rough sense of how many actual data centers these were distributed to, I pinged all of these addresses and came up with the following results:

209.85.133.104: icmp_seq=0 ttl=245 time=17.2 ms
209.85.133.147: icmp_seq=0 ttl=245 time=17.5 ms
209.85.133.99: icmp_seq=0 ttl=245 time=17.8 ms

64.233.161.104: icmp_seq=0 ttl=248 time=3.45 ms
64.233.161.147: icmp_seq=0 ttl=248 time=3.50 ms
64.233.161.99: icmp_seq=0 ttl=248 time=3.71 ms
66.102.1.103: icmp_seq=0 ttl=248 time=2.93 ms
66.102.1.104: icmp_seq=0 ttl=248 time=3.91 ms
66.102.1.147: icmp_seq=0 ttl=248 time=3.45 ms
66.102.1.99: icmp_seq=0 ttl=248 time=3.70 ms

72.14.205.99: icmp_seq=0 ttl=249 time=26.0 ms
72.14.205.103: icmp_seq=0 ttl=249 time=25.1 ms
72.14.205.104: icmp_seq=0 ttl=249 time=27.2 ms

Time is the round-trip time from Superb.net’s datacenter in Washington, DC. TTL is the 255 minus the number of routers between me and google’s server. Based on the TTL and ping times, I’ve identified three data centers that Google appears to use for Google Maps and the Business Directory. One of those is located within 250 miles of DC, and most likely in Ashburn, VA.

We’ve established that Google local searches are directed to a smaller subset of Google locations, however, Google’s main search is a much more distributed process. This leaves the question of where the Local Onebox fits into everything. Since the Local Onebox is tied into the Google main search, it follows that they’d like to process the Local Onebox results in the same location that they process the main Google search in. Thus, the Local Onebox results are probably replicated from the Business Directory listings, but are stored at the full set of Google Locations and in an architecture similar to what drives the the main search results. In addition, it appears that a separate group has administrative control over the Local Onebox results, as some fraudulent entries have been removed from the Onebox while still appearing in the business directory.

/2008/08/25/comparing-the-onebox-with-the-business-directory-an-architectural-view/#comment-262502Ultimately, it makes sense to think of the Local Onebox as a separate database with a separate set of criteria from the maps business directory, as it is most likely driven by Google’s proprietary database rather than an off-the-shelf database, and information is replicated to the Onebox database(s) on a different timetable.

Edit (Monday 7:25pm): I ran another test where I compared the response time with three main Google searches all to the same data center. I ran one search that did not return a onebox, and the response time was 505ms, and then I ran another search that did return a onebox and the response time was 650ms.  Running that same search with the onebox again took 504ms.  This indicates that Google may be looking up the onebox results once from a local business directory NOC, and then caching that result for an unknown period of time.  This would cause the same behavior of inconsistent onebox results.

Please consider leaving a comment as your input will help me (& everyone else) better understand and learn about local.
Comparing the Local Onebox with the Maps business directory: an architectural view by

9 thoughts on “Comparing the Local Onebox with the Maps business directory: an architectural view”

  1. Thus, the Local Onebox results are probably replicated from the Business Directory listings, but are stored at the full set of Google Locations

    My gosh, it’s fascinating just to read how you went about figuring this out, Erek, let alone reading your conclusions. I’m awed. Thank you for taking the time to explain this, and thanks to Mike for providing a place for you to share this information!

    Miriam

  2. Thanks for the insight. I work as a SEM manager and everything google and local seems to be up in smoke… this puts my thoughts a little more at ease; we all seem to be a little confused by the whole situation. Good read!

  3. Erek, fabulous synopsis. This gives me the kick in the you-know-what I needed to arrange and publish the Local Search Ranking Factors contributors’ thoughts on this subject.

  4. Interesting read! Although I should note that I think Google uses BigTable as their DBMS on top of their own Google File System (short, GFS).

    read more about gfs here : http://labs.google.com/papers/gfs.html
    and BigTable http://labs.google.com/papers/bigtable.html

    in the paper about bigtable it actually states that google earth data is stored in that DBMS as well as web indexing. since the onebox is part of universal search and thus web indexing I would say it uses the same system. But, with a different dataset, like you said.
    Let me know what you think!

    Guest posts are interesting!

  5. @Miriam
    Thanks. Too kind.

    @Bethany
    Unfortunately the local search arena is still much like the wild west. Everyone knows it’s important, but there aren’t any hard fast rules. It’d really be nice if some simple things were published: The original data source, times that the listing was updated, the last time the data source was refreshed, etc. and if there were some degree of guaranteed customer service both to get listed if you’re not, and to get fraudulent listings removed.

    @David
    Looking forward to seeing that if you publish it.

  6. @Martin
    Thanks for bringing up BigTable. It really is a facinating read, and I enjoyed reading the paper again. Definitely possible that that’s their data store, however while it describes the Google Earth implementation in detail, it doesn’t mention the Local Business data anywhere in the discussion.

    It’s well known from job postings and community participation that Google maintains substantial applications that use both Oracle and MySQL, and because of its relational nature, I think that the business directory is an application that’d be well suited to a general purpose database.

    So, essentially, it could be BigTable or it could be a normal RDBMS. Even if the raw data is stored in a regular database, there’s a strong chance that the cached onebox results are stored in a blob in a BigTable row.

  7. @Erek didn’t think of the job applications/community + useful resource of the actual software they use. clever thinking. And clear explanation of a possible set up, thx!

  8. @martijn
    When I’m researching a company I often glance at the job postings. It gives a good sense of how they work in ways that they won’t otherwise publicize.

  9. Seems Erek knows his stuff. I went and contacted him as soon as I read this because I would love to host some of his articles on my site as well. I guess I was right when I wrote my article on how this world is heading into an era where computers along with databases are becoming increasingly common given Google Maps also makes extensive use of a database.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Comments links could be nofollow free.