This guest article is written by Erek Dyskant, a database analyst, who has written a number of apps that interface with Google Maps and Yahoo Local. It provides an interesting look at the inner workings of the Google Maps database.
As I’ve been reading about some of the discrepancies between the Google Maps business directory and the Local Onebox, I thought that I’d shed some light on the likely architectural differences between the two approaches.
Google’s main search is highly distributed, based on the approach that search results need to be both quick and impervious to natural disaster. However, it’s not important that the main search results be identical across all their datacenters. If a specific site is 5th in one datacenter, but 9th in another, or one has a more recently updated version of a site than another, that’s not a major concern to Google.
However, the Google Maps Business Directory has different priorities. Data consistency is much more important as they’re dealing with structured data submitted by trusted data sources, and general purpose databases are more suited to the task at hand. While I don’t have any information to back this up, I expect that they’re using Oracle or MySQL as the datastore for the Business Directory.
I tracked down as many maps.google.com IP addresses as I could, and came up with the following list (note that it’s much shorter than Google Search addresses, where the lists tend to be on the order of several hundred):
22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199 ,188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
To determine a rough sense of how many actual data centers these were distributed to, I pinged all of these addresses and came up with the following results:
188.8.131.52: icmp_seq=0 ttl=245 time=17.2 ms
184.108.40.206: icmp_seq=0 ttl=245 time=17.5 ms
220.127.116.11: icmp_seq=0 ttl=245 time=17.8 ms
18.104.22.168: icmp_seq=0 ttl=248 time=3.45 ms
22.214.171.124: icmp_seq=0 ttl=248 time=3.50 ms
126.96.36.199: icmp_seq=0 ttl=248 time=3.71 ms
188.8.131.52: icmp_seq=0 ttl=248 time=2.93 ms
184.108.40.206: icmp_seq=0 ttl=248 time=3.91 ms
220.127.116.11: icmp_seq=0 ttl=248 time=3.45 ms
18.104.22.168: icmp_seq=0 ttl=248 time=3.70 ms
22.214.171.124: icmp_seq=0 ttl=249 time=26.0 ms
126.96.36.199: icmp_seq=0 ttl=249 time=25.1 ms
188.8.131.52: icmp_seq=0 ttl=249 time=27.2 ms
Time is the round-trip time from Superb.net’s datacenter in Washington, DC. TTL is the 255 minus the number of routers between me and google’s server. Based on the TTL and ping times, I’ve identified three data centers that Google appears to use for Google Maps and the Business Directory. One of those is located within 250 miles of DC, and most likely in Ashburn, VA.
We’ve established that Google local searches are directed to a smaller subset of Google locations, however, Google’s main search is a much more distributed process. This leaves the question of where the Local Onebox fits into everything. Since the Local Onebox is tied into the Google main search, it follows that they’d like to process the Local Onebox results in the same location that they process the main Google search in. Thus, the Local Onebox results are probably replicated from the Business Directory listings, but are stored at the full set of Google Locations and in an architecture similar to what drives the the main search results. In addition, it appears that a separate group has administrative control over the Local Onebox results, as some fraudulent entries have been removed from the Onebox while still appearing in the business directory.
/2008/08/25/comparing-the-onebox-with-the-business-directory-an-architectural-view/#comment-262502Ultimately, it makes sense to think of the Local Onebox as a separate database with a separate set of criteria from the maps business directory, as it is most likely driven by Google’s proprietary database rather than an off-the-shelf database, and information is replicated to the Onebox database(s) on a different timetable.
Edit (Monday 7:25pm): I ran another test where I compared the response time with three main Google searches all to the same data center. I ran one search that did not return a onebox, and the response time was 505ms, and then I ran another search that did return a onebox and the response time was 650ms.Â Running that same search with the onebox again took 504ms.Â This indicates that Google may be looking up the onebox results once from a local business directory NOC, and then caching that result for an unknown period of time.Â This would cause the same behavior of inconsistent onebox results.