Today in the announcement section of the Google Help forums Google has published a simplified view of where their business listing information comes from. The details as to how this actually functions were covered by Bill Slawsky in his review of the patent, Generating structured information, filed in early 2006.
In the recent announcement Google notes the following mechanisms and sources for the listing data noted in the schematic:
As many of you know, a local listing is often created by including data from multiple sources. We do our best to give attribution to the data appearing on a local listing. Here’s a rundown of our main sources of data and how it appears in a listing:
LBC: Local Business Center. Information submitted and verified as individual listings appears with the label Provided by business owner. Also, some feeds are submitted through the LBC.
YP: Yellow Pages. This describes information we get from public directories created and licensed from 3rd parties. In some areas, we provide attribution at the bottom of a list of results (e.g., business listings distributed by YellowPages.ca™).
EC: Enhanced Content, which can include reviews, photos, business hours, payment methods, and other details. This is provided to us via feeds from other websites. If this information is coming from a published web page, a link will be provided.
UGC + WEB: User Generated Content & other websites. Both these sources are either submitted to Google directly or crawled, just like other websearch results. If the content is hosted on a website, we’ll provide a link. Otherwise, you’ll see a Provided by Google users label that shows it was submitted using our community features.
This newly posted information essentially mirrors the details of the 2006 patent . I will detail, in the next day or two, how this “clsutering” technology leads to Google’s issues with merging business records in the index.