The claim made by Google that “…fewer than 0.5% of local searches lead to fake listings” in Google search is NOT a conclusion that can be drawn from Google’s recently published paper. This number understates the number of local searches that lead to the visibility of fake listings due to its assumptions and flawed methodology. And it may do so by a large margin.
The paper, Pinning Down Abuse on Google Maps, while providing interesting insights into Map spam from 2014 and 2015 is fundamentally flawed in its approach to the question of fake listings in Google Local and in no way warrants the optimistic conclusion that Google noted1 on their blog.
First you need to understand what Google defines as a fake listing for the purposes of this study. It only includes listings that were in gross violation of the guidelines and caught by their algo or human curation and suspended.. This excludes any listing that manipulated its name or any listing that had manipulated their reviews2. And more importantly it excludes fake listings that their algo didn’t catch.
Then you also need to understand what Google means when they say that “0.5% of local searches lead to fake listings”. They are not saying, like I erroneously thought, that only .5% of the listings are fake. They are saying that the listings that were fake & suspended were up for an average of X days and only seen Y times during that time. I.E. Compared to total searches, fake listings constituted .5% of the user’s impressions in search and Maps. But as I will detail even the visibility assumption is flawed.
The Good, The Bad & The (very) Ugly of the Google Research & Blog :
I think that it serves the industry and the public3 to know more about Map spam and to understand which sectors are likely to experience it. I laud the transparency, as limited as it has been.
We can also presume, given the noted drop in fake listing creation, that Google has closed many of these loop holes that were leading to spam at such large scale.
The paper also makes the point that it isn’t just the existence of fake listings that matter. What matters as much or perhaps more is whether they are seen by searchers. And many fake listings that we as professionals might uncover, might never be seen at all because they are buried too deeply in the ranking hierarchy.
That is a critical point in analyzing Map spam that is all too often forgotten.
Factual errors? One hopes that a paper of this caliber would hew to academic standards and document facts and do a thorough literature review.
This paper did neither. The literature review was very thin and statements to the affect that Map Spam is new and early manifestations occurred in 2015 are untrue on their face4. We saw high volume “on-premise” affiliate Maps spam in the florist industry in September of 2008 and massive “on-call” locksmith spam soon after.
In this paper, we investigate a new form of blackhat search engine optimization that targets local listing services like Google Maps. … Early forms of attacks included defacement, such as graffiti posted to Google Maps in Pakistan.5
Another problem is the big data, global approach and tone that this paper takes. Local listing spam is a hyper-local phenomenon that impacts high density markets and high value verticals. Lumping views of those listings in with views of rural churches and government offices makes no sense. This sort of “average” conclusion creates an impression that may be statistically true (although it’s not in this case) but that is also very misleading6.
The paper noted this issue but didn’t deal with it in a systemic way in the publication:
Even so, such impressions can vary across geographic locations. In particular, users in West Harrison, NY were the most affected—where 83.3% of the search results for locksmiths were abusive. In contrast, 15.6% of search results for locksmiths in New York City were abusive.
A final issue, and maybe this can’t be easily overcome7, is the assumption that is made in calculating how often fake listings are seen by searchers.
Assuming a uniform query rate, this average approximates the likelihood a user would encounter an abusive listing if Google Maps selected listings uniformly at random rather than based on search quality. Effectively, this metric discounts the (in)effectiveness of any particular listing’s SEO
Translated that says that for the purposes of the analysis, one listing will be seen as often as the next and that there is no difference due to SEO.
Given Google’s heavy weighting of a business name, any decent fake listing isn’t just going to use a fake address. They will obviously choose a name that ranks well on a high value, high frequency keyword search. And as we all know, with that keyword laden name, it will be much more visible than the average listing.
The Ugly: Problems with methodology
Small factual errors, disappointing definitions and doubtful assumptions are troublesome but do not, in and of themselves, call into question the conclusion. Flawed study design does.
One might forgive some of these sins if the over arching methodology were sound. It isn’t.
Google provided the researchers8 with listings that were suspended during the research period. The study ONLY included fake listings that had been removed from the index. It makes NO effort to estimate the fake listings that remained in the index undetected during the study period.
Limitations to our approach: Our study is biased towards abuse caught by the suspension algorithms employed by Google Maps.
The main limitation with this approach is that we cannot estimate the number of false negatives, i.e., abusive listings overlooked by Google Maps.
Thus Google’s blog post conclusion that fewer than 0.5% of local searches lead to fake listings is like a policeman saying “I spent an hour at 5 am looking for speeders. I saw ten but charged five. I saw 100 cars go by, therefore 5% of all cars speed. But I was talking to my wife for some of that time so I may have missed a few.”
A simple thought experiment, based on recent findings, can show the rough impact of this study design decision. In November 2016, Google implemented advanced verification for plumbers in San Diego. 89% of all listings were dropped from the visible index and prevented from showing. One assumes that most of them were fake by the standards of the guidelines.
According to this paper, Google tightened up verification in July 2015. And yet the even tighter advanced verification uncovered 900% more fake listings than the algo suspended. In other private research we saw similar results in the locksmith arena. It bears noting that this was in just one midsize market that is not at the epicenter of Map spam. These were listings that, during the period of the study, were trusted in Google Maps and were frequently visible in the search results but remained uncounted.
Clearly their methodology grossly under counts the number of fake listings and how often they are seen by searchers.
Beyond plumbers etc, we know that lawyers broadly abuse virtual offices as well. Because they are not targeted by the algo, these are completely missing from the analysis9. We also know that UPS and drop boxes continue to be abused agressivlely and many of the listings using that method were obviously missed. These numbers of unsuspended fake listings that are in the live index add up.
Using only those listings that were suspended by the algo as the basis for this analysis invalidates the conclusion. The best that could be said from this data is that “fake listings that we allowed into the index and were subsequently suspended were seen in local search results 0.5% of the time”.
This study offers some interesting data. But the conclusions put forth are not warranted by the research. The visibility number offered by the paper is a lowest possible estimate of the visibility of fake listings, not anything more.
Who knows what the real number of fake listings in the Maps index is or how often they might be seen?
This research does a good job showing us what segments have been targeted by Google, which markets they were in and some of the techniques they used. It does little to shed any light on the questions of fake listing visibility to the average searcher. Or more importantly to the average searcher in markets where fake listings are prevalent.
We can perhaps give the researchers some slack in this. They did make some attempt to position their summary within the obvious limitations of their data. Although not persuasively in my opinion.
On the other hand, for the Google research blog to proclaim the results with NO qualification is at best cynical and at worst deceptive.
1 – Google Research Blog: Keeping fake listings off Google Maps hrrmph! Should have read: Listings we kept off of Google Maps.
2 – Which is more fake and does more consumer harm? A locksmith with a fake address or a chiropractor with 100 fake reviews? It very well could be the later.
3 – The list of the top ten is filled with the usual suspects: Locksmiths 25.7% Plumbers, electricians 14.6% Restaurants, pizza delivery 7.3% Motels, hotels, bed-and-breakfast 5.4% Clothing stores, beauty salons 3.8% Lawyers, consultants, accountants Limousine, taxi, travel agents 1.9% Car repair, towing, dealers 1.7% Photographers, graphic designers 1.5% Movers, packers, shippers 1.5%. But bail bonds, Internet service providers, real estate agencies, and dating agencies didn’t fail to get mentioned. I am sure that “internet service providers” likely equals SEOs.
4- The starting dates for Map spam weren’t the only instances of factual errors. It was noted that Google Maps provides 4000 categories from which to choose. The number is actually closer to
2500 3500. Regardless factual errors, even small ones, reduce the credibility of the paper. And given that four of the paper’s authors work at Google, it is hard to understand how this assertion was left to stand.
5 – Massive spam of both types noted by Google were prevalent AT LEAST six years prior to the study. A simple Google search on the phase locksmith spam quickly show a result from 2011. Maybe seeing Map spam was new for them but that doesn’t excuse the factual errors. .
6 -Brings to mind the phrase as noted in Wikipedia: ‘”Lies, damned lies, and statistics“: a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments. It is also sometimes colloquially used to doubt statistics used to prove an opponent’s point.’ Although often attributed to Mark Twain, it appears to be from Benjamin Disreali. Who knew?
7 – Google does track impressions down to the listing level so it is conceivable that the data is available and could be integrated into this study. No effort appears to have been made to do so.
8 – While the lead researcher, Danny Yuxing Huang, is a 5th-year Computer Science Ph.D. at University of California, San Diego four of the other authors work for Google. At least one of the Googlers (Abishek Kumar, Team Lead of the Local Guides) is involved in the local space.
9 – I have often wondered why lawyers are treated so differently than plumbers and locksmiths. Both are violating the guidelines in much the same way and yet plumbers and locksmiths are targeted for removal while lawyers, for the most part, are given a pass and are rarely removed for the very same offense.Spam In, Garbage Out - Why Google's Recent Paper on Map Spam is Flawed by Mike Blumenthal