{"id":20394,"date":"2017-04-09T08:12:20","date_gmt":"2017-04-09T12:12:20","guid":{"rendered":"http:\/\/blumenthals.com\/blog\/?p=20394"},"modified":"2017-04-13T09:17:07","modified_gmt":"2017-04-13T13:17:07","slug":"spam-in-garbage-out-why-googles-recent-paper-on-map-spam-is-flawed","status":"publish","type":"post","link":"https:\/\/blumenthals.com\/blog\/2017\/04\/09\/spam-in-garbage-out-why-googles-recent-paper-on-map-spam-is-flawed\/","title":{"rendered":"Spam In, Garbage Out &#8211; Why Google&#8217;s Recent Paper on Map Spam is Flawed"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-thumbnail wp-image-20444\" src=\"http:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/map-spam1-243x200.jpg\" alt=\"\" width=\"243\" height=\"200\" srcset=\"https:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/map-spam1-243x200.jpg 243w, https:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/map-spam1-520x428.jpg 520w, https:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/map-spam1-768x632.jpg 768w, https:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/map-spam1.jpg 816w\" sizes=\"auto, (max-width: 243px) 100vw, 243px\" \/>The <a href=\"https:\/\/research.googleblog.com\/2017\/04\/keeping-fake-listings-off-google-maps.html\">claim<\/a> made by Google<span style=\"font-size: 12px;\">\u00a0<\/span>that<em> &#8220;&#8230;fewer than 0.5% of local searches lead to fake listings&#8221;<\/em> in Google search\u00a0is NOT a conclusion that can be drawn from Google&#8217;s recently published\u00a0<a href=\"https:\/\/static.googleusercontent.com\/media\/research.google.com\/en\/\/pubs\/archive\/45976.pdf\">paper<\/a>. This number understates the number of local searches that lead to\u00a0the\u00a0visibility of fake listings due to its assumptions and flawed methodology. And it may do so by a large margin.<\/p>\n<p>The paper, <a href=\"https:\/\/static.googleusercontent.com\/media\/research.google.com\/en\/\/pubs\/archive\/45976.pdf\">Pinning Down Abuse on Google Maps<\/a>,\u00a0 while providing interesting insights into Map spam from 2014 and 2015 is \u00a0fundamentally flawed in its approach to the question of fake listings in Google Local and in no way warrants the optimistic conclusion that Google <a href=\"https:\/\/research.googleblog.com\/2017\/04\/keeping-fake-listings-off-google-maps.html\">noted<sup>1<\/sup>\u00a0<\/a> on their blog.<\/p>\n<p><strong>Definitions<\/strong><\/p>\n<p>First you need to understand what Google defines as a fake listing for the purposes of this study. It only includes listings that were in gross violation of the guidelines <strong>and<\/strong> caught by their algo or human curation and suspended.. \u00a0This\u00a0 excludes any listing that manipulated its name or any listing that had manipulated their reviews<sup>2<\/sup>. \u00a0And more importantly it excludes fake listings that their algo didn&#8217;t catch.<\/p>\n<p>Then you also need to understand what Google means\u00a0when they say that &#8220;0.5% of local searches lead to fake listings&#8221;. They are not saying, like I <a href=\"http:\/\/blumenthals.com\/blog\/2017\/04\/06\/google-announces-study-looking-abuses-in-google-maps\/\">erroneously thought<\/a>, that only .5% of the listings are fake. They are saying that the listings that were fake &amp; suspended were up for an average of X days and only seen Y times during that time. I.E. Compared to total searches, \u00a0fake listings\u00a0constituted .5% of the user&#8217;s impressions in search and Maps. But as I will detail even the visibility assumption is\u00a0flawed.<\/p>\n<p><strong>The Good, The Bad &amp; The (very) Ugly of the Google Research &amp; Blog :<\/strong><!--more--><\/p>\n<p><strong>The Good<\/strong><\/p>\n<p>I think that it serves the industry and the public<sup>3<\/sup> to know more about Map spam and to understand which sectors are likely to experience it. I laud the transparency, as limited as it has been.<\/p>\n<p>We can also presume, given the noted drop in fake listing creation, that Google has closed many of these loop holes that were leading to spam at such large scale.<\/p>\n<p>The paper also makes the point that it isn&#8217;t just the existence of fake listings that matter. What matters as much or perhaps more is whether they are seen by searchers. And many fake listings that we as professionals might uncover, might never be seen at all because they are buried too deeply in the ranking hierarchy.<\/p>\n<p>That is a critical point in analyzing Map spam that is all too often forgotten.<\/p>\n<p><strong>The Bad<\/strong><\/p>\n<figure style=\"width: 324px\" class=\"wp-caption alignleft\"><a href=\"http:\/\/blumenthals.com\/blog\/2009\/02\/18\/google-maps-proves-more-locksmiths-in-nyc-than-cabs\/\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2009\/02\/locksmith.jpg\" width=\"324\" height=\"258\" \/><\/a><figcaption class=\"wp-caption-text\">Screen shot of Locksmith spam in early 2009<\/figcaption><\/figure>\n<p>Factual errors? One hopes that a paper of this caliber would hew to academic standards and document facts and do a thorough literature review.<\/p>\n<p>This paper did neither. The literature review was very thin and statements to the affect that Map Spam is new and early manifestations <a href=\"https:\/\/www.theguardian.com\/technology\/2015\/may\/12\/google-shuts-off-map-maker-urinating-robot-\">occurred in 2015\u00a0<\/a>are untrue on their face<sup>4<\/sup>. We saw high volume &#8220;on-premise&#8221;\u00a0<a href=\"http:\/\/blumenthals.com\/blog\/2008\/09\/18\/google-maps-widespread-hijacking-of-business-listings-confirmed\/\">affiliate Maps spam<\/a> in the florist industry in September of 2008 and massive &#8220;on-call&#8221; <a href=\"http:\/\/blumenthals.com\/blog\/2009\/02\/18\/google-maps-proves-more-locksmiths-in-nyc-than-cabs\/\">locksmith spam<\/a> soon after.<\/p>\n<blockquote><p>In this paper, we investigate a new form of blackhat search engine optimization that targets local listing services like Google Maps. &#8230;\u00a0Early forms of attacks included defacement, such as graffiti posted to Google Maps in Pakistan.<sup>5<\/sup><\/p><\/blockquote>\n<p>Another problem is the big data, global approach and tone that this paper takes. Local listing spam is a hyper-local phenomenon that impacts high density markets and high value verticals. Lumping views of those listings in with views of rural churches and government offices makes no sense. This sort of &#8220;average&#8221; conclusion creates an impression that may be statistically true (although it&#8217;s not in this case) but that is also very misleading<sup>6<\/sup>.<\/p>\n<p>The paper noted this issue but didn&#8217;t deal with it in a systemic way in the publication:<\/p>\n<blockquote><p>Even so, such impressions can vary across geographic locations. In particular, users in West Harrison, NY were the most affected\u2014where 83.3% of the search results for locksmiths were abusive. In contrast, 15.6% of search results for locksmiths in New York City were abusive.<\/p><\/blockquote>\n<figure id=\"attachment_20412\" aria-describedby=\"caption-attachment-20412\" style=\"width: 387px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20412\" src=\"http:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/Screen-Shot-2017-04-08-at-12.47.21-PM.png\" alt=\"\" width=\"387\" height=\"265\" srcset=\"https:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/Screen-Shot-2017-04-08-at-12.47.21-PM.png 483w, https:\/\/blumenthals.com\/blog\/wp-content\/uploads\/2017\/04\/Screen-Shot-2017-04-08-at-12.47.21-PM-292x200.png 292w\" sizes=\"auto, (max-width: 387px) 100vw, 387px\" \/><figcaption id=\"caption-attachment-20412\" class=\"wp-caption-text\">Search on &#8220;<a href=\"https:\/\/www.google.com\/search?client=safari&amp;rls=en&amp;q=best+motorcycle+lawyer+los+angeles&amp;ie=UTF-8&amp;oe=UTF-8\">best motorcycle lawyer Los Angeles<\/a>&#8221; conducted 4\/8\/2017 that has been in the index for quite some time. Missed by the algo and not suspended examples like this abound.<\/figcaption><\/figure>\n<p>A final issue, and maybe this can&#8217;t be easily overcome<sup>7<\/sup>, is the assumption that is\u00a0made in calculating\u00a0how often fake listings are seen by searchers.<\/p>\n<blockquote><p>Assuming a uniform query rate, this average approximates the likelihood a user would encounter an abusive listing if Google Maps selected listings uniformly at random rather than based on search quality. Effectively, this metric discounts the (in)effectiveness of any particular listing\u2019s SEO<\/p><\/blockquote>\n<div class=\"mceTemp\"><\/div>\n<p>Translated that says that for the purposes of the analysis, one listing will be seen as often as the next and that there is no difference due to SEO.<\/p>\n<p>Given Google&#8217;s heavy weighting of a business name, any decent fake listing isn&#8217;t just going to use a fake address. They will obviously choose a name that ranks well on a high value, high frequency keyword search. And as we all know, with that keyword laden name, it will\u00a0be much more\u00a0visible than the average listing.<\/p>\n<p><strong>The Ugly: Problems with methodology<\/strong><\/p>\n<p>Small factual errors,\u00a0disappointing definitions and doubtful assumptions are troublesome but do not, in and of themselves, call into question the conclusion. Flawed study design does.<\/p>\n<p>One might forgive some of these sins if the over arching methodology were sound. It isn&#8217;t.<\/p>\n<p>Google provided the researchers<sup>8<\/sup> with listings that were suspended during the research period. The study ONLY included fake listings that had been removed from the index. It makes NO effort to estimate the fake listings that remained in the index undetected during the study period.<\/p>\n<blockquote><p>Limitations to our approach: Our study is biased towards abuse\u00a0caught by the suspension algorithms employed by Google Maps.<\/p>\n<p>The main limitation with this approach is that we cannot estimate\u00a0the number of false negatives, i.e., abusive listings overlooked by\u00a0Google Maps.<\/p><\/blockquote>\n<p>Thus Google&#8217;s blog post conclusion that <em>fewer than 0.5% of local searches lead to fake listings i<\/em>s like a\u00a0policeman saying \u201cI spent an hour at 5 am looking for speeders. I saw ten but charged five. I saw 100 cars go by, therefore 5% of all cars speed. But I was talking to my wife for some of that time so I may have missed a few.\u201d<\/p>\n<p>A simple thought\u00a0experiment, based on recent findings, can show the rough impact of this study design decision. In\u00a0November 2016, Google <a href=\"http:\/\/searchengineland.com\/googles-advanced-verification-test-san-diego-just-dropped-89-listings-3-pack-263222\">implemented<\/a> advanced verification for plumbers in San Diego. 89% of all listings were dropped from the visible index and prevented from showing. One assumes that most of them were fake by the standards of the guidelines.<\/p>\n<p>According to this paper, Google tightened up verification in July 2015. And yet the even tighter advanced verification <a href=\"http:\/\/searchengineland.com\/googles-advanced-verification-test-san-diego-just-dropped-89-listings-3-pack-263222\">uncovered<\/a> \u00a0900% more fake listings than the algo suspended. In other private research we saw similar results in the locksmith arena. It bears noting that\u00a0this was in just one midsize market that is not at the epicenter of Map spam. These were listings that, during the period of the study, were trusted in Google Maps and were frequently visible in the search results but remained uncounted.<\/p>\n<p>Clearly their methodology grossly under counts the number of fake listings and how often they are seen by searchers.<\/p>\n<p>Beyond plumbers etc, we \u00a0know that lawyers broadly abuse virtual offices as well. Because they are\u00a0not targeted by the algo, these are completely missing from the analysis<sup>9<\/sup>. We also know that UPS and drop boxes continue to be abused agressivlely and many of the listings using that method\u00a0were obviously missed. These numbers of unsuspended fake listings that are in the live index add up.<\/p>\n<p>Using only those listings that were suspended by the algo as the basis for this analysis invalidates the conclusion. The best that could be said from this data is that &#8220;fake listings that we allowed into the index and were subsequently suspended were seen\u00a0in local search results\u00a00.5% \u00a0of the time&#8221;.<\/p>\n<p><strong>Conclusion<\/strong>:<\/p>\n<p>This study offers some interesting data. But the conclusions put forth are not warranted by the research.\u00a0The visibility\u00a0number offered by the paper is a lowest possible estimate of the visibility of fake listings, not anything more.<\/p>\n<p>Who knows what the real number of fake listings in the Maps index is or how often they might be seen?<\/p>\n<p>This research does a good job showing us what segments have been targeted by Google, \u00a0which markets they were in and some of the techniques they used.\u00a0It does little to shed\u00a0any light on\u00a0the\u00a0questions of fake listing visibility to the average searcher. Or more importantly to the average searcher in markets where fake listings are prevalent.<\/p>\n<p>We can perhaps give the researchers some slack in this. They did make some attempt to position their summary\u00a0within the obvious limitations of their data. Although not persuasively in my opinion.<\/p>\n<p>On\u00a0the other hand, for the Google research blog to proclaim the results with NO qualification is at best cynical and at worst deceptive.<\/p>\n<p><sup>1 &#8211; Google Research Blog:\u00a0<a href=\"https:\/\/research.googleblog.com\/2017\/04\/keeping-fake-listings-off-google-maps.html\">\u00a0Keeping fake listings off Google Maps\u00a0<\/a>hrrmph! Should have read: Listings we kept off of Google Maps. <\/sup><\/p>\n<p><sup>2\u00a0&#8211; Which is more fake and does more consumer harm? A locksmith with a fake address or a chiropractor with 100 fake reviews? It very well could be the later.<\/sup><\/p>\n<p><sup>3\u00a0&#8211; The list of the top ten is filled with the usual suspects:\u00a0Locksmiths 25.7% Plumbers, electricians 14.6% Restaurants, pizza delivery 7.3% Motels, hotels, bed-and-breakfast 5.4% Clothing stores, beauty salons 3.8% Lawyers, consultants, accountants Limousine, taxi, travel agents 1.9% Car repair, towing, dealers 1.7% Photographers, graphic designers 1.5% Movers, packers, shippers 1.5%. But\u00a0bail bonds, Internet service providers, real estate agencies, and dating agencies didn&#8217;t fail to get mentioned. I am sure that &#8220;internet service providers&#8221; likely equals SEOs.\u00a0<\/sup><\/p>\n<p><sup>4- The starting dates for Map spam\u00a0\u00a0weren&#8217;t the only instances of factual errors. It was noted that Google Maps provides 4000 categories from which to choose. The number is actually closer to <del>2500 <\/del>\u00a03500. \u00a0Regardless factual\u00a0errors, even small ones, reduce the credibility of the paper. And given that four of the paper&#8217;s authors work at Google, it is hard to understand how this assertion was left to stand.\u00a0<\/sup><\/p>\n<p><sup>5\u00a0&#8211; Massive spam of both types noted by Google were prevalent AT LEAST six years prior to the study. A simple Google search on the phase\u00a0<a href=\"https:\/\/www.google.com\/search?client=safari&amp;rls=en&amp;q=locksmith+spam&amp;ie=UTF-8&amp;oe=UTF-8\">locksmith spam<\/a>\u00a0quickly show a result from 2011. \u00a0Maybe seeing Map spam was\u00a0new for them but that doesn&#8217;t excuse the factual errors. . \u00a0<\/sup><\/p>\n<p><sup>6\u00a0-Brings to mind\u00a0the\u00a0phrase \u00a0as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Lies,_damned_lies,_and_statistics\">noted<\/a> in Wikipedia: &#8216;&#8221;<b>Lies, damned lies, and statistics<\/b>&#8220;: a phrase describing the persuasive power of numbers, particularly the use of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Statistics\" title=\"Statistics\">statistics<\/a> to bolster weak <a href=\"https:\/\/en.wikipedia.org\/wiki\/Argument\" title=\"Argument\">arguments<\/a>. It is also sometimes colloquially used to doubt statistics used to prove an opponent&#8217;s point.&#8217; \u00a0Although\u00a0often attributed to Mark Twain, it appears to be from Benjamin Disreali. Who knew?<\/sup><\/p>\n<p><sup>7 &#8211; Google does track impressions down to the listing level so it is conceivable that the data is available and could be integrated into this study. No effort appears to have been made\u00a0to do so.\u00a0<\/sup><\/p>\n<p><sup>8 &#8211; While the lead researcher, <a href=\"https:\/\/www.linkedin.com\/in\/dannyyhuang\/\">Danny Yuxing Huang<\/a>, is a 5th-year Computer Science Ph.D. at University of California, San Diego four of the other authors work for Google. At least one of the Googlers (Abishek Kumar, Team Lead of the Local Guides) is involved in the local space.<\/sup><\/p>\n<p><sup>9 &#8211; I have often wondered why lawyers are treated so differently than plumbers and locksmiths. Both are violating the guidelines in much the same way and yet plumbers and locksmiths are targeted for removal while lawyers, for the most part, are given a pass and are rarely removed for the very same offense.\u00a0<\/sup><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The claim made by Google\u00a0that &#8220;&#8230;fewer than 0.5% of local searches lead to fake listings&#8221; in Google search\u00a0is NOT a conclusion that can be drawn from Google&#8217;s recently published\u00a0paper. This number understates the number of local searches that lead to\u00a0the\u00a0visibility of fake listings due to its assumptions and flawed methodology. And it may do so &#8230;<\/p>\n","protected":false},"author":262,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[],"class_list":["post-20394","post","type-post","status-publish","format-standard","hentry","category-google-plus"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/posts\/20394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/users\/262"}],"replies":[{"embeddable":true,"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/comments?post=20394"}],"version-history":[{"count":42,"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/posts\/20394\/revisions"}],"predecessor-version":[{"id":20470,"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/posts\/20394\/revisions\/20470"}],"wp:attachment":[{"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/media?parent=20394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/categories?post=20394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blumenthals.com\/blog\/wp-json\/wp\/v2\/tags?post=20394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}