Author

Analytics my profession

SEO my passion

User Experience my obsession

Follow me on
Category | Google Analytics
Difficulty |

F.A.Q. Spam in Google Analytics

There is no doubt that the spam is a big issue for many Google Analytics users.

To help you understand what you should worry about and what you shouldn't, here is a compilation of the most common questions and concerns I receive from emails and comments of users with this issue.

Is Google doing something to handle this threat?

This is one of the first questions my readers ask me "Is the GA team doing something to stop it?"

The truth is that they are constantly fighting the spam, believe me, it could be a lot worse if they weren't doing something to control it.

Is it enough? It is hard to tell, the problem is that every time Google implements a measure the spammers find other ways to get through.

Remember that Google Analytics is a free service (if you are not one of the few using the premium service) and it is used by millions of sites, this has two consequences, (1) it is extremely attractive for spammers, and (2) any change in the service core has a great impact on all users. So they have to do it right and not mess up other functionalities.

How to efficiently deal with the Spam in Google Analytics?

The main problem with the spam is that there are many articles offering all types of solutions and people get confused. Some of these posts offer partial solutions or may even make the issue even worse.

To avoid this, I built a detailed guide with the most efficient solutions against the spam in Google Analytics. These solutions have been proven to work for over 2 years.

Here are some examples of sites that used these solutions:

You can go to the Ultimate Guide for Getting Rid of the Spam in Analytics or if you prefer I can take care of it for you.

What is Referrer Spam?

Referrer spam or referral spam is fake traffic sent to Google analytics in order to attract people to their sites. Originally the spammers were sending fake data only as referrals, but now you can find it everywhere in your reports like keywords (organic), pages, events or even as a language.

Google Analytics Spam dimensions
Spam in different dimensions

What are the most common types of spam?

The most active types of spam are:

  • Crawler Spam this was the first type of spam it uses bots to leave fake referrals.
  • Ghost Spam this is the most common and aggressive, you may find it almost anywhere on your reports and it only affects Google Analytics, it never passes through your site.

Bot Direct Traffic - This is not technically spam but spiders (aka bots) compiling information some are good and some less.

Each one of them has different characteristics, therefore, different ways of dealing with them.

What is Crawler Referrer Spam?

Crawler spam is a spider programmed to navigate through sites to leave fake referrals in Analytics and logs of the site. These URLs will attract the users to the spammer site when searching for information.

Examples: All semalt.com variations, uptime-alpha.net

  • These crawlers will usually ignore all rules like robots.txt
  • Crawler spam is far less frequent than Ghost spam since it requires more resources from the spammer.
  • Although they can be blocked using server solutions (htaccess, web.config, WordPress plugins), it is recommended to use filters in GA, since the number of hits is very low.

Note: A common mistake when filtering this type of spam is using the Referral field on the filter, instead you should use the Campaign Source, otherwise, the filter won't work.

What is Ghost Spam?

A difference with the crawler spam, ghost spam NEVER accesses your site. Ghost spam is fake data sent directly to Google Analytics servers. This type of spam only needs an existing tracking code number to hit, it doesn't matter if it is or not inserted on an active site. Some of the key characteristics:

  • No matter if you use WordPress, Joomla, Shopify or any other Content Manager System (CMS), the only way to stop ghost spam in Google Analytics is with filters.
  • Server-side solutions like WordPress plugins, the htaccess file or the web.config are useless.
  • You can find ghost spam in almost any report: Referral, Organic, Direct, Language, and Events.

This type of spam is the preferred by the spammers and currently the most used.

If it never visits my site. How does Ghost Spam hit my Analytics?

They use the Analytics Measurement Protocol to reach your Analytics directly without passing through your Site. This Protocol is intended to allow developers to send data directly to Google Analytics Servers to measure how users interact with their business from almost any environment.

Contrary to what some people think the spammer doesn't get your tracking ID from your page, this will require crawling your site, which means more effort.

how-does-ghost-spam-hit-google-analytics-measurement-protocol

What they most probably do is generate random codes with the GA pattern (UA-XXXXXX-Y) in combination with an automated script that send the fake data to thousands of Properties.

Are you sure Ghost Spam never access my site? (demonstration)

I sometimes get some emails asking how is it possible to get traffic in GA if they never pass through your site.

So I decided to make a small demonstration that shows that it hits your Reports directly and why server solutions won't work. I took a segment of a Google Analytics Report with all Referrer Spam (crawler and ghost) that hit me in March 2015.

Referrer Spam List

I used AWStats to analyze the access log of my site (same month) and looked for the name of all the Spam on the previous list.

AwStats Ohow.co

As you can see only semalt.com and buttons-for-website.com (marked in red) which are crawlers, are logged. The rest (marked in blue) are all Ghost Spam, and there is no trace whatsoever of them in the local access log.

What is the valid hostname filter and why is so important?

The valid hostname filter is the best solution against the spam in Google Analytics. There are four huge advantages to using this filter:

  • It's preventive, unlike the campaign source filter.
  • Non-Very little maintenance is required since only one filter is needed to do all the work.
  • It will stop any form of ghost spam whether it shows as a referral, organic, event, or direct visit.
  • Will help you keep away other irrelevant traffic, such as hostnames that you use for testing.
  • Since it doesn't know the site, all data will be faked including the hostname.
Exclude Filter vs Valid Hostname filter - Google Analytics Ghost Spam

How does the valid hostname filter work?

Since all ghost spam (the most common and obnoxious type of spam) leaves a fake hostname in your reports by creating a filter that includes only valid hostnames you will automatically leave it out.

What is a hostname in Google Analytics?

A hostname is every place where one of your visits arrives. It will mainly be your domain but it could also be services where you added your GA tracking code. Every visit in your Google Analytics will have a source and a hostname.

  • Source: the place where the visit originates (i.e. referral, organic, direct, social).
  • Hostname: the place where the visit arrives. In most cases, this is your domain.
source-vs-hostname-google-analytics

To make it clearer, I will give you an example:

If we consider a visit that comes from Facebook to this article:

Facebook >> www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/

The visit will be recorded in Google Analytics like this: 

  • Hostname: www.ohow.co
  • Source (referral): facebook.com

Why Should You Use Campaign Source Instead of Referral?

It may seem reasonable using the Referral field on the filters to try to exclude Referrals, however, that is not how you should do it.

Instead, you should use Campaign Source. But why?

First of all, it is in the Analytics documentation for filtering referrals, no matter if they are spam or not.

To expand more on why, usually, a valid visit will come with a valid value for the HTTP header Referrer. However, that's not the case of most of the spam and even some real sources.

When spammers send spam they add the source and medium by using UTM parameters, so they will appear as referrals but they don't have the information for the HTTP header Referrer. That is why a filter with the Referral field won't work.

In many occasions, the filter may seem like it is working when using the referral field this is because some of the spam has a short lifetime of weeks even days. So the spam just stop comming but it wasn't blocked

Does the Spam harm my SEO/SERP?

The short answer is NO, at least not directly. If we consider that the spam corrupts your data and may cloud your decisions then it can affect in some part your SEO.

However, if we talk about the data left by the spammer, like the bounce rate or the Avg. Session time then you shouldn't worry about it, Google has officially stated that they don't use Google Analytics data as a ranking factor in any way, and John Mueller, Webmaster Analyst at Google, recently confirmed it:

If you think about it, using Google Analytics data for rankings won't make sense for two main reasons.

  • First, although GA is widely used, not every website uses it so it wouldn't be a fair benchmark.
  • Second, the data in GA can be easily be manipulated in many forms, and people could fake the Bounce rate, for example, if you insert the code multiple times in your site the bounce rate will be close to 0.

That said, we have to look at the other aspects; SEO is not only related to your Rankings, but it is also related to the analysis of your data to make better decisions and improve your search ranking (SERP).

In that case, the answer is Yes, the spam and any other irrelevant traffic that lowers the accuracy of your reports, affects your SEO, because the data you are analyzing is polluted, and it may mislead you on those decisions.

Does the Spam Represent a Security Issue?

No, as long as you don't insert any script from the spammer website.

Sometimes ghost spam leaves weird pages on Google Analytics, and people think that the website was hacked in some way. But as you already know is all fake, injected by the spammer directly into your GA reports.

Just make sure it is spam and not real pages injected on your website somehow. If you can open the page on your site then you might have been hacked.

How did the spammer target my analytics?

The truth is that they don't pick the analytics, they just target random tracking codes in the form of UA-000000-1, yours just happened to be on the list.

What is the purpose of the Spam?

You may wonder how they get benefit from this. People are curious by nature, and they want to know what is going on their websites, so they visit the URL of the referral without knowing it is fake. The surprise comes when they find that there is no mention of their site at all.

Google Analytics Spam Purpose

The spammers hit thousands of Google Analytics properties so you can imagine the amount of traffic they are getting with this blackhat technique.

While the common purpose is to lure people to visit the fake referral, the final objective changes:

  • Promote a page
  • Get your email
  • Sell you a service
  • Try to make you insert a script on your site (case of free-share-button)
  • Redirect you to an online store where they get a commission through an affiliate program. A common store used is aliexpress.com

How to detect Spammy traffic?

Not all unusual traffic is spam, so before filtering it, you should do a little research.

First, you can check if the odd visit is on this list that is constantly updated.

If you can't find it there, then try searching for it, but don't type the URL directly in the browser, or you will be redirected to the spammer site, instead search it like this suspicioussite.com / referral 

If you still can't find information about it, you can analyze the data left by the spam. Ghost Spam is easier to spot since all data is fake. Just check the hostname of the referral if it is (not set) or some weird name that doesn't belong to you then it's spam, below in red are some examples.

detect Referrer Spam by hostname

Crawlers (orange), on the other hand, are harder to detect because they do leave real data. You can try using a combination of the following characteristics to find if it is spam:

  • Landing Page and Page Title: Homepage /
  • Bounce rate: either close to 0% or close to 100%
  • Avg. Session Time: close to 0 seconds.

Can I use the Referral Exclusion List for Spam?

No! This is one of the most common mistakes. The purpose of this feature is to exclude real referrals, so they don’t trigger a new session, like avoiding payment gateways from being counted as a referral.

Adding spam on this list will only strip the referral part and will leave it as a direct visit instead, which is even worse since it is harder to detect and filter later.

When is OK to use the referral exclusion list?

Third-party payment processors

If you use third-party payment processors like Paypal, Shopify, etc., consider adding them to the Exclusion List.

Cross-subdomain tracking

When you use the same tracking code across your subdomains. You should add your domain to the Exclusion List

referral exclusion list example

Why You Can't Use Server Solutions (WordPress plugins, .htaccess) for Ghost Spam?

If you read how ghost spam hits your analytics then you know now that it never passes through your site. So trying to block it with server-side solutions like the plugins, the .htaccess file or the web.config file won't do any good.

In the worst case scenario, and I've seen this a lot, it will shut down your site completely because these files are very sensitive, and just a misplaced character could cause a lot of troubles.

You can block crawler spam with this methods, however, the amount of traffic generated by it is very low for this. My recommendation is to use filters for both types of spam.

Why am I getting "This filter would not have changed your data..."?

There are 2 common reasons, first, your filter is not correctly configured and the second is related to the data used by the Filter verification feature. This tool uses only a sample of your data so if it doesn't find a match in the sampling you will get this message.

If you are sure you configured the filter correctly just ignore the message, if you still want to make sure you can use an advanced segment to test your filter.

Are there other ways of preventing the Spam?

I highly recommend you filtering the spam from Google Analytics even the crawlers but if you prefer to use other methods you could use these:

Blocking the spam from your server (ONLY Crawlers)

You can use configuration files and rules to block the spam from your server. Just be aware that this will only work with a small portion of the spam the crawlers, ghost spam never visits your site so it's not possible to block it from there.

Changing your tracking ID number (Only new accounts)

This method doesn't exactly Block the Spam, but it makes your Google Analytics less targetable to them. It is a good option for new Websites. Since the Spam usually targets codes ending in -1, if you change your Google Analytics tracking ID a higher number UA-XXXXXXX-3 some of them won't reach you.

To do this just create a new property under the Admin section of your Analytics.

 

GA tracking id 3

Is there a Spam List?

You can find here a comprehensive list of the Spam that hit Google Analytics over the last couple of years. Here is an example.

Historical Ghost Spam List

##-1.website-speed-check.site ##-1.website-speed-checker.site ##-1.website-speed-up.top
##-1.website-speed-up.site ##-1.site-speed-check.site  ##-site-speed-up.site 
##-1.site-speed-checker.site  ##-1.site-speed-up.top  24x7-server-support.site
golden-catalog.pro / referral californianews.cf  www1.free-share-buttons.top 
cdn front.to pinkduck.ga fashionindeed.ml 
scanner-[name].top  homemade.gq eyes-on-you.ga
compliance-checker.info cookielawblog.wordpress.com  gq-catalog.gq 
familyholiday.ml / referral compliance-***.top  executehosting.com 
bugof.gq www1.cookie-law.xyz fashionindeed.ml 
wowas31.ucoz.ru  expdom.com nyfinance.ml
globalscam.ga  spin2016.cf popup-jdh.xyz
turkeyreport.tk  alert-jdh.xyz  free-share-buttons.blogspot.com / keyword
biketank.ga ogodnyyeavarii.gq law-***.xyz
[NUMBER].social-s-***.xyz asacopaco.tk forum.topic###.ilovevitaly.xyz
kiwi237au.tk doyouknowtheword-flummox.ml law-enforcement-check-***.xyz
luxmagazine.cf bestofferswalkmydogouteveryday.gq free-social-buttons-***.xyz
exchangeit.gq ranking2017.ga slow-website.xyz
pokemongooo.ml  bestchoice.cf  site-auditor.online
eu cookie law eu-cookie-law.info / keyword itrevolution.cf law-enforcement-bot-**.xyz
eu-cookie-law.info / keyword priceg.com fix-website-errors.com
share buttons sharebutton.org / keyword share buttons www.get-free-social-traffic.com / keyword social-buttons-**.xyz
www.get-free-social-traffic.com / keyword eu-cookie-law.blogspot.com / keyword theguardlan.com
снятьдомвсевастополе.рф law-enforcement-**.xyz torture.ml
free-share-buttons-***.xyz cookie-law-enforcement-**.xyz trafficmonetize.org
smailik.org ghostvisitor.com trafficmonetizer.org
sanjosestartups.com magicdiet.gq ranksonic.info
s.click.aliexpress.com/e/ay3rfmzfi burn-fat.ga resellerclub.com
see-your-website-here.com monetizationking.net resellerclub scam
sharebutton.net popads.net resellerclub
sharebutton.to eu-cookie-law-enforcement-#.xyz ranksonic.org
shopping.ilovevitaly.com ownshop.cf cenoval.ru
simple-share-buttons.com getlamborghini.ga makeprogress.ga
dominateforex.ml cenokos.ru m-google.xyz / keyword
easycommerce.cf dailyrank.net facebook-mobile.xyz
topquality.cf darodar.com fuck-paid-share-buttons.xyz / keyword
unpredictable.ga econom.co socialbuttons.xyz
increasewwwtraffic.info egovaleo.it socialbutton.xyz
social-traffic-#.xyz erot.co social-button.xyz
smartphonediscount.info descargar-musica-gratis.net social-buttons.xyz
free-social-buttons#.xyz domination.ml website-stealer-warning-alert.hdmoviecams.com
free-video-tool.com ranksonic.com getrichquick.ml
share-button.xyz adviceforum.info lombia.co
addons.mozilla.org (ilovevitaly.com) яндех-херня.рф lombia.com
0n-line.tv adtiger.tk lumb.co
100dollars-seo.com wordpresscore.com make-money-online.7makemoneyonline.com
12masterov.com http://feedback.sharemyfile.ru/ iskalko.ru
4webmasters.org rank-checker.online kabbalah-red-bracelets.com
forum.darodar.com domain-tracker.com o-o-6-o-o.com
getrichquickly.info dktr.ru o-o-6-o-o.ru
bestwebsitesawards.com http://go.ekatalog.xyz/ o-o-8-o-o.com
free-traffic.xyz ilikevitaly.com o-o-8-o-o.ru
lomb.co net-profits.xyz onlinetvseries.me
web-revenue.xyz meendo-free-traffic.ga traffic2cash.xyz
free-social-buttons.xyz event-tracking.com w3javascript.com
social-widget.xyz get-free-traffic-now.com hdmoviecamera.net
traffic-cash.xyz guardlink.org pops.foundation
share-buttons.xyz floating-share-buttons.com nufaq.com
с.новым.годом.рф forum.smailik.org googlemare.com
snip.tw forum20.smailik.org top1-seo-service.com
trafficgenius.xyz forum69.info boost-my-site.com
website-analyzer.info free-social-buttons.com fast-wordpress-start.com
topseoservices.co ilovevitaly.co www.event-tracking.com
ilovevitaly.com santasgift.ml site#.free-share-buttons.com
iloveitaly.ru quit-smoking.ga googlsucks.com
ilovevitaly.info rusexy.xyz www.Get-Free-Traffic-Now.com
iloveitaly.ro traffic2cash.org buy-cheap-online.info
ilovevitaly.org cyber-monday.ga free-share-buttons.com
ilovevitaly.ru lsex.xyz social-buttons.com
humanorightswatch.org black-friday.ga best-seo-offer.com
hulfingtonpost.com /?from=http://adf.ly/1SDmxr site##.social-buttons.com
howtostopreferralspam.eu hosting-tracker.com alfa9.com
iedit.ilovevitaly.com get-your-social-buttons.info rednise.com
snip.to build-a-better-business.2your.site  seo-platform.com
traffic2cash.net build-audience.for-your.website  qualitymarketzone.com
ranksonic.net cash4traffic.xyz hongfanji.com
work-from-home-earn-money-online.com copyrightclaims.org free-floating-buttons.com
claim381811.copyrightclaims.org feedback.sharemyfile.ru sexyali.com
how-to-earn-quick-money.com forum.topic.6hopping.com video--production.com
traffictomoney.com free-share-buttons.xyz get-free-social-traffic.com
dbutton.net go.ekatalog.xyz chinese-amezon.com
alibestsale.com happy.new.yeartwit.com traffic2money.com
best-seo-software.xyz how.to.travel...ilovevitaly.com vitaly rules google / keyword
http://link.web-list.xyz e-buyeasy.com websites-reviews.com
maps.ilovevitaly.com yourserverisdown.com why.does.spacebarnot work? / keyword
marketland.ml www*.free-social-buttons.com wpsecuritycheck.co.uk
naturehelps.me webmaster-traffic.com wpthemedetector.co.uk
new-look.for-your.website  webmonetizer.net китай.с.новым.годом.рф
onlinetvseries.me kambasoft.com site#.floating-share-buttons.com
satellite.maps.ilovevitaly.com / keyword непереводимая.рф teedle.co
smarter-content.for-your.website     
List of URL's from legit sites showing as Page Title Spam
https://www.blackhatworld.com/seo/this-guy-not-one-f-ck-given.897424/
http://motherboard.vice.com/read/this-pro-trump-russian-is-spamming-google-analytics
 
http://motherboard.vice.com/read/spammer-now-spamming-google-analytics-with-motherboard-article-on-spam
https://twitter.com/3mapsVitaly
https://www.washingtonpost.com/politics/the-electoral-college-is-poised-to-pick-trump-despite-push-to-dump-him/2016/12/19/75265c16-c58f-11e6-85b5-76616a33048d_story.html?utm_term=.d80e546a6b26
www.freenom.com and http://www.dot.tk/

Some of the spammers use different prefix or termination on the URL, with one the following:

  • -1-2-3-4-5-6-7-8-9
  • -aa-bb-cc-dd-ee-ff-gg-hh-ii-jj
  • -aaa-bbb-ccc-ddd-eee-fff-ggg-hhh-iii
  • site1site2site3site4site5site6site7site8site9
  • www1www2www3www4www5www6www7www8www9

It is also common to see your GA code number attached to the URL of the spammer.

Do you have any other question?

I tried to cover all the most common questions I get about this issue. If you have any other question or something is not clear leave a comment with your question.

Excellent resources that helped build this guide.

Thanks to Ben from Viget and Nick from cucumber.co for the help in building this article

Be the first to comment :)