Author

Analytics Expert. Passionate about SEO and User Experience or what he calls UX-SEO

Follow me on

Language Spam Google Analytics

Language spam is the latest and nastiest form of spam in Analytics. As we implement solutions to prevent this threat, the spammers keep finding new ways of getting into our analytics. 

This time the spammers used the language HTTP header and legitimate sites like Reddit, Twitter, motherboard.vice.com or TNW.

Learn how to prevent this without risking your real data.

Getting Rid of Google Analytics Language Spam

If you followed my main Analytics Spam guide you should have prevented most of the language spam. However, in some cases, some hits may get through, so I created the following expression to help you get rid of the rest

\s[^\s]*\s|.{15,}|\.|,

It may look a bit old but it is quite simple, this expression will exclude any language that doesn't have a proper format, for example, if it contains spaces or invalid characters. I will show you how to use it in a filter.

How to Block Language Spam with a Simple Filter

To stop language spam, you will need to create an exclude filter using the filter type language setting.

  1. Go to your Google Analytics and select the Admin tab
  2. In the last column "VIEW," select option Filters  and then click + Add Filter
    Add filter button Google Anlaytics
  3. In the name box enter "Language Spam."
  4. Filter Type > Custom > Exclude
  5. Filter Field > Language settings
    How to filter language spam in Google Analytics
  6. Filter Pattern > Copy and Paste the following expression as it is. (don't leave any space)

    \s[^\s]*\s|.{15,}|\.|,

    Click on the blue text that says Verify this filter. You will see a preview table of how this filter will work, you should only see language spam on the left side of the table:

    Note: Don't worry if you get a message "This filter won't change your data..." if you followed the instructions correctly the filter will work. This message just means that a match wasn't found on the sample data taken by this feature
  7. After verifying the filter click Save.
How to Prevent of All Fake Languages and Any Other Form of Spam in Google AnalyitcsClick To Tweet

Remember filters only work onwards, previous data can't be permanently deleted. But you can use an advanced segment to get a clean view of your historical data.

How to Remove Language Spam from Historical Data

To save you some time I created the segment for you.

  1. To add the segment to your user open the following link:
    https://analytics.google.com/analytics/web/template?uid=7nmXtBdIS0mtEHH3ByOzUQ
  2. On the segment window, select the option Any view and click on Create. This will automatically add the segment to your list.
    How to remove language spam from Google Analytics
  3. Now go to the reporting section and click the box that says All Users at the of the graph.
  4. From the list of segments uncheck All Users and check 0. All users - No Language Spam

Now while this segment is selected all language spam will be gone from your past data!

How to Easily Get Rid of All Fake Languages and Any Other Form of Spam in Google AnalyitcsClick To Tweet

What to do next?

The above solutions will take care only of the language spam. However, there are other types of spam like ghosts and crawlers. So you still have some work to do if you want to keep the spam away from your Analytics.

The following guide will help you prevent most of the spam without needing to update filters every time. Plus it will help you create a Segment that will remove ALL spam, not only language spam, from your historical data. 

Author

Analytics Expert. Passionate about SEO and User Experience or what he calls UX-SEO

Follow me on
Be the first to comment :)