Tracking The Trackers: A Large-Scale Analysis of Embedded Web Trackers


We perform a large-scale analysis of third-party trackers on the World Wide Web. We extract third-party embeddings from more than 3.5 billion web pages of the CommonCrawl 2012 corpus, and aggregate those to a dataset representing more than 41 million domains. With that, we study global online tracking on two levels: (1) On a global level, we give a precise figure for the extent of tracking, and analyse which trackers (and subsequently, which companies) are used by how many websites. (2) On a country-specific level, we analyse which trackers are used by websites in different countries, and identify the countries in which websites choose significantly different trackers than in the rest of the world. We find that trackers are widespread (as expected), and that very few trackers dominate the web (Google, Facebook and Twitter), except for a few countries such as China and Russia.

AAAI International Conference on Web and Social Media (ICWSM)