Differential Tracking Across Topical Webpages of Indian News Media


Browser cookies are ubiquitous in the web ecosystem today. Although these cookies were initially introduced to preserve user-specific state in browsers, they have now been used for numerous other purposes, including user profiling and tracking across multiple websites. This paper sets out to understand and quantify the different uses for cookies, and in particular, the extent to which targeting and advertising, performance analytics and other uses which only serve the website and not the user add to overall cookie volumes. We start with 31 million cookies collected in Cookiepedia, which is currently the most comprehensive database of cookies on the Web. Cookiepedia provides a useful four-part categorisation of cookies into strictly necessary, performance, functionality and targeting/advertising cookies, as suggested by the UK International Chamber of Commerce. Unfortunately, we found that, Cookiepedia data can categorise less than 22% of the cookies used by Alexa Top20K websites and less than 15% of the cookies set in the browsers of a set of real users. These results point to an acute problem with the coverage of current cookie categorisation techniques. Consequently, we developed system, a novel machine learning-driven framework which can categorise a cookie into one of the aforementioned four categories with more than 94% F1 score and less than 1.5 ms latency. We demonstrate the utility of our framework by classifying cookies in the wild. Our investigation revealed that in Alexa Top20K websites necessary and functional cookies constitute only 13.05% and 9.52% of all cookies respectively. We also apply our framework to quantify the effectiveness of tracking countermeasures such as privacy legislation and ad blockers. Our results identify a way to significantly improve coverage of cookies classification today as well as identify new patterns in the usage of cookies in the wild.

Proceedings of the 13th ACM Conference on Web Science
Pushkal Agarwal
Pushkal Agarwal
PhD Student working with the UK Parliament on Digital Citizen Engagement

I am a PhD Scholar in Computer Science at King’s College London under the supervision of Dr. Nishanth Sastry. My doctoral work focuses on online digital citizen engagement with the UK Parliament. I completed my Bachelor of Engineering (Computer Science and Engineering) from The LNM Institute of Information Technology, India. My professional experiences include an internship at Telefonica I+D (Barcelona, Spain) and with the Neilsen Group. My honours include a “Best research impact” award at the PhD poster competition of King’s College London, NMS and Chairman’s Gold Medal for best all-round performance in graduating class of 2017 (BTech. CSE).