Periodic Reporting for period 1 - FARE_AUDIT (FARE_AUDIT: Fake News Recommendations - an Auditing System of Differential Tracking and Search Engine Results)
Reporting period: 2022-12-01 to 2024-05-31
Thus, the main aim of FARE_AUDIT was to address this imbalance through the development of an unbiased tool to audit search engines, particularly around situations of conflict (political or military), when stakes are high and disinformation rampant. The rationale was to create a system of bots (web crawlers) and incrementally change their features, controlling for factors known to impact search engine results. The bots, made to resemble users from different countries and speaking different languages, visited different websites (including those known to share disinformation) to mimic human online behavior. Through their websurfing, they collected cookies and other “fingerprints”, becoming “profiled”. These profiled bots were then directed to different search engines and instructed to perform the exact same search. By comparing the search engine recommendations, it should be possible to “reverse engineer” the recommendation systems and better understand how browsing history influences those results, particularly the likelihood of being directed to disinformation.
More specifically, FARE_AUDIT’s main goals were to:
1. Develop and implement an unbiased bot-based audit tool;
2. Systematically identify how browsing history influences search-engine results using this system of “web crawlers” that mimics different user profiles;
3. Create and test an online interface that that allows NGO’s, journalists, and interested users to scrutinize search-engine platforms and understand how different profiles access information differently;
4. Extend this concept to novel tracking or search methodologies.
Overall, we expected this tool to have meaningful social impact at at least three different levels: by increasing our knowledge on search-engine personalization, by raising public awareness of the role(s) of search engines on polarization and disinformation spread, and by better equipping civil society organizations with a tool to detect and monitor different ongoing narratives, in close to real-time. Moreover, by relying on web crawlers, our tool is privacy-protecting and does not require any real user data, paving the way to other unbiased audits. In fact, our tool is currently being adapted to include Large Language Models (LLM)-based chatbots (ChatGPT, Gemini, Llama), particularly when integrated with traditional search engines.
The bots can be increasingly personalized from Step 0 (no browsing history, no cookies, English language, location set to a specific country), to Step 1 (no browsing history, no cookies, language matching location , and location set to a specific country), to a Step “N”, with bots set to specific languages and locations and having a “long history” of visiting specific content, including known disinformation websites (Figure 1). To audit the search-engines, these bots, associated with different user features, were deployed to simultaneously query a search engine, inputting identical queries and collecting the resulting page listings. This process was then repeated across several different queries for 4 search engines. As Figure 2 shows, not only the system worked, it also revealed consistent differences in search-engine recommendations even for the lowest levels of personalization (in the depicted case, asking questions related to the EU Parliamentary Elections of 2024, from different locations but using the English default language).
Regarding the online interface (goal 3), we intended to offer a tool that could be used independently by citizens and NGOs, according to their distinct needs and interests, allowing them to audit the systems and a) help bring awareness to personalization in search results, and b) do real-time tracking of misinformation. However, our work is showing that the observed differences in profiling are very dependent on the searcher’s location, which cannot be easily implemented in a public tool. Therefore, we are now splitting this interface in two: one with pre-trained “user-bots” and search patterns that can be used by the general public (still helping to raise awareness and “break the information bubble”, by showing how different people can be suggested very different search results), and a second for use specifically by NGOs and journalism/democracy related associations, which will be able to audit misinformation regarding select, ongoing situations. These interfaces were piloted during the European Researcher’s Night, in Lisbon, in 2023, with several participant pairs comparing search results.
The proposed web interface is being redesigned but we expect to fully implement its new versions. By allowing citizens to realize that the same queries can return such different results, this knowledge could be used to help “burst the information bubbles”. In parallel, this tool might be useful to journalists and democracy watchdog associations to track disinformation narratives and was piloted in collaboration with two NGOs. Our expectation is to freely and openly share these tools with interested colleagues and other relevant actors, to increase its reach and potential impacts.