Periodic Reporting for period 1 - DisAI (Improving scientific excellence and creativity in combating disinformation with artificial intelligence and language technologies)
Okres sprawozdawczy: 2022-12-01 do 2025-11-30
Disinformation disseminated through social media and digital platforms continues to pose a significant societal challenge, contributing to polarisation and undermining democratic processes. Despite sustained efforts by the European Commission, Member States, civil society and the research community, the problem remains unresolved, with particularly strong effects in post-communist EU countries characterised by weaker institutions, lower media literacy and reduced societal resilience. In Slovakia, for example, 56% of surveyed citizens report belief in conspiracy theories or misinformation narratives, while only 30% trust news media most of the time. In this context, the project strengthened KInIT’s research capacity in AI-based approaches to disinformation analysis, in line with the Slovak Recovery and Resilience Plan and the Digital Transformation Strategy 2030.
Overall, the project objective was to enhance the scientific excellence of KInIT and the consortium partners in trustworthy AI and multimodal natural language processing and multilingual language technologies to combat disinformation.
Novel methods, primarily focused on claim matching, were developed, resulting in 42 scientific publications targeting top-tier NLP venues, including ACL and EMNLP. KInIT strengthened its presence at leading scientific conferences, expanded its professional network and implemented a staff exchange programme. Improvements in scientific excellence were reflected in increased industry collaboration and knowledge transfer, growth in the average h-index of participating researchers, and extensive international engagement, with over 85% of researchers participating in international mobility. Beyond the planned networking activities, the LowResNLP workshop was organised, further strengthening the scientific network and advancing one of the project’s core themes: NLP for low-resource languages.
From a research management and administration perspective, an institutional assessment of KInIT was conducted, followed by the development and implementation of an improvement plan. As a result, the research management and administration unit was upgraded, with a significantly increased share of trained administrative staff. A research support network with partner organisations was established, two workshops on research management skills were organised, and 22 research project and grant proposals were submitted. Overall, KInIT substantially expanded its network, engaging with more than 70 industrial and over 100 research partners.
To maximise visibility and impact, a dissemination and communication strategy was developed and successfully implemented, with most performance indicators meeting or exceeding planned targets. Multiple communication channels were used to reach diverse target groups. The sustainability of project results is further ensured through follow-up initiatives building directly on DisAI’s outcomes, including the DisAI-AMPLIFIED project (2024–2026) funded under Slovakia’s Recovery and Resilience Plan, and the lorAI: Low Resource Artificial Intelligence project (2025–2031), supported by the Horizon WIDERA Teaming for Excellence programme.
– MultiClaim, a multilingual dataset comprising over 200k fact-checked claims and 28k social media posts, was created and published together with a scientific paper. The dataset enabled subsequent research within the project and beyond, including by third-party researchers.
– Extensive evaluations using MultiClaim under the multilingual CBFA framework demonstrated how cross-lingual retrieval can be reliably performed and compared the effectiveness of different system configurations, addressing the three original research questions defined in the proposal.
– Additional findings show that state-of-the-art generative LLMs can be effectively integrated into multilingual claim-matching pipelines, both as re-rankers and relevance classifiers, and that auxiliary components such as OCR engines based on multilingual LLMs can substantially improve performance in cross-lingual scenarios, addressing an additional research question introduced during the project.
– A version of MultiClaim augmented with visual data demonstrated that multimodal information can be effectively leveraged in multilingual and cross-lingual claim matching, outperforming unimodal baselines. A novel architecture, FACTOR, was proposed for this purpose. The dataset was also used to study the role of multimodal data in facilitating cross-lingual knowledge transfer, further extending the state of the art. Results additionally show that generative vision–language models can be effectively used as re-rankers.
– An AutoXAI framework was introduced to support the selection of suitable explainability methods for specific model–dataset combinations, demonstrating that automated selection can identify XAI methods balancing technical fidelity and human comprehensibility.
– Systemic inequities in multilingual systems were analysed, showing substantial performance variation across languages, particularly for low-resource and non-Latin script languages. While reasoning-enabled reranking models partially mitigate these biases, they remain present.