Meeting the challenges of cyber disinformation
This is an AI transcription.
00:00:00:00 - 00:00:38:12
Abigail Acton
This is CORDIScovery. Hello and welcome to this episode of CORDIScovery with me, Abigail Acton. Remember when you could wander around the internet fairly confident that what you were looking at was probably reliable? Or am I showing my age? Over the years, it seems that information is becoming less and less trustworthy, and with deep fakes and biased algorithms, it's starting to feel like disinformation might be around every corner.
00:00:38:16 - 00:01:01:02
Abigail Acton
Identifying, tracking, and investigating online disinformation and other problematic content is an extremely complex challenge. It can lead to hate crimes and other violence, but many European police authorities do not have access to any specialized tools or technology to help them tackle the issue. So how can they be helped? As individuals how can we establish if we're being manipulated?
00:01:01:07 - 00:01:24:00
Abigail Acton
We are increasingly exposed to cyber disinformation, either passively through social media feeds, or actively by using search engines and specific websites that guide us to sites that reinforce our biases and build walls of prejudice. Companies are making some effort to identify and remove fake news websites and minimize the spread of disinformation on social media. But what about the search engines themselves?
00:01:24:05 - 00:01:48:14
Abigail Acton
Could web crawlers provide an innovative way to help us audit their activity? The spread of cyber disinformation threatens our democratic values. As the amount of disinformation grows AI language technologies in particular have a crucial role in detecting it. Machine learning and AI train on large language models. But what about the languages that have a smaller footprint online? Those that are used less frequently?
00:01:48:16 - 00:02:14:11
Abigail Acton
How can we strengthen AI to combat disinformation in what are called low resource languages? Here to help us navigate the maze of three researchers who have all received support from EU science funding. Owen Conlan is a fellow of Trinity College Dublin and professor in the School of Computer Science and Statistics. He is also co-director of the Trinity Center for Digital Humanities.
00:02:14:13 - 00:02:20:13
Abigail Acton
Owen is very interested in user control over personalized AI driven systems. Hello Owen and welcome!
00:02:20:15 - 00:02:21:24
Owen Conlan
Hello Abigail. It's lovely to be here.
00:02:22:02 - 00:02:42:03
Abigail Acton
Lovely to have you, Joana Gonçalves-Sá is a researcher both at the Nova Lab for Computer Science and Informatics and in the Laboratory of Particle Physics in Lisbon, where she leads the social physics and Complexity research group. Her focus is on human and algorithmic bias, using fake news as model systems. Welcome, Joana.
00:02:42:09 - 00:02:43:19
Joana Gonçalves-Sá
Thank you so much for having me.
00:02:43:21 - 00:02:59:08
Abigail Acton
Marián Šimko is an expert researcher at the Kempelen Institute of Intelligent Technologies in Slovakia. Marián focuses on natural language processing, information extraction, low resource language processing and the interpretability of neural models. Welcome, Marián.
00:02:59:10 - 00:03:00:18
Marián Šimko
Hi. It's great to be here.
00:03:00:22 - 00:03:18:18
Abigail Acton
Nice to have you. Owen I'm going to turn to you first. The VIGILANT Project is developing an integrated platform, advanced disinformation identification and analysis tools and technologies, employing state of the art AI methods. So I know your focus is on human centric AI Owen but can you explain what you mean by that term?
00:03:18:23 - 00:03:41:05
Owen Conlan
Yeah, certainly. Human centric AI is trying to understand the symbiotic interaction between humans and AI systems, and really to allow you, humans and users to appreciate how an AI agent works on their behalf. And that's quite a complicated thing, because you don't want to get into the weeds and the very fundamental details of convolution algorithms and so on.
00:03:41:05 - 00:04:00:24
Owen Conlan
That doesn't serve anybody. But the flip side is you don't just want an AI agent to say, here's your answer. And a very black box solution. So where's the middle ground? So when we use the phrase human centric AI, we're trying to find a personalized way for a user to be shown how an AI agent is functioning on their behalf.
00:04:01:05 - 00:04:23:16
Owen Conlan
Illustrating which key pieces of information are being used to generate the responses and illustrates that. And for us, when we generate responses, the main thing is we're trying to highlight really problematic content that the user needs to attend to understand maybe the context of an investigation, or to maybe understand that this is potentially disinformation. So how do we make sense that for the user?
00:04:23:19 - 00:04:36:05
Abigail Acton
Okay. That's clearer. Thanks. Owen I know that you worked on a project called PROVENANCE in the past. And that focused on helping people to navigate these complexities. Can you first just touch on that before we move on to VIGILANT?
00:04:36:07 - 00:05:15:21
Owen Conlan
Absolutely. PROVELANCE was aimed at ordinary users like you and I, as we're navigating content on the web, particularly in social feeds, to try and illustrate where certain items in that social feed may have problematic signals within them. So what do I mean by problematic signals? I mean, maybe very emotive language being used in the presentation of something that is claiming to be news or something that is using terminology that we know can be really quite challenging and has maybe referring to different members of groups of society, or it's using media assets that we have, we know have occurred in other contexts.
00:05:15:23 - 00:05:44:13
Owen Conlan
Another seems to be used in a, in a very different context. So we're very careful in that situation, not claim that something is fake news. We are trying to illustrate that there are problems potentially in how this is being represented. And then to some degree leave that to the user. Now that only works in hand-in-hand, you offer the user with the ability to scaffold and learn about disinformation, why it's deployed, what kinds of tropes and mechanisms are used to trigger us.
00:05:44:15 - 00:06:22:04
Owen Conlan
And mostly there are very strong emotional triggers that that disinformation is trying to respond to. And get us to kind of just shift our beliefs ever so slightly, whether it's towards vaccines or certain political perspectives, and all of that. So the PROVENANCE looked at this deeply. And one of the key things we saw was the role that our personality traits really play in our susceptibility to disinformation, and then also the role that those traits play in how we might have interventions to allow a user to interact with that information in a controlled and understood way.
00:06:22:04 - 00:06:44:02
Owen Conlan
So they're not just seeing a thing, believing a thing, they're seeing the thing they're starting to question, is this real? However, one challenge in all of this is to ensure that people can still learn that there is content online you can trust. Because if what we do causes people to question everything fundamentally and not believe anything, then everything is a conspiracy and nobody can trust any content online.
00:06:44:02 - 00:06:46:00
Owen Conlan
So there's a fine balance to maintain here.
00:06:46:05 - 00:07:02:13
Abigail Acton
Yes, that's very true. Yes, indeed. Yeah, I can see the logic of that. Super. Okay. Well if we moved onto VIGILANT then so VIGILANT I know was was concerned perhaps with more with providing tools for police authorities which we might call PAs going forward. Can you tell us more about the kind of tools that the project is creating?
00:07:02:15 - 00:07:32:12
Owen Conlan
Certainly. One of the challenges PAs have is to first off, like all of us, there's so much content online to understand what content might be problematic from a criminal perspective. Now we have to be very careful here. This is not mass surveillance. This is when a member of the public or through some other investigatory path, an area of content is being now focused on because there's a suspicion that there's criminal activity there.
00:07:32:14 - 00:07:59:07
Owen Conlan
Kind of criminal activity we're talking about are often in extremist activities, people trying to stoke members of society to act in criminal and anti-social ways. And this occurs in platforms that all of us use, maybe benignly, but some people don't use them in such benign ways. So, a good example for us would be to illustrate, maybe a group claiming something falsely.
00:07:59:07 - 00:08:17:20
Owen Conlan
And this is where the disinformation part comes in. So there might, for example, this is just a hypothetical situation. They might claim that a mosque is to be built at a very sensitive area on La Ramblas. So depending on that claim and the conversation around us, that's either just a piece of disinformation, which is in itself not illegal.
00:08:17:22 - 00:08:42:01
Owen Conlan
But if people start to mobilize around it and plan a riot, then it becomes an illegal activity. So the kind of tools we offer our police officers are the tools that, when they point them to these areas of suspicion, that it can start to harvest and understand what kind of emotions are being expressed. And what's quite strange for us is to see that some of the emotions you don't expect, you think hate would be a key indicator of a problem, and certainly is.
00:08:42:03 - 00:09:11:23
Owen Conlan
But happiness can be too. Where you see members of that group being quite happy that they're about to organize, they're about to enact a plan. And so we're looking at logical entities, names, words and so on, and emotion and the confluence of those and been able to visually illustrate those. So at a glance, you could look at a telegram channel and understand where the hotspots here so that a police member can look at us and get draw their attention there.
00:09:12:00 - 00:09:30:13
Abigail Acton
Right. Excellent. That's super. You know, I get it. I mean, that would probably be a sense of thrill or excitement or anticipation, which could be red flags as well. Yeah. So how did the police authorities, somebody looking at a computer trying to to work out whether they need to deploy resources somewhere, how would they see this visually?
00:09:30:15 - 00:09:52:09
Owen Conlan
So they first off, they focus in on the chance so we use this idea of a knot. So this idea of untying a knot. So they import a number of different informational sources. Those sources are processed for a natural language, entity, emotion, a variety of different things, and they are graphed in timelines and they are graphed in intensities.
00:09:52:15 - 00:09:57:10
Owen Conlan
And that allows them to see essentially visual hotspots where those things co-occur.
00:09:57:12 - 00:10:04:15
Abigail Acton
Right. So you can see overlaps that are red flags. Yeah. And then a certain number of overlaps and you start thinking perhaps we need to be present.
00:10:04:17 - 00:10:06:21
Owen Conlan
Well you start to drill into the content in detail.
00:10:06:24 - 00:10:07:11
Abigail Acton
Right.
00:10:07:12 - 00:10:16:08
Owen Conlan
So it allows them to get to the crux of it and really understand is, is because we don't want an AI system to deploy police.
00:10:16:14 - 00:10:18:03
Abigail Acton
Yeah.
00:10:18:05 - 00:10:25:03
Owen Conlan
Well, we want a system like this to support the detailed investigation to then determine how resources may be best used.
00:10:25:03 - 00:10:32:07
Abigail Acton
So in fact, it's a way of winnowing through a huge amount of data to identify which actual sources need to be more closely examined.
00:10:32:07 - 00:10:32:22
Owen Conlan
Correct.
00:10:32:24 - 00:10:35:13
Abigail Acton
Yeah. So it's a huge, huge time saver.
00:10:35:19 - 00:10:56:15
Owen Conlan
It's a big time saver. And also to understand the way resources might be deployed. Yeah. Because different countries in in Europe have very different ethos that underpin how their police authorities function. So in Ireland, for example, our police force is called An Garda Síochána, which means the guardians of peace. And that really characterizes how they interact with society and so on.
00:10:56:17 - 00:11:18:16
Owen Conlan
And the same happens in an online context. So in Estonia they have online constables where they're, you know, it's walking the virtual beat. And sometimes that means knocking on a virtual door and saying this content is not illegal, but just know we're here, we're present. We're, you know, we're you know, we're here to make sure everything stays okay.
00:11:18:18 - 00:11:24:09
Owen Conlan
Yeah. So it's it's it's, you know, and a tool like VIGILANT has to sit within those very different contexts.
00:11:24:11 - 00:11:43:05
Abigail Acton
Yeah, indeed. Because the approach is very different from country to country. That's excellent. Thank you so much. You explain that really, really well, I know that you're also interested in providing tools and resources to policymakers. So how is the material that's being created by VIGILANT, the work being done by VIGILANT, feeding into into that do you think?
00:11:43:07 - 00:12:08:10
Owen Conlan
Yeah. So this is on one hand very similar tools when you look at them from a distance. But the way they're focused and the kind of commentary and uplift you do to help policymakers is quite different. So we're actively engaged in another European funded project called Athena and Athena is looking at Phoenix so that this, foreign information manipulation interference challenge and this is a little bit more like surveillance, actually.
00:12:08:12 - 00:12:33:22
Owen Conlan
So in a way that we should not surveil our populations in our countries, it is acceptable to look at these informational, if you can see me, I'm air quoting informational sites. So there are large disinformation farms producing a whole variety of often AI generated disinformation, and they're just trying to gain traction. And those appear as campaigns, campaigns where you might be trying to tear down a particular political personality, and you're using a number of different avenues.
00:12:33:22 - 00:12:54:24
Owen Conlan
And whichever gained traction, they then pump more of that information in. And what Athena does is in a similar way to VIGILANT is harvest information from sources like that. Athena is less discriminating because it's not focused on individuals and we have to be very careful with personal information here, but it's gathering information and it's producing similar kind of infographics and so on.
00:12:54:24 - 00:13:20:13
Owen Conlan
So to have a a data basis for informing policy decisions on policy actions and interventions, one of the challenges we often see is that Athena attacker campaign my crop up in a certain linguistic context. So, for example, it might occur in Greece. In Greek, it's pointing at challenges around maybe migration, and it's drawing false claims around that to stoke a population.
00:13:20:15 - 00:13:43:02
Owen Conlan
How the Greek authorities react to that in itself as a learning. So if you can couple what they did with this kind of emergent problem, is the platform like Athena and then transport to to Italy, when we start to see a similar problem in Italian language, different political context. But what can we learn from the Greek reaction in that case?
00:13:43:04 - 00:13:56:17
Owen Conlan
Because Europe is is a complicated mix of multiple different cultural, language, linguistic contexts, countries. But we can learn when we see these attacks, and we can try to transport that knowledge across national boundaries.
00:13:56:19 - 00:14:15:18
Abigail Acton
Perfect. Thank you very much indeed. Yeah. So we can learn from each other. Thank you very much indeed. Okay. I'm going to turn to Joana now. Joana, FARE_AUDIT came up with a way to audit search engines to see how browsing history influences search engines results and how that plays into the likelihood of being directed to sources of disinformation.
00:14:15:18 - 00:14:29:10
Abigail Acton
So there's a bit of an overlap here with the work being done by Owen. Joana, can you tell us a little bit about, the the notion of human behavior in relation to bias? So can you explain the significance of bias online please?
00:14:29:12 - 00:14:52:04
Joana Gonçalves-Sá
Yes. So we are talking about two different projects. In one of the projects, what we did were clear looking at was how is it that human bias or cognitive bias can promote the spread of disinformation? So every time that we encounter information online, we have to decide whether or not we believe it and whether or not we want to share it.
00:14:52:06 - 00:15:15:09
Joana Gonçalves-Sá
And like Owen was saying, we cannot just decide that we're not going to believe anything and become completely cynical or just believe everything and become gullible. So every time we have to make a call and what we think or our hypothesis is that this call is influenced by different cognitive biases. So let's say I've already encountered this in the past and I tend to believe this.
00:15:15:09 - 00:15:40:19
Joana Gonçalves-Sá
So then it comes a confirmation bias. And it's easier for me to believe information that I already believe in further information. And also, if my friends tend to believe this, then I'm probably subject to something called an in-group bias, in which I believe the more than I believe the experts. And what we do is that we use this information as a model system, like other researchers use mouse in the lab, to study cognitive biases.
00:15:40:19 - 00:16:03:04
Joana Gonçalves-Sá
And we try to see how is it that these different cognitive biases influences are spreading. Experience, because we can learn about people's preconceptions about the world and about the, the, the way that they decide to relate with society. So we kind of like use it the other way around. We're not exactly setting disinformation and using this information to study cognitive biases.
00:16:03:09 - 00:16:25:01
Abigail Acton
Which is great. That's excellent because the understanding of how biases work also show how people are using the information or approaching the information online. So echo chambers are fostered by algorithms used by search engines to push information a user's way. So must be very hard for disinformation to be identified by an individual. So can you tell us a little bit about what FARE_AUDIT did in this area?
00:16:25:07 - 00:16:50:08
Joana Gonçalves-Sá
Yes. So the idea is because it is so difficult to identify, there have been great efforts, especially for monitoring social networks, because we know that our feeds are being personalized and they can feed on these biases. It's as if they're the people who are planting disinformation deliberately. They know we are not perfectly rational agents, so they are feeding on these biases so that they can amplify their signal.
00:16:50:10 - 00:17:14:18
Joana Gonçalves-Sá
But there have been really important works on social media. But what we've been focusing on are two other media for spreading information that are a little bit more overlooked. In particular, search engines. The search engines are typically seen as neutral and is like even a gateway for truth. And, and people tend to really believe what they see if they search for it.
00:17:14:18 - 00:17:40:09
Joana Gonçalves-Sá
So they go on, let's say on Google or another search engine, and they search for something. And, and typically even the, especially the top ranking results are seen as the truth. So and of course it's not true. And of course also the search engines personalize what they show us based on many different things, including our location. And, well, other searches that we did in the past or even our browsing history.
00:17:40:11 - 00:17:58:00
Abigail Acton
So can I ask you then, what did FARE_AUDIT develop to try and help people to recognize, that they might be being pushed in a certain direction? I understand that you developed a tool that could perhaps be useful for journalists and democracy watchdogs, associations to track disinformation. Can you tell me a little bit more about that?
00:17:58:01 - 00:18:18:07
Joana Gonçalves-Sá
Exactly. So what we did was, because it's so difficult to audit, search engines are difficult to audit, and the algorithms are proprietary and black boxes. And we did not want to rely on real people's data because of privacy concerns and also bias in the samples. We developed a system of web crawlers, typically called bots. And these bots, what they do is they mimic human behavior.
00:18:18:07 - 00:18:38:12
Joana Gonçalves-Sá
So we have this little army of bots, and they browse online and they collect cookies and they pretend to be people. And we can have them pretend to be people of different locations and using different languages and even possibly genders or ages from their browsing history. And then they go to the search engines and they do exactly the same query at the exact same time.
00:18:38:14 - 00:19:02:03
Joana Gonçalves-Sá
And what they do is we compare what the different search engines show as results for these queries. And we've been running different studies using exactly this methodology. So let's say we have a, a study using the current Israel-Palestine conflict in which the bots can be located just across the border from each other, and they make the same queries.
00:19:02:05 - 00:19:25:22
Joana Gonçalves-Sá
And then we see not only if the results are very different, but if they have particular leanings or if they are particular biases. And in this in this particular case, I think it's interesting because location is usually seen as neutral in terms of profiling. If I search for a restaurant near me, I want the restaurant that it's near me and not a restaurant that is very, very, very far away.
00:19:25:24 - 00:19:48:10
Joana Gonçalves-Sá
But if we are talking about geopolitical conflict, showing different information on the same query to people they live in different countries can be extremely problematic with these two audits: one for the European parliamentary elections in which bots placed in different European countries ask things like should I vote for? Or what's the best party?
00:19:48:12 - 00:20:10:11
Joana Gonçalves-Sá
And also for the US presidential elections. The last ones in 2024. And then we can see how how easy that is general and sometimes neutral queries offer very, very different results. And in this case especially for instance, let's say in the case of the European Parliament, very biased results towards.
00:20:10:11 - 00:20:13:24
Abigail Acton
Oh, so you actually did see some bias coming through. Oh yes.
00:20:14:01 - 00:20:22:14
Joana Gonçalves-Sá
And very much so. And I could probably have you guess on which direction of the left right. They were biased.
00:20:22:20 - 00:20:29:10
Abigail Acton
Yeah. Okay. How interesting. Slightly scary though as well. Did you find your results sort of I presume you found it slightly disturbing.
00:20:29:15 - 00:20:55:06
Joana Gonçalves-Sá
Yes. It's scary. And, even in reality, we found biases that are quite clear. So it's not very common that many of these parties are mentioned, but every time they are mentioned, they belong to a specific family. And but the fact that these search engines are so widely used and so many millions of people turn to them every day, even if the biases were small and rare, it would still be worrisome because the signal can be very much amplified.
00:20:55:08 - 00:21:24:09
Joana Gonçalves-Sá
But also the fact that we have this tool now that can be used by researchers. But, we're trying to also adapted for being used by journalists and the general public, but we are also informing on the EU, as with a pilot under the Digital Services Act to audit a very large online, search engines and to try to identify biases, because it's also possible that the search engines are actually being manipulated by political agents.
00:21:24:09 - 00:21:33:21
Joana Gonçalves-Sá
It's not. And they are using the search engine optimization or other systems to amplify their signal without even the search engines realizing that this is happening.
00:21:33:21 - 00:21:44:23
Abigail Acton
All right. So sort of sneaking through some sort of back door. Gosh, yes. That's really that's an eye opener. Thank you. Does anyone have any questions or observations to make to Joana's fabulous work? Yes. Owen what would you like to say?
00:21:45:00 - 00:21:47:08
Owen Conlan
Yeah, this is really fantastic work Joana.
00:21:47:08 - 00:21:48:14
Abigail Acton
Isn't it?
00:21:48:16 - 00:22:12:05
Owen Conlan
Have you considered deploying the bots to look at the AI responses we're starting to see in searches now, so I know Google search, for example, has an AI overview on many responses. I think the opportunity for bias and steering people could be very significant. It will be intriguing to see how it tunes the results based on location or its understanding of the user.
00:22:12:07 - 00:22:13:19
Owen Conlan
Is this something you're considering?
00:22:14:00 - 00:22:36:23
Joana Gonçalves-Sá
Yes, and thank you very much for the question. So we actually when this audit also until large language models, based chatGPT and copilot and asked similar questions. Who should I vote for, what are the best parties? And we even introduced gender, the gender component such as as a woman who should I vote for or as a man who should I vote for?
00:22:36:23 - 00:23:06:04
Joana Gonçalves-Sá
And the results are also very biased in the same direction, and they become even more biased if you introduce gender. Right? So these ones, we went directly to the platforms. But now of course, with the integration that search engines are the wing of this, tools, we, we can also audit them directly on both the search results, the awareness that they provide and on their AI answer, and see if the biases remain or if they are amplified, or if not, and if they're gone, if they're fixed.
00:23:06:06 - 00:23:31:05
Abigail Acton
Super. Thank you very much. Gosh it's very comprehensive. It sounds like a bit of a, it's almost like a race, isn't it, to try and keep up and to keep coming up with innovative ways of identifying manipulation. As fast as ways of manipulation are up here, that's very good. Thank you very much. I'm going to move to Marián now. Marián the aim of the DisAI project based in Slovakia, was to develop trustworthy AI technologies and tools for low resource languages.
00:23:31:05 - 00:23:45:05
Abigail Acton
We've just been talking about linguistic models and so on. So here we're interested in combating the growing threat of online disinformation in perhaps languages that are that have less of a footprint. Can you tell us why you got involved in this in the first place? Marián.
00:23:45:07 - 00:24:15:08
Marián Šimko
Yeah. As a as a researcher, I'm amazed how technology can help us in, in the plethora of tasks we do on a daily basis. Most of them are related to various forms of communication. And that's the nature of for us humans and, methods and techniques from the field, called, natural language processing aimed to support us in, such, everyday routines when searching, understanding, creating or transforming some information.
00:24:15:10 - 00:24:39:18
Marián Šimko
We have applications of this technology in our pockets today for, for example, when filtering mail spam being recommended daily news or having summarized reviews of products or services we want to buy and utilizing language technology for this information I think is important. And particularly motivating, as we are doing it for, for, for the social good.
00:24:39:20 - 00:25:04:11
Marián Šimko
And what is interesting here is that, the phenomena of this information is not new for us. In fact, it is as old as humankind. But what make it special is the power of technology that we have today, and it significantly amplifies that impact. I mean, like the amount of information, instant access to information, speed of spreading and the rise of generative AI and the decreasing cost.
00:25:04:13 - 00:25:06:17
Marián Šimko
So, yeah, that that's an issue.
00:25:06:17 - 00:25:26:14
Abigail Acton
Right. Absolutely. And we've been talking so far about the, the ability of AI to understand language is being used in order to, you know, push information at us. DisAI is doing work to open that up to languages that are less often used. So can you tell us a little bit more about the work of your project, trying to, as I say, open it up to less used languages.
00:25:26:18 - 00:25:53:10
Marián Šimko
Yeah. First of all, the quality or performance of of, recent natural language processing applications, which are widely utilizing deep neural networks nowadays is based on quantity of data used for training, for instance, large language models, our trade, the, in a way that, they read weighs the amounts of text from the internet and learn to predict the next word in a sentence.
00:25:53:10 - 00:26:13:13
Marián Šimko
For example, if we have skies we want to guess, blue. The sky is blue. And, the, such models learn patterns, grammar, facts, just by trying to predict the next word over and over. And this is the foundation for many state of the art approaches. And basically it fills any task, right natural language processing that we have at hand.
00:26:13:15 - 00:26:42:07
Marián Šimko
And this includes the the task that we are dealing with in the DisAI project they are related to to this information combating, the issue is when, less often, languages, are used. And for, for such languages, there isn't enough content to provide a strong foundation for this model. So as a result, these models may struggle to, to understand, or generate a coherent text in, in those languages.
00:26:42:09 - 00:27:00:22
Marián Šimko
So most of the data is in English, Chinese, Spanish, etc.. These models are far better. And this leads to unequal performance and speakers of, low resource languages get poorer quality responses. So it's it's less and less accurate, less useful.
00:27:00:22 - 00:27:05:09
Abigail Acton
Right. And so what did the DisAI project do to try and ameliorate the situation?
00:27:05:09 - 00:27:35:00
Marián Šimko
Yeah. As a result, we focus on, on fact checkers and as an important role or actors and this, in this endeavor and as well as basic users with basic natural language processing technology. Also, they have more difficult jobs when dealing with, with, other languages than English, let's say. And, in DisAI project, we focus on developing new approaches for language processing that can help with that.
00:27:35:00 - 00:27:58:07
Marián Šimko
So that can improve performance in low resource languages. And, yeah, our motivation it's natural our native language is slovak and it is a vivid example of, of, low resource language. And spreading this information is still a serious issue in Slovakia. And, similar with other East European countries the democracy here is, let's say, more, more fragile.
00:27:58:07 - 00:28:07:04
Marián Šimko
So it is important to improve those methods and to help, make, work of, of, fact checkers easier.
00:28:07:04 - 00:28:23:22
Abigail Acton
Absolutely. It's vital work you're doing. I totally, I can see that. Can I ask you what the project actually developed in terms of of techniques to try and meet this challenge? I can see very clearly what the goal is and the motivation. But what did you actually do or what is still ongoing?
00:28:24:01 - 00:28:46:21
Marián Šimko
Yeah. So. Exactly. So in DisAI, my team, we are trying to develop methods, techniques and tools that can ease the work of fact checkers. And, in their work, there are different tasks that can be supported by language technologies. And we particularly focus on the so-called fact check claim retrieval task. This is one of the 4 or 5, like most important tasks in the pipeline.
00:28:46:23 - 00:29:19:02
Marián Šimko
And simply put, when a fact checker comes, upon the new claim, for instance, like, vaccination changes human DNA, they want to know whether it was already fact checked by somebody else before, or at least it can help them significantly, because the creation of fact checks is, is quite, demanding. And, it's good if we can, check all the languages and see whether it was already fact checked by somebody else, I mean also in Portuguese or Indonesian language.
00:29:19:02 - 00:29:25:05
Marián Šimko
And this can, this can significantly reduce efforts they put into the creation of fact checks.
00:29:25:07 - 00:29:33:12
Abigail Acton
Okay. That's a nice example. Thank you very much for that. Super. Okay. Well explained Marián, thanks. So anyone got any observations or comments to make to Marián? Yes. Owen, please.
00:29:33:14 - 00:29:55:11
Owen Conlan
Marián, this is a essential work. We see this all of the time ias well in our attempts to try and combat disinformation. The lack of data sets in different languages contexts, the fact that for us, of course, the for the police, they're trying to combat disinformation that may not be in the language that they speak themselves and can often be in a language that's not well represented in data sets.
00:29:55:17 - 00:30:15:24
Owen Conlan
One area recently we saw a particular challenge was finding data sets for hate speech in German. And you know that that sounds like something that you should be able to discover relatively easily, but it really depends on where the focus is. And, and researchers in those linguistic contexts as to whether they exist. So we try to translate hate speech from English to German.
00:30:15:24 - 00:30:36:21
Owen Conlan
And as you can understand, that didn't work very well. Simple things like portmanteau. So phrases like Kill-ary Clinton that makes sense to us as a hateful term in an English context, and that is a hateful term, but you can't translate that. So there's so many contextual cultural tied and aspects. So that's why this, this work is essential.
00:30:36:23 - 00:30:57:13
Abigail Acton
Well, the work of all of you is absolutely essential. And I've just enjoyed so much listening to you and hearing what you've been up to and what you've achieved so far. And of course, as I said earlier, it's, it's ongoing, a race against, the disinformation techniques that are, pushed our way. So thanks very much for for that and for doing what you're doing.
00:30:57:14 - 00:31:02:14
Abigail Acton
To try and clarify a little bit our online lives. Thank you very much.
00:31:02:16 - 00:31:03:16
Joana Gonçalves-Sá
Thank you for having us.
00:31:03:17 - 00:31:04:09
Marián Šimko
Thank you.
00:31:04:13 - 00:31:36:15
Abigail Acton
Been a pleasure. You're very welcome. Bye bye. Take care. If you've enjoyed this podcast, follow us on Spotify and Apple Podcasts and check out the podcast homepage on the CORDIS website. Subscribe to make sure the hottest research and EU funded science isn't passing you by. And if you're enjoying listening, why not spread the word? We've talked about how our gut influences our brains, the latest technology that is helping scene of crimes investigate us in cases of rape, and how to land a probe on an asteroid.
00:31:36:17 - 00:32:04:19
Abigail Acton
In our last 47 episodes, there'll be something there to tweak your curiosity. Perhaps you want to know what other EU funded projects are doing to turn back the tide of digital disinformation? The CORDIS website will give you an insight into the results of projects funded by Horizon 2020 and Horizon Europe that are working in this area. The website has articles and interviews that explore the results of research being conducted in a very broad range of domains and subjects, from dodos to neutrinos.
00:32:04:23 - 00:32:29:10
Abigail Acton
There's something there for you. Maybe you're involved in a project or would like to apply for funding. Take a look at what others are doing in your domain. So come and check out the research that's revealing what makes our world tick. We're always happy to hear from you. Drop us a line editorial@cordis.europa.eu. Until next time.
Latest moves in the complex race to identify and counter cyber disinformation
Content can lead to hate crimes and other violence, but many European Police Authorities do not have access to any specialised tools or technologies to help them tackle the issue – how can they be helped? As individuals, how can we establish if we are being manipulated? We are increasingly exposed to cyber (dis)information, either passively, through social media feeds, or actively, by using search engines and specific websites that guide us to sites that re-enforce our biases and build walls of prejudice. Companies are making some effort to identify and remove fake-news websites, and minimise the spread of disinformation on social media, but what about the search engines themselves? Could web crawlers provide an innovative way to help us audit their activity? The spread of cyber disinformation threatens our democratic values. As the amount of disinformation grows, AI, and language technologies in particular, have a crucial role in detecting it. Machine learning and AI train on large language models, but what about languages that have a smaller footprint online – those that are used less frequently? How can we strengthen AI to combat disinformation in what are called ‘low resource’ languages? Listen on to hear how these and other cyber risks are being tackled with the help of EU research funding. Owen Conlan(opens in new window), is a fellow of Trinity College(opens in new window), Dublin, and professor in the School of Computer Science and Statistics(opens in new window). He is also co-director of the Trinity Centre for Digital Humanities(opens in new window). Owen is very interested in user control over personalised AI-driven systems, which he explored through the VIGILANT project. Joana Gonçalves-Sá(opens in new window) is a researcher both at the Nova Laboratory for Computer Science and Informatics(opens in new window) and in the Laboratory of Instrumentation and Experimental Particle Physics(opens in new window), Lisbon, where she leads the Social Physics and Complexity research group. Her focus in on human and algorithmic biases, using fake news as a model system, the subject of her FARE_AUDIT project. Marián Šimko(opens in new window) is an expert researcher at the Kempelen Institute of Intelligent Technologies(opens in new window) in Slovakia. He focuses on natural language processing, information extraction, low-resource language processing and the interpretability of neural models. The DisAI project focused on developing new approaches for language processing to improve performance of large language learning models for less frequently used languages.
Happy to hear from you!
If you have any feedback, we’re always happy to hear from you! Send us any comments, questions or suggestions to: editorial@cordis.europa.eu.