Global Under-Resourced MEedia Translation

Informazioni relative al progetto

GoURMET

ID dell’accordo di sovvenzione: 825299

DOI

10.3030/825299

Progetto chiuso

Data della firma CE 25 Ottobre 2018

Data di avvio 1 Gennaio 2019

Data di completamento 30 Giugno 2022

Finanziato da

INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT)

Costo totale

€ 2 906 098,75

Contributo UE

€ 2 906 098,75

2 906 098,75

Coordinato da

THE UNIVERSITY OF EDINBURGH
United Kingdom

Periodic Reporting for period 2 - GoURMET (Global Under-Resourced MEedia Translation)

Periodo di rendicontazione: 2020-07-01 al 2022-06-30

The BBC and DW are the flagships of European news broadcasting and renowned around the world for their
editorial independence and accurate news reporting. The BBC and DW are acutely aware of their
responsibility, and want to meet the demands of their global audiences with innovative technology.
Machine translation (MT) is key technology - allowing analysts and
journalists to quickly and effectively gather information across very diverse languages, and to
understand how these events are being perceived and reported on. It is also a powerful tool for
speeding up the dissemination of news reports in multiple languages.

Machine translation technology has delivered models for well-resourced languages which have
improved in quality enormously over the last few years. However, there are many more
language pairs where little or no translated text is available for training translation models and some
under-resourced languages have large populations which are important for commercial,
strategic or humanitarian reasons.

GoURMET was structured around five objectives all of which we have fully achieved
in the duration of the project:
- Advancing low-resource deep learning for natural language applications
- Development of high-quality machine translation for under-resourced language pairs and domains
- Development of tools for analysts and journalists
- Sustainable, maintainable platform and services
- Dissemination and communication of project results to stakeholders

The GoURMET project fully achieved its objectives. We describe the work done towards them in more detail.

Objective 1: Advancing low-resource deep learning for natural language applications
The core objective of GoURMET was the development of methods for low-resource deep learning
which is able to optimally learn from small amounts of training data.
Our success with regard to completing this objective has been significant. The strong research
pursued in this project is reflected in our scientific publications (at the time of writing, 74 publications
shared in an open-access manner via OpenAIRE). We have more work which is under review, or
published as pre-prints or theses. We have also released
42 different repositories related to research software that accompanies these publications.

Objective 2: Development of high-quality machine translation for under-resourced
language pairs and domains
We address this objective by pursuing research into improving the data collection pipeline.
We have released 19 data sets, both parallel data sets and monolingual data sets.
The most significant part of covering this objective was accomplished by following a 9 month-cycle
of building, delivering, and evaluating translation models for low-resource languages. We have
completed four rounds of translation model building, all into and out of English. We delivered the
following translation models: Gujarati, Bulgarian, Turkish, Swahili, Amharic, Kyrgyz, Serbian, Tamil,
Hausa, Igbo, Amharic, Macedonian, Turkish v2, Burmese, Yoruba and Urdu.
We developed one further language pair as part of our Surprise Language Challenge (Pashto-English).
We prioritised languages which are strategically important for the BBC and DW, and then from
a shortlist, the research partners selected languages which were interesting for their research and
have a variety of resources to work with. These models were released to the public.

Objective 3: Development of tools for analysts and journalists
This objective relates to the design and implementation of interfaces to translate, edit and evaluate
translations.
Our completion of this objective has been enabled through the development of the GoURMET
translation platform - there is also a demo front end, which is linked to from our website.
The main component of this objective was achieved by the development of six products and
prototypes developed and tested by the user partners:
BBC - Frank, Live Page Translation, Graphical Storytelling Tool
DW - PlainX, Selma, Benchmarking Prototype
All BBC prototypes remain active and Live Page Translation has been extensively used by
journalists reporting on the Russian invasion of Ukraine. Frank is being developed further to expand
compatibility with additional content management tools and workflows for World Service. In DW,
plainX is being rolled out as multilingual adaptation and subtitling platform and used in Media
Production, and GoURMET models have also been deployed in other services through SELMA.

Objective 4: Sustainable, maintainable platform and services
This objective concerns the development of a plan for sustainable exploitation and use of the plat-
forms, systems and technologies developed in GoURMET. We organised a number of events engaging
with the media industry to encourage interest in low-resource translation, and uptake of the resources we have provided.

Objective 5: Dissemination and communication of project results to stakeholders
We engaged with organisations from all along the innovation chain, including broadcasters,
commercial players, governmental/EU agencies in the single digital market and relevant research
communities. One of the ways we encouraged the field of machine translation to focus on low-resource languages
was developing test and training sets for WMT shared tasks: Gujarathi (WMT 2019), Tamil (WMT
2020) and Hausa (WMT 2021). These tasks attracted significant participants and these languages
were later included in the translation services of a number of large industrial labs.

GoURMET delivered three main use cases which have encouraged our media partners to adopt
innovative tools and workflows:
- Global content creation – managing content creation in several languages efficiently by
providing machine translations for correction by humans;
- Media monitoring – for low-resource language pairs — tools to address the challenge of
monitoring media in strategically important languages;
- Health Related Reporting – reliably translating and analysing news in the highly
specialised health domain.
As part of these use-cases, GoURMET translation models and prototypes and products have been
deployed and evaluated within the BBC and DW.

GoURMET has published many peer reviewed papers on improving low-resource machine translation.
We have led shared tasks on Gujarathi, Tamil and Hausa that have had wide participation from
academic and industry groups. Our research and our data has been adopted by the field.

As a project we have helped to push the field of low-resource machine translation forward.
The field has advanced very quickly from typically using small amounts of English-German data to test
low-resource research, to recent research which trains models with 10, 20 and recently 200 languages.
Very large, massively multilingual pre-trained language models are currently maturing as
a powerful way of leveraging monolingual and parallel data to help to improve low-resource
languages. We have been an integral part of this journey and
we explore these changes in depth in our survey paper (Haddow et al., 2022).
As a project we have also
completed a detailed study of the advantages of the massively multilingual models (Birch et al., 2021)
for our surprise language challenge, which provides a blueprint for future researchers interested in
applying them to new low-resource languages.

Gourmet Logo

Periodic Reporting for period 2 - GoURMET (Global Under-Resourced MEedia Translation)

Scarica Scarica il contenuto della pagina