The GoURMET project fully achieved its objectives. We describe the work done towards them in more detail.
Objective 1: Advancing low-resource deep learning for natural language applications
The core objective of GoURMET was the development of methods for low-resource deep learning
which is able to optimally learn from small amounts of training data.
Our success with regard to completing this objective has been significant. The strong research
pursued in this project is reflected in our scientific publications (at the time of writing, 74 publications
shared in an open-access manner via OpenAIRE). We have more work which is under review, or
published as pre-prints or theses. We have also released
42 different repositories related to research software that accompanies these publications.
Objective 2: Development of high-quality machine translation for under-resourced
language pairs and domains
We address this objective by pursuing research into improving the data collection pipeline.
We have released 19 data sets, both parallel data sets and monolingual data sets.
The most significant part of covering this objective was accomplished by following a 9 month-cycle
of building, delivering, and evaluating translation models for low-resource languages. We have
completed four rounds of translation model building, all into and out of English. We delivered the
following translation models: Gujarati, Bulgarian, Turkish, Swahili, Amharic, Kyrgyz, Serbian, Tamil,
Hausa, Igbo, Amharic, Macedonian, Turkish v2, Burmese, Yoruba and Urdu.
We developed one further language pair as part of our Surprise Language Challenge (Pashto-English).
We prioritised languages which are strategically important for the BBC and DW, and then from
a shortlist, the research partners selected languages which were interesting for their research and
have a variety of resources to work with. These models were released to the public.
Objective 3: Development of tools for analysts and journalists
This objective relates to the design and implementation of interfaces to translate, edit and evaluate
translations.
Our completion of this objective has been enabled through the development of the GoURMET
translation platform - there is also a demo front end, which is linked to from our website.
The main component of this objective was achieved by the development of six products and
prototypes developed and tested by the user partners:
BBC - Frank, Live Page Translation, Graphical Storytelling Tool
DW - PlainX, Selma, Benchmarking Prototype
All BBC prototypes remain active and Live Page Translation has been extensively used by
journalists reporting on the Russian invasion of Ukraine. Frank is being developed further to expand
compatibility with additional content management tools and workflows for World Service. In DW,
plainX is being rolled out as multilingual adaptation and subtitling platform and used in Media
Production, and GoURMET models have also been deployed in other services through SELMA.
Objective 4: Sustainable, maintainable platform and services
This objective concerns the development of a plan for sustainable exploitation and use of the plat-
forms, systems and technologies developed in GoURMET. We organised a number of events engaging
with the media industry to encourage interest in low-resource translation, and uptake of the resources we have provided.
Objective 5: Dissemination and communication of project results to stakeholders
We engaged with organisations from all along the innovation chain, including broadcasters,
commercial players, governmental/EU agencies in the single digital market and relevant research
communities. One of the ways we encouraged the field of machine translation to focus on low-resource languages
was developing test and training sets for WMT shared tasks: Gujarathi (WMT 2019), Tamil (WMT
2020) and Hausa (WMT 2021). These tasks attracted significant participants and these languages
were later included in the translation services of a number of large industrial labs.