Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Resolving the Paradoxes of Cross-lingual Transfer in Multilingual Language Models

Objective

The technical advances, and resulting societal opportunities, of Large Language Models (LLMs) have principally benefited communities whose primary languages are well-represented in the written data used for training LLMs (e.g. English). While these few high-resource languages are used by many around the world, they do not cover large segments of the global population of 8.2 billion, who collectively speak over 7000 languages. For intelligent natural language systems to be adopted and useful, they must enable interaction in the preferred languages of their users and be knowledgeable of the environments of those users. This expansion of LLM functionality requires re-thinking the cross-lingual transfer paradigm for enabling systems in low-resource languages. In an era where LLMs are knowledge bases, naive reasoners, and interactive agents, our intuitions that held for cross-lingual transfer to linguistic tasks will not extend to transferring regional and cultural knowledge understanding, which may differ even among similar languages.

In this proposal, we reformulate cross-lingual transfer using inference-time algorithms that dynamically localize, augment, and adapt implicit language and knowledge representations of multilingual LLMs for queries presented in any language. These new algorithms will leverage shared linguistic knowledge for cross-lingual transfer to new languages while disentangling regional and cultural knowledge that is tied to language but unique to individual language environments. Second, we will develop novel modular architectures to catalyze our adaptation algorithms by disentangling language and knowledge representations within multilingual LLMs during pretraining. Finally, we will develop new benchmarks, settings, and standards for reliable evaluation of regional knowledge in multilingual contexts.

Keywords

Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

HORIZON-ERC - HORIZON ERC Grants

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2025-STG

See all projects funded under this call

Host institution

ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 1 499 597,00
Address
BATIMENT CE 3316 STATION 1
1015 LAUSANNE
Switzerland

See on map

Region
Schweiz/Suisse/Svizzera Région lémanique Vaud
Activity type
Higher or Secondary Education Establishments
Links
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

€ 1 499 597,00

Beneficiaries (1)

My booklet 0 0