Meet the Innovators: Samuel Frontull on Creating Machine Translation to Promote Ladin

14.11.2024

Samuel Frontull, a 28-year-old researcher at the University of Innsbruck (Austria), has created the first machine translation system for Ladin, a minority language spoken by around 30,000 people in northern Italy’s South Tyrol, Trentino, and Belluno regions. His tool, accessible via tradutur-informatik.uibk.ac.at, supports translations between Ladin-English, Ladin-German and Ladin-Italian. Samuel’s work has drawn significant media attention, making him a prominent figure in the digitalisation of Ladin. In this interview, Samuel, who is also a speaker at the upcoming 8th Forum of European Minority Regions (26-27 November 2024 in San Sebastián/Donostia, Basque Country), discusses the inspiration behind his project, the challenges of promoting a small language, and his vision for the future of language technology.

Samuel, could you briefly explain what exactly you have developed, and how it benefits the Ladin language?
Over the past three years, through a research project at the University of Innsbruck in collaboration with the Ladin Cultural Institute “Micura de Rü”, I’ve been developing a machine translation system for the Ladin language. We approached this as a research project because it was initially uncertain whether a translation system could be created for a smaller language with limited data resources. Now, after three years of dedicated work, we have an online tool that can translate texts to and from Ladin (currently for the Val Badia variant). Although this tool is still limited to this specific variant and has some potential for improvement, it’s already practical and can offer helpful guidance. It benefits the Ladin language by enabling those interested to engage with it and by supporting those who use it professionally, making the language more accessible and appealing.

What inspired you to develop a machine translation system for Ladin?
The idea came up during my studies at the University of Innsbruck, where I suggested this topic for my final thesis as part of the Data Science continuing education programme. While I found it a fascinating data science challenge, at its core, this project probably grew from my long-standing desire to make Ladin more visible and accessible. It’s a unique combination of my computer science background and my native language, which makes it especially meaningful to me. With funding from the Regione Autonoma Trentino-Alto Adige/Südtirol and support from the Ladin Cultural Institute “Micurá de Rü”, I was granted the opportunity to continue working on it and bring it to its current level.

How do you see your work impacting the future of the Ladin language?
It’s hard to predict, as this is just a digital tool. If I had done something to address issues like the high cost of living in Ladin-speaking areas, I could more confidently say I’d strengthened the language’s future. Nevertheless, it can help preserve Ladin, and it opens up exciting possibilities for further applications. Ideally, this project will inspire new initiatives to increase the language’s visibility and accessibility.

What challenges have you encountered while digitalising Ladin?
Ladin is in a relatively good position because it is still actively used in everyday life, is being promoted, has a media presence, is taught in schools, and studied academically. This provides an important foundation for the development of a machine translation system, as it ensures the availability of high-quality resources and a wider potential user base. Still, I faced several challenges, especially in gathering and preparing the data. One unique aspect of Ladin is its diversity, with different variants that each have their own spelling conventions. This diversity presents a challenge, as each variant requires a tailored approach, and high-quality texts must be collected for each one. For the Val Badia variant, a significant hurdle came from the 2015 spelling reform, which meant older texts needed preprocessing to align with the updated standards.

How has the media responded to your work, particularly in Italy and Austria?
In 2021, the project received an award from the Eduard Wallnöfer Foundation of Tyrolean Industry, which was covered by local media, including Rai Ladinia, Rai Südtirol, ORF’s Südtirol Heute programme, and the local newspapers Dolomiten and La Usc di Ladins. These media outlets have been very supportive, providing essential text resources and showing consistent interest in the project’s progress.

What are your plans for the future of language technology and minority languages?
Improved solutions are needed to make current technologies more effective for minority languages. Most modern systems rely on large datasets to mimic language patterns – a method that’s effective for major languages but often falls short in low-resource scenarios. Adapting these systems can also be challenging, as their internal workings aren’t easily interpretable. Solutions that incorporate linguistic knowledge could enable more advanced applications for minority languages. I envision future developments such as speech synthesis, voice recognition, intelligent writing assistants, innovative language courses, and storytelling tools. I would like to contribute to this ongoing journey.

Further Resources on Samuel’s Work:

Ladin Machine Translator: Explore Samuel’s tool, which supports translation pairs for Ladin-English, Ladin-German and Ladin-Italian (Val Badia variant of Ladin).
Stol.it Article: „Das Ladinische zur Welt hin öffnen“ (in German)
Lausc.it Article: Traduziun automatica ladina: pest por Samuel Frontull (in Ladin)
Scientific Paper: Traduzione automatica “neurale” per il ladino della Val Badia (in Italian); report that discusses the advancements in developing a machine translation system for the Ladin language using neural models
Scientific Paper: Rule-Based, Neural and LLM Back-Translation: Comparative Insights from a Variant of Ladin (in English); research that explores the impact of different back-translation approaches on machine translation for Ladin (Val Badia variant), including fine-tuned neural networks, rule-based systems, and large language models

Meet the Innovators: Dr Jörg Hübner on Using Video Games to Promote the Sorbian Language Meet the Innovators: Iona Mercer and Mirren Buchanan on Revolutionizing Gaelic Learning with “SpeakGaelic”

Meet the Innovators: Samuel Frontull on Creating Machine Translation to Promote Ladin

PRESS RELEASES

Flensburg

Berlin

Bruxelles