Meet the Innovators: Alexandru Jerpelea on Pioneering Machine Translation to Preserve Aromanian
12.11.2024Alexandru Jerpelea, a 17-year-old high school student from Romania’s capital Bucharest, has pioneered the first neural machine translation system for Aromanian, a minority Romance language spoken by around 200,000 people across the Balkans. His groundbreaking tool, accessible via AroTranslate.com, supports translations between Aromanian-Romanian, Aromanian-English and English-Romanian. His dedication to computational linguistics has attracted national attention in Romania, with media outlets featuring his impactful work. In this interview, Alexander, who is also a speaker at the upcoming 8th Forum of European Minority Regions (26-27 November 2024 in San Sebastián/Donostia, Basque Country), shares the inspiration behind his project, the challenges he faced, and the significance of his project for Aromanian preservation.
Alexandru, could you briefly explain what exactly you have invented and how it benefits the Aromanian language?
Together with Sergiu Nisioi from the University of Bucharest and with support from the Aromanian community across Romania and beyond – particularly with help from Florentina Costea, who introduced me to the community – we created the first neural machine translation system for Aromanian, an Eastern Romance language. The project required building a dataset of over 80,000 Romanian-Aromanian sentence pairs collected from various sources, along with a data collection pipeline tailored to Aromanian. This translation tool, powered by AI, aims to make Aromanian more accessible. You can find more technical details of the project in our publication: https://arxiv.org/abs/2410.17728.
What inspired you to develop the first machine translation system for Aromanian?
My inspiration came from self-studying computational linguistics as well as from witnessing other machine translation projects aimed at endangered languages like Sami and Cherokee. Seeing these efforts for other languages motivated me to create something similar for Aromanian.
What does it mean to you to contribute so significantly to the preservation of the Aromanian language?
I’m proud that this project has drawn attention to Aromanian preservation. Online articles and social media posts are helping raise awareness within the general public. In academia, I hope our corpus will inspire more research on Aromanian digitalisation. Although our project is a notable step forward, it remains a prototype with limitations, and Aromanian still faces challenges. While this project alone won’t “save” the language, I hope we’ve taken a step in the right direction.
What challenges did you encounter while developing this translation system?
The biggest challenge was data collection. Although our corpus (dataset) is the largest of its kind for Aromanian, it’s still quite small compared to high-resource languages, which often have datasets with millions or even billions of sentences. We aim to expand and improve on this.
What reactions have you received from the Aromanian community regarding your invention? How has the media in Romania responded to your work?
The Aromanian community has been very enthusiastic and offered constructive feedback on the software’s errors, which has been valuable. Major Romanian media outlets featured our project, generating excitement from non-Aromanian speakers as well. We received messages from people unaware of the language’s situation, and we’re glad to help raise awareness of its challenges.
What are your plans for future projects related to language technology?
Next year, I’ll be starting university, where I plan to continue studying NLP (Natural Language Processing) and contribute to more projects for low-resource languages. Once the university admission process settles down, I’ll return to further developing the Aromanian translation system.
Further Resources on Alexander’s Work:
- AroTranslate – Aromanian Machine Translator: Explore Alexander’s tool, which supports translation pairs for Aromanian-Romanian, Aromanian-English and English-Romanian.
- Digi 24 Interview on YouTube: Watch Alexander’s TV interview, where he shares insights into his translator’s development and impact (in Romanian).
- Libertatea Article: Read a feature article about Alexander’s journey, highlighting community reactions and the project’s importance (in Romanian).
COMMUNIQUÉ DE PRESSE
- FUEN member Turkic organisations at the UN Forum on Minority Issues
- FUEN’s European Dialogue Forum Participated in the 17th Session of the UN Forum on Minority Issues
- 8th Forum of European Minority Regions: Day Two Highlights the Role of Minority Languages in Entertainment and Media
- 8th Forum of European Minority Regions: Exploring the Role of Minority Languages in the Digital Age
- The 8th Forum of European Minority Regions Starts Tomorrow – Follow the Discussions Live on YouTube!
- FUEN calls on MEPs to support the re-establishment of the Minority Intergroup
- Interview: Elizabete Krivcova on Her Role as an Advocate for Minority Rights in Latvia
- Participants of the 27th AGSM Seminar see extremism as a clear and present danger
- Interview: Christina Gestrin on Digital Challenges and Opportunities for Swedish in Finland
- Meet the Innovators: Iona Mercer and Mirren Buchanan on Revolutionizing Gaelic Learning with “SpeakGaelic”