CU at the forefront of European AI research

Tuesday, 11 March 2025 23:26

“Our main goal is to produce a language model that will compete with existing models and, moreover, will work very well for all European languages,” says Professor Jan Hajič from the Institute of Formal and Applied Linguistics (ÚFAL) at the Faculty of Mathematics and Physics, Charles University, who is in charge of the giant European project OpenEuroLLM. Research institutions, computing centres and companies from all over Europe are collaborating in the project, with Charles University as its main coordinator. This activity is intended to result in the creation of large-scale, next-generation open language models that will support the development of Europe's AI capabilities.

dovnitř 1

“This is a happy milestone for us, as it is a very large international project involving many countries and connecting the European space at a new level. Equally important, it is a collaboration with application partners across Europe. We often hear that Europe is not the dragon in the field of innovation compared to other areas of the world, but these projects and the work of Professor Hajič, and in general the scientific activity that the faculty can be proud of, contribute to making Europe an innovative environment,” the Rector of Charles University Milena Králíčková said, opening proceedings.

“We are proud of the number of projects we have recently been awarded,” the rector was followed by the Dean of Faculty of Mathematics and Physics, Mirko Rokyta, who added that the faculty currently working on almost 450 different projects - almost a hundred of which are foreign and five prestigious ERC projects. “However, a project of the scale that Professor Hajič and his team have been awarded is something absolutely exceptional. It involves eleven foreign universities and institutions, five foreign companies and four major European computing centres,” the Dean continued.

“It is amazing that the coordination of such an important project has fallen on us, so to speak. But the fall is not automatic, it needs to fall on those who are ready, who are able to solve such a project and show foreign partners that they have the skills to coordinate something so huge,” Rokyta appreciated and added that the hierarchy of enthusiasm runs from ÚFAL through the management of the faculty to the top ranks of Charles University. “I think that artificial intelligence should be dealt with primarily by people who are endowed with natural intelligence, and I am convinced that this is what the people from ÚFAL are,” the dean summarised.

dovnitř 2dovnitř 3

One of the main specifics of the upcoming model is its complete openness from start to finish - from the training data to the final model. “Thanks to this, we will be able to prove that we meet all European regulations, which is important for the application of these models in practice,” said Jan Hajič, head of the Institute of Formal and Applied Linguistics and the main coordinator of the project. “There are six centres for high capacity computing in Europe and our partners in the project are five of them. We firmly hope that these capacities, which we will acquire by the end of this year and in the next two years, will be sufficient to be able to produce a high-quality model,” Professor Hajič said, adding that the project will service 32+ languages. "These are the 24 languages in the European Union and the eight languages of the countries that are in talks to join the EU. This plus means that we will try to include in the model the big languages that are important for trade between Europe and the rest of the world," the head of ÚFAL specified.

dovnitř 4

“Another goal is to make the models easy to use for the large ecosystem of smaller and small businesses in Europe that either can't afford to pay for those big language models, or want to make sure that the model is still available for their needs and that they can use it locally and not as a paid service,” added ALT-EDIC director Edouard Geoffrois, a colleague from the faculty.

dovnitř 5dovnitř 6

The project, branded STEP (Strategic Technologies for Europe Platform), builds on previous European projects and the experience of the partners. It makes use of extensive high-quality datasets as well as pilot language models that have been developed previously. The consortium started on 1 February 2025, will run for three years and will be funded by the European Commission under the Digital Europe Programme. It is also co-funded by industry and national providers, including the Czech Ministry of Education. “I am very pleased that a project of such enormous importance and prestige is being handled by Charles University, specifically by the research team of the Faculty of Mathematics and Physics of Charles University,” Radka Wildová, Senior Director of the Higher Education, Science and Research Section of the Ministry of Education, Youth and Sports said. The section is also the largest provider of financial support for the development of science and research in the Czech Republic.

dovnitř 7dovnitř 8

“We would like the language models we have developed to be available as soon as possible, but of course they must be of high quality and comparable to existing models. The project is for three years, and we hope that during that time we will develop models that will be competitive not only with today's models, but also with those models that will be available in three years' time," pointed out the main coordinator of the OpenEuroLLM project, Professor Jan Hajič from the Faculty of Mathematics and Physics, said at the end of the meeting.

Author: Jitka Jiřičková
Photo: Hynek Glos

Share article: