Lab of the Month:
The Centre For Language Resources And Techonologies At The University Of Ljubljana
(Center za jezikovne vire in tehnologije Univerze v Ljubljani)
by Špela Arhar Holdt & Teja Goli
The Centre for language resources and technologies at the University of Ljubljana (CJVT UL) is a research unit that conducts scientific research as well as develops and maintains key digital language resources and technologies for contemporary Slovene.
It is an interdisciplinary centre connecting five faculties of the University of Ljubljana: the Faculty of Social Sciences, the Faculty of Arts, the Faculty of Education, the Faculty of Electrical Engineering, and the Faculty of Computer and Information Science. The core team includes about a dozen researchers from these different fields. Our resources have practical value and are accessible to all Slovene language users around the world. We constantly encourage open access to linguistic data and therefore ensure our databases are accessible under open licenses.
The Centre is part of the Network of Research and Infrastructural Centres at the University of Ljubljana. This means we offer infrastructural support (tools, data, expertise) for the Slovene language. In addition to this, we focus on scientific research and the development of a variety of modern resources that are geared towards its many end-users. We are active within a variety of fields including semantics, phraseology, corpus linguistics, computational linguistics, natural language processing, language teaching and crowdsourcing. A special emphasis is put on the latter, especially since the centre is participating in the enetCollect COST action.
Our signature crowdsourcing tools are known as “responsive language resources”. First, these are compiled through automatic extraction methods, then, they undergo continuous lexicographic improvement, also with the help of user-provided feedback. For example, in the case of the Thesaurus of modern Slovene, users can vote positively or negatively on the automatically extracted data as well as suggest additional synonyms to be included in the dictionary database.
We have also developed a crowdsourcing mobile word game called Igra besed (English: Game of Words), in which the users play various games containing collocations and synonyms, and the data is then used to enhance our resources and tools.
Our main products can be found on our resources site and include:
Gigafida, a reference corpus of written standard Slovene
Sloleks, a Slovene morphological lexicon
Sopomenke, a Thesaurus of Modern Slovene
Kolokacije, a Collocations Dictionary of Modern Slovene
In addition to this, we are involved in a variety of domestic and international projects including ELEXIS (European Lexicographic Infrastructure), in which we are one of the observers and data providers. Our team members also serve on boards of international associations such as EURALEX and ELRA, or act as advisors to institutes such as the Dutch Language Institute and the Czech National Corpus.