Lab of the Month:
The Árni Magnússon Institute for Icelandic Studies
by Branislav Bédi
The Árni Magnússon Institute for Icelandic Studies has a team of people who are dedicated to working on developing and maintaining new technologies in order to support Icelandic in the digital world. This team conducts research in the areas of Linguistics, Lexicography, Natural Language Processing (NLP), Linguistic Infrastructure, Linguistic Corpora and Computational Linguistics. This work is important for strengthening the position of Icelandic both locally and globally.
Amongst the resources housed and developed by the Institute are the historical Written Language Archive (Ritmálssafn Orðabókar Háskólans), the online Database of Modern Icelandic Inflection – DMII (Beygingarlýsing íslensks nútímamáls – BÍN), and the Icelandic Word Web (Íslenskt orðanet), all of which are widely used by native as well as foreign speakers of Icelandic. Other projects include the Tagged Icelandic Corpus (Mörkuð íslensk málheild – MÍM), a corpus of approximately 25 million running words, and the Icelandic Gigaword Corpus (Risamálheildin) which includes around 1,300 million running words. The web-portal Málföng provides users with information about various language data, tools, and online services for the Icelandic language in English as well as in Icelandic, and instructions regarding how to access the data for search and/or download purposes. The following two open-source resources, the DMII and the MÍM, are used in the tool LARA (Learning and Reading Assistant), which was developed by the University of Geneva and is closely linked to the enetCollect COST Action. This tool supports creating online hyper-linked texts in various languages, including Icelandic, supporting language learning via reading. The DMII gives information about inflection of words and the MÍM helps to process the text into a marked-up form.
Iceland has recently joined the CLARIN network as an observer with the Árni Magnússon Institute for Icelandic Studies as a leading partner in the national consortium.