Mapping the language learning platforms worldwide in order to foster the crowdsourcing of language-related datasets and the automatic generation of exercises.
by Lionel Nicolas
(on behalf of the Task 4 crew of the crowdfest 2020)
Within the context of WG2, there is a strong interest in uncovering potential synergies between stakeholders interested in crowdsourcing language-related datasets and stakeholders that run a language learning platform. More specifically, the synergies could allow the first group to crowdsource the answers of students performing exercises automatically generated from language-related datasets in order to improve these datasets, while it would allow the second group to add to their platform language learning exercises that are automatically generated from language-related datasets.
In order to foster the exploitation of such synergies, we aim at first finding out what small, medium and large language learning platforms offering exercises exist nowadays and compile this information in a way that will allow the stakeholders interested in crowdsourcing language-related datasets to understand which language learning platform they could start collaborating with in order to crowdsource the data they are interested in. At the same time, we are also interested in finding out about other relevant aspects such as discovering and studying different aspects of existing language learning platforms such as, among other things, the languages they cover or the type of skills they train their users on.
With these objectives in mind, the six of us (Claudia Borg, Marta Giralt, Sarah Grech, Julia Ostanina-Olszewska, Gokhan Ozkan and myself) started to work during the Task 4 of the Crowdfest 2020 in Coimbra on a survey to gather as many references as possible from language learners and teachers from all over the world, and especially from Europe. Because we knew that getting answers to surveys can be challenging, we devised an easily completed online form, and coupled it with a contest to win e-vouchers in order to incentivize participation and favor references to small and lesser-known language learning platforms. We also devised a strategy to receive support in its dissemination by enetCollect members of the different COST countries (and would like to thank them again for their support). Indeed, we knew too well that a message in English coming from some unknown and distant researcher can quickly be perceived as spam and be swiftly sent to garbage by recipients. Therefore it was of paramount importance to establish direct contact with local stakeholders and researchers who could agree to support us by introducing our survey in the mother tongues of the targeted audiences.
The survey was thus fine-tuned and launched at the end of June with a deadline mid-July. We kept an eye on participation from one week to the other and insisted on the dissemination towards the countries and the mother tongues least represented in the participants. In total, we received 628 answers from participants of 68 different countries speaking 43 different mother tongues. 253 participants marked their interest in taking part in the contest to win e-vouchers, 297 said they wanted to be kept informed of our results while 129 indicated that they wanted to be kept informed about enetCollect in general.
Overall, the number of answers we collected was under our original expectations but the number of references provided by these 628 answers was far above what we expected! Indeed after grouping the references provided by each participant (up to ten per participant, 1699 in total) under a common set of labels (you wouldn’t imagine how many ways participants have found to reference the same entity), we came to the conclusion that these references are pointing us to around 600 different entities!
The next step for us is to check if the referenced entities are indeed language learning platforms offering exercises and not other types of entities such as ones offering a service related to language learning (e.g. a language learning school), an entity offering a service that can be used for language learning purposes, but is more meant for general purposes (e.g. Youtube) or an entity allowing to the exchange or download of language learning content without providing exercises (e.g. Teachers pay Teachers) .
While we check them, we will annotate their characteristics worth reporting on (e.g. language covered, business plan etc.) in order to crunch some interesting statistics. We just finished the annotation guideline for this task and created a dedicated form with Limesurvey to perform the annotation in a standardized fashion. Next step is for us to perform the actual annotation of these 600 entities, make a report out of it, make the data freely available and disseminate it all back to the participants.
After that, there will be other milestones to reach to fully meet our original objectives. Already during the Crowdfest 2020, we realized that our shared efforts will take time and we would need to steadily work on the question if we wish to meet our objectives. Such need became even more evident as we got the data from the survey and could catch a glimpse at its extent and quality. For example, we certainly didn’t expect to have 600 different entities to look at when we first devised the survey!
So stay tuned and do expect another blog article on our efforts in some months from now.