Difference between implicit and explicit crowdsourcing
Nowadays crowdsourcing is an established concept and the public is well aware of its tremendous potential. The taxonomy of crowdsourcing approaches however is far from being a consensually solved matter and crowdsourcing approaches are categorized in a very diversified manner. In enetCollect, we categorized crowdsourcing approaches as either explicit or implicit and we have created a working group for each type (WG1 and WG2). We could see that members of the action could have difficulties in understanding such a categorization and, accordingly, in understanding to which of the two working groups they should participate. This little essay, created during summer 2017, is meant to provide a starting point of discussion and explain the reasoning behind such a distinction. However, if practically possible, it is worth noting that things could evolve on this subject to better suit the needs of our Action.
The term implicit crowdsourcing is an already established term. It refers to approaches where users do not necessarily know they are contributing. In other words, users do not necessarily know that their actions are being crowdsourced. In order to create such an approach, the crowdsourcing component needs to be inserted into an existing user workflow that users would perform regardless if their actions are being crowdsourced or not. The most famous example is without doubt reCaptcha, that provided a CAPTCHA (“Completely Automated Public Turing test to tell Computers and Humans Apart”) system that has the useful “side” effect of allowing the OCRization of texts that could not properly be OCRized by state-of-the-art OCR algorithms. CAPTCHA system are meant to test if a user is indeed human in order to, among other things, prevent automatic bots to register on websites and publish or advert inadequate content. By creating such a CAPTCHA system, the reCaptcha initiative provided a useful service for an existing workflow while creating an unparalleled workforce for the highly-precise manual OCRization of texts difficult to recognize automatically. Another famous initiative is Duolinguo which, by providing a language learning platform, crowdsources translations that are later sold to third parties. By using an existing user workflow (language learning) an unparalleled workforce is once more obtained to create datasets (translation datasets). In enetCollect, implicit approaches are aiming for a similar result but for a far larger and more diversified type of datasets. Such approaches also aim at creating an endless source of exercise content.
The term explicit crowdsourcing, on the other hand, has been coined in enetCollect as an opposition to implicit crowdsourcing. It refers to approaches where users are willingly contributing to create an output that is of common interest for a large number of persons. The most famous explicit approach is without doubts the one adopted by Wikipedia which has managed to redefine over time the well-established landscape of encyclopedias. Contrarily to implicit approaches, explicit ones need to create or upgrade an existing user workflow to allow the crowd to collaborate in a coordinated fashion. Also, the legal aspects regarding the created data are to be carefully crafted to avoid issues related to Intellectual Property Rights (IPRs). In enetCollect, explicit crowdsourcing approaches will focus on collaboratively creating material that directly or indirectly benefit language learning. It is worth noting that the concept of explicit crowdsourcing is fairly similar to the one of referred often as “Wisdom of the crowd”.
Whereas the differences between explicit approaches such as the one of Wikipedia and implicit approaches such as the one of reCaptcha should be clear to most people, there are some use cases where the boundary can be fairly fuzzy. For example, crowdsourcing on teachers actions when they correct student essays through a digital platform can be a fairly effective way to obtain (pre-)annotated learner corpora. In such a case, the first purpose of the actions of the teachers are meant to correct essays and not to create a (pre-)annotated learner corpus. As such, the crowd doesn’t need to be informed of its participation and the approach could be labeled as implicit crowdsourcing. However, making the crowd aware of the crowdsourcing initiative should not harm the process since the main goal of the teacher actions is still to provide feedback to students and, by knowing that the data will be re-used for other purposes, teachers might actually be even more motivated in doing an even finer and more thorough job. Also, because student essays can contain very sensitive data, legal issues might arise and an explicit consent of the crowd would be necessary to avoid any problems. Last but not least, because such a crowdsourcing approach requires that teachers correct copies in a very coordinated and standardized way, such an approach can hardly be achieved without explicitly training them somehow. For all these reasons, despite being in principle an implicit crowdsourcing approach, such an approach has many aspects related to explicit crowdsourcing.
Because such use cases do exist, the border between implicit and explicit will always be fuzzy. The reason we decided to make two different working groups to handle separately implicit and explicit approaches is merely a pragmatic, people-oriented, one. Indeed, even though many members will be interested in both types of approaches, many will only be interested by only one side. For example, NLP researchers are very likely to only be interested in implicit crowdsourcing whereas stakeholders interested in creating language learning lesson (e.g. editors or teachers) are more likely to be only interested in explicit approaches. Having both types of approaches handled by a single working group would thus have resulted in WG meetings where at numerous occasions a large part of the audience has no interest and/or expertise for what is being presented and discussed. We thus created a binary categorization of the approaches so that the working groups have more homogeneous interest and expertise. We also ensure that members interested in participating in both working groups will have the possibility to do so by, among other things, ensuring that their meetings are never run in parallel. Shared or overlapping sessions between the two working groups are also foreseen is the future.