Common Voice is a crowdsourcing platform for text and audio. It is an open source project to allow individuals who are interested in building their language's digital footprint participate either by donating their voice or text , or by using the crowd sourced data to train and fine tune their NLP models for whichever task.
The Lacuna grant is funding the process of building an NLP Text and Speech Datasets for Low Resourced Languages in East Africa and this is being done with the help of the Common Voice(CV) tool where people from diverse demographics can contribute to this drive from any location as long as they have access to a web browser and internet. The language that is currently being focused on is Luganda due to its recent launch on the CV platform. Other languages such as Luganda, Runyankore-Rukiga, Acholi, Swahili, and Lumasaaba are also in the pipeline to have them launched on the CV platform to enable their people to also contribute to these . Among the various NLP technologies that these collected datasets can be used to build is an Automatic Speech Recognition model for these respective languages in support of the SDG’s .
Monolingual and parallel text corpora is also being developed for use in several NLP applications as shown below