Last week DoReCo held our scheduled workshop in Berlin to officially kick off the project, as well as to reflect on our experiences with processing corpora for the past six months. The workshop was attended by corpus creators of nearly twenty languages spoken on six continents, representatives of various language archives, as well as representatives of a number of related initiatives. It was a great opportunity to bring together many important voices from the fields of archival and corpus linguistics, language documentation, and typology to discuss the mission of the DoReCo project.
Discussions at the workshop were immensely helpful for clarifying how we can best make the DoReCo corpus available to the scientific community: for instance, which kinds of file formats to provide to the community, how to standardize data output, and what metadata would be most helpful for future users of the DoReCo web portal.
We also discussed broader and sometimes thornier issues related to archiving and research, such as how best to version data that is stored in multiple places, how to deal with licensing and citations, and what kinds of research questions can be answered with a reference corpus consisting of many, but relatively small, corpora.
There were also a number of new collaborations begun at the workshop, and we’re looking forward to working on these together. A big thank you to everyone who participated.