Challenges to Transcription in Languages other than English

This panel presentation will bring researchers, librarians, project managers, and developers into conversation about challenges and opportunities they faced when administering transcription projects in multiple languages. We specifically designed this panel to include perspectives from DH practitioners in a variety of roles.

The specific projects they will present on include a crowd-sourced project to transcribe over 4000 pages of linguistic data from dozens of related Indigenous Mexican languages, a project to interpret medieval and modern orthography in transcribing an early 15th century Middle English poem and its variants, and a project to create a historical gazetteer for Latin America and the Caribbean by indexing an 18th-century Spanish source. All three of these projects were conducted using the FromThePage transcription tool which allows users to administer crowd-sourced transcription projects. By bringing the administrators of these projects together with the developer of the tool they are using, we hope to have a meaningful conversation about challenges in conducting multilingual DH and opportunities for change in the field. Some of the challenges encountered include developing detailed transcription protocols that can be followed by non-specialists, deciphering the handwritten text of original documents, and in recruiting and training volunteer transcribers. Opportunities include improving access to documents and creating datasets to further research on these languages, bringing together both experts and novices to collaboratively transcribe texts, and increasing their visibility. Through presenting on these experiences, we hope to underline the importance of developing DH tools and projects that support work in multiple languages and to encourage others to participate in this important and exciting work.

Some of the questions we will pose include:

  • What would motivate scholars to work on a collaborative transcription project
  • What would motivate students (of those scholars) to work on a collaborative transcription project and to teach from it?
  • Is it possible to create a detailed transcription protocol that accounts for enough details to make it easy to train collaborators to contribute to projects in languages they are unfamiliar with? For example, of modern or medieval orthography (shorthand or just notational quirks) or of “Mixtec” which is a family of dozens of distinct and unintelligible languages that change from document to document.
  • Is it more worthwhile to do collaborative transcription from medieval manuscripts directly or from handwritten interpretations of medieval orthography?