How we digitize the 1926 census

Lincoln Mullen

The 1926 Census of Religious Bodies has over 232,000 schedules. How do we go about digitizing such a large collection? Then how do we turn it into a dataset? The answer is longer than a blog post. But Greta Swain, the former project manager and graduate research assistant on the project, has written a lengthy case study. The case study has been published at the DataScribe website.

Here is the summary of the case study:

In the following case study we describe how the American Religious Ecologies project at the Roy Rosenzweig Center for History and New Media utilized DataScribe to transcribe tabular data from early twentieth-century digitized census forms in order to create a new dataset for American religious history. Because DataScribe was still under development when our team started transcription, we first used a basic spreadsheet structure for transcription and transitioned to DataScribe mid-way through the project. In this study, we will provide an overview of the American Religious Ecologies project and the sources we used. We will also detail the process of digitizing and readying our sources for transcription, the transition from using spreadsheets to using DataScribe for transcription, the workflows we developed for transcribing and reviewing, DataScribe as a project management tool, the final format of the data coming out of DataScribe, and the questions and visualizations this new dataset enabled. Finally, we discuss the decisions we made along the way. In total, this study will give you a better sense of how DataScribe was used by a diverse team at a University-based research center, and how it became a critical part of our efforts to transcribe thousands of sources, create new datasets and asynchronously manage a large-scale transcription project.