Skip to Main Content

Data Services: Workshop Series_Archived Events

Erik Johnson

March 11, 2021 2:00-3:30 pm on Zoom

Reproducible OCR Research using RStudio, Slack, and GitHub


During this workshop, attendees will learn how to set up a collaborative and reproducible research workflow. Attendees will begin by setting up a collaborative workspace in Slack. Next, they will get an introduction to optical character
recognition (OCR) as they import, digitize, and process raw image data using the Tesseract library in R. Attendees will then work though a data cleaning process using LaTex and perform a basic analysis of the resulting data to produce a report using RMarkdown. The workshop will also briefly cover how to start a project in Github to enable project and code versioning in RStudio.

For those who want to follow along, the following resources are recommended:

Optional Resources:

 

About the Instructors:

Erik Johnson is an Assistant Professor of Economics at The University of Alabama. His research focuses on topics such as measuring industrial agglomeration patterns, valuing visual amenities in cities through the use of deep learning methods and automated street photo scraping, and how exploring support for higher education funding is affected by the racial composition of students and voters. His work has been published in the Journal of Urban Economics, Regional Science and Urban Economics, and the Journal of Real Estate Finance and Economics and presented at the NBER Summer Urban Institute, the Boston and Richmond Federal Reserves, the AREUEA ASSA sessions, and in as well as the national and international UEA conferences.