Skip to Main Content

Research Data Management and Curation

This guide covers the data life-cycle and how it relates to your research.

Research Data Lifecycle

The research data lifecycle describes the process of scientific discovery (research). The cycle is usually represented as a wheel to demonstrate how new research builds off of prior research. Although terminology and number of steps may vary (for example, the model below is from the Harvard Medical School), the idea remains the same.

The core process steps are: Plan, Acquire, Process, Analyze, Preserve, Publish/Share, and Reuse.

Other elements that are sometimes included as interconnected to the main steps are data backup and security, quality assessment and assurance, documentation, and metadata.

Harvard Medical School model for research data management. Contains a wheel split into 6 main sections: plan & design, collect & create, analyze & collaborate, evaluate & archive, share & disseminate, and access & reuse.

Planning involves learning and thinking about how you will incorporate data management into your research. Formally this is documented in a Data Management Plan (DMP), which outlines the major decisions for each step of the research data lifecycle.

The next step is to acquire the data for your project. This may include creating, collecting, or generating new data or obtaining existing data. 

Once you have collected your data, usually it needs some processing before it is analysis ready. This may involve validating, summarizing, integrating, subsetting, or applying other transformations to the data. It is important to note what steps you went through to change the data from its original format, to ensure it can be reproduced later.

Analyzing is when you begin interpreting and drawing conclusions from the processed data. This may involve statistics, visualization, spatial analysis, image analysis, and modeling. Regardless, documenting the process is just as important as saving your data. 

Once you have completed your findings, the next step is to preserve your data. This could be done by submitting your data to a repository, planning for long-term archiving of your data, or any other processes aimed at ensuring the long-term access to your data. 

Finally, it is time to Publish and Share your data. In addition to publishing an article, this step includes other activities to increase the findability of your data, such as submitting records to data catalogues, assigning DOIs, and creating searchable metadata.

Reuse is the connection between sharing and planning, starting the cycle over again for the next iteration of the research lifecycle. 

FAIR Data

FAIR Data and FAIR Principles are another related concept model for research data management. FAIR mostly focuses on the sharing, documentation, and reuse aspects of the research data lifecycle, but can also be used as a framework for implementing research data management more broadly. 

First introduced in a 2016 paper, the FAIR framework presents a methodology for data management focusing on creating Findable, Accessible, Interoperable, and Reuseable data. The hope is to move towards a data culture that supports machine actionability and automation due to the increased volume, complexity, and creation speed of data. 

Findable - Data should be easy to find for both humans and computers. Machine readable metadata files are important for the discovery of datasets as they can be searched algorithmically. 

Accessible - Data needs to be accessed, and it should be clear to users how to download datasets manually and programmatically.

Interoperable - Data usually needs to be integrated with other data and software packages. 

Reuseable - FAIR's goal is to optimize the reuse of data, which relies on researchers creating well-described metadata in order to replicate and combine data.

Open Science Framework

Open Science Framework (OSF) is an online tool created to help researchers plan and track collaborative projects to ensure they are completing transparent, accurate, and meaningful research. The Open Science Framework was created by the Center for Open Science, who believes transparency is important in every step of the research process.

OSF is free and open-source and was developed by the non-profit Center for Open Science. Specific uses of the platform include documenting and keeping track of contributions across institutions, managing and storing files, writing and sharing documentation, and providing a centralized location for following other research projects. 

The University of Alabama does not have institutional access to the OSF platform, but you can create an account using your ORCID or email address for free.