Articles, Blog

Intro to Data Management Plans

January 10, 2020

[Anna] Hello and welcome to Intro to Data
Management Plans from ICPSR. As you might already know, ICPSR is one of
the world’s largest social sciences data archives. I’d like to introduce you to your hosts,
Chelsea Goforth and Shane Redman, data project managers at ICPSR. Chelsea and Shane, the floor is yours. [Shane] Thank you, Anna. What is a Data Management Plan or a DMP, as you’ll hear us reference it? A Data Management Plan is a written document
that describes the data that you expect to generate or otherwise acquire throughout your
research project. It explains how you intend to manage the data
during the project, as well as your plan for long-term preservation and data sharing, even
after the project has ended. Some of the common elements that are included
in a data management plan include a description of the types and formats of all of the data
you expect to acquire, a description of the accompanying documentation, such as codebooks
or user guides, and metadata that will be produced to help understand the data, information
about the repository or mechanism that you have chosen to archive your data for the long-term
and share it with other researchers, and information about your project’s informed consent process. We will discuss these and additional elements
that are typically included in a data management plan in greater detail later on in this presentation. Why should a researcher write a data management plan? There are many reasons why it is a good
idea to write a data management plan. One increasingly common reason is that
it is required. Many funding agencies have started to require
applications for research support to include a data management plan, or some sort of data
sharing and dissemination plan. Funders want to extend the impact of their
research dollars. Writing a data management plan showing that
the data will eventually be accessible to other researchers for use, both now and in
the future, allows the funder to see the long-term value and impact of their research dollars
to the scientific community. A data management plan also helps formalize
the process of data management throughout the research project and provides a record
of what the researcher intended to do. For example, a question might arise in the
middle of the data gathering process about how or where to store the data. Looking back to the data management plan can
be a helpful reminder of what you planned to do with the data from the start. Writing a data management plan also allows
the researcher to identify any weaknesses in the project early on. Writing a DMP forces the researcher to think
through the types of issues that may be encountered throughout the project. For example, protecting respondents’ identities
may be something that you think is a natural part of your research project, but writing
a data management plan forces you to think through the practical steps of ensuring that this
actually happens, such as what files may need to be de-identified prior to archiving
the data or sharing the data with others. Similarly, determining certain elements
of data management at the beginning stage of a research project can help save time later
on. For example, having a repository chosen and
included in your plan ensures that you won’t run into any issues with the repository being
unable to accept your data format later on when you are actually ready to archive the
data. Lastly, writing a data management plan
encourages transparency in the scientific process. Sharing data and other information about your
research project with other researchers encourages and promotes replicability, accountability,
and efficiency. Who is involved in writing a data management
plan? Ultimately, the PI and/or members of the
research team are responsible for writing the data management plan for their own project. However, the PI and research team should
consult several others to ensure the information provided in the data management plan is accurate
and complete. If the data management plan is part of
a grant proposal, you should check with the funding agency to see if there are any special
required elements or topics that need to be addressed in the data management plan. Additionally, the institutional review
board, or IRB, at the PI’s institution or institutions, if there are multiple PI’s or co-PI’s, should be consulted. IRB requirements and preferences vary across
institutions, so it is important to explain to your IRB what your plans are for sharing
the data from your project as their requirements may have implications for the information
that you provide, or the language that you use, in your data management plan. Similarly, once you have chosen a repository
for your data, it is a good idea to contact someone there or to research their website
to see if they have any guidance on the various elements that will be discussed in the DMP. For example, if you choose to archive your
data at ICPSR, you can contact us and we are happy to review your data management plan
to ensure that the information you include aligns with our standards and practices. Our website provides information about ICPSR
that would typically be included in a data management plan, and you can also find a sample
DMP and recommended language for informed consent documents on the website. [Chelsea] Thanks, Shane. Next you may be wondering when to write a data management plan? And the answer is
nearly always as soon as possible at the beginning of the data life cycle—when planning a new
research project, or when writing a new grant proposal. As Shane mentioned, DMPs are often required for funding requests, especially for proposals
to large federal agencies like NSF or NIH, but also for many other funding sources as well. You’ll note here that I’ve linked to an excellent resource from the Scholarly Publishing and Academic Resource Coalition. They’ve compiled this great resource with information about data management and data sharing requirements from all of the federal funding agencies. Regardless of whether or not your funding agency requests a DMP, writing one will be
helpful, even if not required for anyone who is at all concerned
about research transparency, replication and reproducibility, or preservation for all
the reasons Shane previously discussed. You can find more information about the
data life cycle from the ICPSR Guide to Social Science Data Preparation and Archiving, which
is linked at the end of this presentation with additional resources. But the important
thing to note here is how important it is to think about a data management plan early
on in a research project, even prior to other project start-up activities. So, given the importance of a data management plan, how exactly do you write one and what
information is typically included? I’ll run through several common elements
typically included in a DMP, but also note that not everything I mention will be applicable
to every research project. You’ll typically start by providing
a description of the data and its collection. This includes a description of the nature
and scale of the data to be produced by the project. For example, whether or not your project will
include experimental data, observational data, models or simulations, video or images, software
programs, applications, etc.—and how, when, and where you plan to collect those data. In addition, you can address how the data
will be processed after collection, including information about which software or algorithms
you’ll use and anything else relevant to your data management workflow and short-term
data management. It’s here that you’ll specify the anticipated
submission, distribution, and preservation formats for the data and related files (perhaps
noting that these formats may be the same), and include a justification for the procedural
and archival appropriateness of those formats. You might also describe the naming conventions,
version control, and/or quality assurance and control procedures you intend to use. If relevant, you’ll also want to address
the origins of existing data that you’ll use in this project. For example, how newly collected
data will be combined with existing data, or any other details about the relationship
between newly collected data and existing data, as is relevant. Relatedly, you should also provide a description
of metadata, which are the contextual details, the basic characteristics of the data, including
any information important for using the data. You’ll often hear metadata described as
“data about data” and they answer questions like: who created the data?, what does the data file contain?, when and
where were the data generated?, and why and how were the data generated? Good descriptive metadata are essential for
effective data use; this can include descriptions of instruments, parameters, units, files,
and other temporal or spatial details. This section should also include a discussion
of the metadata standards you plan to use, again including a justification for the format
chosen. We recommend using structured or tagged metadata,
such as the XML format of the Data Documentation Initiative, or DDI, because of the flexibility
that they offer in display. XML format is also preservation-ready and
machine-actionable, and I’ve included a link here if you’d like more information
about this. Next, you’ll address the long-term storage,
management, and backup of the data, including the physical and cyber resources and facilities
that will be used for the effective preservation and storage of the research data, by answering
questions such as: how and where will you store copies of your research files to ensure
their safety?, how many copies will you maintain and how
will you keep them synchronized? Here, you’ll provide details about data
security by describing the technical and procedural protections for information, including confidential
information, and how permissions, restrictions, and embargoes will be enforced. In addition, you should specify the name(s)
of the individuals responsible for data management in the research project or who will act as
the responsible steward for the data throughout the data life cycle. Many of these details can be addressed
by providing information about your archiving and preservation plans, including identifying
an appropriate archive for long-term preservation early on in your project. For example, you can write that by depositing
data with ICPSR, your project will ensure that the research data are migrated to new
formats, platforms, and storage media as required by good practice. Finally in this section, you’ll indicate
how data will be selected for archiving, how long the data will be held (if applicable),
and what your plans are for eventual transition or termination of the data collection in the
future. Some repositories like ICPSR store data in
perpetuity. However, if there is a compelling reason for
why some of your data should not be preserved permanently, you should address the proper
retention period and reasons in this section. Closely related are policies for access,
sharing, and re-use. This section should include a description
of how data will be shared, including access procedures, embargo periods, technical mechanisms
for dissemination, and whether access will be open or granted only to specific user groups. A timeframe for data sharing and publishing
should also be provided, as well as a description of the intended future uses or users of the data. Importantly, this section will include
any ethical or privacy issues with data sharing, including a discussion of how informed consent
will be handled, how privacy will be protected (perhaps through restricted-use data), any
exceptional arrangements that might be needed to protect participant confidentiality, or
other ethical issues that may arise. You’ll also describe here any obligations
that exist for sharing collected data, including obligations from funding agencies, institutions,
professional organizations, or any other legal requirements. And, you might also indicate how the data
should be cited by others and how the issue of persistent citation will be addressed.
For example, if the dataset will have a digital object identifier or a DOI assigned to it. Next, you’ll want to specify the entities
or persons who will hold the intellectual property rights for the data and other information
created by the project, and how intellectual property will be protected if necessary. Also address whether or not these rights will
be transferred to another organization for data distribution and archiving as we previously discussed. Any copyright issues or constraints should be noted here as well (for example, if you use copyrighted data collection instruments), including how the
researcher will obtain permission to use and disseminate copyrighted materials. Lastly, although not on this slide, it’s
worth noting that sometimes researchers will also include budget information for their
project in their data management plan; specifically, information about personnel time for data
preparation, management, documentation, and preservation; hardware and/or software needed
for data management, backup, security, documentation, and preservation; and/or any costs associated
with submitting the data to an archive. There are many options to archive data at
a repository at no cost to researchers, including at ICPSR. However, it is still a good idea to consult
with your repository of choice when writing your data management plan to ensure any cost
information is accurate and that the repository can handle the data you plan to archive. We’ll close with information about where to find additional resources on data management
plans; you can see several links on this last slide. In particular I’ll highlight the first link
to the ICPSR Guidelines for Effective Data Management, which includes several examples of how to include each of these elements that I just discussed. Many thanks for taking the time today to watch
this webinar. We hope it was helpful in providing some preliminary
information about data management plans. If you have any questions or are interested
in having a conversation about your data archiving options, please don’t hesitate to reach out. Our contact information is here, or you can
also go to our website,, and can use our general contact email:
[email protected], if you have any questions.

You Might Also Like

No Comments

Leave a Reply