Data planning

This has been up in various guises before, it remains incomplete but I’ve had requests recently to post.

Use this document in conjunction with the Research data planning checklist and the research data guidelines above to help with the research data planning process.

1. Describing your data
– How will this research be generated and used in this project?
– Describe the data set as completely as possible. Include information about the format, average size, volume and/or estimated number of data files produced.
– Consider life cycles of this data set:
– What stages does the data go through (eg raw, processed, analyzed)?
– What methodologies at each stage?
– What tools and instruments are used?
– Who is involved (eg. Professors, lab techs, students)
– How will this data be managed?
– How will you identify and cover costs of managing data sets?

 Data collection
– Are you collecting on paper based forms? How will they be digitised?
– Any additional requirements (eg. image scanner, optical character recognition)?
– Collecting data by mobile phone or tablet? What software, how will you access?
– Who is responsible? Where will the data be hosted?

Data and metadata standards
– Are there any standard formats in for eld for managing or disseminating the data sets (eg. XML, ASCII, CSV)?
– Is your format proprietary rather than open and is this essential?
– If there is not a standard format, how will you format the data so that others in your field will be able to make use of it?
– Who in your team will have responsibility for ensuring that data standards are properly applied and data are properly formatted?

Metadata is structured information that describes or otherwise makes it easier to retrieve, use or manage an information resource. It represents the who, what, why, where and how of the resource.
– How will metadata be generated and captures for each of your data sets?
– Are you aware of any metadata standards that could be used for you data sets?
– If there is not a metadata standard, what metadata will you need to generate so that others in your field will be able to find, understand and make use of your data?
– Who in your research team will be responsible for ensuring metadata standards are followed?

2. Intellectual property, ownership and copyright
Funding agencies may have varying approaches toward IP, copyright and related issues. UCT has specific IP and ownership, and South African IP law is different than many other countries.
– Who will own these data sets? Any other stakeholders need to be consulted before data sets are made
– Will you permit re-use of data, either with or without conditions?
– Will you permit re-distribution of the data, either with or without conditions?
– Will you permit the creation and publication of derivatives of the data, either with or without conditions?
– Will you permit others to use the data to develop commercial products or in ways that produce a financial benefit for themselves, either with or without conditions?

3. Data sharing
Funding agencies may recommend or require data sharing during the course of research.
– How will the people who generated the data sets receive attribution for their work?
– Who would be the target audience for your data sets, and how would they use your data?
– When will you share each of your data sets (eg. After data has been normalised, corrected, after publication, etc)?
– Will you place any conditions on the sharing of your data with others (ie requiring some form of acknowledgement or attribution, forbidding for-profit use)?
– If these data sets contain sensitive information, what steps will you take to ensure protection?
– Do you need to get specific consent for data sharing? Note this if often essential for qualitative interviews, genetic data.

Data archiving and preservation
– Which of you data sets have long-term value to others?
– How will you ensure ongoing access beyond the life of the project?
– What related information needs to be preserved with the data?
– How will you or the repository you are working with ensure that these data sets are able to withstand changes in or the obsolescence of the storage techniques?

Additional resources
This document borrows heavily from: Purdue University Libraries. Data management plan self assessment questionnaire. Purdue University, West Lafayette IN. 2/4/11.