As well as Love Your Library week coming next week, it is also Love Your Data week, so we thought it would be helpful to clarify what data is and how you can treat it well to ensure you get the most from it during your studies and research.
Most of the following is relevant to undergraduates and graduates, but the latter stages may be of more interest to the researchers. The following information is based on advice from Cambridge’s Research and Data Management Team. Explore the resources on their site or attend one of their courses to ensure you are creating, storing and (re)using data effectively and efficiently. Alternatively, contact us to discuss your data needs or look at the Data Management tab on our LibGuide.
There are lots of training opportunities in person and online if you want to learn more.
What do we mean by data?
Cambridge uses a broad definition of the term which includes images, video, transcripts, and historical documents as well survey responses, lab books, field notes, physical samples, measurements and statistics. You should consider data to be everything used and produced as part of your studies and research; even your bibliography.
Why should you manage your data?
First and foremost it is good practice; it increases your efficiency meaning you can find what you need more easily and preserve it so that it is available in years to come. However, it is also increasingly a requirement of funding bodies that research data is made freely available so that others can reuse it to replicate research findings or build upon it.
Data back up and file sharing
As an individual it is very easy to lose data. Whether it is deleting something by mistake or breaking your laptop. It happens to companies and organisations too. In 2017, the Cancer Research UK Manchester Institute at the Christie cancer hospital, a went up in flames leading to the loss of equipment, data and samples.
Back-up is essential. Use departmental drives, external drives, cloud/online storage and automate saving where possible. You should always keep backups in at least two locations. Don’t, for example save it to a USB drive and keep it in your bag with your laptop! If you are using cloud storage, you may need to consider data sensitivity issues. Think about where the data is stored and which laws govern that data?
Your strategy should be guided by considering: what you are willing to lose? what is crucial to your research? and how often does it change? The more it changes, the often you need to back it up.
I have seen lots of computer desktops covered in a myriad of files. You need to organise your data into a consistent and meaningful system if you are to have any hope of finding something again and if you want others to look at it too.
For physical samples, you could create maps of your storage system, reference samples in notebooks, and add notes to the samples themselves.
With digital files, think about using the following method to help with retrieval:
prefix (for document type e.g. report, notes, essay)_document title_version_dateyyyymmdd
Keep folders structured similarly, using dates where practicable to divide up work. Nest folders to keep each level to of storage to a minimum. Having 50 folders on your desktop is just as confusing as having 50 documents.
Managing personal and sensitive data
Personal data is data relating to a living individual, which allows the individual to be identified from the information itself.
Sensitive data is personal data about:
- racial or ethnic origin
- political opinions
- religious beliefs
- Trade Union membership
- physical and mental health
- sexual life
- criminal offences and court proceedings about these
You therefore need to consider whether the data you collect falls into this category and how you will deal with it. The easiest thing is, of course, not to collect it in the first place. But if you do ensure you get consent, try and anonymise it and have a plan of action for how it will be managed in the future.
There can be a potential conflict between abiding by data protection legislation and ethical guidelines, whilst at the same time fulfilling funder's and individual's requirements to make research results available. Ethics committees may believe that any personal or sensitive data should remain confidential. It is important therefore to distinguish between personal and more general data gathered during research.
Do seek advice if you think you will be working with sensitive data. A 30 minute online course is available to support you too.
Why should you share your data? You will potentially benefit from increased citations while helping move knowledge forward. In addition, it will ensure the integrity of your findings and, as with open access publications, many funders now mandate that your data should be publicly available. Check with your funder as soon as possible so that you collect and store data appropriately throughout your research so that it is sharable at the end of the process.
You should aim to store it for at least ten years in a suitable repository and link to your publication(s). Data can be uploaded directly through Sympletic.
Data Management Plans (DMP)
Many funders now demand a DMP to demonstrate that you are aware of best practice and the expectations of sharing. It should cover:
- type of data
- how sensitive data will be dealt with
There is a whole section of the Data Management site dedicated to helping you write your DMP and DMP Online will take your through the process step by step.
- Find a system of organising your data and stick to it
- Always back up in at least two loctions
- Be cautious of cloud storage, esapecially for sensitive data
- Start planning early in the research cycle
- Check funder requirements for research degrees