We recommend drawing up a data management plan that specifies, from the outset of a research project, which data and related documents are to be retained and protected, using backup copies.
A good rule of thumb is 3-2-1: 3 copies, including 2 backup copies and 1 off-site copy.
Ideally, all research data judged to be of high quality should be preserved, along with related documents, i.e.: metadata, documents describing data collection methodology and database design, as well as documents describing ways of using or transforming the database.
How often should backup files be updated?
- There is no general rule on how often database backups should be updated. As the research project progresses and new data becomes available, the stored files should be regularly updated.
- Thereafter, the recommendation is to have and use an update plan that can send automatic notifications for a review to determine whether any changes need to be made to the database or any of the related documents.
Which technology should be used to save and store copies?
No type of technology is perfect, which is why it is advisable to use multiple technologies to copy a single database. Here is a list of current technologies:
- Networked drives: These drives are placed on institutional servers, which are normally well protected as well as regularly and automatically backed up.
- Computer hard disk: This is a flexible technology to use while the database is being developed, but it should be used in conjunction with another technology, as the risk of breakage or loss is significant.
- External storage devices (USB sticks, CDs, DVDs): These are affordable and particularly useful in the short term, like when travelling. However, it is risky to use these technologies alone to maintain a database over the long term; durability is limited, and the risk of loss or theft is high.
- Cloud storage: Services such as Dropbox, Google Drive, or OneDrive offer free or low-cost storage space. Although most offer encryption technologies, data protection is not guaranteed. Moreover, bandwidth is limited, which may restrict the use of large databases. Finally, the service is subject to change depending on the orientations and stability of the service provider.
Which file format should be used?
It is important to use a file format that will enable easy long-term use of the database and won't require extensive conversion of the stored data. We recommend the use of open-source formats (txt, csv, tab, mp3, flac, xml), ideally with Unicode encoding (e.g. UTF-8). If the database includes a lot of metadata, then structured formats such as SPSS, SAS, or Stata are suggested.
If you use proprietary software, it is important to specify which one.