There are two slightly different types of "storing" data:

  1. Storing data that is being actively used in a current project, including periodic back ups of said data – the back up storage serves as an insurance in case of data loss, as data can be restored/recovered from back up in such an event.
  2. Storing data by submitting it to a long-term archive in case it is needed again, but where no more work and maintenance is being done on it. The purpose of archiving is preservation. Funders and regulators may state that the data must be archived for a certain period of time. Data archive should contain all the metadata. Data archiving is vital to data sharing, discovery and dissemination (note: data should be archived whether or not they will be shared with others).

There are many repositories available, some very general and some quite discipline- or disease-specific. They may provide different levels of access and security, and you need to consider whether your data should be open, safeguarded or controlled. For instance, if you are looking for long-term storage of confidential data, you may prefer to use an archive that does not focus on increasing data visibility. However, even some open access repositories provide the option of embargoing data before publication. (Note: repositories may be also called “archives” or “data centres”.)

 

Version control should be part of data storage and maintenance – all changes to your data and code should be recorded. There are tools (version control systems) available that can automatically deal with this, creating a ‘history’ of all revisions, with timestamps.

It is important to ensure that you familiarise yourself with policies and legal aspects of storing data too: funders and your institution probably have a specific set of rules that you need to follow. If data has more than one owner, you also need to ensure that you have the necessary permissions before depositing the data. On top of that, if your research involves human subjects you have to take data security and sensitivity particularly seriously.

 

Particularly relevant keywords:

  • data archiving
  • data repository
  • reproducibility
  • data reusability
  • data security
  • data sharing
  • FAIR
  • metadata
  • open access