Organize Your Data

Principles

  1. Plan. Think in advance about key issues that will affect your research data. What types of data will be generated? How much data will be collected? What data do you need to retain long term? Consider creating a data inventory to understand and track your data.
  2. Choose appropriate file formats. File formats for long term access are:
    • Non-proprietary
    • Open, documented standard
    • In common usage by research community
    • Use standard character encoding (ASCII, UTF-8)
  3. Name your files well:
    • Be consistent (always use same information and order of information)
    • Use unique identifiers (e.g. acronym for project)
    • Do not use spaces or special characters (\ / : * ? ” < > |)
    • When using dates follow the Date and Time Formats (W3C-DTF) standard (YYYYMMDD[hh][mm][ss])
    • To keep track of updated versions, use sequential numbering (v1, v2, etc.) rather than words, such as “Final.”
  4. Separate ongoing and completed work. Before you amass lots of folders and files, it may be useful to separate your original data from that you are currently working on, and also to differentiate between ongoing and completed work.  Create a copy of your original data and put in a folder named something like “Original.” Make multiple back ups in multiple locations.
  5. Be selective. Decide whether/when it is appropriate to delete digital materials and data, based upon standards of your discipline and guidelines of your funding agency.  Plan this with your colleagues.
  6. Describe your data: Create a data dictionary with a detailed description of your data set or data model. Use community based standards when possible; here is a short list by discipline.  Include the data collection methods, variable names, codes, algorithms, file formats and software versions, structure of the data files, sources, quality control or related issues, transformations and any issues regarding privacy or confidentiality and use/re-use.

Tools and Methods

File Renaming

Workflow Management

Versioning

  • GitHub
  • Subversion (supported by Rice IT)
  • Cloud services such as Box often provide some level of versioning

Services

The Research Data Management Team can recommend best practices for organizing and naming files, help you develop and implement a plan for managing data, and assist with developing a framework for data documentation.