Organizing the data: Difference between revisions

From RedwoodCenter
Jump to navigationJump to search
No edit summary
No edit summary
 
Line 1: Line 1:
[http://redwood.berkeley.edu/w/index.php?title=Organizing_the_data&printable=yes Printable view]
[http://redwood.berkeley.edu/w/index.php?title=Organizing_the_data&printable=yes Printable view]
Back to main page for [[Instructions for data contributors]].
Back to main page for [[Instructions for data contributors]].



Latest revision as of 22:37, 6 December 2007

Printable view

Back to main page for Instructions for data contributors.

General guideline

Before uploading the data, try to organize and document it so that it will be useful to the wide spectrum of people who could potentially be interested in your data. Think of someone highly interested and reasonably intelligent but unfamiliar with your experiments and data. Please try to make it as easy as possible for that person, by including not only what is necessary, but also material which would be helpful.

To help organize your data, we suggest the following guidelines. Keep in mind that this is a first-time experience for us, as well. We welcome any suggestions regarding these guidelines.

Describe methods and other meta information

Include all documentation necessary to understand and use (analyze) the data. Give at least the information that would go into a method section, for example, include (if applicable):

  • Description of experimental conditions/experimental paradigms etc.
  • Species, age of the animal, etc.
  • Surgical procedures
  • Information about recording technique (electrode type, clamp method, etc)
  • Locations of recording electrodes
  • Information of recorded cells, such as cell anatomies, cell type, laminar position
  • Tools/procedures used to process the data (e.g., spike sorting)
  • Information on how stimuli were generated.
  • Information on how the timing of the stimuli is correlated with the recordings.
  • For cells in the visual system, the spatial relationship between stimuli and the recorded cells.

Provide necessary software

If there is custom software used on a standard basis to handle the data, if possible, provide it also, to help those analyzing the data.

Include Stimuli

If possible, include the files or other information necessary for generating the stimuli.

Format and organization of data

Initially, use whatever data format you think to be most useful, and which you can provide easily. For example, if all of your data is in Matlab format, and that is a format you think is useful, provide it in that format. (e.g. Don't worry about converting it to another format, such as HDF5). However, whatever data format you use, it has to be carefully described, in particular, any nonstandard conventions that you used, etc.

If you already have a transparent system to organize your data, just use it for the data to be uploaded. However, if your data are stored in a scheme that is hard to understand, please try to reorganize the data. For example, if there are lots of files, place related files into separate directories. If helpful, use subdirectories to arrange the files hierarchically.

Choose names for files and directories that are somewhat descriptive of the contents, but reasonably short. Feel free to use intuitive abbreviations. Example: exp23 (could be the name of a directory containing results for experiment number 23). "docs" might be a directory containing documentation. "tools" might be a directory containing programs used to analyze the data.

Add explanatory files

To help to get oriented with the content that you submit it is important to have some generically named files providing guidance. These files should be in plain ASCII format and should contain brief summaries of the essential information necessary to understand the data. These could be excerpts from previously written documents. Don't worry if these files contain information that duplicates what you have written in other descriptions that you provide. The reason we ask you for these separate files containing excerpts is that you know best how to summarize the essential information. In addition, these files will also help to prepare the conversion to HDF5 (that has options for descriptor files). If possible, please provide the following files:

Top-level README.txt

At the top level directory, include a file named "README.txt". This is the most important file in the data for orienting the users. Think of it as the global roadmap describing the experiment and how the different files and directories represent it and how they are related. For example, the road map should contain:

  1. Brief summary of the gist of the experiments and the data.
  2. Pointers to files with more detailed explanatory material in this or lower-level directories (see below).
  3. List of directories and important files with short descriptions of their contents.
  4. A HOW TO GET STARTED section, which summarizes how you would recommend someone get started looking into the data.

README.txt files in subdirectories

As appropriate, include additional documentation files in subdirectories, describing the contents of the directory. If possible, name it "README.txt", because this makes it easy to find. If the name is not "README.txt", there should be a pointer to these files in a higher-level README.txt file. Any pointer to a README.txt file should specify the subdirectory containing the file (to distinguish between different README.txt files).

Add file specifying usage conditions

CONDITIONS.txt

Please create a file CONDITIONS.txt in the top-level directory. This file should specify the ground rules for publishing results emanating from your data. For example, do you require being consulted before publication? What form of acknowledgment is mandatory? Would you require co-authorship under certain circumstances? Are there other conditions of usage for your data?

RESTRICTIONS.txt (optional)

If there are parts of your data that are sensitive, and which you want to protect, create RESTRICTIONS.txt in the top-level directory. This file should specify the data files to be protected. Protection means that the data can only be accessed after passing a certain kind of background check, for example, verification of a university affiliation. If it's easy to do, also put all sensitive information into a single directory and indicate that in the RESTRICTIONS.txt file.

Hard disk shipping (for large data sets)

If the volume of the data which you plan to contribute is more than 50GB, please let us know. We should then discuss whether or not to arrange to have a hard drive shipped to you. On our current server connection the upload time for 50GB would be around 8 hours.

Review and approve presentation of your data

Once we have received your data and incorporated it into the repository, we will contact you so you can review it. Your contribution will only be made available to others after your approval.