Skip to content

DICOM organization

Objective


This is a dataset specific process and needs to be customized based on local scanner DICOM dumps and file naming. This organization should produce, for a given session, participant specific dicom dirs. Each of these participant-dir contains a flat list of dicoms for the participant for all available imaging modalities and scan protocols. The manifest is used to determine which new subject-session pairs need to be processed, and a doughnut.csv file is used to track the status for the DICOM reorganization and BIDS conversion steps.


Key directories and files

  • <DATASET_ROOT>/tabular/manifest.csv
  • <DATASET_ROOT>/downloads
  • <DATASET_ROOT>/scratch/raw_dicoms
  • <DATASET_ROOT>/scratch/raw_dicoms/doughnut.csv
  • <DATASET_ROOT>/dicom

Procedure

  1. Run workflow/make_doughnut.py to update doughnut.csv based on the manifest. It will add new rows for any subject-session pair not already in the file.
    • To create the doughnut.csv for the first time, use the --empty argument. If processing has been done without updating doughnut.csv, use --regenerate to update it based on new files in the dataset.

Note

The doughnut.csv file is used to track the multi-step conversion of raw DICOMs to BIDS: whether raw DICOMs have been downloaded to disk, re-organized into a directory structure accepted by HeuDiConv, and converted to BIDS. This file is updated automatically by scripts in workflow/dicom_org and workflow/bids_conv. Backups are created in case it is needed to revert to a previous version: they can be found in <DATASET_ROOT>/scratch/raw_dicoms/.doughnuts.

Here is a sample doughnut.csv file:

participant_id session participant_dicom_dir dicom_id bids_id downloaded organized converted
001 ses-01 MyStudy_001_2021 sub-001 sub-001 True True True
001 ses-02 MyStudy_001_2022 sub-001 sub-001 True False False
002 ses-01 MyStudy_002_2021 sub-002 sub-002 True True False
002 ses-03 MyStudy_002_2024 sub-002 sub-002 False False False
  1. Download DICOM dumps (e.g. ZIPs / tarballs) in the <DATASET_ROOT>/downloads directory. Different visits (i.e. sessions) must be downloaded in separate sub-directories and ideally named as listed in the global_config.json. The DICOM download and extraction process is highly dataset-dependent, and we recommend using custom scripts to automate it as much as possible.
  2. Extract (and rename if needed) all participants into <DATASET_ROOT>/scratch/raw_dicoms separately for each visit (i.e. session).
    • At this point, the doughnut.csv should have been updated to reflect the new downloads (downloaded column set to True where appropriate). We recommend doing this in the download script (i.e. in Step 2), but workflow/make_doughnut.py can also be run with the --regenerate flag to search for the expected files (this can be very slow!).

Note

IMPORTANT: the participant-level directory names should match participant_ids in the manifest.csv. It is recommended to use participant_id naming format to exclude any non-alphanumeric chacaters (e.g. "-" or "_"). If your participant_id does contain these characters, it is still recommended to remove them from the participant-level DICOM directory names (e.g., QPN_001 --> QPN001).

Note

It is okay for the participant directory to have messy internal subdir tree with DICOMs from multiple modalities. (See data org schematic for details). The run script will search and validate all available DICOM files automatically.

  1. Run run_dicom_org.py to:
    • Search: Find all the DICOMs inside the participant directory.
    • Validate: Excludes certain individual dicom files that are invalid or contain scanner-derived data not compatible with BIDS conversion.
    • Symlink (default) or copy: Creates symlinks from raw_dicoms/ to the <DATASET_ROOT>/dicom, where all participant specific dicoms are in a flat list. The symlinks are relative so that they are preserved in containers.
    • Update status: if successful, set the organized column to True in doughnut.csv.

Sample cmd:

python run_dicom_org.py \
    --global_config <global_config_file> \
    --session_id <session_id> \