Skip to content

Instantly share code, notes, and snippets.

@danlurie
Forked from anonymous/bddb_dilemma.md
Last active December 9, 2016 00:04
Show Gist options
  • Save danlurie/ac5f45e55f5096eb5ef41247b353e139 to your computer and use it in GitHub Desktop.
Save danlurie/ac5f45e55f5096eb5ef41247b353e139 to your computer and use it in GitHub Desktop.

The Setup:

I have MRI data from a group of patients with brain damage (due to stroke, head injury, etc). For each patient, I have a binary MRI image (3D array) for each patient, where a value of 0 or 1 indicates the presence or absence of damage at a particular voxel (3D pixel). Each image is ~870k voxels.

The Goal:

Return a list of patients with damage at a given voxel.

Here's the stupid/inelegant way to do this:

voxel_id = 12345 # Index of the voxel I'm interested in
patient_matches = []
for patient in patient_list:
  patient_img_file = '/path/to/images/{}.MRI'.format(patient) # Get the path to the image for this patient.
  img = load_image(patient_img_file) # Load the image as a NumPy ndarray.
  img_flat = img.ravel() #Flatten to a 1D
  vox_val = img[voxel_id]
  if vox_val == 1:
    patient_matches.append(patient)

The Original Plan

  1. Load and flatten each image, so I have a 1x870k vector for each patient.
  2. Read these vectors into a DB (table='damage'), with 1 row per patient and 1 column per voxel.
  3. Query select patient_id from damage where vox12345 = 1

Other Possibilities

I quickly realized that it's not reasonable/possible to have almost 1 million columns in a table, so my next thought was to have a separate table for each voxel (which would also reduce the amount of data that is read into memory with each query).

Another solution would be to read all the images into a single huge 2D table (n_patients x n_voxels) and query it using something like Pandas/xarray/PyTables/HDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment