I have MRI data from a group of patients with brain damage (due to stroke, head injury, etc). For each patient, I have a binary MRI image (3D array) for each patient, where a value of 0 or 1 indicates the presence or absence of damage at a particular voxel (3D pixel). Each image is ~870k voxels.
Return a list of patients with damage at a given voxel.
Here's the stupid/inelegant way to do this:
voxel_id = 12345 # Index of the voxel I'm interested in
patient_matches = []
for patient in patient_list:
patient_img_file = '/path/to/images/{}.MRI'.format(patient) # Get the path to the image for this patient.
img = load_image(patient_img_file) # Load the image as a NumPy ndarray.
img_flat = img.ravel() #Flatten to a 1D
vox_val = img[voxel_id]
if vox_val == 1:
patient_matches.append(patient)
- Load and flatten each image, so I have a 1x870k vector for each patient.
- Read these vectors into a DB (
table='damage'
), with 1 row per patient and 1 column per voxel. - Query
select patient_id from damage where vox12345 = 1
So, in bit packing, you're going to have 870k/8 or ~100KB/patient? How many patients do you have? (10's hundreds thousands?)
I think this is tractable just treating it as binary data, either mmaping or not. 1000 patients is under 1GB, so pop it in memory, do some lookups, and you're there. If you're doing more than a handful of searches, I'd preprocess your flat images, the read them in and run a bunch of searches.
You could even throw them into a numpy array, shape = (feature, patients) of bool, and just use index projection to get the vector of matches. That's going to scale well till you run out of memory.