-
-
Save lwaldron/35e2b6c1024b62f9466357229d98fc0d to your computer and use it in GitHub Desktop.
library(curatedMetagenomicData) | |
library(dplyr) | |
agecats <- unique(sampleMetadata$age_category) |> na.omit() | |
sm <- filter(sampleMetadata, study_condition=="control") |> | |
filter(disease == "healthy") |> | |
filter(body_site == "stool") |> | |
filter(!is.na(age_category)) | |
for (agecat in agecats){ | |
sm1 <- filter(sm, age_category == agecat) | |
se <- returnSamples(sm1, dataType = "relative_abundance", rownames = "NCBI") | |
write.csv(t(assay(se)), file=paste0(agecat, "_relab.csv")) | |
write.csv(colData(se), file=paste0(agecat, "_samplemetadata.csv")) | |
} |
Gist provides a relative abundance file with NCBI IDs in columns and observations in rows, and a corresponding metadata file for stool specimens from healthy control participants. I divided the files into age categories, since they'll have somewhat different properties:
$ wc -l *relab.csv
8983 adult_relab.csv
821 child_relab.csv
2328 newborn_relab.csv
229 schoolage_relab.csv
835 senior_relab.csv
13196 total
Note that the relative abundances won't always add up quite to 100% because some species that could not be mapped to the phylogeny were dropped, but these are rare and low abundance. Note also that there are an additional 1,301 control samples from body sites other than stool which are not included here, but available if you want them. And finally, we'll be re-running these and some (possibly tens of) thousands more specimens through MetaPhlAn4, which will add a large number of Species Genome Bins, putative species based on high-quality metagenome assemblies, that have not yet been isolated or named (or assigned NCBI identifiers).
Ran on May 3, 2024 on superstudio. Results at 2024-05-04_fornash.zip