Last active
June 8, 2023 13:05
-
-
Save KaiAragaki/0f9c122db6f0416dfb3f5431bdd3b50e to your computer and use it in GitHub Desktop.
My workflow
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Almost all - if not all - of my new R projects are targets projects. | |
I'm usually bouncing between computers and occasionally have a collaborator, so targets makes it super super simple to spin up | |
my analyses at another computer. | |
Nowadays, I don't just use a targets project, but I like having multiple targets projects within one targets project. This | |
keeps the pipelines from getting to unweildly, at which point I usually get pretty overwhelmed. | |
(From here on, assume that when I say 'project' I mean a targets project, not an RStudio project - all of this happens in a | |
single RStudio project) | |
The first project I create is called 'common' and contains anything that could need to be accessed by other projects. | |
This keeps my hierarchies flat. In my _targets.yaml, name all my targets stores as 'store_proj-name'. So for common, | |
the store is in "store_common". I put the targets pipeline in ./R/targets/common.R and the functions it uses in | |
./R/functions/common.R. In this 'common' pipeline, I usually set up folder structures - I usually have one called 01_data, | |
where data gets downloaded to, and one called 02_figs, where figures get saved. I also create individual subdirectories | |
for each project's figures (eg 02_figs/pcr - more on that below) | |
The next projects I create are usually siloed by the kind of information they contain. I work in a wet lab, | |
so my information kind of naturally separates itself by experiment type. I might call my next project "pcr". | |
This again gets a "store_pcr" targets entry line in the yaml, as well as ".R/targets/pcr.R" line to point to its pipeline | |
location. Finally, any functions it uses is stored in "./R/functions/pcr.R". | |
For me, each target usually means a single figure. I used to break up my targets quite a bit more, but I didn't really use | |
the upstream targets so much and wasn't saving a whole lot of time caching my results from the upstream targets. The reslts | |
of these get saved to 02_figs, to their own subdirectory (eg 02_figs/01_pcr/my-fig.png). | |
These 'figure targets' have a general skeleton like so: | |
tar_file( | |
my_figure, | |
make_my_figure("fig-name.png", pcr_plot_dir) | |
) | |
(pcr_plot_dir usually comes from something at the top of the script like: | |
pcr_plot_dir <- tar_read("pcr_plot_dir", store = "store_common")) | |
And the 'make_my_figure' function looks like this: | |
make_my_figure <- function(filename, pcr) { | |
# Oftentimes I download the data inside this function and use it immediately | |
# I know that this is an affront to functional programming and targets in general | |
# I don't care! | |
# But sometimes if I'm using data that will be used across multiple targets, I'll | |
# download it to, say, 01_data/01_pcr/my-data.csv and use it as an input for this target | |
# Oftentimes, though, individual data files don't really make 'sense' as targets. I'll | |
# usually forgo functional purity for semantic clarity. | |
my_data <- downloading_stuff("path/to/cloud/storage") | |
plot <- my_data |> | |
making() |> | |
my() |> | |
ggplot() | |
out <- fs::path(out_dir, paste0(filename, ".png")) | |
ggsave(out, plot, units = "in", width = 3.5, height = 3.5, dpi = 500) | |
} | |
Note how the target name is "my_figure" and the function name is "make_my_figure". This is a common pattern I follow. | |
Additionally, I like to have a kind of 'staging area' file at top level called scratch.R. | |
This isn't part of the targets pipeline - this is a testbed for making figures and stuff before | |
I put them in the pipeline, which allows for rapid and piecemeal development. | |
Other things: | |
I like to use the `conflicted` package and include a (RStudio) project level .Rprofile that determines 'winners': | |
conflicted::conflict_prefer("select", "dplyr") | |
conflicted::conflict_prefer("filter", "dplyr") | |
conflicted::conflict_prefer("rename", "dplyr") | |
conflicted::conflict_prefer("path", "fs") | |
I .gitignore anything that is created by the targets pipelines - like 01_data/, 02_figs/. | |
Example _targets.yaml: | |
human_seq: | |
script: R/targets/human_seq.R | |
store: store_human_seq | |
common: | |
script: R/targets/common.R | |
store: store_common | |
wb: | |
script: R/targets/wb.R | |
store: store_wb | |
pcr: | |
script: R/targets/pcr.R | |
store: store_pcr | |
tw: | |
script: R/targets/tw.R | |
store: store_tw | |
cell_seq: | |
script: R/targets/cell_seq.R | |
store: store_cell_seq |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment