Skip to content

Instantly share code, notes, and snippets.

@joowkim
Created July 30, 2021 01:53
Show Gist options
  • Save joowkim/c50873ed9fb8f3b36f28b163d1ccbeb2 to your computer and use it in GitHub Desktop.
Save joowkim/c50873ed9fb8f3b36f28b163d1ccbeb2 to your computer and use it in GitHub Desktop.
Question about batch effect
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### I am going to make up the tech variable as a batch effect. For example, tech1 is measured by technician 1 and tech2 is measure by technician 2. And I am not interested in the effect of the tech variale, but would like to ignore or remove the effect of the tech variable. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"dataframe\">\n",
"<caption>A data.frame: 6 × 2</caption>\n",
"<thead>\n",
"\t<tr><th></th><th scope=col>weight</th><th scope=col>group</th></tr>\n",
"\t<tr><th></th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;fct&gt;</th></tr>\n",
"</thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>1</th><td>4.17</td><td>ctrl</td></tr>\n",
"\t<tr><th scope=row>2</th><td>5.58</td><td>ctrl</td></tr>\n",
"\t<tr><th scope=row>3</th><td>5.18</td><td>ctrl</td></tr>\n",
"\t<tr><th scope=row>4</th><td>6.11</td><td>ctrl</td></tr>\n",
"\t<tr><th scope=row>5</th><td>4.50</td><td>ctrl</td></tr>\n",
"\t<tr><th scope=row>6</th><td>4.61</td><td>ctrl</td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"A data.frame: 6 × 2\n",
"\\begin{tabular}{r|ll}\n",
" & weight & group\\\\\n",
" & <dbl> & <fct>\\\\\n",
"\\hline\n",
"\t1 & 4.17 & ctrl\\\\\n",
"\t2 & 5.58 & ctrl\\\\\n",
"\t3 & 5.18 & ctrl\\\\\n",
"\t4 & 6.11 & ctrl\\\\\n",
"\t5 & 4.50 & ctrl\\\\\n",
"\t6 & 4.61 & ctrl\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 2\n",
"\n",
"| <!--/--> | weight &lt;dbl&gt; | group &lt;fct&gt; |\n",
"|---|---|---|\n",
"| 1 | 4.17 | ctrl |\n",
"| 2 | 5.58 | ctrl |\n",
"| 3 | 5.18 | ctrl |\n",
"| 4 | 6.11 | ctrl |\n",
"| 5 | 4.50 | ctrl |\n",
"| 6 | 4.61 | ctrl |\n",
"\n"
],
"text/plain": [
" weight group\n",
"1 4.17 ctrl \n",
"2 5.58 ctrl \n",
"3 5.18 ctrl \n",
"4 6.11 ctrl \n",
"5 4.50 ctrl \n",
"6 4.61 ctrl "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(PlantGrowth)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"dataframe\">\n",
"<caption>A data.frame: 6 × 3</caption>\n",
"<thead>\n",
"\t<tr><th></th><th scope=col>weight</th><th scope=col>group</th><th scope=col>tech</th></tr>\n",
"\t<tr><th></th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;fct&gt;</th><th scope=col>&lt;fct&gt;</th></tr>\n",
"</thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>1</th><td>4.17</td><td>ctrl</td><td>tech1</td></tr>\n",
"\t<tr><th scope=row>2</th><td>5.58</td><td>ctrl</td><td>tech2</td></tr>\n",
"\t<tr><th scope=row>3</th><td>5.18</td><td>ctrl</td><td>tech1</td></tr>\n",
"\t<tr><th scope=row>4</th><td>6.11</td><td>ctrl</td><td>tech2</td></tr>\n",
"\t<tr><th scope=row>5</th><td>4.50</td><td>ctrl</td><td>tech1</td></tr>\n",
"\t<tr><th scope=row>6</th><td>4.61</td><td>ctrl</td><td>tech2</td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"A data.frame: 6 × 3\n",
"\\begin{tabular}{r|lll}\n",
" & weight & group & tech\\\\\n",
" & <dbl> & <fct> & <fct>\\\\\n",
"\\hline\n",
"\t1 & 4.17 & ctrl & tech1\\\\\n",
"\t2 & 5.58 & ctrl & tech2\\\\\n",
"\t3 & 5.18 & ctrl & tech1\\\\\n",
"\t4 & 6.11 & ctrl & tech2\\\\\n",
"\t5 & 4.50 & ctrl & tech1\\\\\n",
"\t6 & 4.61 & ctrl & tech2\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 3\n",
"\n",
"| <!--/--> | weight &lt;dbl&gt; | group &lt;fct&gt; | tech &lt;fct&gt; |\n",
"|---|---|---|---|\n",
"| 1 | 4.17 | ctrl | tech1 |\n",
"| 2 | 5.58 | ctrl | tech2 |\n",
"| 3 | 5.18 | ctrl | tech1 |\n",
"| 4 | 6.11 | ctrl | tech2 |\n",
"| 5 | 4.50 | ctrl | tech1 |\n",
"| 6 | 4.61 | ctrl | tech2 |\n",
"\n"
],
"text/plain": [
" weight group tech \n",
"1 4.17 ctrl tech1\n",
"2 5.58 ctrl tech2\n",
"3 5.18 ctrl tech1\n",
"4 6.11 ctrl tech2\n",
"5 4.50 ctrl tech1\n",
"6 4.61 ctrl tech2"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"tech <- rep(c('tech1', 'tech2'),nrow(PlantGrowth)/2)\n",
"PlantGrowth$tech <- as.factor(tech)\n",
"head(PlantGrowth)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'data.frame':\t30 obs. of 3 variables:\n",
" $ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...\n",
" $ group : Factor w/ 3 levels \"ctrl\",\"trt1\",..: 1 1 1 1 1 1 1 1 1 1 ...\n",
" $ tech : Factor w/ 2 levels \"tech1\",\"tech2\": 1 2 1 2 1 2 1 2 1 2 ...\n"
]
}
],
"source": [
"str(PlantGrowth)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"Call:\n",
"lm(formula = weight ~ group + tech, data = PlantGrowth)\n",
"\n",
"Residuals:\n",
" Min 1Q Median 3Q Max \n",
"-0.9710 -0.3765 -0.0330 0.2100 1.2600 \n",
"\n",
"Coefficients:\n",
" Estimate Std. Error t value Pr(>|t|) \n",
"(Intercept) 5.1410 0.2280 22.550 <2e-16 ***\n",
"grouptrt1 -0.3710 0.2792 -1.329 0.1955 \n",
"grouptrt2 0.4940 0.2792 1.769 0.0886 . \n",
"techtech2 -0.2180 0.2280 -0.956 0.3478 \n",
"---\n",
"Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n",
"\n",
"Residual standard error: 0.6244 on 26 degrees of freedom\n",
"Multiple R-squared: 0.2891,\tAdjusted R-squared: 0.2071 \n",
"F-statistic: 3.525 on 3 and 26 DF, p-value: 0.02881\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fit <- lm(weight ~ group + tech, data=PlantGrowth)\n",
"summary(fit)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>\n",
".dl-inline {width: auto; margin:0; padding: 0}\n",
".dl-inline>dt, .dl-inline>dd {float: none; width: auto; display: inline-block}\n",
".dl-inline>dt::after {content: \":\\0020\"; padding-right: .5ex}\n",
".dl-inline>dt:not(:first-of-type) {padding-left: .5ex}\n",
"</style><dl class=dl-inline><dt>(Intercept)</dt><dd>5.141</dd><dt>grouptrt1</dt><dd>-0.371</dd><dt>grouptrt2</dt><dd>0.494</dd><dt>techtech2</dt><dd>-0.218</dd></dl>\n"
],
"text/latex": [
"\\begin{description*}\n",
"\\item[(Intercept)] 5.141\n",
"\\item[grouptrt1] -0.371\n",
"\\item[grouptrt2] 0.494\n",
"\\item[techtech2] -0.218\n",
"\\end{description*}\n"
],
"text/markdown": [
"(Intercept)\n",
": 5.141grouptrt1\n",
": -0.371grouptrt2\n",
": 0.494techtech2\n",
": -0.218\n",
"\n"
],
"text/plain": [
"(Intercept) grouptrt1 grouptrt2 techtech2 \n",
" 5.141 -0.371 0.494 -0.218 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"coefficients(fit)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
" group emmean SE df lower.CL upper.CL\n",
" ctrl 5.03 0.197 26 4.63 5.44\n",
" trt1 4.66 0.197 26 4.26 5.07\n",
" trt2 5.53 0.197 26 5.12 5.93\n",
"\n",
"Results are averaged over the levels of: tech \n",
"Confidence level used: 0.95 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"emmeans(fit, \"group\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### My question is, if I knew the coefficient of tech1, then ignoring both of tech(1,2) coefficients would be a way of estimating the effect of group? \n",
"\n",
"#### Is there a nice way of removing the unwanted effect? "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.0.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment