Skip to content

Instantly share code, notes, and snippets.

@aboSamoor
Created May 23, 2014 18:36
Show Gist options
  • Save aboSamoor/fb015e88c5a0152abd76 to your computer and use it in GitHub Desktop.
Save aboSamoor/fb015e88c5a0152abd76 to your computer and use it in GitHub Desktop.
Polynomial Features
{
"metadata": {
"name": "",
"signature": "sha256:4ec1ed51b2e0cc9a893c25d8fb0a4c94dd1bc1701b1d23736d16d4885147671b"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import itertools\n",
"from time import time\n",
"from itertools import combinations\n",
"import math"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Current Approach"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def _power_matrix(n_features, degree, include_bias):\n",
" \"\"\"Compute the matrix of polynomial powers\"\"\"\n",
" # Find permutations/combinations which add to degree or less\n",
" deg_min = 0 if include_bias else 1\n",
" powers = itertools.product(*(range(degree + 1)\n",
" for i in range(n_features)))\n",
" powers = np.array([c for c in powers if deg_min <= sum(c) <= degree])\n",
"\n",
" # sort so that the order of the powers makes sense\n",
" i = np.lexsort(np.vstack([powers.T, powers.sum(1)]))\n",
" return powers[i]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## New Approach\n",
"\n",
"Instead of iterating an exponential number of alternatives and then prune them by removing the wrong ones, we can use counting to iterate only over the right combinations.\n",
"\n",
"This problem could be mapped to **Indistinguishable Objects to Distinguishable Boxes**.\n",
"\n",
"For more details, check this [tutorial](http://2000clicks.com/mathhelp/CountingObjectsInBoxes.aspx#UnlabLab) or check the end of the notebook."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def nCr(n,r):\n",
" f = math.factorial\n",
" return f(n) / f(r) / f(n-r)\n",
"\n",
"def __power_matrix(n_features, degree):\n",
" k = n_features\n",
" balls = degree\n",
" bars = k - 1\n",
" total = balls + bars\n",
" rows = nCr(total, bars)\n",
" combs = np.asarray(list(combinations(xrange(total), bars)))\n",
" last_column = total*np.ones((combs.shape[0], 1))\n",
" powers_ = np.hstack((combs, last_column))\n",
" powers = np.zeros_like(powers_)\n",
" powers[:, 0] = powers_[:, 0]\n",
" for j in xrange(1, powers_.shape[1]):\n",
" powers[:, j] = powers_[:, j] - powers_[:, j-1] - 1\n",
" return powers"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def _power_matrix2(n_features, degree, include_bias):\n",
" \"\"\"Compute the matrix of polynomial powers\"\"\"\n",
" # Find permutations/combinations which add to degree or less\n",
" deg_min = 0 if include_bias else 1\n",
" powers = []\n",
" for n in range(deg_min, degree + 1):\n",
" powers_ = __power_matrix(n_features, n)\n",
" powers.extend(powers_)\n",
" powers = np.asarray(powers)\n",
" # sort so that the order of the powers makes sense\n",
" i = np.lexsort(np.vstack([powers.T, powers.sum(1)]))\n",
" return powers[i]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unit Test"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(\"{:<10}{:<15}{:<15}\".format(\"Test\", \"Old (Sec)\", \"New (Sec)\"))\n",
"print\n",
"for n_f in [3, 4, 5, 6, 7, 8]:\n",
" for d in [2, 3, 4, 5, 6]:\n",
" for bias in [True, False]:\n",
" t0 = time()\n",
" p1 = _power_matrix(n_f, d, bias)\n",
" t1 = time() - t0\n",
" p2 = _power_matrix2(n_f, d, bias)\n",
" t2 = time() - t0 - t1\n",
" if (p1 == p2).all():\n",
" print \"{:<10}{:<15.4f}{:<15.4f}\".format(\"PASS\", t1, t2)\n",
" else:\n",
" print \"BAD\""
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Test Old (Sec) New (Sec) \n",
"\n",
"PASS 0.0005 0.0009 \n",
"PASS 0.0002 0.0005 \n",
"PASS 0.0002 0.0009 \n",
"PASS 0.0003 0.0007 \n",
"PASS 0.0002 0.0010 \n",
"PASS 0.0002 0.0008 \n",
"PASS 0.0003 0.0012 \n",
"PASS 0.0003 0.0010 \n",
"PASS 0.0004 0.0014 \n",
"PASS 0.0004 0.0012 \n",
"PASS 0.0002 0.0007 \n",
"PASS 0.0002 0.0005 \n",
"PASS 0.0004 0.0009 \n",
"PASS 0.0003 0.0008 \n",
"PASS 0.0006 0.0013 \n",
"PASS 0.0006 0.0011 \n",
"PASS 0.0011 0.0017 \n",
"PASS 0.0011 0.0014 \n",
"PASS 0.0019 0.0021 \n",
"PASS 0.0018 0.0019 \n",
"PASS 0.0003 0.0010 \n",
"PASS 0.0003 0.0006 \n",
"PASS 0.0008 0.0011 \n",
"PASS 0.0009 0.0009 \n",
"PASS 0.0022 0.0017 \n",
"PASS 0.0022 0.0015 \n",
"PASS 0.0055 0.0024 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0051 0.0022 \n",
"PASS 0.0107 0.0033 \n",
"PASS 0.0106 0.0032 \n",
"PASS 0.0006 0.0009 \n",
"PASS 0.0006 0.0007 \n",
"PASS 0.0028 0.0014 \n",
"PASS 0.0027 0.0012 \n",
"PASS 0.0099 0.0022 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0098 0.0020 \n",
"PASS 0.0287 0.0035 \n",
"PASS 0.0295 0.0037 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0713 0.0057 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0734 0.0055 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0016 0.0011 \n",
"PASS 0.0016 0.0008 \n",
"PASS 0.0105 0.0018 \n",
"PASS 0.0104 0.0015 \n",
"PASS 0.0513 0.0030 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0489 0.0027 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.1056 0.0026 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0822 0.0025 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.2428 0.0049 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.2409 0.0047 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.0022 0.0006 \n",
"PASS 0.0021 0.0004 \n",
"PASS 0.0202 0.0010 \n",
"PASS 0.0201 0.0009 \n",
"PASS 0.1677 0.0043 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.2466 0.0039 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.7059 0.0047 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 0.5160 0.0041 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 1.7403 0.0099 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"PASS 1.7984 0.0087 "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Embedded Tutorial"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"\n",
"t = \"\"\"\n",
"<h3><b><a name=\"UnlabLab\"></a>Indistinguishable Objects to Distinguishable Boxes</b></h3>\n",
"<p>The number of different ways to distribute n indistinguishable balls into k\n",
"distinguishable boxes is C(n+k-1,k-1).</p>\n",
"<p>For example, 5 balls into 3 boxes can be done in these C(7,2) = 21 ways:</p>\n",
"<blockquote>\n",
"\t<table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" bordercolordark=\"#003333\" bordercolorlight=\"#006666\">\n",
"<tbody><tr><td align=\"center\"><b>&nbsp;Box 1&nbsp;</b></td><td align=\"center\"><b>&nbsp;Box 2&nbsp;</b></td><td align=\"center\"><b>&nbsp;Box 3&nbsp;</b></td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaaa&nbsp;</td></tr>\n",
"</tbody></table>\n",
"</blockquote>\n",
"<p>The easiest way to understand this formula is to consider the permutations of\n",
"the string XXaaaaa -- in each permutation, the 2 X's divide the a's into one of\n",
"three boxes.&nbsp; From the first section, on permutations of sets with some\n",
"indistinguishable objects, we know that the number of distinct permutations of\n",
"XXaaaaa is C(7,2).&nbsp; In general, the number of permutations of a string of n\n",
"a's and k-1 X's is C(n+k-1,k-1), and each of these permutations corresponds to\n",
"one of the ways you can place n indistinguishable objects into k labeled boxes.</p>\n",
"<p>If you're told that the boxes can't be empty, then this is the same as\n",
"putting one ball in each of k boxes first (that part is forced, so there's one\n",
"way to do it), and then it's the number of ways to distribute n-k identical\n",
"balls into k labeled boxes, or C(n-1,k-1)</p>\n",
"\"\"\"\n",
"h = HTML(t); h"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"<h3><b><a name=\"UnlabLab\"></a>Indistinguishable Objects to Distinguishable Boxes</b></h3>\n",
"<p>The number of different ways to distribute n indistinguishable balls into k\n",
"distinguishable boxes is C(n+k-1,k-1).</p>\n",
"<p>For example, 5 balls into 3 boxes can be done in these C(7,2) = 21 ways:</p>\n",
"<blockquote>\n",
"\t<table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" bordercolordark=\"#003333\" bordercolorlight=\"#006666\">\n",
"<tbody><tr><td align=\"center\"><b>&nbsp;Box 1&nbsp;</b></td><td align=\"center\"><b>&nbsp;Box 2&nbsp;</b></td><td align=\"center\"><b>&nbsp;Box 3&nbsp;</b></td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaaa&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aa&nbsp;</td><td align=\"center\">&nbsp;aaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;a&nbsp;</td><td align=\"center\">&nbsp;aaaa&nbsp;</td></tr>\n",
"<tr><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;&nbsp;</td><td align=\"center\">&nbsp;aaaaa&nbsp;</td></tr>\n",
"</tbody></table>\n",
"</blockquote>\n",
"<p>The easiest way to understand this formula is to consider the permutations of\n",
"the string XXaaaaa -- in each permutation, the 2 X's divide the a's into one of\n",
"three boxes.&nbsp; From the first section, on permutations of sets with some\n",
"indistinguishable objects, we know that the number of distinct permutations of\n",
"XXaaaaa is C(7,2).&nbsp; In general, the number of permutations of a string of n\n",
"a's and k-1 X's is C(n+k-1,k-1), and each of these permutations corresponds to\n",
"one of the ways you can place n indistinguishable objects into k labeled boxes.</p>\n",
"<p>If you're told that the boxes can't be empty, then this is the same as\n",
"putting one ball in each of k boxes first (that part is forced, so there's one\n",
"way to do it), and then it's the number of ways to distribute n-k identical\n",
"balls into k labeled boxes, or C(n-1,k-1)</p>\n"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"<IPython.core.display.HTML at 0x7fb832dde690>"
]
}
],
"prompt_number": 6
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment