Skip to content

Instantly share code, notes, and snippets.

@kevinrpb
Last active November 3, 2021 08:57
Show Gist options
  • Save kevinrpb/97ea7bd35ac3f465ab87bdcf1f716d87 to your computer and use it in GitHub Desktop.
Save kevinrpb/97ea7bd35ac3f465ab87bdcf1f716d87 to your computer and use it in GitHub Desktop.
Image Into Columns Algorithms

Images Into Columns Algorithms

This experiment surged from this tweet by @sindresorhus:

What's the simplest algorithm for splitting a (unordered) set of integers into n-2 even sub-sets where the sum of the elements in each sub-set have the minimum difference? Even better if someone has an example in Swift or JS. All the algorithms I found care about order.

— Sindre Sorhus (@sindresorhus) November 2, 2021

Basically, we want to put images into columns in such a way that the height of the columns is as similar as possible. As some people in the thread mentioned, this can be referred to as Greedy Number Partitioning, and several approximations exist.

In particular, I took three answers with suggestions and run an experiment to see how they differed.

Naive idea:

Get total height of all the images and calculate Total/n (you get what ideally each column should take).

Then iterate over the list until you get within a threshold of the ideal.

— Kevs 🦄 (@kevinrpb) November 2, 2021

They're likely to be similar in size (no order of magnitude diff) so all you need is a load balancer. Add the next image to the smallest-sum set.
Should be good enough.
Do you need perfect? Probably not.

— Naugtur (@naugtur) November 2, 2021

Haha funny, @christianhg had this exact same use case last week. My recommendation:
1. Sort list, descending
2. Create your empty subsets
2. Iterate over items, always adding them to the current smallest subset.

— jeppeReinhold 🇩🇰 (@DrReinhold) November 2, 2021

Here, I took the difference in height between the tallest and shortest generated columns for each algorithm, as well as the time it took to run the algorithm. I ran the algorithms for 1.000.000 lists of 1.000 integers each and calculated avg and std metrics.

Tweet avg_diff (px) std_diff (px) avg_time (s) std_time (s)
@kevinrpb 1551.5493 1172.2171 0.3626 34.7693
@naugtur 775.9586 585.9650 0.3726 39.1062
@DrReinhold 3.0816 3.3509 0.6445 33.3858

(this experiment took 28m to run in my old-ish machine and needs about 8GB of memory)

/// MARK: Util
// Normal Distribution With Min, Max, Skew. Taken from
// https://spin.atomicobject.com/2019/09/30/skew-normal-prng-javascript/
function normalRandom(mean = 0, stdev = 1, skewness = 0) {
// First, Box-Muller transform
let u1 = 0, u2 = 0
while(u1 === 0) u1 = Math.random() //Converting [0,1) to (0,1)
while(u2 === 0) u2 = Math.random()
const R = Math.sqrt(-2.0 * Math.log(u1))
const O = 2.0 * Math.PI * u2;
const v1 = R * Math.cos(O)
const v2 = R * Math.sin(O)
// Next, skew-normal transform
if (skewness === 0) {
return mean + stdev * v1
}
const d = skewness / Math.sqrt(1 + skewness * skewness);
const v3 = d * v1 + Math.sqrt(1 - d * d) * v2;
const z = v1 >= 0 ? v3 : -v3;
return mean + stdev * z;
}
// Is it really "normal random"? Probably not
function positiveNormalRandom(mean, stdev, skewness = 0) {
let n = normalRandom(mean, stdev, skewness)
while (n < 0) { n = normalRandom(mean, stdev, skewness) }
return n
}
// Sums the numbers in a list... I kinda miss Python here
function sumList(list) {
return list.reduce((s, n) => s + n, 0)
}
// Yeah, really missing math functions
function avgList(list) {
return sumList(list) / list.length
}
// I hate this
function stdList(list) {
const mean = avgList(list)
return Math.sqrt(
list.map(x => Math.pow(x - mean, 2))
.reduce((a, b) => a + b) / list.length
)
}
// Generates a list of size n with random numbers with a distribution
// ~ N(1000, 800)
function getList(n = 100) {
return [...Array(n).keys()]
.map(() => Math.round(positiveNormalRandom(1000, 800)))
}
// Sort a list
function sortListDesc(list) {
let newList = list
newList.sort((a, b) => b - a)
return newList
}
/// MARK: Algorithms
// https://twitter.com/kevinrpb/status/1455482116486086659
//
// Naive idea:
//
// Get total height of all the images and calculate Total/n (you get
// what ideally each column should take).
//
// Then iterate over the list until you get within a threshold of the ideal.
function kevinrpb(list, nColumns) {
const targetHeight = sumList(list) / nColumns
let columns = [...Array(nColumns).keys()].map(() => [])
let currentColumn = 0
let currentHeight = 0
for (let i = 0; i < list.length; i++) {
const nextImage = list[i]
if (currentHeight + nextImage < targetHeight) {
// We still have space in this column to put this image
currentHeight += nextImage
columns[currentColumn].push(nextImage)
} else if (currentColumn + 1 < nColumns) {
// The current image would go over the limit for this column,
// but we still got more columns to put it in
currentColumn += 1
currentHeight = nextImage
columns[currentColumn].push(nextImage)
} else {
// The image wouldn't fit but we have no more columns..
columns[currentColumn].push(nextImage)
}
}
return columns
}
// https://twitter.com/naugtur/status/1455515761062170627
//
// They're likely to be similar in size (no order of magnitude diff)
// so all you need is a load balancer. Add the next image to the
// smallest-sum set.
// Should be good enough.
// Do you need perfect? Probably not.
function naugtur(list, nColumns) {
let columns = [...Array(nColumns).keys()].map(() => [])
let heights = [...Array(nColumns).keys()].map(() => 0)
function getShortestColumn() {
let shortestIndex = 0
for (let i = 1; i < columns.length; i++) {
let height = heights[i]
if (height < heights[shortestIndex])
shortestIndex = i
}
return shortestIndex
}
for (let i = 0; i < list.length; i++) {
let shortestColumn = getShortestColumn()
heights[shortestColumn] += list[i]
columns[shortestColumn].push(list[i])
}
return columns
}
// https://twitter.com/DrReinhold/status/1455466203032526853
//
// Haha funny, @christianhg had this exact same use case last week.
// My recommendation:
// 1. Sort list, descending
// 2. Create your empty subsets
// 2. Iterate over items, always adding them to the current smallest subset.
function DrReinhold(list, nColumns) {
const sortedList = sortListDesc(list)
return naugtur(sortedList, nColumns) // basically the same algo w/o sorting
}
/// MARK: Benchmark
// Runs the algorithm over the list and returns the difference between
// the tallest column and the shortest
function benchmark(list, algorithm, nColumns = 2) {
const start = performance.now()
const columns = algorithm(list, nColumns)
const end = performance.now()
let shortest = undefined
let tallest = undefined
for (let column of columns) {
const height = sumList(column)
if (shortest === undefined || height < shortest)
shortest = height
if (tallest === undefined || height > tallest)
tallest = height
}
return {
'diff': tallest - shortest,
'time': end - start
}
}
function benchmarks(lists, algorithm, nColumns = 2) {
const results = lists
.map((list) => benchmark(list, algorithm, nColumns))
const diffs = results.map((result) => result['diff'])
const avgDiff = avgList(diffs)
const stdDiff = stdList(diffs)
const times = results.map((result) => result['time'])
const avgTime = avgList(times)
const stdTime = stdList(times)
return { avgDiff, stdDiff, avgTime, stdTime }
}
/// MARK: Main
const N_LISTS = 1000000
const LIST_SIZE = 1000
const ALGOS = [
{
'handle': '@kevinrpb',
'algo': kevinrpb
},
{
'handle': '@naugtur',
'algo': naugtur
},
{
'handle': '@DrReinhold',
'algo': DrReinhold
}
]
// Run the stuff
const lists = [...Array(N_LISTS).keys()].map(() => getList(LIST_SIZE))
let results = ALGOS
.map(({ handle, algo }) => (
{
handle,
results: benchmarks(lists, algo)
}
))
console.log(results)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment