layout | title | date | author | authorAvatar | image | tags | |||
---|---|---|---|---|---|---|---|---|---|
blog |
Explore Memoization |
2019-07-18 16:20:16 UTC |
Toan Nguyen Gia |
/img/uploads/t0d8dq73q-u9hr7hu1m-706d3758d60e-512.jpeg |
/img/uploads/deco-iskusstvoa-helix.jpg |
|
You may heard about the term memoization before. It's simply a technique that improves performance of an expensive function, by storing and returning the cached result if the same inputs occur. The minimal implementation of a memoize function can be as follow, in case of your targeted function only receives one argument:
const memoize = f => {
let cache = {};
return arg => {
if (cache[arg]) return cache[arg];
const result = f(arg);
cache[arg] = result;
return result;
};
};
Lets try applying it to a recursive function:
// Factorial
const fac = n => {
log(); // Keep track of number of recursions
if (n <= 1) return 1;
return n * fac(n - 1);
};
const memoizedFac = memoize(fac);
memoizedFac(6);
memoizedFac(6);
expect(log).toHaveBeenCalledTimes(6); // Passed
The result of 6! is cached so in the second call, there is no recursion happens. It may look ok until you try this:
memoizedFac(6);
memoizedFac(6);
memoizedFac(7);
expect(log).toHaveBeenCalledTimes(7); // Failed, it's 13
Because 6! is already cached, should the total number of recursions is 7 only? It turns out that since in our factorial implementation, we try to call the non-memoized version in its body. Let's fix that:
// Non-memoized version
const fac = n => {
if (n <= 1) return 1;
return n * fac(n - 1);
};
// Memoized version
const memoizedFac = memoize(n => {
if (n <= 1) return 1;
return n * memoizedFac(n - 1);
});
memoizedFac(6);
memoizedFac(6);
memoizedFac(7);
expect(log).toHaveBeenCalledTimes(7); // Passed
Problem solved, easy peasy!
But it has one down side, that is we are copying the logic. What if I want to maintain the logic in just one place, but also have two versions of factorial, memoized and non-memoized. Is it possible?
You can stop here for a while and try it yourself before jumping to the next section. This is where the fun begins.
My favorite solution is:
const fac = y(_fac); // Non-memoized version
const memoizedFac = y(memoize(_fac)); // Memoized version
Where y
, memoize
and _fac
are defined as follow:
const y = f => (g => n => f(g(g))(n))(g => n => f(g(g))(n));
// Factorial logic is here
const _fac = anotherFac => n => {
if (n <= 1) return 1;
return n * anotherFac(n - 1);
};
// Memoize is a bit different now
const memoize = f1 => {
let cache = {};
return f2 => arg => {
if (cache[arg]) return cache[arg];
const result = f1(f2)(arg);
cache[arg] = result;
return result;
};
};
The function y
is called y-combinator. It may looks confusing, and you may wonder why we need it. Well, actually there are other approaches, but the beauty of this is y-combinator allows you to decorate a recursive function by composition, so you can easily implement other strategies, like trampolining (to prevent stack overflow).
If you are curious about what y-combinator is, or how I came up with this solution, you can continue to the next section. It's gonna be very long and contains a lot of codes, so prepare yourself.
Let's forget everything about y-combinator above and make it from scratch. Take a look back at this code snippet first:
// Non-memoized version
const fac = n => {
if (n <= 1) return 1;
return n * fac(n - 1);
};
// Memoized version
const memoizedFac = memoize(n => {
if (n <= 1) return 1;
return n * memoizedFac(n - 1);
});
The difference between those definitions is the function called inside the body, fac
and memoizedFac
. To avoid copying logic, we can move that function out as an parameter. Base on that idea, we can create a special function, called _fac
:
const _fac = anotherFac => n => {
log();
if (n <= 1) return 1;
return n * anotherFac(n - 1);
};
As you can see, _fac
doesn't look like a normal factorial function. In fact, it's a not-working-yet factorial function. It requires another factorial function, in order to produce a factorial function! I will call it a factorial factory (prefix with a dash). So if I want to create a memoized factorial using _fac
, I should pass anotherFac
to it which should be already memoized, right?
Let's try using it to see what happens next:
const memoize = f => {
let cache = {};
return arg => {
if (cache[arg]) return cache[arg];
const result = f(arg);
cache[arg] = result;
return result;
};
};
const memoizedFac = memoize(_fac(anotherMemoizedFac)); // anotherMemoizedFac is not defined
There is no definition of anotherMemoizedFac
yet. We can simply create it before memoizedFac
:
const anotherMemoizedFac = memoize(_fac(anotherMemoizedFac1)); // anotherMemoizedFac1 is not defined
const memoizedFac = memoize(_fac(anotherMemoizedFac));
Just keep doing and at some points, it will start running:
const anotherMemoizedFac2 = memoize(_fac()); // Fall to the n <= 1 condition
const anotherMemoizedFac1 = memoize(_fac(anotherMemoizedFac2));
const anotherMemoizedFac = memoize(_fac(anotherMemoizedFac1));
const memoizedFac = memoize(_fac(anotherMemoizedFac));
expect(memoizedFac(3)).toEqual(6); // Passed
expect(memoizedFac(3)).toEqual(6); // Passed
expect(memoizedFac(4)).toEqual(24); // Passed
expect(log).toHaveBeenCalledTimes(4); // Failed, it's called 7 times
Great, no error, and some tests get passed. But seem like memoizing is not working properly. The reason is each time we call memoize
, it creates a different caching storage. Keep in mind that memoize
must be called just once, we can re-write the codes as follow:
const _fac = anotherFac => n => {
log();
if (n <= 1) return 1;
return n * anotherFac(n - 1);
};
// Change memoize a bit so it fits the _fac factory
const memoize = f1 => {
let cache = {};
return f2 => arg => {
if (cache[arg]) return cache[arg];
const result = f1(f2)(arg);
cache[arg] = result;
return result;
};
};
// This new factory return a memoized factorial function
const _memoizedFac = memoize(_fac);
const anotherMemoizedFac2 = _memoizedFac();
const anotherMemoizedFac1 = _memoizedFac(anotherMemoizedFac2);
const anotherMemoizedFac = _memoizedFac(anotherMemoizedFac1);
const memoizedFac = _memoizedFac(anotherMemoizedFac);
expect(memoizedFac(3)).toEqual(6); // Passed
expect(memoizedFac(3)).toEqual(6); // Passed
expect(memoizedFac(4)).toEqual(24); // Passed
expect(log).toHaveBeenCalledTimes(4); // Passed!!!
And yes, it works! However it's not actually usable now, because it only apply for 4. If I want to calculate 6!, I have to manually add anotherMemoizedFac3
and anotherMemoizedFac4
. Let's see what we can refactor in the next section.
You can notice that the pattern to create memoized factorial is repeated. We can split it into a function, called generator
, because I'm bad at naming things:
const _memoizedFac = memoize(_fac);
const anotherMemoizedFac2 = _memoizedFac();
const anotherMemoizedFac1 = _memoizedFac(anotherMemoizedFac2);
const anotherMemoizedFac = _memoizedFac(anotherMemoizedFac1);
const generator = () => _memoizedFac(anotherMemoizedFac);
const memoizedFac = generator();
Replace anotherMemoizedFac
with generator()
as well:
const _memoizedFac = memoize(_fac);
const generator = generator => _memoizedFac(generator(generator)); // Maximum call stack size exceeded
const memoizedFac = generator(generator);
We got stack overflow, since Javascript is not lazy evaluation. There is a tip to avoid this problem:
const _memoizedFac = memoize(_fac);
const generator = generator => number => _memoizedFac(generator(generator))(number);
const memoizedFac = generator(generator);
Create another function called y
, to move _memoizedFac
out as a parameter:
const _memoizedFac = memoize(_fac);
const y = _memoizedFac => {
const generator = generator => number => _memoizedFac(generator(generator))(number);
return generator(generator);
};
const memoizedFac = y(_memoizedFac);
Rename to make it shorter:
const _memoizedFac = memoize(_fac);
const y = f => {
const g = g => n => f(g(g))(n);
return g(g);
};
const memoizedFac = y(_memoizedFac);
Final step:
const y = f => (g => n => f(g(g))(n))(g => n => f(g(g))(n));
const memoizedFac = y(memoize(_fac));
Done! This is the completed code snippet so you can try it yourself:
const log = jest.fn();
const y = f => (g => n => f(g(g))(n))(g => n => f(g(g))(n));
const _fac = anotherFac => n => {
log();
if (n <= 1) return 1;
return n * anotherFac(n - 1);
};
const memoize = f1 => {
let cache = {};
return f2 => arg => {
if (cache[arg]) return cache[arg];
const result = f1(f2)(arg);
cache[arg] = result;
return result;
};
};
const fac = y(_fac); // Non-memoized version
const memoizedFac = y(memoize(_fac)); // Memoized version
expect(fac(3)).toEqual(6);
expect(fac(3)).toEqual(6);
expect(log).toHaveBeenCalledTimes(6);
log.mockClear();
expect(memoizedFac(3)).toEqual(6);
expect(memoizedFac(3)).toEqual(6);
expect(memoizedFac(4)).toEqual(24);
expect(log).toHaveBeenCalledTimes(4);
As you can see, _fac
alone is quite useless. Y-combinator helps function which is not-recursive-yet, like _fac
, become recursive. This is also why a startup accelerator names their company after it, as in the FAQs:
Why did you choose the name “Y Combinator?”
The Y combinator is one of the coolest ideas in computer science. It's also a metaphor for what we do. It's a program that runs programs; we're a company that helps start companies.
In this article, we have gone through the concept of memoization, its problem with recursive function, and a fancy solution for that problem using y-combinator. Although most of programming languages already support recursion, there are still cases for y-combinator to shine. There are also some good reads about this topic, On recursive function and Why Y. So, enjoy!