Skip to content

Instantly share code, notes, and snippets.

@stedolan
Created January 5, 2021 12:03
Show Gist options
  • Save stedolan/52c13d6b1a30276db31ca98683c8db16 to your computer and use it in GitHub Desktop.
Save stedolan/52c13d6b1a30276db31ca98683c8db16 to your computer and use it in GitHub Desktop.

OCaml GC Pacing

Current OCaml GC logic

I'm going to make the following simplifying assumptions:

  • There aren't many custom_alloc_custom_mem blocks (caml_dependent_size ~= 0)

  • There aren't many caml_alloc_custom blocks with nonzero mem/max (caml_extra_heap_resources ~= 0)

  • There are no manually triggered GC slices (howmuch = -1, caml_major_work_credit = 0)

  • The smoothing of slice amounts has no effect (filt_p ~= p)

  • There are few or no weak pointers and ephemerons (Phase_clean is negligible)

  • The number of roots is much smaller than the size of the heap (stat_heap_wsz + caml_incremental_roots_count ~= stat_heap_wsz)

Under these assumptions, the calculations in caml_major_collection_slice simplify to the following. (For readability, I've removed caml_ prefixes, casts, comments, logging, etc.)

p = allocated_words * 3 * (100 + percent_free)
    / stat_heap_wsz / percent_free / 2;

if (gc_phase == Phase_mark) {
  computed_work = p * stat_heap_wsz * 250 / (100 + percent_free);
}else{
  computed_work = p * stat_heap_wsz * 5 / 3;
}

Expanding p, we get:

if (gc_phase == Phase_mark) {
  computed_work = allocated_words * 3 * 250 / (2 * percent_free);
}else{
  computed_work = allocated_words * 5 * (100 + percent_free) / (2 * percent_free);
}

Note that the uses of stat_heap_wsz in p and in computed_work cancel out, so the amount of work done does not depend on the heap size.

The computed_work is equal to allocated_words times a constant. Let's name these constants m and s. They represent the amount of work done per word of allocation during mark and sweep slices respectively, and their values are:

m = 3 * 250 / (2 * percent_free)
s = 5 * (100 + percent_free) / (2 * percent_free)

The default percent_free = 80 gives m = 4.6875, s = 5.625.

This is open-loop control: the parameters m and s are determined only from configuration (percent_free), and there is no feedback mechanism. The open-loop nature of this controller makes it much simpler to analyse.

The interesting question is about the relationship between m, s, percent_free and the actual heap size.

A major GC cycle

As soon as a major GC cycle starts, the allocated data on the heap is snapshotted, dividing it into live data (reachable from the roots at the moment the cycle starts) and garbage (unreachable).

Write L for the size of the live data. It is convenient to work in units of L, so let's measure the garbage as a proportion g of L, so that the allocated data at cycle start has size:

L + Lg

The value of L at any given moment is a property of the program, unaffected by GC decisions. However, the value of g at the start of the cycle is a function of GC decisions in previous cycles.

The first phase is marking, as the collector determines what's live and what's garbage. The total amount of marking work to do is L (only live data need be marked), and since we do m marking work per word of allocation, we will allocate L/m words during the marking phase.

At the end of marking, the amount of allocated data is maximal: we have been allocating during marking, but we have not yet begun to collect any garbage. The amount of allocated data here is:

L + Lg + L/m

Next, we begin sweeping. We must sweep the entire heap, and we do s sweeping per allocated word. By the end of sweeping, we have collected exacty Lg garbage, as none of the data allocated during marking or sweeping can be collected this cycle. This leaves the total amount of allocated data as the sum of live at the start, allocated during marking, and allocated during sweeping:

L + L/m + (L + Lg + L/m)/s

Finally, the next cycle begins, splitting the above into L' + L' g'.

Steady-state and stability

An important property is that if the amount of live data is constant, then the heap size should also be constant. So, taking L to be constant (L = L'), let's see what happens to g.

L + L g' = L + L/m + (L + Lg + L/m)/s

g' = 1/m + (1 + g + 1/m)/s

g' = 1/m + 1/s + g/s + 1/ms

This has the form of a linear recurrence g -> αg + β, where

α = 1/s
β = 1/m + 1/s + 1/ms

This is stable as long as α < 1, and tends towards its unique fixpoint of β/(1-α), or:

(1/m + 1/s + 1/ms) / (1 - 1/s)

The stability condition is that s > 1: we must sweep faster than we're allocating, or sweeping won't terminate. There is no such condition on marking, as the amount of mark work is fixed at the start of the cycle.

The "overhead" is the ratio of non-live heap to live heap. The heap is at its largest just at the end of marking, at which time the overhead is g + 1/m. In the steady state, that gives an overhead of:

(1/m + 1/s + 1/ms) / (1 - 1/s)  + 1 / m
=
(2/m + 1/s) / (1 - 1/s)

Choosing m and s

With the current default values of m and s, we end up with an overhead of 0.735. We can choose a specific value of o by solving for m and s.

There is one other parameter to pick: the ratio λ between mark and sweep work. We choose m = λs, and try to pick λ so that the amount of overhead per allocated word is approximately equal between marking and sweeping. That is, we try to ensure that doing λ words of sweeping has about the same cost as doing 1 word of marking.

I think we could experimentally determine a reasonable hardcoded value for λ, although it will depend somewhat on the program behaviour (poor locality makes marking slower without affecting sweeping much, while a larger average object size speeds up sweeping without affecting marking much).

Then, we can take:

s = 1 + (1 + 2/λ) / o
m = λs
@NickBarnes
Copy link

This is really handy, thank you. Why "try to pick λ so that the amount of overhead per allocated word is approximately equal between marking and sweeping."?

@stedolan
Copy link
Author

Why "try to pick λ so that the amount of overhead per allocated word is approximately equal between marking and sweeping."?

We'd like to minimise the maximum pause time. For a given amount of work, that happens when all pauses are equal. It's easy enough to make mark pauses approximately equal (do the same amount of marking work per slice), and likewise for sweep pauses, but it takes a careful choice of λ to make the mark pauses the same length as the sweep pauses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment