First, make the damn GC work consistently.
Parallel marking seems easier than parallel copying. The GC is already designed after https://dl.acm.org/doi/pdf/10.1145/512529.512546 and has the same mark queue design. We presumably “just” need to make marking atomic, give threads their own input and output packets, and lock around processing weak reference stuff.
Concurrent marking seems easier than concurrent copying. As in the prior paper, I assume. Need a write barrier, and to trace pointers stored in new objects.
Do it incrementally to reduce space overhead. What should we target for copying? Presumably pages with high fragmentation which is measured by … Check how much space the mutator skips over while allocating (indicating fragmentation)? LXR just picks the lowest occupancy pages.
With concurrent marking, we could work out a remset of cards/pages/etc that point into the pages to copy, to avoid fixing up the whole heap. Then we could work out how much we can copy at once, based on throughput measurements, to target a particular pause time, yielding something closer to a soft-real time collector. Something like https://dl.acm.org/doi/pdf/10.1145/773039.512442 perhaps.
There’s then the case of popular objects/pages - we could take RC for each page while tracing, and then avoid compacting pages that have way too many incoming references. But compacting is the backup way of reclaiming memory, so maybe it’s not so important.
In discussion with the authors of LXR, popular objects are less of an issue when compacting/copying is the backup approach, and not primary. So we should be okay.
update: I implemented the incremental compactor algorithm with STW tracing, and it works pretty well. Haven’t noticed the remset ballooning too much.