Created
November 18, 2019 11:36
-
-
Save OlivierLi/c9d53d6c8ef5167d59ea9603a22d5f02 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#pragma once | |
// Assumptions: | |
// 1) Taking a lock is considered costly on the platforms of interest to us, | |
// even with minimal contention. This is not true on pthread I think. | |
// 2) Using atomic pointers/indices on the platforms of interest is considered | |
// cheap. Of course this is not to say that they are free. | |
// Requirements : | |
// 1) Ability to share the work between workers with minimal synchronization. | |
// 2) Each worker has to have the ability to pick up the slack of others if | |
// necessary. | |
// 3) We do not need to support retries of jobs within the job dispatching | |
// mechanism. | |
// Assuming that most tasks succeed allows for the jobs to post retry requests | |
// to a costly synchronized data structure if necessary on failure. | |
// Algo 1: | |
// 1) Each worker starts with a pointer/index to the start of its slice. | |
// 2) Workers increment that pointer until their is no more work left. | |
// This option has literally zero synchronization involved. It does not allow | |
// workers to pick up the slack of others though which can be very important | |
// with heterogenous workloads. This option also has very good temporal | |
// localilty because workers will work on the items that were very close to the | |
// previous one. | |
// Algo 2: | |
// 1) Workers pop off work by incrementing an atomic pointer/index into the | |
// work. They stop if there is no more work at all. | |
// This option has the advantage of simplicity and slack handling comes for | |
// free. It has the disadvantage of introducing constant contention which is can | |
// be costly if we cannot use atomics or if they are not cheap. It also has very | |
// poor per-worker locality by design, | |
// Algo 3: | |
// 1) Each worker recieves it's index at startup and a pointer to the running | |
// index of the next worker. Whether this is linked list or index based the last | |
// worker need to know how to wrap around. 2) As long as a worker does not go | |
// over its personal last item it just keeps going. | |
// The worker probably needs a local non-atomic variable for the current | |
// work-load. This is needed to prevent two workers to work on the same item at | |
// the same time. | |
// 3) If the worker hits its limit it compares_and_exchanges the index of the | |
// next worker to the max of the next worker. This grabs some work for the next | |
// worker. 4) If there is no work in the next worker either the worker coul | |
// return or traverse the whole structure to find available work. | |
// This algorithm provides good locality and more coherent work history for the | |
// workers at the cost of added complexity. Contention is greatly reduced on the | |
// counter in most cases. | |
#include <atomic> | |
struct ControlBlock { | |
std::atomic<int> index; | |
int work_slice_size; | |
ControlBlock* GetNextWorker() {} | |
}; | |
template <typename T> | |
class JobDispatcher { | |
public: | |
private: | |
}; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment