OlivierLi · November 18, 2019 11:36
diff --git a/jobs_dispatch.hpp b/jobs_dispatch.hpp
 #pragma once

 // Assumptions:
 // 1) Taking a lock is considered costly on the platforms of interest to us,
 //  even with minimal contention. This is not true on pthread I think.
 // 2) Using atomic pointers/indices on the platforms of interest is considered
 // cheap. Of course this is not to say that they are free.

 // Requirements :

 // 1) Ability to share the work between workers with minimal synchronization.
 // 2) Each worker has to have the ability to pick up the slack of others if
 // necessary.
 // 3) We do not need to support retries of jobs within the job dispatching
 // mechanism.
 //  Assuming that most tasks succeed allows for the jobs to post retry requests
 //  to a costly synchronized data structure if necessary on failure.

 // Algo 1:
 // 1) Each worker starts with a pointer/index to the start of its slice.
 // 2) Workers increment that pointer until their is no more work left.

 // This option has literally zero synchronization involved. It does not allow
 // workers to pick up the slack of others though which can be very important
 // with heterogenous workloads. This option also has very good temporal
 // localilty because workers will work on the items that were very close to the
 // previous one.

 // Algo 2:
 // 1) Workers pop off work by incrementing an atomic pointer/index into the
 // work. They stop if there is no more work at all.

 // This option has the advantage of simplicity and slack handling comes for
 // free. It has the disadvantage of introducing constant contention which is can
 // be costly if we cannot use atomics or if they are not cheap. It also has very
 // poor per-worker locality by design,

 // Algo 3:
 // 1) Each worker recieves it's index at startup and a pointer to the running
 // index of the next worker. Whether this is linked list or index based the last
 // worker need to know how to wrap around. 2) As long as a worker does not go
 // over its personal last item it just keeps going.
 //  The worker probably needs a local non-atomic variable for the current
 //  work-load. This is needed to prevent two workers to work on the same item at
 //  the same time.
 // 3) If the worker hits its limit it compares_and_exchanges the index of the
 // next worker to the max of the next worker. This grabs some work for the next
 // worker. 4) If there is no work in the next worker either the worker coul
 // return or traverse the whole structure to find available work.

 // This algorithm provides good locality and more coherent work history for the
 // workers at the cost of added complexity. Contention is greatly reduced on the
 // counter in most cases.

 #include <atomic>

 struct ControlBlock {
  std::atomic<int> index;
  int work_slice_size;

  ControlBlock* GetNextWorker() {}
 };

 template <typename T>
 class JobDispatcher {
 public:
 private:
 };
	#pragma once

	// Assumptions:
	// 1) Taking a lock is considered costly on the platforms of interest to us,
	// even with minimal contention. This is not true on pthread I think.
	// 2) Using atomic pointers/indices on the platforms of interest is considered
	// cheap. Of course this is not to say that they are free.

	// Requirements :

	// 1) Ability to share the work between workers with minimal synchronization.
	// 2) Each worker has to have the ability to pick up the slack of others if
	// necessary.
	// 3) We do not need to support retries of jobs within the job dispatching
	// mechanism.
	// Assuming that most tasks succeed allows for the jobs to post retry requests
	// to a costly synchronized data structure if necessary on failure.

	// Algo 1:
	// 1) Each worker starts with a pointer/index to the start of its slice.
	// 2) Workers increment that pointer until their is no more work left.

	// This option has literally zero synchronization involved. It does not allow
	// workers to pick up the slack of others though which can be very important
	// with heterogenous workloads. This option also has very good temporal
	// localilty because workers will work on the items that were very close to the
	// previous one.

	// Algo 2:
	// 1) Workers pop off work by incrementing an atomic pointer/index into the
	// work. They stop if there is no more work at all.

	// This option has the advantage of simplicity and slack handling comes for
	// free. It has the disadvantage of introducing constant contention which is can
	// be costly if we cannot use atomics or if they are not cheap. It also has very
	// poor per-worker locality by design,

	// Algo 3:
	// 1) Each worker recieves it's index at startup and a pointer to the running
	// index of the next worker. Whether this is linked list or index based the last
	// worker need to know how to wrap around. 2) As long as a worker does not go
	// over its personal last item it just keeps going.
	// The worker probably needs a local non-atomic variable for the current
	// work-load. This is needed to prevent two workers to work on the same item at
	// the same time.
	// 3) If the worker hits its limit it compares_and_exchanges the index of the
	// next worker to the max of the next worker. This grabs some work for the next
	// worker. 4) If there is no work in the next worker either the worker coul
	// return or traverse the whole structure to find available work.

	// This algorithm provides good locality and more coherent work history for the
	// workers at the cost of added complexity. Contention is greatly reduced on the
	// counter in most cases.

	#include <atomic>

	struct ControlBlock {
	std::atomic<int> index;
	int work_slice_size;

	ControlBlock* GetNextWorker() {}
	};

	template <typename T>
	class JobDispatcher {
	public:
	private:
	};