Transaction Protocol
The transaction protocol is intended to be a general and extensible substrate for communication in Robigalia systems. The base protocol takes any system capable of unreliable synchronous RPC and extends it to be capable of reliable asynchronous RPC. Further extensions enable concurrency, promise pipelining, efficient use of platform primitives, etc.
Base Protocol
Assumptions
- Underlying transport
- Clients perform operations on the executor using synchronous RPC, carrying arbitrary parameters and return values
- Synchronous RPC involves a send phase and a reply phase, and either may fail
- Any message (send or reply) that is repeated an unbounded number of times eventually reaches its destination (weak fairness)
- Client behavior
- If an RPC fails (in either phase), the client must resubmit it with the same parameters
- The trace of the client's RPCs must obey
(Submit+ -> Retrieve+ -> Reap+)*
, with all step repetition deriving from the above.
States
The Transaction Protocol is modeled in terms of making synchronous calls to an executor, which behaves according to a state machine. The states of the state machine are:
Ready
Executing
Completed
Ready
The executor is ready to have a task submitted to it, at which point it will transition to the Executing
state.
Executing
The executor has had a task submitted to it, and has not yet completed the task. Once the task completes, it will transition to the Completed
state.
Completed
The executor has completed executing a task, and the result is available for retrieval. The task may now be reaped, at which point the executor will transition to the Ready
state.
Operations
There are three operations the client may perform on the executor:
Submit(Executor, Task) -> Result<(), IllegalOperation>
Retrieve(Executor) -> Result<Task::Output, InProgress|IllegalOperation>
Reap(Executor) -> Result<(), IllegalOperation>
Submit
In the Ready
state, the server will transition to the Executing
state and reply with a success response.
In the Executing
or Completed
states, if the task being submitted is the same as the one executing, the server will perform no state transition and reply with a success response.
In the Executing
or Completed
states, if the task being submitted differs from the one executing, the server will perform no state transition and may reply with an error response indicating that the client has violated the contract (trace does not satisfy rule). This is a "may" rather than a "will" because detecting this case may be prohibitively expensive.
Retrieve
Under no circumstances will the server perform a state transition based on a Retrieve
operation.
In the Completed
state, the server will reply with the result of the task.
In the Ready
state, the server will reply with an error indicating that the client has violated the contract (trace does not satisfy rule).
In the Executing
state, the server will reply with an error indicating that the task is in progress.
Note: As the base protocol does not assume an asynchronous notification primitive, polling of Retrieve
is used to check for task completion.
Reap
In the Completed
state, the server will transition to the Ready
state and reply with a success response.
In the Ready
state, the server will not perform a state transition and will reply with a success response.
In the Executing
state, the server will not perform a state transition and will reply with an error indicating the client has violated the contract (trace does not satisfy rule).
Protocol Extensions
Concurrency
A concurrent executor exposes much the same interface as the executor in the base protocol, however:
- The executor may be queried for its maximum concurrency
Concurrency(Executor) -> u64
-
Submit
,Retrieve
, andReap
now take an additionalindex
parameter0 <= index < maximum_concurrency
Submit(Executor, Index, Task) -> Result<(), IllegalOperation>
Retrieve(Executor, Index) -> Result<Task::Output, InProgress|IllegalOperation>
Reap(Executor, Index) -> Result<(), IllegalOperation>
- The properties of the base protocol apply as-if operations on each index were performed on a non-concurrent executor.
Promise Pipelining (requires Concurrency)
A pipelined executor exposes much the same interface as a concurrent executor, however:
- An additional state,
Consuming
. This behaves identically toCompleted
, save thatReap
produces an "in use" error when performed on itReap(Executor, Index) -> Result<(), IllegalOperation|InUse>
-
Submit
now takes an additional parameter, which is a set of indices (not including its own) which this task depends on.- These indices must be in the
Executing
,Consuming
, orCompleted
states; if not, the server replies with a "not ready" error. - Any dependencies in the
Completed
state transition to theConsuming
state. - Any dependencies in the
Executing
state will transition to theConsuming
state on completion, rather thanCompleted
Submit(Executor, Index, Task, deps: [Index]) -> Result<(), IllegalOperation|NotReady>
- These indices must be in the
- When all tasks that depend on an index are
Consuming
orCompleted
, if that index isConsuming
, it transitions toCompleted
- A task has some means to access the results of its dependencies, once the dependencies are in the
Consuming
state.
Note: The burden is on the client to avoid Reap
ing tasks if it has tasks it intends to Submit
that will depend on them.
Dependency-only Tasks (requires Promise Pipelining)
A minor extension of promise pipelining, this allows a task to be marked as "dependency-only" at submission time. A dependency-only task does not support the Retrieve
operation and its result is only accessible by dependent tasks, which may allow the executor to optimize its resource usage.
Submit(Executor, Index, Task, deps: [Index], dep_only: bool) -> Result<(), IllegalOperation>
Protocol Optimizations
Interleaving (requires Concurrency)
Where the underlying transport is not synchronous (e.g. HTTP/2), then it is legal for operations on different indices to perform the Send
phase in parallel, without waiting for each other to complete the Retrieve
phase
Asynchronous Notification
Where the underlying transport offers an asynchronous notification primitive (e.g. seL4 notifiers), it may be used to signal when a task reaches the Completed
state. When available, this allows for Reap
to be performed without performing Retrieve
, so long as a completion has been signalled.
Trace: (Submit+ -> (Retrieve+|(Retrieve* -> Notify -> Retrieve*)) -> Reap+)*
Shared Notification (requires Concurrency)
Where the underlying transport provides an efficient means of asynchronous multiple notification (e.g. seL4 notifiers), multiple indices of the concurrent executor may share the notification channel. In the case of seL4 notifications, for example, an executor with a maximum concurrency the same as the number of bits in an seL4 word may use a single notifier.
Early notification (requires Promise Pipelining)
When using an asynchronous notifier together with promise pipelining, it may be beneficial to notify when the task reaches Consuming
(where the result may be fetched) as well as or instead of when the task reaches Completed
(where the task may be reaped).
Compressed dependencies (requires Promise Pipelining)
When the maximum concurrency is low enough, it may be beneficial to encode the dependencies as a bitmap of indices, rather than a list.
Niche-encoded dependency-only tasks (requires Compressed Dependencies, Dependency-only Tasks)
When using compressed dependencies, dependency-only tasks may be encoded by setting the bit that would indicate the task depends on itself, as this has no other meaningful use.
Out-of-band Bulk Transfer
When available, the bulk data of a task or a result may be transferred via out-of-band mechanisms, such as shared memory, references to a content-addressed store, or other external means of bulk data transfer.
Multiple Asynchronous Submission (requires Out-of-band Bulk Transfer, Shared Asynchronous Notification)
When out-of-band transfer is used for tasks, an asynchronous multiple notification from the client to the executor may be used to signal multiple tasks having been submitted at once.
Return New State
The operations could return the new state of the executor in addition to their other return values. This could allow for the executor to, in cases where the task completed immediately, avoid some unnecessary IPC by communicating that it is in the Completed
state rather than the Executing
state.
Historical contents of this ticket
Transaction layer:
- Guarantee:
- Every operation is eventually successfully recorded, in sequential order, so long as no single message continues to fail forever
- Client contract:
-
Submit
=>Retrieve
=>Reap
=>Submit
=> ... - Not permitted to elide or reorder steps
- If step fails, must repeat WITH SAME PARAMETERS until it succeeds
-
- Server state machine (ops take zero time)
- States
Ready
Completed
- Transitions
-
INITIAL
::Ready
-
Submit
::Ready
->Completed
-
Retrieve
::Completed
->Completed
(gets value) -
Reap
::Completed
->Ready
-
- Transitions reachable by repeating on failure:
-
Submit
::Completed
->Completed
(all parameters identical - silent success) -
Reap
::Ready
->Ready
(silent success)
-
- Transitions only reachable if client violates contract:
-
Submit
::Completed
->Completed
(EBADE
) (some parameters differ)- Likely too onerous to mandate servers detect this, as it requires saving (at least a hash of) all the parameters
- However, likely very useful as a debugging tool
-
Retrieve
::Ready
->Ready
(EBADE
)
-
- States
- Server state machine (ops take nonzero time, extends the above)
- Additional states
Executing
- Replaced transitions
-
Submit
::Ready
->Executing
-
- Additional transitions
- (internal) ::
Executing
->Completed
- (internal) ::
- Additional transitions reachable by race, but repeating upholds invariants:
-
Retrieve
::Executing
->Executing
(EINPROGRESS
) -
Submit
::Executing
->Executing
(all parameters identical)
-
- Additional transitions only reachable if client violates contract:
-
Reap
::Executing
->Executing
(EBADE
) -
Submit
::Executing
->Executing
(EBADE
) (some parameters differ)- See "
Submit
::Completed
->Completed
" above
- See "
-
- Additional states
- Optimizations:
-
Executing
->Completed
flags notification -
Executing
->Completed
sends result- Obviates
Retrieve
- ...except that sends fail silently if they do fail.
- Requires thread to avoid blocking
- Obviates
-
Executing
->Completed
stores data in shared memory- Obviates
Retrieve
if combined with optimization (1) -
Reap
indicates space can be reused - No thread needed
- Obviates
- Can execute arbitrary number of state machines in parallel
-
WORD_BITS
state machines can reuse same notification channel - No dependencies will be honored between state machines - treat these like HW queues on NICs or HBAs
-
- Notifier can be shared among thread pool at
Retrieve
/Reap
time- So long as each thread does a
Recv
rather than anNBRecv
(clearing/claiming the result) - This is to avoid racing
Retrieve
andReap
between two threads. - If a thread does the
Recv
and then dies beforeRetrieve
andReap
both complete, this can leave a state machine stuck. This is the "riskiest" of these optimizations, as a result.
- So long as each thread does a
- Named (i.e. client-triggered) transitions can (in non-error cases) return the state transitioned to
- This opens the door to the server non-deterministically selecting one of
Submit
::Ready
->Completed
vs.Submit
::Ready
->Executing
at runtime
- This opens the door to the server non-deterministically selecting one of
-