Transaction Protocol

The transaction protocol is intended to be a general and extensible substrate for communication in Robigalia systems. The base protocol takes any system capable of unreliable synchronous RPC and extends it to be capable of reliable asynchronous RPC. Further extensions enable concurrency, promise pipelining, efficient use of platform primitives, etc.

Base Protocol

Assumptions

Underlying transport
- Clients perform operations on the executor using synchronous RPC, carrying arbitrary parameters and return values
- Synchronous RPC involves a send phase and a reply phase, and either may fail
- Any message (send or reply) that is repeated an unbounded number of times eventually reaches its destination (weak fairness)
Client behavior
- If an RPC fails (in either phase), the client must resubmit it with the same parameters
- The trace of the client's RPCs must obey (Submit+ -> Retrieve+ -> Reap+)*, with all step repetition deriving from the above.

States

The Transaction Protocol is modeled in terms of making synchronous calls to an executor, which behaves according to a state machine. The states of the state machine are:

Ready
Executing
Completed

Ready

The executor is ready to have a task submitted to it, at which point it will transition to the Executing state.

Executing

The executor has had a task submitted to it, and has not yet completed the task. Once the task completes, it will transition to the Completed state.

Completed

The executor has completed executing a task, and the result is available for retrieval. The task may now be reaped, at which point the executor will transition to the Ready state.

Operations

There are three operations the client may perform on the executor:

Submit(Executor, Task) -> Result<(), IllegalOperation>
Retrieve(Executor) -> Result<Task::Output, InProgress|IllegalOperation>
Reap(Executor) -> Result<(), IllegalOperation>

Submit

In the Ready state, the server will transition to the Executing state and reply with a success response.

In the Executing or Completed states, if the task being submitted is the same as the one executing, the server will perform no state transition and reply with a success response.

In the Executing or Completed states, if the task being submitted differs from the one executing, the server will perform no state transition and may reply with an error response indicating that the client has violated the contract (trace does not satisfy rule). This is a "may" rather than a "will" because detecting this case may be prohibitively expensive.

Retrieve

Under no circumstances will the server perform a state transition based on a Retrieve operation.

In the Completed state, the server will reply with the result of the task.

In the Ready state, the server will reply with an error indicating that the client has violated the contract (trace does not satisfy rule).

In the Executing state, the server will reply with an error indicating that the task is in progress.

Note: As the base protocol does not assume an asynchronous notification primitive, polling of Retrieve is used to check for task completion.

Reap

In the Completed state, the server will transition to the Ready state and reply with a success response.

In the Ready state, the server will not perform a state transition and will reply with a success response.

In the Executing state, the server will not perform a state transition and will reply with an error indicating the client has violated the contract (trace does not satisfy rule).

Protocol Extensions

Concurrency

A concurrent executor exposes much the same interface as the executor in the base protocol, however:

The executor may be queried for its maximum concurrency
- Concurrency(Executor) -> u64
Submit, Retrieve, and Reap now take an additional index parameter 0 <= index < maximum_concurrency
- Submit(Executor, Index, Task) -> Result<(), IllegalOperation>
- Retrieve(Executor, Index) -> Result<Task::Output, InProgress|IllegalOperation>
- Reap(Executor, Index) -> Result<(), IllegalOperation>
The properties of the base protocol apply as-if operations on each index were performed on a non-concurrent executor.

Promise Pipelining (requires Concurrency)

A pipelined executor exposes much the same interface as a concurrent executor, however:

An additional state, Consuming. This behaves identically to Completed, save that Reap produces an "in use" error when performed on it
- Reap(Executor, Index) -> Result<(), IllegalOperation|InUse>
Submit now takes an additional parameter, which is a set of indices (not including its own) which this task depends on.
- These indices must be in the Executing, Consuming, or Completed states; if not, the server replies with a "not ready" error.
- Any dependencies in the Completed state transition to the Consuming state.
- Any dependencies in the Executing state will transition to the Consuming state on completion, rather than Completed
- Submit(Executor, Index, Task, deps: [Index]) -> Result<(), IllegalOperation|NotReady>
When all tasks that depend on an index are Consuming or Completed, if that index is Consuming, it transitions to Completed
A task has some means to access the results of its dependencies, once the dependencies are in the Consuming state.

Note: The burden is on the client to avoid Reaping tasks if it has tasks it intends to Submit that will depend on them.

Dependency-only Tasks (requires Promise Pipelining)

A minor extension of promise pipelining, this allows a task to be marked as "dependency-only" at submission time. A dependency-only task does not support the Retrieve operation and its result is only accessible by dependent tasks, which may allow the executor to optimize its resource usage.

Submit(Executor, Index, Task, deps: [Index], dep_only: bool) -> Result<(), IllegalOperation>

Protocol Optimizations

Interleaving (requires Concurrency)

Where the underlying transport is not synchronous (e.g. HTTP/2), then it is legal for operations on different indices to perform the Send phase in parallel, without waiting for each other to complete the Retrieve phase

Asynchronous Notification

Where the underlying transport offers an asynchronous notification primitive (e.g. seL4 notifiers), it may be used to signal when a task reaches the Completed state. When available, this allows for Reap to be performed without performing Retrieve, so long as a completion has been signalled.

Trace: (Submit+ -> (Retrieve+|(Retrieve* -> Notify -> Retrieve*)) -> Reap+)*

Shared Notification (requires Concurrency)

Where the underlying transport provides an efficient means of asynchronous multiple notification (e.g. seL4 notifiers), multiple indices of the concurrent executor may share the notification channel. In the case of seL4 notifications, for example, an executor with a maximum concurrency the same as the number of bits in an seL4 word may use a single notifier.

Early notification (requires Promise Pipelining)

When using an asynchronous notifier together with promise pipelining, it may be beneficial to notify when the task reaches Consuming (where the result may be fetched) as well as or instead of when the task reaches Completed (where the task may be reaped).

Compressed dependencies (requires Promise Pipelining)

When the maximum concurrency is low enough, it may be beneficial to encode the dependencies as a bitmap of indices, rather than a list.

Niche-encoded dependency-only tasks (requires Compressed Dependencies, Dependency-only Tasks)

When using compressed dependencies, dependency-only tasks may be encoded by setting the bit that would indicate the task depends on itself, as this has no other meaningful use.

Out-of-band Bulk Transfer

When available, the bulk data of a task or a result may be transferred via out-of-band mechanisms, such as shared memory, references to a content-addressed store, or other external means of bulk data transfer.

Multiple Asynchronous Submission (requires Out-of-band Bulk Transfer, Shared Asynchronous Notification)

When out-of-band transfer is used for tasks, an asynchronous multiple notification from the client to the executor may be used to signal multiple tasks having been submitted at once.

Return New State

The operations could return the new state of the executor in addition to their other return values. This could allow for the executor to, in cases where the task completed immediately, avoid some unnecessary IPC by communicating that it is in the Completed state rather than the Executing state.

Historical contents of this ticket

Some notes from @eternaleye

Transaction layer:

Guarantee:
- Every operation is eventually successfully recorded, in sequential order, so long as no single message continues to fail forever
Client contract:
- Submit => Retrieve => Reap => Submit => ...
- Not permitted to elide or reorder steps
- If step fails, must repeat WITH SAME PARAMETERS until it succeeds
Server state machine (ops take zero time)
- States
  - Ready
  - Completed
- Transitions
  - INITIAL :: Ready
  - Submit :: Ready -> Completed
  - Retrieve :: Completed -> Completed (gets value)
  - Reap :: Completed -> Ready
- Transitions reachable by repeating on failure:
  - Submit :: Completed -> Completed (all parameters identical - silent success)
  - Reap :: Ready -> Ready (silent success)
- Transitions only reachable if client violates contract:
  - Submit :: Completed -> Completed (EBADE) (some parameters differ)
    - Likely too onerous to mandate servers detect this, as it requires saving (at least a hash of) all the parameters
    - However, likely very useful as a debugging tool
  - Retrieve :: Ready -> Ready (EBADE)
Server state machine (ops take nonzero time, extends the above)
- Additional states
  - Executing
- Replaced transitions
  - Submit :: Ready -> Executing
- Additional transitions
  - (internal) :: Executing -> Completed
- Additional transitions reachable by race, but repeating upholds invariants:
  - Retrieve :: Executing -> Executing (EINPROGRESS)
  - Submit :: Executing -> Executing (all parameters identical)
- Additional transitions only reachable if client violates contract:
  - Reap :: Executing -> Executing (EBADE)
  - Submit :: Executing -> Executing (EBADE) (some parameters differ)
    - See "Submit :: Completed -> Completed" above
Optimizations:
1. Executing -> Completed flags notification
2. Executing -> Completed sends result
  - Obviates Retrieve
  - ...except that sends fail silently if they do fail.
  - Requires thread to avoid blocking
3. Executing -> Completed stores data in shared memory
  - Obviates Retrieve if combined with optimization (1)
  - Reap indicates space can be reused
  - No thread needed
4. Can execute arbitrary number of state machines in parallel
  - WORD_BITS state machines can reuse same notification channel
  - No dependencies will be honored between state machines - treat these like HW queues on NICs or HBAs
5. Notifier can be shared among thread pool at Retrieve/Reap time
  - So long as each thread does a Recv rather than an NBRecv (clearing/claiming the result)
  - This is to avoid racing Retrieve and Reap between two threads.
  - If a thread does the Recv and then dies before Retrieve and Reap both complete, this can leave a state machine stuck. This is the "riskiest" of these optimizations, as a result.
6. Named (i.e. client-triggered) transitions can (in non-error cases) return the state transitioned to
  - This opens the door to the server non-deterministically selecting one of Submit :: Ready -> Completed vs. Submit :: Ready -> Executing at runtime

Edited May 05, 2020 by Alex Elsayed