Allow setting/configuring advanced parameters for LTTng
Superset of #67 (closed) and thus it supersedes it
We can currently provide the following parameters through both the Trace action and the underlying tracing interface (tracetools_trace):
- kernel event names
- userspace event names
- context field names
There are however more parameters for LTTng that users might want to set:
-
buffering scheme
- per-user or per-process
- applies to: each userspace channel in the userspace channel (kernel tracing always has a "global root user" buffer)
- userspace channels must use the same buffering schemes
- set at: channel creation time
- current default: per-user for the one UST channel we create
-
event record loss mode
- discard or overwrite
- applies to: separately to each userspace and kernel tracing channel
- set at: channel creation time
- current defaults: overwrite for userspace tracing and for kernel tracing
-
sub-buffer size and count
- size in bytes
- total ring buffer size = sub-buffer size * count
- sub-buffer count is pointless in discard mode
- applies to: separately to each userspace and kernel tracing channel
- set at: channel creation time
- current defaults: 8 sub-buffers of 2*4096 bytes and 8 sub-buffers of 8*4096 bytes
-
❗ since we are using discard mode, we could use only 2 sub-buffers to reduce sub-buffer switching frequency (and thus reduce tracing overhead, see LTTng doc)
-
- size in bytes
-
timers
- switch timer, read timer, monitor timer
- values in microseconds
- value of 0 means the timer is disabled
- use read timer for real-time
- monitor timer
- applies to: separately to each userspace and kernel tracing channel
- set at: channel creation time
- current defaults: switch timer disabled and read timer set to 200 for userspace tracing and kernel tracing
- switch timer, read timer, monitor timer
-
channel output type
- mmap or splice
- how the ring buffers are shared between the tracer and the consumer daemon
- splice is only available for the kernel domain, so userspace channels can only use mmap
- applies to: separately to each userspace and kernel tracing channel
- set at: channel creation time
- current defaults: mmap for userspace tracing and kernel tracing
- mmap or splice
There are other advanced parameters:
- maximum trace file size and count
-
recording session mode
- local, network streaming to relay daemon, snapshot (through command or rule-based trigger), live move to relay daemon for live reading
- current default: local
We need to find a middle ground between not allowing users to set any of these other parameters and allowing users to set all of these parameters.
Some options:
- Allow users to customize some of the defaults above (through the
Traceaction andtracetools_trace)- buffering scheme, event loss record mode, sub-buffer size and count, timers, channel output type
- channel output type might not be that important
- users wouldn't really be able to set other parameters separately (see item 2 below)
- Re-implement the
Traceaction (or more likely provide a second one) that only interacts with the LTTng CLI instead of the Python interface to allow setting any and all parameters (so puttingtracetools_tracefor this)- as a simpler way to create a bunch of
ExecuteProcessactions to execute a bunch of commands- I say "a bunch," but it's probably less than a dozen commands
- might provide some utilities like replacing some placeholders in command strings
- and then of course we automatically call
lttng stop && lttng destroyat the end, similar to how we stop tracing now through the LTTng Python interface
- and then of course we automatically call
- because we might not be able to modify some parameters twice. For example, we might not be able to modify attributes of an existing channel (created using the Python interface) using the LTTng CLI
- as a simpler way to create a bunch of
Those two options are not mutuall exclusive of course, but I would lean towards option 2 because:
- ROS 2 itself doesn't generate that many events, so the default values we currently use are just fine and thus most users probably won't need to go any further
- using 1 channel for all userspace events and 1 channel for all kernel events is just fine, etc.
- Those who do probably wouldn't mind directly providing LTTng commands
- It would avoid making the
tracetools_tracecode any more complicated;ros2 traceprobably doesn't need to support all of this
Edited by Christophe Bédard