Revisit architecture for pure pursuit
Controller and Vehicle Interface Reference Implementation Review
This document seeks to define a concrete, and extensible interface for both the vehicle interface and the controller, while providing discussion points and rationale. A tl;dr for reviewers is at the bottom.
After discussion and consensus, this document can be rolled into a document for the controller, and a document for the vehicle interface which concretely lays out concerns, interfaces, and behaviors the components are expected to conform to.
High Summary
Below are a proposed high-level description of the controller and vehicle interface as broad components in the autonomous driving stack. Further details and discussion points for interface definitions (message definitions) can be seen further below in the document.
Controller
A controller's purpose is to take a reference sequence of kinematic vehicle states and the vehicle's current kinematic state, and generate a command which results in the error between the vehicle's actual kinematic state and the desired sequence being rejected or reduced.
A controller has three inputs:
- A Trajectory message, from a motion planner
- A position proxy message (vehicle state), from an ego state estimator
- Transforms (to handle vehicle states from a different coordinate frame from the trajectory)
A controller has two outputs:
- A control command message
- A algorithm diagnostic message
The controller is tightly coupled with a Vehicle Interface.
Vehicle Interface
The vehicle interface receives instantaneous commands from the AV stack, and translates them into real-world actuation via a drive-by-wire interface. Additional, well-documented logic may and should exist to prevent unsafe command sequences (e.g. state machine, low pass filtering). Additionally, gear shifting to satisfy e.g. negative accelerations from a stopped state may be a concern of this component.
The vehicle interface has two inputs:
- A control command message, from a controller
- A vehicle state command message, from a behavior planner
The vehicle interface has one main output:
- A vehicle State report message
Schematically, the setup looks as follows:
Legend: (unscoped components) |scoped components| Message
(Behavior Planner)
| (Localization/Map Store)
| (Motion Planner) /
| | Transforms
VehicleStateCmd Trajectory /
| | /
| V V
| |Controller| <-----VehicleKinematicState--- (Ego State Estimator)
| / \
| ControlCmd ControllerStatus
| / \---------------
V V \
|Vehicle Interface| ----VehicleStateRpt-----> (Logger?)
- Controller and Vehicle Interface Reference Implementation Review
Introduction
The goal of Autoware.Auto is to produce certifiable algorithms. To do this, rigorous development practices are necessary. This document intends to instantiate an aspect of these development practices by designing interfaces for the Controller aspect of the autonomous driving stack.
Having a concrete interface (input/output messages) defined for a given component will aid in interoperability of implementations as the Autoware.Auto ecosystem grows.
Why a controller
A controller is the core algorithmic component of a simple waypoint-following stack. The remaining components are largely boilerplate. As a result, this component should be appropriately scoped out as the central component of the first milestone implementation.
Related to this, a controller is a reasonable first choice for interface definition because it is relatively isolated in the AV stack.
Broadly speaking, the AV stack can be conceptualized as having five layers:
-------------------------------------
| Sensing |
| Driver1 | Driver 2 | ... | ... |
|-----------------------------------|
| Understanding |
|-----------------------------------|
| Reasoning |
|-----------------------------------|
| Planning |
|-----------------------------------|
| Vehicle Interface |
-------------------------------------
- Sensing is simply the interface layer with sensor hardware, e.g. LiDAR, camera, GNSS, etc. Maps can broadly be thought of as being in this category
- Understanding converts the raw measurements of the sensor devices into concrete observations, e.g. object detection, localization, etc.
- Reasoning involves applying higher-level reasoning on the observations from the previous layer, e.g. data fusion can happen at the level, tracking, so on
- Planning involves taking the aposteriori world understanding from reasoning, and appropriately generating plans pursuant to the vehicle's current goals (e.g. a route, safety, not hitting things, etc.)
- The vehicle interface then take atomic commands from the planning stack and commands the vehicle to actually do things
The planning stack with the vehicle interface can broadly be conceptualized as follows:
Map \ Object Tracks\
| ---------------- | ----------
V V V V
----------------------------------------------------------------------------------------------------
| Global Planning -> Behavior Planning -> Motion/Local Planning -> Controller -> Vehicle Interface | ->
----------------------------------------------------------------------------------------------------
^ ^ ^ ^
---------------- ------ --- ------------
\ --------Localization---/---------- /
(You get the picture)
From this view, it can generally be agreed that the controller and vehicle interface as less tightly coupled to the rest of the AV stack (representing the "bottom-most" portion), and thus the interfaces are probably the simplest with the fewest design ramifications. This makes it a good first candidate for attempting to architect things properly
Vehicle Interface
Generally speaking, the vehicle interface is conceptually straight forward. It is intended to take the place of a person sitting in the driver's seat.
As such, it should be able to do the following basic manipulations of the vehicle:
- Move the steering wheel
- Actuate gas and brake pedals
- Shift the vehicle's gears
- Manipulate vehicle lights (headlights, hazard lights, turning blinkers)
- Manipulate wipers (and wiper fluid)
In addition, such an interface should be able to report standard instantaneous vehicle information:
- Fuel/charge state
- Vehicle velocity
- Vehicle steering angle
- Vehicle light state
- Vehicle gear state
- Wiper state
At the same time, it is also the responsibility of the vehicle interface to ensure that not too much is asked of the vehicle, e.g.
- Crazy command signals (e.g. steer angles/throttle commands at opposite extremes)
- Unsafe operations (e.g. shifting gears at any velocity over a bare crawl, shifting out of autonomous mode when no one is in the driver seat)
As such, some reasonable program logic, and low-pass filtering is reasonable in the vehicle interface. This platform-specific logic and filtering should be well documented so developers of upstream algorithms e.g. a controller can properly adjust their configurations.
Additionally, to simplify the development of motion planners and controllers, the vehicle interface can have some autonomous ownership over gear shifting between drive and reverse. For example, if a control command is continually requesting negative acceleration while the vehicle is stopped (has zero velocity), then the vehicle may shift into reverse and begin applying throttle to satisfy this acceleration.
Outputs
Conceptually, we can group all of the report items together as an output, e.g.
# VehicleStateReport.msg
# Reports instantaneous state of vehicle
# Should be published at > 10 Hz, so it can be reasonably used in a planning loop (e.g. behavior, motion)
uint32 fuel_state # or float, but having some normalized range is probably a good idea
float32 velocity_mps # longitudinal, should be negative when reversed
float32 steer_angle_rad # 0 is neutral, CCW positive
bool is_hazard_on
bool hand_brake
int32 lamp # headlights: off, on, high, fog?
int32 blinker_state # left, right, off
int32 gear # e.g. 0 = Drive, 1 = Reverse, 2 = Park, 3 = Neutral, 4 = third gear ...
int32 wiper # 0 = off, up to some max speed, e.g. 100, does there need to be something for a rear wiper?
int32 auto_state # e.g. 1 = autonomous, 2 = ready, 0 = not ready (e.g. doors open etc.)
uint64 ecu_code # a field for reporting if there are internal ecu faults?
# Additional fields could include: tire pressures, angular rates of wheels, etc..
This message is intended to represent a generic vehicle state, something that is valid for all platforms (and thus vehicle interfaces). In the case that a platform cannot support a particular field, it should be agreed upon what default value should be provided for each field.
In the case of advanced vehicle platforms, e.g. with built-in radar, ultrasonics, GNSS, etc. The vehicle interface should publish that information on a different topic of the appropriate type.
Inputs
Similarly, the actuation of the vehicle can conceptually broken into two components: an always-on part, which also concerns the kinematic state of the vehicle, and an infrequent part controlling the rest of the vehicle.
Control Command
The high frequency parts can be bottled up into a straightforward message, which would come from a controller:
# VehicleControlCmd.msg
# Instantaneous actuation command of the brake/throttle and steering wheel
# Should be reported at > 20 Hz (e.g. 50-100-1000 Hz?)
# There are actual dynamics of the vehicle actuation system, so the interface should do it's best
# There is no timestamp, since this is something that should happen right now
float32 long_accel_mps
float32 steer_angle_rad
However a number of other message definitions are equally reasonable:
- Wrench(Stamped)
- Autoware's ControlCommand
- Autoware's VehicleCommand
Of these, ControlCommand
is perhaps the most appropriate. Wrench and WrenchStamped are both over
and ill-defined (e.g. 3 DoF in force, etc..). Similarly, VehicleCommand
also has many aliased
fields (e.g. ctrl_cmd
vs brake_cmd
--which should be obeyed?), in addition to combining
manipulation of the vehicle's non-kinematic states, which may happen at a different frequency.
State Command
While the lower frequency, non-kinematic commands can similarly be bottled up into another message:
# VehicleStateCmd.msg
# Other controls, should be relatively low frequency, e.g. < 10 Hz (1 Hz-ish?)
int32 blinker_state # left, right, off
bool hazard # true = on
int32 headlight # off, normal, high (fog, etc..)
int32 wiper # 0 = off ...
int32 gear
bool autonomous
bool hand_brake
The VehicleStateCmd
can come from a different source, e.g. a behavior planner.
Controller
In general, the controller should take a trajectory, which is a sequence of kinematic vehicle states, and a current vehicle pose or kinematic vehicle state, and determine a control input which should appropriately result in the vehicle following the state sequence (i.e. rejecting state error).
Outputs
The main output of the controller should be the control command consumed by the vehicle interface. There are a few options for this. See "Vehicle Interface::Inputs::Control Command" for more details.
Additionally, as an algorithm, it is important for the controller to report how it's doing. As such, a diagnostic message should also be published on every iteration.
# ControllerStatus.msg
# These first three could be a part of a standard "diagnostic header"
String controller_name # e.g. "pure_pursuit", "bicycle_mpc", etc.. could also be an integer type, but adds overhead of mapping
Time stamp
Time runtime
uint32 iterations
# (Error) terms specific to controllers
bool new_trajectory
String trajectory_source
String pose_source
float32 lateral_error_m
float32 longitudinal_error_m
float32 velocity_error_mps
float32 acceleration_error_mps
float32 yaw_error_rad
float32 yaw_rate_error_rps
Inputs
The inputs to the controller are a trajectory and a vehicle state proxy. In addition, transforms should also be consumed or consumable by the controller such that the trajectory and vehicle state proxy are in the same coordinate frame during internal algorithmic processing.
Trajectory
Fundamentally, a trajectory should come from a motion planner and have the following:
- A coordinate frame
- A time stamp
- A vehicle kinematic state sequence with at least:
- The vehicle's position
- When the vehicle should be in this state
- Orientation (probably definitely)
- Velocity (probably definitely)
- Higher order kinematics (acceleration, yaw rate, jerk etc..., maybe?)
In addition, a trajectory should be rooted in both space and time. That means if /base_link
is a
dynamic frame, and the given trajectory is in the /base_link
frame, then the vehicle's state
should be transformed relative to the /base_link
frame at the time stamp specified by the
trajectory message.
Further, it should be agreed whether or not transforming only position and heading is sufficient. The rationale is that velocity, and acceleration are assumed to be longitudinal, these are coupled to heading. Yaw rate is orthogonal to the 2D plane upon which a trajectory point lives.
Finally, the orientation should have an agreed-upon reference angle, for example yaw or heading 0 should correspond to the +x basis direction (or E in an ENU coordinate frame). Similarly, extraneous fields should also have agreed-upon default values if they cannot be provided. For example, if acceleration values are not provided, then the default value should be 0.
A couple of relevant messages exist:
============================
We can use JointTrajectory.msg, where we enforce:
float64[] positions # x and y and heading
float64[] velocities # longitudinal velocity; optional: heading rate
float64[] accelerations # optional: longitudinal acceleration, second heading derivative
float64[] effort # optional: longitudinal jerk, third heading derivative
Pros:
- Built-in type
- Some room for extensibility
- Possible interoperability with existing planners
Cons:
- Room for variability (e.g. switching x and y in positions)
- Double precision may or may not be needed
- Extensibility with semantic correctness is limited (e.g. up to third derivative)
=============================
Autoware's Waypoint
Pros:
- Very expressive, can be used in other contexts
Cons:
- Appears to be more for global planning
- Doesn't have kinematics
======================================
Alternatively, a custom message definition could be used:
# TrajectoryPoint.msg
Duration time_from_start
float x
float y
float heading # or a 2D quaternion
float velocity
# higher derivatives can be defined
# float acceleration
# float heading_rate
# ...
# Trajectory
std_msgs/Header header # or just string frame_id
TrajectoryPoint[] points
Pros:
- This type is specific to the autonomous driving use case (i.e. concrete)
- We would also be able to arbitrarily define further extensions to the type
- We have concrete, unambiguous fields
Cons:
- This is not a built-in type
Of these, I would advocate for the custom type as above. This is because it is concrete for the autonomous driving use case, and thus leaves less room for ambiguity.
Vehicle State
A vehicle state is needed as an input to the controller so that the controller knows where the vehicle is relative to the reference trajectory.
Fundamentally, a vehicle state should have the following:
- A coordinate frame
- A time stamp
- The vehicle's position (definitely)
- Orientation (probably definitely)
- Velocity (probably definitely)
- Higher order kinematics (acceleration, yaw rate, jerk etc..., maybe?)
Similarly, a couple of messages exist which are usable in this context:
- JointTrajectoryPoint (as mentioned above, but possibly stamped)
- PointStamped
- PoseStamped
- TransformStamped
- NavSatFix
Of these, I personally advocate for TrajectoryPointStamped
, since it would fit best with the
definition of trajectory. This could also be the output of a reasoning component, e.g. Ego State Estimator
, which can take in odometry, localization etc. to estimate the vehicle state.
Transforms
Finally, the controller should also have access to the /tf
topic, so that it can appropriately
transform it's vehicle state to a coordinate frame consistent with the trajectory
References
Dataspeed can provide inspiration for the types needed for a vehicle interface:
Autoware has many relevant message types:
Apollo also has interfaces which we can use for inspiration:
Controller/Interface messages
Configuration Messages
Algorithm Status Messages
tl;dr (Questions for Reviewers)
Fundamentally, we should agree on things in order of increasing granularity:
Component Concerns
Do you agree that the controller (as a node or group or nodes) should accept vehicle state and trajectory, and emit control commands such that trajectory error is rejected?
Do you agree that the vehicle interface accepts commands and actuates them (with potentially extra logic e.g. drive<->reverse gear shifting, low pass filtering), while applying logic to gate-keep unrealistic or unsafe commands?
Component Interfaces in General
If you agree with the general concerns of the component, then please respond to the following:
Do you agree that the controller should have at least the following inputs:
- A trajectory type, from a motion planner (~5-20 Hz)
- A vehicle state type, from an ego state estimator (> 20 Hz)
- Transforms
And at least output the following:
- A diagnostic/status type
- A control command type (> 20 Hz)
Do you agree that the vehicle interface should at least have the following inputs:
- A control command type, from a controller (> 20 Hz)
- A vehicle state command type, from a behavior planner (~1-5 Hz)
And at least the following output:
- A vehicle state report (>= 10 Hz)
Component Interfaces in Detail
If you agree with the above interfaces and concerns, then please look over the interface definitions in detail, agree/disagree/propose changes with them.