Skip to content

Structured logging via logfmt

Alex requested to merge alx.k/ns-3-dev:logfmt into master
  • Add documentation on the workflow using agrind
  • Update log examples in the documentation
  • Convert the rest of MacLow and WifiPhy

I have previously proposed to add structured logging support to NS-3 as a GSoC 2019 project: https://www.nsnam.org/wiki/GSOC2019Projects#Structured_logging_support Read the proposal for background on this PR.

C++ lacks good structured logging libraries, and changing the logs to build dictionaries instead of strings sounds like a lot of work. Today I learned about logfmt format, which turns out to be an ideal format for structured logs that is easy to write on the C++ side, easy to read, and can be converted to JSON lines or any other format to get support from all the tooling available for structured log analysis.

In this MR I have changed the logs to write time prefix, level prefix and context in logfmt format, as well as some messages in wifi module. It is already pretty usable, which shows that we can convert logs to this format gradually over time.

To demonstrate how structured logs work, I have ran the following example with new logs enabled:

$ NS_LOG="WifiPhy=debug|prefix_node|prefix_time" ./waf --run "wifi-rate-adaptation-distance --standard=802.11a --staManager=ns3::MinstrelWifiManager --apManager=ns3::MinstrelWifiManager --outputFileName=minstrel --steps=5" 2>1.log

Resulting log looks like this:

time_s=0.004954016 node=1 tag=sync_to_signal msg="Sync to signal with power power_w" power_w=6.8767e-09
time_s=0.004958016 node=1 tag=start_rx_header msg="Start receive header" snr_db=42.3398 per=0
time_s=0.004974016 node=1 tag=rx_header_nonht msg="Received non-HT PHY header"
time_s=0.004974016 node=1 tag=rx_psdu msg="Receiving PSDU"
time_s=0.005050016 node=1 tag=rx_status mode=6000000 snr_db=42.3398 per=0 size=54 relative_start_ns=0 duration_ns=96000
time_s=0.005050016 node=1 tag=rx_succeeded msg="Reception succeeded" psdu=0x55c42435f9e0
time_s=0.107354000 node=0 tag=no_power_restriction msg="Transmitting without power restriction"
time_s=0.107354016 node=1 tag=sync_to_signal msg="Sync to signal with power power_w" power_w=6.8767e-09
time_s=0.107358016 node=1 tag=start_rx_header msg="Start receive header" snr_db=42.3398 per=0
time_s=0.107374016 node=1 tag=rx_header_nonht msg="Received non-HT PHY header"
time_s=0.107374016 node=1 tag=rx_psdu msg="Receiving PSDU"
time_s=0.107450016 node=1 tag=rx_status mode=6000000 snr_db=42.3398 per=0 size=54 relative_start_ns=0 duration_ns=96000
time_s=0.107450016 node=1 tag=rx_succeeded msg="Reception succeeded" psdu=0x55c42435b8e0
time_s=0.120034000 node=1 tag=no_power_restriction msg="Transmitting without power restriction"
time_s=0.120034016 node=0 tag=sync_to_signal msg="Sync to signal with power power_w" power_w=6.8767e-09
time_s=0.120038016 node=0 tag=start_rx_header msg="Start receive header" snr_db=42.3398 per=0
time_s=0.120054016 node=0 tag=rx_header_nonht msg="Received non-HT PHY header"
time_s=0.120054016 node=0 tag=rx_psdu msg="Receiving PSDU"
time_s=0.120122016 node=0 tag=rx_status mode=6000000 snr_db=42.3398 per=0 size=46 relative_start_ns=0 duration_ns=88000
time_s=0.120122016 node=0 tag=rx_succeeded msg="Reception succeeded" psdu=0x55c4242ac400
time_s=0.120138016 node=0 tag=no_power_restriction msg="Transmitting without power restriction"
time_s=0.120138032 node=1 tag=sync_to_signal msg="Sync to signal with power power_w" power_w=6.8767e-09
time_s=0.120142032 node=1 tag=start_rx_header msg="Start receive header" snr_db=42.3398 per=0
time_s=0.120158032 node=1 tag=rx_header_nonht msg="Received non-HT PHY header"
time_s=0.120158032 node=1 tag=rx_psdu msg="Receiving PSDU"
time_s=0.120182032 node=1 tag=rx_status mode=6000000 snr_db=42.3398 per=0 size=14 relative_start_ns=0 duration_ns=44000
time_s=0.120182032 node=1 tag=rx_succeeded msg="Reception succeeded" psdu=0x55c4243609f0

It can be converted to JSON lines format by filtering out structured log lines, which all have tag key and parsing as logfmt with agrind:

$ agrind <1.log '" tag=" | logfmt' -o json > 1.jl

Filtering by tag is not strictly necessary as any garbage can be parsed as logfmt and filtered out later.

After collecting the log, it can be analyzed with the following Python 3 script:

#!/usr/bin/env python3
import numpy as np
import pandas as pd
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure


def main():
    data_frame = pd.read_json("1.jl", lines=True)

    figure = Figure()
    ax = figure.add_subplot()

    for node, df in data_frame.query('tag == "sync_to_signal"').groupby("node"):
        ax.plot(
            df["time_s"],
            30.0 + np.log10(0.1 * df["power_w"]),
            ["x", "+"][node],
            label=f"Node {node}",
        )

    ax.set_xlabel("Time, s")
    ax.set_ylabel("Power, dBm")
    ax.legend()
    ax.grid()

    canvas = FigureCanvas(figure)
    canvas.print_figure("1.png")


if __name__ == "__main__":
    main()

Script outputs the following image:

1

It seems to be a viable approach to make logs way more usable then they currently are. You can have such scripts lying around to quickly visualize which conditions nodes operate in and which mechanisms are triggered, e.g. plot established connections, retransmission timeout events vs current congestion window sizes and find out which nodes have abnormal behavior in large scenarios. And it does not require experiment modification, compared to adding traces, binding to them and writing C++ code to analyze data or collect it into multiple CSV files.

Edited by Alex

Merge request reports

Loading