Skip to content

Diagnostics System

For the Diagnostics system to work correctly with a device, the device software must implement certain things.

MQTT Messages

At the most basic level, the Diagnostics system operates through two MQTT messages. One is sent from the platform to the device. The other is regularly broadcast from the device.

Diagnostics Request

Topic

mon/d/<device-id>/ctrl/diagnostics

Example Payload

{ "level": "debug", "broadcast_interval": 60, "debug_level": "warn" }

This command is sent from the platform to the device to request that the device update its diagnostics configuration.

Diagnostic Levels

You can find information about each of the diagnostic levels here.

By default, every device should assume that the diagnostics level is normal.

A device is not required to support all the diagnostics data file types.

Broadcast Interval

Diagnostic broadcast intervals are documented here.

By default, every device should assume that the broadcast interval is null. This value indicates that the device should completely disable broadcasts of the diagnostics information MQTT message.

Devices may be limited in how frequently they can reasonably broadcast diagnostics information. If the broadcast_interval value is too high or too low for the device to support, the device should select the closest broadcast_interval value it can support. For example, if a broadcast_interval value of 1 is given, but the device can only support broadcasting diagnostics information at most once a minute, then the device should use a broadcast_interval value of 60.

Debug Level

The debug levels are documented here.

By default, every device should assume that the debug level is info. This means that more verbose debug messages should be ignored.

Broadcast Diagnostics Information

Topic

mon/d/<device-id>/mon/diagnostics

Example Payload

{
"level": "debug",
"broadcast_interval": 60,
"debug_level": "warn",
"timestamp": 1727798008783,
"modules": {
"display-app": {
"uptime": 131234,
"cpu_usage": [24],
"cpu_temperature": [30],
"memory_usage": 2345553141,
"memory_total": 4000000000,
"storage_usage": 62831400000,
"storage_total": 64000000000
},
"can-controller-1": {
"is_connected": true,
"uptime": 43230
},
"can-controller-2": {
"is_connected": false,
"uptime": 0
}
}
}

Payload Contents

The device should report all the values it is currently using for the diagnostic configuration settings:

These values are important because they allow the platform to know which values are actually currently being used by the device. For example, the platform might send a broadcast_interval of 3 to the device, but the device falls back to 60 because it is not capable of broadcasting diagnostics data more frequently than once a minute. It is important for the device to report the actual value being used for each configuration setting, so that the platform can accurately record the status of the device.

The device should report its current timestamp reading in milliseconds from the UTC epoch.

  • This can be useful for diagnosing problems in the device’s internal clock or calculating network delay.
Modules Diagnostics

The modules object is used to report diagnostics information from each module connected to the device.

The keys of this object should correspond to the key of each module.

The diagnostic information for each module can contain whatever information is useful. This information can easily be displayed in a dashboard. Spoke Zone has built-in support for several pieces of diagnostics information.

Built-in Diagnostic Values
  • uptime - an integer
    • Represents the number of seconds the device has been running since the last boot.
    • You can get this on a Linux system with cat /proc/uptime | awk '{print $1}'
  • cpu_usage - an array of numbers (integer or floating point) in the inclusive range [0, 100]
    • Represents the usage of each CPU core as a percentage.
    • If a device only has one CPU core, the array should only include one item.
  • cpu_temperature - an array of numbers (integer or floating point)
    • Represents the temperature of each CPU core in degrees Celsius.
    • If a device only has one CPU core, the array should only include one item.
  • memory_usage - an integer
    • Represents the device’s RAM usage in bytes.
  • memory_total - an integer
    • Represents the device’s total RAM in bytes.
  • storage_usage - an integer
    • Represents the device’s storage usage in bytes.
  • storage_total - an integer
    • Represents the device’s total storage space in bytes.