Diagnostics System
For the Diagnostics system to work correctly with a device, the device software must implement certain things.
MQTT Messages
At the most basic level, the Diagnostics system operates through two MQTT messages. One is sent from the platform to the device. The other is regularly broadcast from the device.
Diagnostics Request
Topic
mon/d/<device-id>/ctrl/diagnostics
Example Payload
{ "level": "debug", "broadcast_interval": 60, "debug_level": "warn" }This command is sent from the platform to the device to request that the device update its diagnostics configuration.
Diagnostic Levels
You can find information about each of the diagnostic levels here.
By default, every device should assume that the diagnostics level is normal.
A device is not required to support all the diagnostics data file types.
Broadcast Interval
Diagnostic broadcast intervals are documented here.
By default, every device should assume that the broadcast interval is null. This value indicates that the device should completely disable broadcasts of the diagnostics information MQTT message.
Devices may be limited in how frequently they can reasonably broadcast diagnostics information. If the broadcast_interval value is too high or too low for the device to support, the device should select the closest broadcast_interval value it can support. For example, if a broadcast_interval value of 1 is given, but the device can only support broadcasting diagnostics information at most once a minute, then the device should use a broadcast_interval value of 60.
Debug Level
The debug levels are documented here.
By default, every device should assume that the debug level is info. This means that more verbose debug messages should be ignored.
Broadcast Diagnostics Information
Topic
mon/d/<device-id>/mon/diagnostics
Example Payload
{ "level": "debug", "broadcast_interval": 60, "debug_level": "warn", "timestamp": 1727798008783, "modules": { "display-app": { "uptime": 131234, "cpu_usage": [24], "cpu_temperature": [30], "memory_usage": 2345553141, "memory_total": 4000000000, "storage_usage": 62831400000, "storage_total": 64000000000 }, "can-controller-1": { "is_connected": true, "uptime": 43230 }, "can-controller-2": { "is_connected": false, "uptime": 0 } }}Payload Contents
The device should report all the values it is currently using for the diagnostic configuration settings:
- Diagnostics level in
level - Broadcast interval in
broadcast_interval - Debug level in
debug_level
These values are important because they allow the platform to know which values are actually currently being used by the device. For example, the platform might send a broadcast_interval of 3 to the device, but the device falls back to 60 because it is not capable of broadcasting diagnostics data more frequently than once a minute. It is important for the device to report the actual value being used for each configuration setting, so that the platform can accurately record the status of the device.
The device should report its current timestamp reading in milliseconds from the UTC epoch.
- This can be useful for diagnosing problems in the device’s internal clock or calculating network delay.
Modules Diagnostics
The modules object is used to report diagnostics information from each module connected to the device.
The keys of this object should correspond to the key of each module.
The diagnostic information for each module can contain whatever information is useful. This information can easily be displayed in a dashboard. Spoke Zone has built-in support for several pieces of diagnostics information.
Built-in Diagnostic Values
uptime- an integer- Represents the number of seconds the device has been running since the last boot.
- You can get this on a Linux system with
cat /proc/uptime | awk '{print $1}'
cpu_usage- an array of numbers (integer or floating point) in the inclusive range[0, 100]- Represents the usage of each CPU core as a percentage.
- If a device only has one CPU core, the array should only include one item.
cpu_temperature- an array of numbers (integer or floating point)- Represents the temperature of each CPU core in degrees Celsius.
- If a device only has one CPU core, the array should only include one item.
memory_usage- an integer- Represents the device’s RAM usage in bytes.
memory_total- an integer- Represents the device’s total RAM in bytes.
storage_usage- an integer- Represents the device’s storage usage in bytes.
storage_total- an integer- Represents the device’s total storage space in bytes.