Linux kernel doesn't care about your disk health

We set out to add disk health monitoring to simob, the open-source agent powering the Simple Observability platform. On paper, it was a simple addition: pull some SMART data, ship it to the Simple Observability backend, and call it a day.

What looked like a relatively small task turned into a frustrating tour of thirty years of legacy storage protocols, kernel permission traps, and the realization that “modern” NVMe support is still a second-class citizen in the world of standard Linux daemons.

Wall #1: SMART is messy

A quick introduction first: SMART (Self Monitoring Analysis Reporting Technology) is a health-tracking mechanism baked into almost every SATA drive since the nineties. It’s a great early warning system for predicting gradual failures, but it carries three decades of legacy baggage.

In theory, SMART provides a structured table of health indicators. In practice, it’s a vendor-specific wild west. A modern drive might report anywhere from 20 to 50 attributes, but there is no rigid standard for what those attributes actually mean.

Each entry in the table gives you more than just a number, it gives you a riddle:

An ID: Heavily vendor-dependent. Vendor A might use ID XXX for a metric while Vendor B uses YYY for the exact same thing.
A normalized score: Usually a value from 0 to 100. Score starts at 100 and goes down with time.
A “worst” value: The lowest the score has ever dropped.
A raw value: This is where the real pain lives. Different manufacturers interpret “raw” differently, making the data notoriously difficult to parse.

The lack of standardization makes SMART data notoriously hard to interpret, which is why smartmontools is the gold standard in the Unix world. It maintains a massive, human-curated database of drives to map these messy values into something readable.

If you run it with JSON output, you can see just how much metadata it has to juggle beyond just the health scores:

$ sudo smartctl -a /dev/sda --json | jq 'keys'
[
  "ata_sct_capabilities",
  "ata_smart_attributes",
  "ata_smart_data",
  "ata_smart_error_log",
  "ata_smart_selective_self_test_log",
  "ata_smart_self_test_log",
  "ata_version",
  "device",
  "firmware_version",
  "in_smartctl_database",
  "interface_speed",
  "json_format_version",
  "local_time",
  "logical_block_size",
  "model_family",
  "model_name",
  "physical_block_size",
  "power_cycle_count",
  "power_on_time",
  "rotation_rate",
  "sata_version",
  "serial_number",
  "smart_status",
  "smartctl",
  "temperature",
  "trim",
  "user_capacity",
  "wwn"
]

Diving into the ata_smart_attributes, you get a glimpse of the mapping at work:

$ sudo smartctl -a /dev/sda --json | jq '.ata_smart_attributes.table[0]'
{
  "id": 5,
  "name": "Reallocated_Sector_Ct",
  "value": 100,
  "worst": 100,
  "thresh": 10,
  "when_failed": "",
  "flags": {
    "value": 51,
    "string": "PO--CK ",
    "prefailure": true,
    "updated_online": true,
    "performance": false,
    "error_rate": false,
    "event_count": true,
    "auto_keep": true
  },
  "raw": {
    "value": 0,
    "string": "0"
  }
}

Wall #2: You need root

Here is where it gets tricky. smartctl (smartmontools binary utility) needs root access to function. This is a significant hurdle because simob runs as a non-privileged user (like any sane monitoring agent should).

It feels like a weird inconsistency. Why can you fetch granular disk I/O stats as a regular user, but not SMART data?

The answer lies in how the kernel mediates hardware access. For standard metrics like disk throughput or latency, the kernel acts as a middleman. It tracks these values in its own memory to manage process scheduling and resource allocation. When you ask for I/O stats, the kernel just hands you a copy of the numbers it’s already keeping.

SMART attributes are different. The kernel doesn’t care about them for its day-to-day operations, so it doesn’t bother tracking them. To get that data, a tool has to talk directly to the hardware via ioctls.

Bypassing the kernel to whisper directly to the disk is a privileged capability for a good reason: if you can send raw commands to the drive, you can potentially bypass file system permissions or even brick the hardware.

Wall #3: A workaround that isn’t enough

There is one potential escape hatch: udisks2.

On most modern Linux distros, udisks2 is the system daemon responsible for managing storage devices. Its primary job is mounting drives and managing partitions, but it also has the capability to query SMART data on your behalf. It sits there with the high-level privileges we lack, acting as a middleman.

You start by identifying the drive object:

$ udisksctl info -b /dev/sda

Once you have the drive’s identifier, you can pull a high-level health report:

$ udisksctl info -b /dev/sda | grep Smart
    SmartEnabled:                               true
    SmartFailing:                               false
    SmartNumAttributesFailedInThePast:          0
    SmartNumAttributesFailing:                  0
    SmartNumBadSectors:                         0
    SmartPowerOnSeconds:                        281872800
    SmartSelftestPercentRemaining:              0
    SmartSelftestStatus:                        success
    SmartSupported:                             true
    SmartTemperature:                           315.15000000000003
    SmartUpdated:                               1775581936

It’s worth noting that udisks2 doesn’t give you the full, raw attribute table that smartctl provides. You lose some of the granular “vendor-specific” nuance.

Wall #4: NVMe breaks everything

Just when we thought udisks2 was the silver bullet, we hit the modern era. If you’re running NVMe drives, udisks2 often goes silent. No SMART data is returned.

The reason is that NVMe support was only added in udisks 2.10.0. If you are on an older version, the tool is essentially blind to the NVMe bus.

NVMe is a completely different protocol. It ditches the messy, vendor-dependent SMART IDs in favor of a Standardized Health Information Log.

Unlike the SATA “Wild West,” the NVMe log uses a fixed set of fields that every drive must report. There are no arbitrary “normalized scores”. You get real, raw values.

If you run the nvme-cli tool, you see a much cleaner, more logical structure:

$ sudo nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning			: 0
temperature				: 33 C (306 Kelvin)
available_spare				: 100%
available_spare_threshold		: 5%
percentage_used				: 5%
endurance group critical warning summary: 0
data_units_read				: 13 250 396
data_units_written			: 14 985 598
host_read_commands			: 82 857 028
host_write_commands			: 153 288 189
controller_busy_time			: 767
power_cycles				: 56
power_on_hours				: 17 636
unsafe_shutdowns			: 26
media_errors				: 0
num_err_log_entries			: 85
Warning Temperature Time		: 0
Critical Composite Temperature Time	: 0
Temperature Sensor 1           : 33 C (306 Kelvin)
Temperature Sensor 2           : 35 C (308 Kelvin)
Temperature Sensor 8           : 33 C (306 Kelvin)
Thermal Management T1 Trans Count	: 0
Thermal Management T2 Trans Count	: 0
Thermal Management T1 Total Time	: 0
Thermal Management T2 Total Time	: 0

Technically, this isn’t even SMART anymore. The industry just keeps using the name because it’s “sticky” branding. In fact, smartmontools only handles NVMe by formatting these health logs into a familiar structure so their legacy parsers don’t break. They do the heavy lifting of translation for you.

But unless you’re on those latest versions, udisks2 doesn’t perform this translation. It looks for the classic SMART attributes, finds none on the NVMe interface, and remains silent.

This journey left us with a nagging question: Why does the Linux kernel leave such a massive blind spot when it comes to disk health?

If you look at /proc/stat or /sys/class/thermal, the kernel is happy to give you near real-time data on CPU cycles, context switches, and thermal throttles. It’s understandable that the kernel needs this data to do its job, but disk health should be a first-class citizen too.

To find out if a disk is dying, we shouldn’t need a 5MB human-curated database like smartmontools living in user-space. If the kernel can normalize a thousand different Wi-Fi chips into a standard interface, it can certainly normalize a “Life Remaining” percentage for a block device.

Simple Observability is a platform that provides full visibility into your servers. It collects logs and metrics through a lightweight agent, supports job and cron monitoring, and exposes everything through a single web interface with centrally managed configuration.

To get started, visit simpleobservability.com.

The agent is open source and available on GitHub

Discuss on Hacker News