Saturday, May 24, 2025

[Tech] What do SATA/AHCI SStatus and SControl mean?

 If you're troubleshooting a SATA drive connection in Linux and you run into errors like this:

  [14465.577496] ata2: SATA link down (SStatus 4 SControl 300)

  [14476.281867] ata2: SATA link down (SStatus 4 SControl 3F0)

 You will likely have trawled the open Internet for solutions and found wonderful advice like "check your cables" and "hard drive is dying". In one case you even get the odd "you dirty boy, elsewhere in your dmesg you loaded a non-GPL module!"

None of it, however, actually attempts to understand *the error message itself*!!

Of course, this is Linux Land, so "just read the source" is the general ethos, so I did. Hopefully I can spare you the pain.

In Linus' Github mirror, you can find this section:

https://github.com/torvalds/linux/blob/b1427432d3b656fac71b3f42824ff4aea3c9f93b/drivers/ata/libata-core.c#L3190 

You don't have to click that, though:

    if (sata_scr_read(link, SCR_STATUS, &sstatus))
        return;
    if (sata_scr_read(link, SCR_CONTROL, &scontrol))
        return;

    if (ata_phys_link_online(link)) {
        tmp = (sstatus >> 4) & 0xf;
        ata_link_info(link, "SATA link up %s (SStatus %X SControl %X)\n",
                  sata_spd_string(tmp), sstatus, scontrol);
    } else {
        ata_link_info(link, "SATA link down (SStatus %X SControl %X)\n",
                  sstatus, scontrol);
    }

So what this message is reporting is merely the hardware register values from the SATA device itself when this function is called. Neat.

So, what are these registers? Well:

https://github.com/torvalds/linux/blob/b1427432d3b656fac71b3f42824ff4aea3c9f93b/drivers/ata/ahci.h#L124

     PORT_SCR_STAT        = 0x28, /* SATA phy register: SStatus */
     PORT_SCR_CTL        = 0x2c, /* SATA phy register: SControl */

The SATA register interface is defined as part of AHCI. From Intel's AHCI spec (thanks, Wikipedia!):

3.3.11 Offset 2Ch: PxSCTL – Port x Serial ATA Control (SCR2: SControl)

3.3.10 Offset 28h: PxSSTS – Port x Serial ATA Status (SCR0: SStatus)

In my case, SStatus of 4 means:

4h Phy in offline mode as a result of the interface being disabled or running in a
BIST loopback mode

So, the physical connection part of the SATA controller is off for some reason.

Also, for SControl: 

300h: no detection or init requested, no speed negotiation restrictions, Partial and Slumber sleep disabled

3F0h: no detection or init requested, RESERVED, Partial and Slumber sleep disabled

Unfortunately this doesn't also print SCR1 SError, so it's hard to say what actually went wrong.

Saturday, April 12, 2025

A Letter

 Dear Doctor,

 It would seem that I'm here for the same thing I was here for a decade ago, when you said it would be fine, that it's very manageable, just do these things and don't worry.

Those things didn't work. Well, they did work! But I didn't do them consistently, so my body didn't learn and grow, so they didn't have the desired work.

I know this is frustrating for you. You have all of the research and training of your profession available to you, pointing to the pattern of healing that best fits my circumstance. With such a clear path, you still watch me come back asking for help after not really taking it.

Take a deep breath. Feel the frustration, the rage, the impotence. Let it flow. Let it pass.

Then walk with me.

On this path we should be out of shelling range. Do you see these trenches? The gnawed, chewed, desolate no-man's land? The emplacements? Their complements on the far side? Here, you can borrow my binoculars.

The line hasn't moved in ten years, yet we fight. Wave after wave of all the little packets of executive function I can muster, mowed down day after day. These, my men, trained, ready, willing, enough to hold the line but never enough to move it. A field of corpses of days, weeks, months, years past, but no progress, just piles.

Walk with me again.

It's hot here. The jungle thrives on the heat and the moisture. They say that our objective is that hill, just over there.

As we walk, notice the tenuous grasp we have. Forest and plains gear rots in the jungle's stew, so we improvise everything. Every foot of paved, level, or open ground cost so, so much. Lives, opportunities, truncated futures relegated to never being, entire barges of materiel.

Up here is a trail into the jungle towards the hill. Note the ruts, the gravel, the slashes and paint on the trees. Out here it's seems like barely a walking trail, yet if you stub your toe don't look down. The sunken truck shoring up this square foot mocks you with its empty, dead eyes, only a bit of fender and bumper and cowling reminding you that there are no masters in the swamp, only the swamp and what the swamp decides to swallow. I think the driver escaped this one. Maybe. Don't dig.

The missions into the swamp sometimes come back with interesting stories and close calls or even strange discoveries. The others find unmarked graves to record their progress...and their inventory.

Yet X marks the spot, and as we can, we try. Occasionally the jungle provides, but usually it sides with the swamp.

So as you cradle your grief that a patient didn't follow the golden path to healing, please find it in your heart to cry for me too. Only recently did we figure out machetes that don't blunt after two strokes, and the promise of air support whispers in the wind. There may even be peace one day. One day the golden path will be clear.

It is not clear today.

Sincerely,

Your Patient

Sunday, February 9, 2025

[Tech] ESPHome, Raspberry Pi, PlatformIO, and "sh: 1: xtensa-esp32s3-elf-g++: not found"

If you have bumped into this locked thread:

https://github.com/esphome/issues/issues/3904

You may be facing NEITHER a PlatformIO bug NOR an ESPHome bug. IF you are on a non-64-bit Raspberry Pi 2 (so, most of them, I think, but some were 64-bit-capable), it might just be that the toolchain was linked incorrectly for Ubuntu (and probably all Debian-based distros). The easy way to tell is:

$ uname -a
Linux pi 5.15.0-1071-raspi #74-Ubuntu SMP PREEMPT Fri Jan 17 12:09:29 UTC 2025 armv7l armv7l armv7l GNU/Linux

If you see 'aarch64' or you aren't on Ubuntu 22.04, then the thread is 100% a better source of info than this post.

Of interest is the first error in this post:

https://github.com/esphome/issues/issues/3904#issuecomment-1552547642

$ ./.esphome/platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-g++
-bash: ./.esphome/platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-g++: No such file or directory

We can elaborate with strace:

$ strace ./xtensa-esp32s3-elf-g++
execve("./xtensa-esp32s3-elf-g++", ["./xtensa-esp32s3-elf-g++"], 0xbedd91a0 /* 27 vars */) = -1 ENOENT (No such file or directory)
strace: exec: No such file or directory
+++ exited with 1 +++

See, this means that the ENOENT comes from very, very early in process start. In my experience (please don't ask), this means the loader is missing, which, indeed, it is:

$ file xtensa-esp32s3-elf-g++
xtensa-esp32s3-elf-g++: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=3b97cade4ee2d3b55df21b3dd333eb95dd42f5dd, stripped

$ file /lib/ld-linux.so.3
/lib/ld-linux.so.3: cannot open `/lib/ld-linux.so.3' (No such file or directory)

I was worried that the EABI5 and armv7l mismatch was going to cause problems, but no, as mentioned here:

https://github.com/esphome/issues/issues/3904#issuecomment-1554071496 

it's just a matter of letting the kernel find the dynamic loader where the ELF binary says it should be:

$ sudo ln -s /lib/arm-linux-gnueabihf/ld-linux.so.3 /lib/ld-linux.so.3

In my case that was actually:

$ sudo ln -s /lib/arm-linux-gnueabihf/ld-linux.so.3 /lib/ld-linux.so.3 

So, maybe it could be considered a PlatformIO bug that it pulls an Espressif toolchain with bad dynamic linking configuration, an Espressif toolchain bug that their toolchain was not available linked for Ubuntu, or a Linux bug that dynamically linked binaries are a kludge. Either way, acidic and vague as they are on that thread, [ssieb] is right that this is NOT an ESPHome bug.