It seems like a bug in a storage driver in that case (if it's actually getting t...

simcop2387 · on March 26, 2020

Maybe? I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported. If there is maybe it advertised support and confuses the system when it doesn't work

bayindirh · on March 26, 2020

When you talk to disks via smartctl, the tool reports the specification versions they support. There's a ATA Version and SATA Version field for SATA disks. I was unable to get details on a SAS disk, but it was identified as a SAS drive successfully.

These standards probably define mandatory and optional commands to certify disks as compatible with these specs IMHO.

If the command is optional, then it's OK, but if it's not, then there's some bug fix what WD shall make.

rowanG077 · on March 27, 2020

For SAS drives I would recommend sg3_utils [1]. You can basically query what the drive supports via `sg_opcodes`.

smartctl isn't really designed to handle SCSI protocol I think. It can do basic things but for anything deep you better use sg3_utils.

[1] http://sg.danny.cz/sg/sg3_utils.html

bayindirh · on March 29, 2020

Thanks for the reply and the utility. I'll take a look into it. Since I'm familiar with smartctl due to my server management roles, it came to my mind so I've shared. I never thought it should be able to handle beyond what it needs to do get SMART and other diagnostic data.

Thanks again. :)

cornishpixels · on March 26, 2020

> I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported.

The OP shows errors that are reported to the OS by the drive when it attempts to use the command. Even if it can't pre-determine support for the command, it can fall back upon receiving an error.

chadcatlett · on March 26, 2020

Yes, it's called "REPORT SUPPORTED OPERATION CODES". If you have sg3_utils instead, sg_opcodes can be used to get the list of operations supported.

dataflow · on March 26, 2020

Why do you need Discovery? The command itself returns illegal opcode, that's sufficient right?

zaarn · on March 26, 2020

It's not always safe to simply try an opcode if it's valid because it might trigger something else... like a firmware update (which has happened)

dataflow · on March 27, 2020

Thanks, I suppose that answers the question of "why not try the opcode instead of doing command discovery". Though what I was really trying to understand was, "if you've already issued the command {for whatever reason}, and it returns invalid opcode, then shouldn't you fall back to an alternative command?" Because at that point, you have enough information to know you can do so safely. It seems to me that that's what the storage driver needs to do, irrespective of any command discovery or lack thereof beforehand.

zaarn · on March 27, 2020

There can be other reasons for command failure than "opcode not supported", even if that's the error code returned. I wouldn't trust cheaper harddrives to handle that properly either.

dataflow · on March 27, 2020

What would such a reason be? How likely is this to happen? If you have such a mistrust of the response then you can never trust anything, right? How do you know the drive isn't lying about everything else too? At some point you gotta trust something means what it says...

zaarn · on March 27, 2020

The trust is in what the drive identifies as supported.

The issue is that some command ops may be doing double duty in a different drive. Famously, a few CDROM drive vendors reused the "clear buffer" command to instead mean "update firmware". Linux used support for "clear buffer" to detect if a drive is a CDROM or CDRW drive. As a result, using such a specific CDROM drive under linux would quickly cause the CDROM drive to become permanently bricked.

You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

That applies to any command that the drive does not advertise support for via appropriate SAS and SATA commands. In some rare cases you might manually have a whitelist of commands supported by drives outside this list but you should never try to automatically discover it during runtime.

dataflow · on March 27, 2020

> You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

I still don't get this. If the damage is already done, then how is issuing the fallback going to change things? Again: I'm not arguing about whether discovery should be done or not. All I'm saying is, if the device says invalid opcode, you should use the fallback, whether or not there was any discovery that led you to use the initial opcode.

zaarn · on March 27, 2020

You don't know what state the drive is in anymore. The safest option is to reset the device entirely and start it back up again. If it comes back, you can use your fallback.

But it is much easier to rely on what is known to work instead of issuing potentially non-working commands to the point that there is no reason to have a fallback other than "rediscover what it supports".

I don't get why you would even want to use a fallback command on a drive that is in a potentially unknown or undefined state.

If discovery led to an invalid opcode the drive is faulty, end of story. The SAS and SATA standards are very clear on what is permitted and what is forbidden and that falls very far on the side of "not allowed".

dataflow · on March 27, 2020

Is this just a theoretical thing, or have there been actual drives that lied about invalid opcodes on a read and then proceeded to destroy the drive if you issued a fallback read? I have a hard time believing a hard drive would behave like a C compiler if I'm being honest...

zaarn · on March 27, 2020

As I mentioned earlier, there was a series of CDROM drives that upon receiving an unsupported command (and this was before you could discover it) would lead to all further data being interpreted as firmware data for an update and brick the device. If you issued a fallback read then the device would become bricked, if you reset the bus and reinitialized the device, everything was fine.

Discovery has of course improved this, so we know what a harddrive can and cannot do. Harddrives that lie about what they support shouldn't have the appropriate seals and trademarks of SATA or SAS on them, as they must be certified by those entities.

dataflow · on March 27, 2020

Oh wow, it would interpret every subsequent command as firmware data? I didn't realize that, that's completely nuts. Thanks for sharing that!