Our tools should not do what we ask if we ask them to do things they should not do.
If it were possible for a gun to refuse to shoot an innocent person then it should do that.
It just so happens that LLMS aren't great at making perfectly good decisions right now, but that doesn't mean that if a tool were capable of making good decisions it shouldn't be allowed to.
If you define the behavior of the system in an immutable fashion, it ought to serve as a guardrail to prevent anyone (yourself included) from fucking it up.
I want claude to tell me to fly a kite if I ask it to do something antithetical to the initially stated mission. Mixing concerns is how you end up spending time and effort trying to figure out why 2+2 seems to also equal 2 + "" + true + 1
If it were possible for a gun to refuse to shoot an innocent person then it should do that.
It just so happens that LLMS aren't great at making perfectly good decisions right now, but that doesn't mean that if a tool were capable of making good decisions it shouldn't be allowed to.