every system that can access plain text can access binary as well.
True, but it should use fewer cycles and fewer amp hours for the ARM system to translate the textual intermediate representation to x86 or ARM instructions than to translate an x86-64 binary to an intermediate representation and then to x86(-32) or ARM instructions.
Binary doesn't necessarily mean instruction code for any particular processor. Is this why people insist on text so much, because they've come to equate the word "binary" with "compiled executable?"
You certainly wouldn't want it to be x86 code anyway, as modern operating systems lack proper security measures to make running untrusted x86 code a good idea. You'd want a pseudo-bytecode which simply indicates operations to perform and you'd translate that into your native machine code, or just run an interpreter on it. The pseudo-bytecode would be designed such that it cannot even represent operations you don't want your untrusted code to perform and so, as long as your programmers aren't completely inept, converting it to native code and running that would be relatively safe.
Also, compiler-enforced memory safety seems like an accident waiting to happen.
How is memory safety enforced in a compiler more of "an accident waiting to happen" than the same enforced in hardware, which is the source of the terms "segfault" and "general protection fault"? Other than that Oracle has had a poorer track record than AMD and Intel, that is.
Well, first of all, have you ever known your CPU to fail to trigger a segmentation fault when a program performs an invalid access? So it's 100% effective then, right?
What we're concerned about is not detected accidents, but undetected ones. Like when someone finds some way to write a statement that performs something which isn't allowed but which the compiler fails to notice due to the unusual syntax of the statement. In binary form there's only so many ways you can describe an operation. In text there are far more due to there generally being no rules about whitespace, variable name length, the number of digits in your numbers, and many other things. So by using plain text you greatly increase the number of opportunities you have to screw up. That's just fine if you're one of those programmers who believes you'll never accidentally code a buffer overflow into your software, but if you're realistic and realize you're only human, then you realize the wise thing to do is to design a system that's as simple as it can be so that you give yourself as few opportunities to fuck it up as you can.
I mean, technically you can drive your car from the back seat just by sticking your ten year old in the driver's seat and telling them when to turn the wheel, and with some luck you may even be able to do this rather well, but it's a scheme that's more likely to result in disaster than simply sitting behind the wheel yourself, and so you just don't do it that way.
so does any processing of plain text, as nothing overflows buffers more easily than data which you have to accept no matter how ridiculous its length may be.
How is this any less true of images, audio, video, or any other content type on the World Wide Web?
It's not, and the fact that one particular browser exploit I remember involved using "width" and "height" tags on images with values prefixed with 64k zeros, I think it demonstrates my point rather well. We've been parsing text since the beginning of computing and it's still something we regularly fuck up all the time. We should avoid doing it whenever we don't have to do it, and when we do have to do it, we should do it on a machine under the control of the person who supplies the text, since if they hack themselves no one gives a shit.
As for audio and video, they of course include some variable length buffers, but it's kind of unavoidable to have a few if you want to do anything. The problem with plain text is that you end up with way more than a few since every single instruction that might otherwise just read a 4-byte float turns into hundreds of instructions working to decode a series of bytes that has to function correctly no matter how long that series of bytes is and no matter whether that series of bytes contains errors or not. ...and it's not just that. You start with a buffer of the entire text which may be any length. Then you split it into lines which also may be any length. Then you follow any length of whitespace until you find some sort of function name or whatever which again might be of any length, then maybe some numbers which may be of any length, etc. Plain text isn't just one buffer of data you have to avoid overflowing -- you end up with hundreds of them as you write function after function to split one item of unknown length into other items of unknown length, then you have to parse each of these things using one of many different functions (all another potential for error) depending on what type of data you expect it to be, and each of them has to work correctly even if the series of bytes they're given turns out to include some special case the programmer wasn't able to anticipate, or some case that isn't even valid, but which the programmer didn't think to look out for.