Every time the systemd thing comes up, I want to hate it, but I don't truly
know enough about it to actually hold a defensible opinion.
One of the defects constantly levelled against systemd is its propensity to corrupt its own system logs, and how the official response to this defect is to ignore it. The uselessd page has a link to the bug report in question, which was reported in May 2013 and, over a year later closed and marked NOTABUG. However, it seems Mr. Poettering is getting annoyed by people using his own bug reports against him, and added a comment to the bug report today purporting to clarify his position.
Unfortunately, his "clarifications" serve only to reinforce my suspicion that systemd is a thing to be avoided. To wit:
Since this bugyilla [sic] report is apparently sometimes linked these days as an example how we wouldn't fix a major bug in systemd:
Well, yeah, corrupt logs would be regarded by many as a major bug...
...Now, our strategy to rotate-on-corruption is the safest thing we can do, as we make sure that the internal corruption is frozen in time, and not attempted to be "fixed" by a tool, that might end up making things worse. After all, in the case the often-run writing code really fucks something up, then it is not necessarily a good idea to try to make it better by running a tool on it that tries to fix it up again, a tool that is necessarily a lot more complex, and also less tested.
Okay, so freeze the corrupted data set so things don't get worse, and start a new data set. A reasonable defensive practice. You still haven't addressed how the corruption happened, or how to fix it.
Now, of course, having corrupted files isn't great, and we should make sure the files even when corrupted stay as accessible as possible. Hence: the code that reads the journal files is actually written in a way that tries to make the best of corrupted files, and tries to read of them as much as possible, with the the subset of the file that is still valid. We do this implicitly on every access.
Okay, so journalctl tries to be robust, assumes the journal data might be crap, and works around it. So we can assume journalctl is probably pretty solid and won't make things worse.
Hence: journalctl implicitly does on read what a theoretical journal file fsck tool would do, but without actually making this persistent. This logic also has a major benefit: as our reader gets better and learns to deal with more types of corruptions you immediately benefit of it, even for old files!
....Uhhhhh-huh. So, yeah, newer tools will do a better job of working around the corruption, and we'll be able to recover more data, assuming we kept known-corrupt logs around. But what I still don't understand is WHY THE LOGS ARE CORRUPT. And why aren't there log diagnostic and analysis tools? If you already know your logs can turn to crap, surely there are structure analysis tools around that let you pick through the debris and recover data that your automated heuristics can't.
And why do I get the feeling that implied in the above is, "You don't need to know the log structure or how to repair it. We'll write the tools for that. We'll release better tools when we get around to it?"
File systems such as ext4 have an fsck tool since they don't have the luxury to just rotate the fs away and fix the structure on read: they have to use the same file system for all future writes, and they thus need to try hard to make the existing data workable again.
....AAAAnd you lost me. Seriously, this is your defense: "Filesystems are more important than system logs, so they have to try harder?" I find this insinuation... surprising. You do realize that btrfs didn't become worthy of general use overnight, right? (Some might argue it still hasn't.) It took years of development, and hundreds of people risking corrupt or destroyed filesystems before the kinks got worked out, and the risk of lost or corrupt files approached zero. More significantly, during this long development time, no one ever once suggested making btrfs the default filesystem for Linux. People knew btrfs could ruin their shit. No one ever suggested, "Oh, well, keep a copy of the corrupt block image and format a new one; we'll release better read tools Real Soon Now." No one suggested putting btrfs into everyday use until it proved its reliability.
Likewise, until it can demonstrate to the same level of reliability as common filesystems that it doesn't trash data, systemd is experimental -- an interesting experiment with interesting ideas and some promise, but still an experiment. I would appreciate it if you didn't experiment on my machines, thankyouverynice.
I hope this explains the rationale here a bit more.
No, sir. No it does not.
P.S: Is there any evidence to suggest that systemd log corruption issues have since been solved?