Comment Re: Why is this not easy? (Score 1) 22
Think of something like Linux. When you boot it up, it prints a banner, which contains the version and timestamp and who built the kernel. (Linux 6.7.1 built on date by blah). That's a timestamp - it's handy during development because hey, the version number might stay the same, but the timestamp gives you a rough idea of where to look at what changes it might have. But they're murder on reproducible builds.
Another one is if you're doing parallel builds - the build server may have 20+ cores on it to make builds speedy so you build Linux using all 20 cores. But which cores get which source files is completely random and if you have an Intel performance and efficiency core thing going, the E cores would take longer, and this can affect the link order of the kernel objects. Changing the link order may mean objects are laid out differently in the final binary as well as branch and jump addresses. Especially tricky since the kernel partially links every file first.
The fun one to get is filesystem layouts. When you traverse a directory, you often get files in the order the files were created in the directory. If your build system does parallel builds, the order the files are created may no longer be deterministic because as the code is built, it's placed int he output directories randomly. When the filesystem image is built, it's often done by pointing the tool at a directory, so the tool enters the directory and then traverses it creating the image as it encounters files and directories. If the build system puts files and directories in a non-deterministic order you can often get files added randomly which can mess up the image. COmpression and encryption compound the problem. (Most tools use calls like opendir() and readdir() to get through the directory tree so the file order they add to the final image is dependent on the order they were created in the directory). This is the hardest to solve, but it can be done if the tool sorts the files first in alphabetical order before processing, thus ensuring files are processed in a deterministic order.
The biggest non-determinism is that, usually. But especially since it can be caused by parallel builds so even if you start with exactly the same files, the final arrangement can vary. If you're unpacking 20 tarballs in parallel it just creates chaos.
Of course, the easiest way is to make it deterministic and only allow one build to proceed at a time, but that means what takes 20 minutes to builds completely now takes hours. And even then you might get some randomization because the disk cache might cause a file to be written ahead of another