I've been on the "NTP Hackers" mailing list for ~15 years now, my last major effort was to develop a server-optimized multi-threaded version of the core ntpd sw: I was hoping for wire speed packet processing on an embedded linux platform, but had to settle for 3-500 Mbit/s since the target kernel version did not support multi-thread targeting of incoming packets, i.e. I needed to have a single receive thread which would fetch the incoming packets, timestamp them and queue them up, then all the other threads/cores would grab them from there.
Back to the "why are there bugs in such a trivial protocol?" question:
By far the biggest cause of required effort when trying to modify or optimize the NTPD distribution is the need to support a big number of OSs and even larger number of OS versions, some of them more than 20 years old, even if the main targets are Unix-like or Windows.
The second problem is the need to support 30+ reference clocks, with all sorts of OS/version specific interfaces needed in order to timestamp events as accurately as possible.
The third and final major stumbling block is all the crypto stuff, which got added in order to be able to authenticate both time packets and monitoring/configuration requests, and this is where the latest major bugs have been found.
PHK (who is working on Ntimed) has spent a lot of time on NTP, including his time as a core FreeBSD hacker when he made sure that the FBSD had the best possible timekeeping kernel. This is the reason that my personal pool server has always used FreeBSD.
If your only need is to get 0.1s level time sync on a number of client only machines, then it really doesn't matter how you implement the NTP protocol, except that you should really try to measure and adjust the local clock frequency so as to track the reference time!
The default Windows time code implements Simple NTP (SNTP) which uses the NTP packet format but doesn't try to implement the proper control loop to steer the local clock, instead it just yanks the OS clock resulting in a sawtooth-like pattern of clock offsets.