What too many people do not seem to understand with LLMs is that everything it spits out is simply a probability matrix based on the input you gave it. It will first attempt to deconstruct the input you provided and use statistical analysis against it's trained knowledge base to then spit out letters, words, phrases and punctuation that statistically resembles the outputs it was trained to produce in it's training materials.
Until this version, ChatGPT obviously suffered from a lack of training materials within it's trained neural network to have it overcome the English language's typed grammar rules for it to be able to discern that em dashes are not typically used in everyday conversations and/or that the input to not use them needed to change it's underlying probability network to be able to ignore the English language's grammar rules and adopt it's output without the use of the em dash. This is a very difficult concept to train into a neural network as it needs to have been training on specifically this input/output case long enough to have that training override the base English grammar language model, which is a fundamental piece of knowledge a LLM requires to function and one of the very first things it is trained to handle.
It also exposes a flaw in how neural networks are typically working. There is a training/learning mode and then there is the functional mode of just using the trained network. In the functional mode, the neural network links, nodes, and function are effectively static. Without having built in-puts to the network so that it can flag certain functionality, it can not change it's underlying probability matrix to effectively forget something it was trained to do. Once that training has changed any of the underlying neural network, you can not effectively untrain it (without simply reverting to a previous backup copy of the network before it was trained). This is why it is so important to scrutinize every piece of data that is used to train the network. One you have added some piece of garbage input training, you are stuck with the changes it made to the probabilities of the output. Any model that is effectively training against the content of the internet itself is so full of bad information that the results can never really be trusted for anything other than probability of asking a random person for the answer because it will have trained on and included phases like "The earth is flat", "birds are not real", and "the moon landing was a hoax". It will have seen those things enough times that it will include them as higher and higher percentages of the proper response to questions about them....