The above is subject to misinterpretation. The copyright owner must demonstrate its a derivative and win in court. Owner must prove guilt, publisher does not need to prove innocence.
It a civil case you don't need to prove "guilt", just that it is more likely than not that they looked at your source.
No, it more like "preponderance of the evidence". Which is certainly looser than "beyond a reasonable doubt", but far more than "more likely than not".
For that reason, any competitive proprietary project coded with an LLM can be credibly accused of being a derivative work, as the preponderance of circumstantial evidence would point to the LLM being "tainted" by the OSS project, unless you could demonstrate that it was excluded from the training data.
It's not quite that simple. One would need to establish that the AI has not learned general concepts from its training, rather that it is directly transforming the copyrighted source code. Like a human, an AI, will be able to learn concepts from copyrighted material and then be able to write their own unique implementations of those concepts. All of our data structures textbooks are copyrighted works, yet we are free to implement the concepts we have learned.