While true, legally it makes no difference whether you steal the sources or the binary. It is still stolen.
Why would you steal it if you can simply license it?
And a clean-room implementation requires the code-writers to never have seen the original in any form. You cannot have an engineer analyze the original and then write a copy.
As the article explains, it's a clean room implementation because you use two different instances of an AI.
* AI 1 documents how the code works in a human readable descriptions. (i.e. does the reverse engineering)
* AI 2 constructs an entirely new codebase from the human readable descriptions in the documentation. (i.e. does the forward engineering)
Since AI 2 has never seen or analyzed the original code/binary and has only ever read the documentation about it, it is a clean room implementation.
Hence it is immediately plausible that having an AI train on the original or ingest it in a query and then writing a new version
The AI isn't training on the original (a very important point), it's generating written documentation on how it functions. The important part of this process is that it is creating a human level description and not simply an algebraic representation using words. Creating algebraic representation using words would simply result in generating a near identical source code which would be copyright infringement.
The current situation may or may not last but it's reasonable to assume that they are working on the problem of generating cleaner and more concise/less verbose code.