Comment Re:Clean room? (Score 5, Interesting) 108
Even if you use an AI to extract an extremely condensed specification out of the source code, it's hardly clean room if the LLM was pre-trained on the source code any way.
I once worked at a place that had a clean room process to create code compatible with a proprietary product. Anybody who had ever seen the original code or even loaded the original binary into a debugger was not allowed to write any code at all for the cloned product. The clone writers generally worked only off of the specifications and user documentation.
There were a handful of people who were allowed to debug the original to resolve a few questions about low-level compatibility. The only way they were allowed to communicate with the software writers was through written questions and answers that left a clear paper trail, and the answers had to be as terse as possible (usually just yes or no). Everyone knew that these memos were highly likely to be used as evidence in legal proceedings.
I highly doubt that any AI tech bros have ever been this rigorous, and I'd bet that most of these AIs have been trained on the exact same source code that they are cloning.