Comment Re:oh no (Score 4, Interesting) 65
All of them really. What's typically open source is
1) the code used for training, but never the dataset for initial LLM and never the RLHF (reinforcement learning with human feedback) data used to make a text vomiting LLM into a useful question answering maching.
2) the resulting weights - these are totally uninterpretable.
So it's never fully replicable; even if you had the infra and were willing to burn electricity you don't have a way of going to 2) yourself.
AFAIK that's not just the Chinese but also open-source / weights Llama and Mistral.