Comment Evaluating an unreleased model ... (Score 2) 12
The news here is that gpt2-chatbot is confirmed to be an anonymized, unreleased model. See yesterday's discussion here: https://news.ycombinator.com/i... The https://chat.lmsys.org/ page now currently says this:
> *gpt2-chatbot is currently unavailable.* See our model evaluation policy [here].
That link takes you to https://lmsys.org/blog/2024-03... , which is this post's URL. The page was edited yesterday (when gpt2-chatbot was added) to include this section:
> Evaluating unreleased models: We collaborate with open-source and commercial model providers to bring their unreleased models to community for preview testing.
> Model providers can test their unreleased models anonymously, meaning the models' names will be anonymized. A model is considered unreleased if its weights are neither open, nor available via a public API or service. Evaluating an unreleased model consists of the following steps:
> 1. Add the model to Arena with an anonymous label. i.e., its identity will not be shown to users.
> 2. Keep it until we accumulate enough votes for its rating to stabilize or until the model provider withdraws it.
> 3. Once we accumulate enough votes, we will share the result privately with the model provider. These include the rating, as well as release samples of up to 20% of the votes. (See Sharing data with the model providers for further details).
> 4. Remove the model from Arena.