Kai-Fu Li's Yi-34B uses exactly Llama's architecture except for 2 tensor renamed

brucethemoose2 · on Nov 14, 2023

Its not just that, its a refactoring of the llama code that doesn't seem to change anything. And its clearly an edit of the original Apache 2.0 llama file, but with no mention of llama:

https://www.diffchecker.com/bJTqkvmQ/

And instead of being PR'd into transformers, its just slapped on as external code, which is either a security risk or unsupported by frameworks. The HuggingFace leaderboard won't even queue the 200K version to benchmark, due to its no custom code policy.

And they claim its a 32K model, but its configured as a 4K model with no RoPE stretching config, and no explanation for how its supposed to be stretched out. For now, there's zero info on its tuning data. They didn't include instructions to reproduce their benchmarks, including the suspiciously high MMLU score.

...Anyone who's been in AI world in awhile won't bat an eye over this. Disingenuous claims? Hit and run release? License violations? Actual benchmark cheating? Who cares!? Just move onto the next paper, or in this case, take all the VC money. Yi is at least above par because its a base model, and it does feel pretty performant.

cosmojg · on Nov 21, 2023

It seems to have been, at least in part, a simple oversight. They're being pretty upfront about it over on HuggingFace: https://huggingface.co/01-ai/Yi-34B/discussions/11#655314587...

jasonjmcghee · on Nov 14, 2023

I tried 6B version on Ollama and it behaved similarly to the phythia < 1B models. English words with no meaning, poor formatting, etc

Maybe there’s a bug, as Llama 2 7B works much better

brucethemoose2 · on Nov 14, 2023

The 34B 200K version works good for me. And a finetune came out... just now: https://huggingface.co/NousResearch/Nous-Capybara-34B

There was some kind of GGUF bug that ruined its output quality, dunno if it was fixed. I am running it in exllamav2.

brucethemoose2 · on Nov 14, 2023

See: https://github.com/01-ai/Yi/discussions/5#discussioncomment-...

vissidarte_choi · on Nov 14, 2023

Background information here: https://techcrunch.com/2023/11/05/valued-at-1b-kai-fu-lees-l...

mikeandres · on Nov 14, 2023

Yi posted an update on Hugging Face. Sounds like it was an oversight from running experiments and open for pull request or version update https://huggingface.co/01-ai/Yi-34B/discussions/11#655314587...