Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Kai-Fu Li's Yi-34B uses exactly Llama's architecture except for 2 tensor renamed (huggingface.co)
39 points by vissidarte_choi on Nov 14, 2023 | hide | past | favorite | 7 comments


Its not just that, its a refactoring of the llama code that doesn't seem to change anything. And its clearly an edit of the original Apache 2.0 llama file, but with no mention of llama:

https://www.diffchecker.com/bJTqkvmQ/

And instead of being PR'd into transformers, its just slapped on as external code, which is either a security risk or unsupported by frameworks. The HuggingFace leaderboard won't even queue the 200K version to benchmark, due to its no custom code policy.

And they claim its a 32K model, but its configured as a 4K model with no RoPE stretching config, and no explanation for how its supposed to be stretched out. For now, there's zero info on its tuning data. They didn't include instructions to reproduce their benchmarks, including the suspiciously high MMLU score.

...Anyone who's been in AI world in awhile won't bat an eye over this. Disingenuous claims? Hit and run release? License violations? Actual benchmark cheating? Who cares!? Just move onto the next paper, or in this case, take all the VC money. Yi is at least above par because its a base model, and it does feel pretty performant.


It seems to have been, at least in part, a simple oversight. They're being pretty upfront about it over on HuggingFace: https://huggingface.co/01-ai/Yi-34B/discussions/11#655314587...


I tried 6B version on Ollama and it behaved similarly to the phythia < 1B models. English words with no meaning, poor formatting, etc

Maybe there’s a bug, as Llama 2 7B works much better


The 34B 200K version works good for me. And a finetune came out... just now: https://huggingface.co/NousResearch/Nous-Capybara-34B

There was some kind of GGUF bug that ruined its output quality, dunno if it was fixed. I am running it in exllamav2.




Yi posted an update on Hugging Face. Sounds like it was an oversight from running experiments and open for pull request or version update https://huggingface.co/01-ai/Yi-34B/discussions/11#655314587...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: