Its not just that, its a refactoring of the llama code that doesn't seem to change anything. And its clearly an edit of the original Apache 2.0 llama file, but with no mention of llama:
And instead of being PR'd into transformers, its just slapped on as external code, which is either a security risk or unsupported by frameworks. The HuggingFace leaderboard won't even queue the 200K version to benchmark, due to its no custom code policy.
And they claim its a 32K model, but its configured as a 4K model with no RoPE stretching config, and no explanation for how its supposed to be stretched out. For now, there's zero info on its tuning data. They didn't include instructions to reproduce their benchmarks, including the suspiciously high MMLU score.
...Anyone who's been in AI world in awhile won't bat an eye over this. Disingenuous claims? Hit and run release? License violations? Actual benchmark cheating? Who cares!? Just move onto the next paper, or in this case, take all the VC money. Yi is at least above par because its a base model, and it does feel pretty performant.
https://www.diffchecker.com/bJTqkvmQ/
And instead of being PR'd into transformers, its just slapped on as external code, which is either a security risk or unsupported by frameworks. The HuggingFace leaderboard won't even queue the 200K version to benchmark, due to its no custom code policy.
And they claim its a 32K model, but its configured as a 4K model with no RoPE stretching config, and no explanation for how its supposed to be stretched out. For now, there's zero info on its tuning data. They didn't include instructions to reproduce their benchmarks, including the suspiciously high MMLU score.
...Anyone who's been in AI world in awhile won't bat an eye over this. Disingenuous claims? Hit and run release? License violations? Actual benchmark cheating? Who cares!? Just move onto the next paper, or in this case, take all the VC money. Yi is at least above par because its a base model, and it does feel pretty performant.