DeepSeek is definitely not real OSS. To be open source, you need to use a real open source license (like the ones OSI lists), and you need to share all pre and post training code, any code related to tuning, any evaluation code, everything related to safety/censorship/etc, and probably the full training data as well. Otherwise you can't reproduce their weights. Sharing weights is like sharing a compiled program.
As far as I know the only true open source model that is competitive is the OLMo 2 model from AI2:
Yes, releasing training source code code is like releasing the source code of a compiler used to compile and link the binary.
Lets say you took GCC, modified its sources, compiled your code with it and released your binaries along with modified GCC source code. And you are claiming that your software is open source. Well, it wouldn’t be.
Releasing training data is extremely hard, as licensing and redistribution rights for that data are difficult to tackle. And it is not clear, what exactly are the benefits in releasing it.
As far as I know the only true open source model that is competitive is the OLMo 2 model from AI2:
https://allenai.org/blog/olmo2
They even released an app recently, which is also open source, that does on-device inference:
https://allenai.org/blog/olmoe-app
They also have this other model called Tülu 3, which outperforms DeepSeek V3:
https://allenai.org/blog/tulu-3-405B