> "[Specialized expertise is] the antithesis of democracy."
> "Democracy works best when men and women do things for themselves, with the help of their friends and neighbors, instead of depending on the state."
These are nice sentiments to have but it does not work in the real world. At a certain point certain problems are too complex for a regular person to understand.
It is impossible for every citizen to fully understand every scientific issue. Part of living in a society—in fact, one of the primary purposes of living in a society—is having different people specialize in different things, and trusting each other to actually be good at what they specialize in.
None of this implies that people don't know enough to vote.
Indeed, to the best of my knowledge, the available evidence suggests that a major part of the problem right now is people's votes being suppressed and people being poorly represented by their supposed representatives (both due to deliberate gerrymandering, and more simply due to the fact that the size of the House of Representatives was capped in the early 20th century, leading to one person representing hundreds of thousands or more, rather than the ~10k or so each they represented prior to the cap).
The thing they are testing for is reasoning performance. It makes sense to not give tool access.
This is same as the critiques of the LLM paper by apple where they showed that LLMs fail to solve the tower of hanoi problem after a set number of towers. The test was to see how well these models can reason out a long task. People online were like they could solve that problem if they had access to a coding enviornment. Again the test was to check reasoning capability not if it knew how to code and algorithm to solve the problem.
If model performance degrade a lot after a number of reasoning steps it's good to know where the limits are. Wheather the model had access to tools or not is orthogonal to this problem
It's selecting a random word from a probability distribution over words. That distribution is crafted by the LLM. The random sampler is not going to going to choose a word with 1e-6 probability anytime soon. Besides with thinking models, the LLM has the ability to correct itself so it's not like the model is at the mercy of a random number generator
Quart was interesting, but it didn't seem to have as much traction as FastAPI. I also seem to understand Flask is trying to integrate some of Quart's ideas.
From what I understand they used a diffusion model (diffdock) to predict the mechanism. These types of models are not LLMs that need to be trained on text
The mechanism for how refined linoleic acid if heated would create higher amounts of free radicals that are known to cause oxidative stress / inflammation is well understood.
I agree a large scale rct for this would be great, but I doubt anyone would fund it and if it does get done I'd be surprised if it wasn't designed to meet the biases of the side that funds it.
I feel pretty confident that Captain Marvel, Emilia Perez, Spy Kids, Sausage Party, The Last Jedi, and Ghostbusters 2016 aren't at much risk of going over the heads of audiences, but you're entitled to your opinion if you think they count as avant garde cinema
> "Democracy works best when men and women do things for themselves, with the help of their friends and neighbors, instead of depending on the state."
These are nice sentiments to have but it does not work in the real world. At a certain point certain problems are too complex for a regular person to understand.