Hacker Newsnew | past | comments | ask | show | jobs | submit | leecarraher's commentslogin

are you referring to this paper https://arxiv.org/abs/1501.01711 ? i believe they won best paper at icml or other impact journal. the published paper and algorithm i recall being compact and succinct, something that took less than a day to implement.


I was referring to even older stuff that I happened to see while doing my masters back in 2007-2008 or so. But that one looks more approachable.


i agree, not just the multinomial sampling that causes hallucinations. If that were the case, setting temp to 0 and just argmax over the logits would "solve" hallucinations. while round-off error causes some stochasticity it's unlikely to be the the primary cause, rather it's lossy compression over the layers that causes it.

first compression: You create embeddings that need to differentiate N tokens, JL lemma gives us a bound that modern architectures are well above that. At face value, the embeddings could encode the tokens and provide deterministic discrepancy. But words aren't monolithic , they mean many things and get contextualized by other words. So despite being above jl bound, the model still forces a lossy compression.

next compression: each layer of the transformer blows up the input to KVQ, then compresses it back to the inter-layer dimension.

finally there is the output layer which at 0 temp is deterministic, but it is heavily path dependent on getting to that token. The space of possible paths is combinatorial, so any non-deterministic behavior elsewhere will inflate the likelihood of non-deterministic output, including things like roundoff. heck most models are quantized down to 4 even2 bits these days, which is wild!


for an interesting reversal of the "problem" of the speed of light, IEX is a stock exchange design to combat HFT by adding a physical speed bump by way of 38 miles of fiber optic cable. The general idea being to level the playing field and improve market liquidity using physical communication limits of light. https://en.wikipedia.org/wiki/IEX


That marketing gimic adds hundreds of microseconds to order latency. It’s not designed to level any playing fields it’s designed to get publicity.


i agree, also add to that, that many python modules are foss projects that are maintained on a limited basis or budget. Refactoring code that may have some unsafe async routines would be costly for an org, and dreadful for recreation. So you can either have a rich library of modules, or go async and risk something you need not working then having to find a workaround. Personally, if parallelism is important enough, i use ctypes and openmp. If i need something more portable, i have a few multiprocessing wrappers that implement prange and a few other widgets for shared memory.


what was the position? what are your credentials to fulfill that position? I feel like cover letters, and recommendations are just icing on the cake of core skills and experiences, not the entire cake.


On the surface, cutting less essential resources during a power supply event makes sense, the ranking of essentialness seems problematic. While the decision to stop dumping megawatts of power to train a companies next gen LLM to be used for life saving/sustaining systems makes sense, it's pretty hard to implement in all but the most extreme cases. Hospital vs gpt6 training is an easy decision, but what about deciding between someone who wants to run AC at their unoccupied home vs. cutting power to a multi-day training epoch worth hundreds of thousands of dollars. It all feels very un-capitalistic, which in the US, like it or not, is how many edge cases get resolved. Right now datacenters are just the easy target, but why not Texas' numerous fracking sites, or other less desirable industries. My guess is that an injunction on the constitutionality of this will hold it up in court for a while.


my slightly next gen todo is a notebook on my remarkable. added features are sharing between devices, and since it's eink its a good paper like alternative to sticky-notes. For me beating procrastination can be more important than organizing many subtasks.

FWIW, i only use this for work todos and differentiate todo with calendar(paper calendar and dry erase board for home, outlook for work calendar)


i feel like you could buy a furby and shave it for way less than $299


what's wrong with the ps2 style serial port on my roomba and a rpi0w


I've used pbzip2 which takes the same parallel blocked compression approach 7zip seems to be taking (using AI's analysis of the changes). Theoretically the compression is less efficient, but i haven't noticed a difference in practice.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: