More

anotherpaulg · 2025-12-31T16:12:03 1767197523

I like the unconventional approach. A few minutes with GPT raises two issues:

1. We've raised CO2 from 280ppm to 420ppm, about a 50% increase. To dilute it back down would require 50% more total atmosphere. This would also raise the surface air pressure 1.5x.

2. How much heat is trapped is related to the absolute amount of CO2 in the atmosphere, not the fraction. So the diluted atmosphere would retain just as much heat.

vscode-rest · 2025-12-31T16:54:42 1767200082

Would it increase the steady state surface air pressure by 50%, or would more molecules offgas into outer space to compensate?

If the latter, it might actually work. Assuming they offgas at-proportion. Which they probably wouldn’t…

anotherpaulg · 2025-12-19T03:49:22 1766116162

It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.

The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.

ghurtado · 2025-12-19T05:29:37 1766122177

> It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.

On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...

danielbln · 2025-12-19T08:45:07 1766133907

Accurate-ish, let's not forget their tendency to hallucinate.

mlinksva · 2025-12-19T06:49:37 1766126977

Different cutoff but similar question thrown out in https://www.dwarkesh.com/p/thoughts-on-sutton#:~:text=If%20y... inspiring https://manifold.markets/MikeLinksvayer/llm-trained-on-data-...

machinationu · 2025-12-19T08:57:20 1766134640

the issue is there is very little text before the internet, so not enough historical tokens to train a really big model

concinds · 2025-12-19T14:13:20 1766153600

And it's a 4B model. I worry that nontechnical users will dramatically overestimate its accuracy and underestimate hallucinations, which makes me wonder how it could really be useful for academic research.

DGoettlich · 2025-12-19T21:53:33 1766181213

valid point. its more of a stepping stone towards larger models. we're figuring out what the best way to do this is before scaling up.

spicyusername · 2025-12-22T04:04:50 1766376290

If there's very little text before the internet, what would scaling up look like?

tgv · 2025-12-19T10:33:37 1766140417

I think not everyone in this thread understands that. Someone wrote "It's a time machine", followed up by "Imagine having a conversation with Aristotle."

crazygringo · 2025-12-20T02:07:46 1766196466

There's quite a lot of text in pre-Internet daily newspapers, of which there were once thousands worldwide.

When you're looking at e.g. the 19th century, a huge number are preserved somewhere in some library, but the vast majority don't seem to be digitized yet, given the tremendous amount of work.

Given how much higher-quality newspaper content tends to be compared to the average internet forum thread, there actually might be quite a decent amount of text. Obviously still nothing compared to the internet, but still vastly larger than just from published books. After all, print newspapers were essentially the internet of their day. Oh, and don't forget pamphlets in the 18th century.

lm28469 · 2025-12-19T15:22:34 1766157754

> the issue is there is very little text before the internet,

Hm there is a lot of text from before the internet, but most of it is not on internet. There is a weird gap in some circles because of that, people are rediscovering work from pre 1980s researchers that only exist in books that have never been re-edited and that virtually no one knows about.

throwup238 · 2025-12-19T16:21:37 1766161297

There is no doubt trillions of tokens of general communication in all kinds of languages tucked away in national archives and private collections.

The National Archives of Spain alone have 350 million pages of documents going back to the 15th century, ranging from correspondence to testimony to charts and maps, but only 10% of it is digitized and a much smaller fraction is transcribed. Hopefully with how good LLMs are getting they can accelerate the transcription process and open up all of our historical documents as a huge historical LLM dataset.

anotherpaulg · 2025-11-06T16:53:43 1762448023

I’ve relied heavily on seamless capture for a couple of decades, ala Getting Things Done.

My solution is a twilio text number that automatically inserts any texts it receives into the top of my todo.md file. Previously todo.org, until about a year ago.

iOS has ubiquitous support to quickly share to SMS from any/everywhere. It’s easy to send a text to this contact from a Home Screen shortcut, but also also from the share sheet in most every app.

anotherpaulg · 2025-10-14T17:55:21 1760464521

I regularly use LLM-as-OCR and find it really helpful to:

1. Minimize the number of PDF pages per context/call. Don't dump a giant document set into one request. Break them into the smallest coherent chunks.

2. In a clean context, re-send the page and the extracted target content and ask the model to proofread/double-check the extracted data.

3. Repeat the extraction and/or the proofreading steps with a different model and compare the results.

4. Iterate until the proofreadings pass without altering the data, or flag proofreading failures for stronger models or human intervention.

SketchySeaBeast · 2025-10-14T20:58:17 1760475497

What's the typical run for you cost?

anotherpaulg · 2025-10-13T04:29:25 1760329765

I’ve been building proficiency with quantum optics equipment. Repeating classic quantum entanglement experiments like the quantum eraser [0] and violating the CHSH inequality (which won the 2022 Nobel). I’m working towards a novel quantum eraser variant.

[0] https://github.com/paul-gauthier/entangled-pair-quantum-eras...

anotherpaulg · 2025-10-04T21:11:06 1759612266

I really like LLM+sympy for math. I have the LLM write me a sympy program, so I can trust that the symbolic manipulation is done correctly.

The code is also a useful artifact that can be iteratively edited and improved by both the human and LLM, with git history, etc. Running and passing tests/assertions helps to build and maintain confidence that the math remains correct.

I use helper functions to easily render from the sympy code to latex, etc.

A lot of the math behind this quantum eraser experiment was done this way.

https://github.com/paul-gauthier/entangled-pair-quantum-eras...

anotherpaulg · 2025-09-06T14:35:06 1757169306

Below are some great videos on the physics and practicalities of single mode fiber. They are Thorlabs videos, so are slanted more towards the use of SMF in a laser lab rather than a telecom setting. They reference a lot of the theory, but also provide a good intuition about how and why SMF works so well.

https://youtu.be/FbOXRuBQt_U

https://youtu.be/HvJeXakc8Kc

anotherpaulg · 2025-08-07T23:52:37 1754610757

If you like this then you will probably enjoy the book How to Invent Everything: A Survival Guide for the Stranded Time Traveler by cartoonist and computer-scientist Ryan North.

https://www.howtoinventeverything.com/

anotherpaulg · 2025-07-07T13:29:20 1751894960

Scott Aaronson has an interesting take on the 2018 paper being discussed in the article:

https://scottaaronson.blog/?p=3975

Strilanc · 2025-07-07T14:16:45 1751897805

Yeah this post nails the issue.

In order to do the X-basis measurement described in the paper, it's necessary to do very funky things to the simulated agents inside the computers. Probably the easiest way to implement the measurement would be to literally rewind the simulation back to before the measurement started, when the superposition was limited to a single qubit, do the measurement at that time, and then run the simulation back forwards to the current time. The paper doesn't specify an implementation, so it should work for any implementation, so this should be a valid way of doing the operation. But this implementation implies you're undoing all the reasoning the agents did, changing the initial state, and so as you run them forwards again they are redoing the same reasoning steps but in a new context where the premises no longer apply. Which of course results in them making mistakes. The same thing would happen classically, if you rewound a simulated agent to change some crucial fact and then assumed reasoning from premises that no longer held should still be valid.

I think Scott also co-authored a follow up paper, where they made some steps towards proving that the only computationally efficient way to implement the X-basis measurement was to do this simulation rewinding thing. But unfortunately I can't seem to find it now.

anotherpaulg · 2025-06-20T14:37:55 1750430275

Looks very interesting. I was hoping it would solve a problem I’ve had recently:

I want to ssh into a windows box that I only have a normal user account on. So I can’t (and don’t want to) change any admin settings or install anything as admin.

All the obvious approaches hit roadblocks.

Seems like this tool solves the opposite problem: sshing out from a minimally privledged environment.

paxys · 2025-06-21T02:36:31 1750473391

You can start your own ssh daemon from the unprivileged account pointing to a random port.

anotherpaulg · 2025-06-21T12:06:30 1750507590

Ya, you would think so. But when you connect to it and sshd tries to fork a process to handle the session… you get a privileges error.