One cool application of this is schemathesis. I really enjoyed it, and I found more input validation bugs in my code than I can count.
Very useful for public APIs.
While the article is very entertaining, I'm not a fan of the pattern matching in Python.
I wish for some linter rule that can forbid the usage of pattern matching.
Should be easily doable with a semgrep rule, e.g.:
~> cat semgrep.yaml
rules:
- id: no-pattern-matching
pattern: |
match ...:
message: |
I'm not a fan of the pattern matching in Python
severity: ERROR
languages:
- python
...
~> cat test.py
#!/usr/bin/env python3
foo = 1
match foo:
case 1:
print("one")
...
~> semgrep --config semgrep.yaml test.py
no-pattern-matching
I'm not a fan of the pattern matching in Python
4┆ match foo:
5┆ case 1:
6┆ print("one")
You need to make that exclude match = ... since match can also be a variable name. This is because people used to write code like match = re.search(...)
If you're experienced enough with Python to say "I want to eliminate pattern matching from my codebase" you can surely construct that as a pre-commit check, no?
What is the status on I2P these days? I used to run a lot of stuff on it. It was a lot of fun. It was like this cozy alternative development of internet, where things still felt like 1997.
The numbers are interesting and a bit surprising to me.
I remember a time when people would have seedboxes for private trackers, data hoarders brag about having TBs of storage and yet only a handful of people are seeding the complete collection(s). I understand not everyone has or can seed multiple TBs of data but I was expecting there to be a lot of seeders for torrents with few hundreds of GBs.
Interesting to see that sci-hub is about 90TB and libgen-non-fiction is 77.5TB. To me, these are the two archives that really need protecting because this is the bulk of scientific knowledge - papers and textbooks.
I keep about 16TB of personal storage space in a home server (spread over 4 spinning disks). The idea of expanding to ~200 TB however seems... intimidating. You're looking at ~qty 12 16TB disks (not counting any for redundancy). Going the refurbished enterprise SATA drive route that is still going to run you about $180/drive = $2200 in drives.
I'm not quite there as far as disposable income to throw, but, I know many people out there who are; doubling that cost for redundancy and throw in a bit for the server hardware - $5k, to keep a current cache of all our written scientific knowledge - seems reasonable.
The interesting thing is these storage sizes aren't really growing. Scihub stopped updating the papers in 2022? At honestly with the advent of slop publications since then, the importance of what is in that 170TB is likely to remain the most important portion of the contrib for a long time.
True but it matters a lot less in many fields because things have been moving to arXiv and other open access options, anyway. The main time I need sci-hub is for older articles. And that's a huge advantage of sci-hub--they have things like old foreign journal articles even the best academic libraries don't have.
As for mirroring it all, $2200 is beyond my budget too, but it would be nothing for a lot of academic departments, if the line item could be "characterized" the right way. To me it has been a bit of a nuisance working with libgen down the last couple months, like the post mentioned, and I would have loved for a local copy. I don't see it happening, but if libgen/sci-hub/annas archive goes the way of napster/scour, many academics would be in a serious fix.
It's 167.5, not ~200, and you can get disks much larger than 16 TB these days - a quick check shows 30 TB being sold in normal consumer stores although ~20 TB disks may still be more affordable per byte.
In text form only (no charts, plots, etc)- yes, pretty much all published 'science' (by that I mean something that appeared in a mass publication - paper, book, etc, not simply notes in people's notebooks) in the last 400 years likely fits into 20TB or so if converted completely to ASCII text and everything else is left out. Text is tiny.
The problem is it's not all text, you need the images, the plots, etc, and smartly, interstitially compressing the old stuff is still a very difficult problem even in this age of AI.
I have an archive of about 8TB of mechanical and aerospace papers dating back to the 1930s, and the biggest of them are usually scanned in documents, especially stuff from the 1960s and 70s, that have lots of charts and tables that take up a considerable amount of space, even in black and white only, due to how badly old scans compress (noise on paper prints, scanned in, just doesn't compress). Also many of those journals have the text compressed well, but they have a single, color, HUGE cover image as the first page of the PDF, that turns the PDF from 2MB into 20MB. Things like that could, maybe, be omitted to save space...
But as time goes on I start to become more against space-saving via truncation of those kind of scanned documents. My reasoning is that storage is getting cheaper and cheaper, and at some point the cost to store and retrieve those 80-90MB PDF's that are essentially total page by page image scans is going to be completely negligible. And I think you lose something be taking those papers and taking the covers out, or OCR'ing the typed pages and re-typesetting them to unicode (de-rasterize the scan), even when done perfectly (and when not done perfectly, you get horrible mistakes in things like equations, especially). I think we need to preserve everything to a quality level that is nearly as high as can be.
> In text form only (no charts, plots, etc)- yes, pretty much all published 'science' (by that I mean something that appeared in a mass publication - paper, book, etc, not simply notes in people's notebooks) in the last 400 years likely fits into 20TB or so if converted completely to ASCII text and everything else is left out. Text is tiny.
20 TB uncompresssed text is roughly 6TB compressed.
I just find it crazy that for about $100 i can buy an external hard drive that would fit in my pocket that can in theory carry around the bulk of humanity's collected knowledge.
What a time to be alive. Imagine telling someone this 100 years ago. Hell, imagine telling someone this 20 years ago.
there is post of Anna's archive blog about exactly that, we basically have to hold on until (open source) OCR solutions are good enough and then it suddenly starts to become feasible to have all the world's published knowledge on your computer
Claude Code has two usage modes: pay-per-token or subscription. Both modes are using API under the hood, but with subscription mode you are only paying a fixed amount a month.
Each subscription tier has some undisclosed limits, cheaper plans have lower usage limits.
So I would recommend paying $20 and trying the Claude Code via that subscription.
I’m looking for cursor alternatives after confusing pricing changes. Is Claude code an option? Can be integrated on an editor/ide for similar results?
My use case so far is usually requesting mechanic work I would rather describe than write myself like certain test suites, and sometimes discovery on messy code bases.
If you like an IDE, for example VS Code you can have the terminal open at the bottom and run Claude Code in that. You can put your instructions there and any edits it makes are visibile in the IDE immediately.
Personally I just keep a separate terminal open and have the terminal and VSCode open on two monitors - seems to work OK for me.
> There's an obvious question looming here — if the models got so confused, how did they consistently pass the reconciliation checks we described above? It may seem like the ability to make forward progress is a good proxy for task understanding and skill, but this isn't necessarily the case. There are ways to hack the validation check – inventing false transactions or pulling in unrelated ones to make the numbers add up.
This is hilarious. I wonder if someone is unintentionally committing fraud by blindly trusting LLMs with accounting.
Or even worse, I bet that some governments are already trying to use LLMs to make accounting validators. My government sure wants to shove LLMs into digital government services.
Lawyers have used it to write briefs; I would be very surprised if someone, somewhere wasn't slowly running a company into the ground by using ChatGPT or another LLM for accounting.
[about the website design] As a bonus for my fellow privacy schizos, the page works fine with 3rd party frames and 3rd party scripts disabled on uBlock, and still looks very good with no remote fonts and no large media. Quite an accomplishment for such a cool looking page
I'm sure that any accounting trick that an LLM can think of is something that is also used by some shady human accountants. The proper response should not be to avoid/prohibit AI but to improve the validation mechanisms.
Counterpoint: if you detect a human accountant doing this, you can take action against the human. Computers will never meaningfully take the blame, and unfortunately usually mean not blaming any human either.
I think that will depend on a case-by-case. I don't have any recent examples but I recall someone trying to sue one of those strip-mall tax preparation franchises over incorrect filings. My understanding is that the documents that you sign when you enroll in those services are pretty strictly in the favor of the company. I doubt you could ever go after the specific "human" that made the error even if it was maliciously done.
In the same way, if you pay for a tax service that uses AI agents, what you can and cannot "take action" for will probably be outlined in the terms of service that you accept when you sign up.
I would guess millions of people already use software based tax filing services (e.g. turbo tax) where no human at all is in the loop. I don't understand how swapping in an LLM significantly changes the liability in those cases. The contract will be between you and the entity (probably a corporation), not you and "computers".
No, I think in this particular case the proper response is for honest companies to avoid any systems which invent nonexistent transactions to reconcile books.
Most businesses don’t want to misrepresent their books, irrespective of the existence of shady accountants.
> The DCO requires that the contribution be “created by me,” yet in many jurisdictions AI-generated code is not recognized as a copyright-protected work.
I get that, but what are the particular examples of such jurisdictions?
For example, when I run the linter that fixes my code formatting, no one will think that I did not create it.
What about autogenerated code? Is it not copyright-protected?
> do I have to pay API based costs
Usually, yes, you do.
However, in this case, opencode kinda cheats by using Antropic client ID and pretending to be Claude Code, so it can use your existing subscription.
> We recommend signing up for Claude Pro or Max, running opencode auth login and selecting Anthropic. It’s the most cost-effective way to use opencode.
https://opencode.ai/docs/
> “Anybody who wants to put Valve out of business could do so, but nobody cares,” argues Michael Pachter, a gaming industry analyst at Wedbush Securities.
This is untrue; many tried. Almost every major publisher has its own launcher.
The problem with them all is they absolutely suck.
Even Epic Games Store, the biggest competitor with the most money poured into it, is ridiculously bad in almost every way.
Aside from the lack of network effect, it just misses most of the QOL features, is slow, ugly, and very unpleasant to use.
Almost universal agreement in the PC gaming community is that EGS is a bootloader for free games that it throws at the user.
Every time I use EGS, I am constantly amazed by how bad it is, despite probably tens of millions of investments.
The second point that the article completely misinterprets is Microsoft's role.
Microsoft (Xbox specifically) is hands down the closest company to beating Steam in its own game. PC game pass provides a constant stream of very good games available on day one for dirt cheap. The work that Microsoft is doing on optimizing Windows for games in general and for handheld consoles in particular is very promising (see Xbox Ally X).
This is the threat that Valve faces. Not just a better store, but the absence of a store and "buying games" in general. For example, I intended to buy The Outer Worlds 2 on launch. Now that I know it will be available day one on Game Pass, there is almost zero chance that I will buy it on Steam or anywhere else.
Their anti-DRM stance is arguably their only compelling feature as opposed to just buying steam.
I remember the old game launcher that was called Desura?, which a lot of those third-party keys sites would sell keys for, and it's no longer around. So one of the main concerns with these online platforms is them disappearing and losing all your stuff.
Quite frankly, I am happy that GOG exists for this niche. It would be less "sabotaging" if consumers would actually vote in their own interests. However, this has been shown time and time again to never happen, sadly.
I suspect by "anybody" they meant anyone outside of the "relatively small but healthy sector", and therefore competitors such as Epic or GOG wouldn't count. Outside analysts think of games as a subset of tech and that big tech companies would merely need to turn the eye of sauren towards games and they could conquer the market.
I recently switched from Pro to $100 Max, and the only difference I've found so far is higher usage limits.
Antropic tends to give shiny new features to Max users first, but as of now, there is nothing Max-only.
For me, it's a good deal nonetheless, as even $100 Max limits are huge.
While on Pro, I hit the limits each day that I used Claude Code. Now I rarely see the warning, but I never actually hit the limit.