vlade11115's comments

vlade11115 · 2025-11-05T09:35:49 1762335349

One cool application of this is schemathesis. I really enjoyed it, and I found more input validation bugs in my code than I can count. Very useful for public APIs.

vlade11115 · 2025-08-21T22:01:06 1755813666

While the article is very entertaining, I'm not a fan of the pattern matching in Python. I wish for some linter rule that can forbid the usage of pattern matching.

siddboots · 2025-08-21T22:21:24 1755814884

Can you explain why? Genuinely curious as a lover of case/match. My only complaint is that it is not general enough.

kurtis_reed · 2025-08-21T23:37:24 1755819444

Double indentation

maleldil · 2025-08-22T18:20:24 1755886824

So? Other languages with pattern match similarly have such double indentation. C-style switch with unintended cases is weird.

jbmchuck · 2025-08-21T22:29:38 1755815378

Should be easily doable with a semgrep rule, e.g.:

    ~> cat semgrep.yaml
    rules:
      - id: no-pattern-matching
        pattern: |
          match ...:
        message: |
          I'm not a fan of the pattern matching in Python
        severity: ERROR
        languages:
          - python

...

    ~> cat test.py
    #!/usr/bin/env python3

    foo = 1
    match foo:
      case 1:
        print("one")

...

    ~> semgrep --config semgrep.yaml test.py   


     no-pattern-matching
          I'm not a fan of the pattern matching in Python
                                                         
            4┆ match foo:
            5┆   case 1:
            6┆     print("one")

(exits non-0)

PokestarFan · 2025-08-22T04:45:02 1755837902

You need to make that exclude match = ... since match can also be a variable name. This is because people used to write code like match = re.search(...)

TheDong · 2025-08-22T05:27:30 1755840450

The existing pattern suggested above, "match ...:", will not match 'match = ...'.

Presumably the reason the parent comment suggested semgrep, not just a grep, is because they're aware that naive substring matching would be wrong.

You could use the playground to check your understanding before implying someone is an idiot.

https://semgrep.dev/playground/new

smcl · 2025-08-21T22:25:57 1755815157

If you're experienced enough with Python to say "I want to eliminate pattern matching from my codebase" you can surely construct that as a pre-commit check, no?

vlade11115 · 2025-08-18T17:33:37 1755538417

Also, they provide a torrents list that anyone can seed and be part of the long-term preservation.

https://annas-archive.org/torrents

aniviacat · 2025-08-18T18:44:49 1755542689

I'm surprised i2p torrents are still not popular enough to be offered as an option by sites like this.

I'd assume there are many people who don't help out purely because of legal fears, something i2p could help with.

gylterud · 2025-08-18T19:35:27 1755545727

What is the status on I2P these days? I used to run a lot of stuff on it. It was a lot of fun. It was like this cozy alternative development of internet, where things still felt like 1997.

Sarky · 2025-08-19T10:57:03 1755601023

it works and is getting better. still no mainstream adoption though.

gylterud · 2025-08-19T14:06:49 1755612409

Cool. I might go back to it sometime some time! I really liked the routing protocol and the ease of setting up services for it.

6jQhWNYh · 2025-08-19T09:06:52 1755594412

I2P's major drawback when torrenting is speed. Assuming a speed of 500 kbps, it would take 2,000 days to download a 10 TB torrent.

vidyesh · 2025-08-19T05:49:11 1755582551

The numbers are interesting and a bit surprising to me.

I remember a time when people would have seedboxes for private trackers, data hoarders brag about having TBs of storage and yet only a handful of people are seeding the complete collection(s). I understand not everyone has or can seed multiple TBs of data but I was expecting there to be a lot of seeders for torrents with few hundreds of GBs.

mk_stjames · 2025-08-18T20:45:32 1755549932

Interesting to see that sci-hub is about 90TB and libgen-non-fiction is 77.5TB. To me, these are the two archives that really need protecting because this is the bulk of scientific knowledge - papers and textbooks.

I keep about 16TB of personal storage space in a home server (spread over 4 spinning disks). The idea of expanding to ~200 TB however seems... intimidating. You're looking at ~qty 12 16TB disks (not counting any for redundancy). Going the refurbished enterprise SATA drive route that is still going to run you about $180/drive = $2200 in drives.

I'm not quite there as far as disposable income to throw, but, I know many people out there who are; doubling that cost for redundancy and throw in a bit for the server hardware - $5k, to keep a current cache of all our written scientific knowledge - seems reasonable.

The interesting thing is these storage sizes aren't really growing. Scihub stopped updating the papers in 2022? At honestly with the advent of slop publications since then, the importance of what is in that 170TB is likely to remain the most important portion of the contrib for a long time.

jasonfarnon · 2025-08-18T23:45:33 1755560733

"Scihub stopped updating the papers in 2022"

True but it matters a lot less in many fields because things have been moving to arXiv and other open access options, anyway. The main time I need sci-hub is for older articles. And that's a huge advantage of sci-hub--they have things like old foreign journal articles even the best academic libraries don't have.

As for mirroring it all, $2200 is beyond my budget too, but it would be nothing for a lot of academic departments, if the line item could be "characterized" the right way. To me it has been a bit of a nuisance working with libgen down the last couple months, like the post mentioned, and I would have loved for a local copy. I don't see it happening, but if libgen/sci-hub/annas archive goes the way of napster/scour, many academics would be in a serious fix.

account42 · 2025-08-19T08:26:25 1755591985

It's 167.5, not ~200, and you can get disks much larger than 16 TB these days - a quick check shows 30 TB being sold in normal consumer stores although ~20 TB disks may still be more affordable per byte.

bawolff · 2025-08-18T21:49:16 1755553756

A lot of these are (relatively large) pdfs, right?

I wonder how much space it is as highly compressed, deduplicated, plain text files.

Does the sum of human scientific knowledge fit on a large hard drive?

mk_stjames · 2025-08-18T22:33:40 1755556420

In text form only (no charts, plots, etc)- yes, pretty much all published 'science' (by that I mean something that appeared in a mass publication - paper, book, etc, not simply notes in people's notebooks) in the last 400 years likely fits into 20TB or so if converted completely to ASCII text and everything else is left out. Text is tiny.

The problem is it's not all text, you need the images, the plots, etc, and smartly, interstitially compressing the old stuff is still a very difficult problem even in this age of AI.

I have an archive of about 8TB of mechanical and aerospace papers dating back to the 1930s, and the biggest of them are usually scanned in documents, especially stuff from the 1960s and 70s, that have lots of charts and tables that take up a considerable amount of space, even in black and white only, due to how badly old scans compress (noise on paper prints, scanned in, just doesn't compress). Also many of those journals have the text compressed well, but they have a single, color, HUGE cover image as the first page of the PDF, that turns the PDF from 2MB into 20MB. Things like that could, maybe, be omitted to save space...

But as time goes on I start to become more against space-saving via truncation of those kind of scanned documents. My reasoning is that storage is getting cheaper and cheaper, and at some point the cost to store and retrieve those 80-90MB PDF's that are essentially total page by page image scans is going to be completely negligible. And I think you lose something be taking those papers and taking the covers out, or OCR'ing the typed pages and re-typesetting them to unicode (de-rasterize the scan), even when done perfectly (and when not done perfectly, you get horrible mistakes in things like equations, especially). I think we need to preserve everything to a quality level that is nearly as high as can be.

bawolff · 2025-08-19T00:24:17 1755563057

> In text form only (no charts, plots, etc)- yes, pretty much all published 'science' (by that I mean something that appeared in a mass publication - paper, book, etc, not simply notes in people's notebooks) in the last 400 years likely fits into 20TB or so if converted completely to ASCII text and everything else is left out. Text is tiny.

20 TB uncompresssed text is roughly 6TB compressed.

I just find it crazy that for about $100 i can buy an external hard drive that would fit in my pocket that can in theory carry around the bulk of humanity's collected knowledge.

What a time to be alive. Imagine telling someone this 100 years ago. Hell, imagine telling someone this 20 years ago.

polytely · 2025-08-19T16:44:40 1755621880

there is post of Anna's archive blog about exactly that, we basically have to hold on until (open source) OCR solutions are good enough and then it suddenly starts to become feasible to have all the world's published knowledge on your computer

vlade11115 · 2025-08-05T18:04:24 1754417064

Claude Code has two usage modes: pay-per-token or subscription. Both modes are using API under the hood, but with subscription mode you are only paying a fixed amount a month. Each subscription tier has some undisclosed limits, cheaper plans have lower usage limits. So I would recommend paying $20 and trying the Claude Code via that subscription.

kace91 · 2025-08-05T19:14:04 1754421244

I’m looking for cursor alternatives after confusing pricing changes. Is Claude code an option? Can be integrated on an editor/ide for similar results?

My use case so far is usually requesting mechanic work I would rather describe than write myself like certain test suites, and sometimes discovery on messy code bases.

andyferris · 2025-08-05T22:43:57 1754433837

Claude Code is really good for this situation.

If you like an IDE, for example VS Code you can have the terminal open at the bottom and run Claude Code in that. You can put your instructions there and any edits it makes are visibile in the IDE immediately.

Personally I just keep a separate terminal open and have the terminal and VSCode open on two monitors - seems to work OK for me.

dennisy · 2025-08-05T18:27:51 1754418471

No Opus in the $20 tier though sadly

andyferris · 2025-08-05T22:39:53 1754433593

As far as I can tell - that seems to have changed today!

andyferris · 2025-08-06T03:14:53 1754450093

Actually I think I was wrong, the PR material was just vague about it.

oblio · 2025-08-05T18:42:23 1754419343

What does Opus do extra?

lxgr · 2025-08-05T18:49:31 1754419771

It's a much larger, more capable LLM than Claude Sonnet.

oblio · 2025-08-06T07:08:59 1754464139

I mean day to day. How is the coding experience different?

vlade11115 · 2025-07-21T17:19:28 1753118368

I love the site design.

> There's an obvious question looming here — if the models got so confused, how did they consistently pass the reconciliation checks we described above? It may seem like the ability to make forward progress is a good proxy for task understanding and skill, but this isn't necessarily the case. There are ways to hack the validation check – inventing false transactions or pulling in unrelated ones to make the numbers add up.

This is hilarious. I wonder if someone is unintentionally committing fraud by blindly trusting LLMs with accounting. Or even worse, I bet that some governments are already trying to use LLMs to make accounting validators. My government sure wants to shove LLMs into digital government services.

pavel_lishin · 2025-07-21T18:30:45 1753122645

Lawyers have used it to write briefs; I would be very surprised if someone, somewhere wasn't slowly running a company into the ground by using ChatGPT or another LLM for accounting.

koolba · 2025-07-21T20:17:30 1753129050

Imagine the fallout from books cooked by an LLM hallucinating revenue.

mvieira38 · 2025-07-21T21:45:06 1753134306

[about the website design] As a bonus for my fellow privacy schizos, the page works fine with 3rd party frames and 3rd party scripts disabled on uBlock, and still looks very good with no remote fonts and no large media. Quite an accomplishment for such a cool looking page

falcor84 · 2025-07-21T18:31:46 1753122706

I'm sure that any accounting trick that an LLM can think of is something that is also used by some shady human accountants. The proper response should not be to avoid/prohibit AI but to improve the validation mechanisms.

o11c · 2025-07-21T18:41:25 1753123285

Counterpoint: if you detect a human accountant doing this, you can take action against the human. Computers will never meaningfully take the blame, and unfortunately usually mean not blaming any human either.

stillpointlab · 2025-07-21T19:47:39 1753127259

> you can take action against the human

I think that will depend on a case-by-case. I don't have any recent examples but I recall someone trying to sue one of those strip-mall tax preparation franchises over incorrect filings. My understanding is that the documents that you sign when you enroll in those services are pretty strictly in the favor of the company. I doubt you could ever go after the specific "human" that made the error even if it was maliciously done.

In the same way, if you pay for a tax service that uses AI agents, what you can and cannot "take action" for will probably be outlined in the terms of service that you accept when you sign up.

I would guess millions of people already use software based tax filing services (e.g. turbo tax) where no human at all is in the loop. I don't understand how swapping in an LLM significantly changes the liability in those cases. The contract will be between you and the entity (probably a corporation), not you and "computers".

Worth stating I am NOT a lawyer.

skmurphy · 2025-07-31T23:27:34 1754004454

The paid tax preparer also signs the return and is still "in the loop" from a liability perspective. See for example

https://www.irs.gov/payments/tax-preparer-penalties

falcor84 · 2025-07-21T19:22:40 1753125760

But still - if there's a way to detect accountants doing it - let's focus on making that detection even easier.

On a related note, can we use something like GAN here, with auditor AIs trained against accountant AIs?

ori_b · 2025-07-21T20:09:45 1753128585

The person using the tool is the accountant, regardless of whether the tool is a calculator and sheet of paper, QuickBooks, or an LLM.

OtherShrezzing · 2025-07-21T20:24:26 1753129466

No, I think in this particular case the proper response is for honest companies to avoid any systems which invent nonexistent transactions to reconcile books.

Most businesses don’t want to misrepresent their books, irrespective of the existence of shady accountants.

lazide · 2025-07-22T02:48:22 1753152502

It is really really common for book keepers to create transactions to reconcile books. Not okay, but ‘journal entries’ are pervasive.

kqr · 2025-07-22T06:56:55 1753167415

Called plug entries: https://en.m.wikipedia.org/wiki/Plug_(accounting)

victorbjorklund · 2025-07-22T07:11:07 1753168267

I have seen so many people doing their accounting with just ChatGPT.

vlade11115 · 2025-07-10T18:27:33 1752172053

> The DCO requires that the contribution be “created by me,” yet in many jurisdictions AI-generated code is not recognized as a copyright-protected work.

I get that, but what are the particular examples of such jurisdictions? For example, when I run the linter that fixes my code formatting, no one will think that I did not create it. What about autogenerated code? Is it not copyright-protected?

alganet · 2025-07-10T21:31:16 1752183076

This argument sounds like someone who was caught cheating and is trying to come up with weird excuses.

"We agreed to never cheat, but I saw you texting a friend, therefore I can now have sex with anyone I want"

It's lame.

vlade11115 · 2025-07-07T09:10:32 1751879432

> do I have to pay API based costs Usually, yes, you do. However, in this case, opencode kinda cheats by using Antropic client ID and pretending to be Claude Code, so it can use your existing subscription. > We recommend signing up for Claude Pro or Max, running opencode auth login and selecting Anthropic. It’s the most cost-effective way to use opencode. https://opencode.ai/docs/

vlade11115 · 2025-07-06T13:16:55 1751807815

> “Anybody who wants to put Valve out of business could do so, but nobody cares,” argues Michael Pachter, a gaming industry analyst at Wedbush Securities.

This is untrue; many tried. Almost every major publisher has its own launcher. The problem with them all is they absolutely suck. Even Epic Games Store, the biggest competitor with the most money poured into it, is ridiculously bad in almost every way. Aside from the lack of network effect, it just misses most of the QOL features, is slow, ugly, and very unpleasant to use. Almost universal agreement in the PC gaming community is that EGS is a bootloader for free games that it throws at the user. Every time I use EGS, I am constantly amazed by how bad it is, despite probably tens of millions of investments.

The second point that the article completely misinterprets is Microsoft's role. Microsoft (Xbox specifically) is hands down the closest company to beating Steam in its own game. PC game pass provides a constant stream of very good games available on day one for dirt cheap. The work that Microsoft is doing on optimizing Windows for games in general and for handheld consoles in particular is very promising (see Xbox Ally X). This is the threat that Valve faces. Not just a better store, but the absence of a store and "buying games" in general. For example, I intended to buy The Outer Worlds 2 on launch. Now that I know it will be available day one on Game Pass, there is almost zero chance that I will buy it on Steam or anywhere else.

msgodel · 2025-07-06T19:30:32 1751830232

GOG has the only serous competitor to Steam IMO. It's the only big store that's well behaved enough for people to actually want to use it.

wqaatwt · 2025-07-08T06:55:52 1751957752

Isn’t it very niche? What’s their market share?

Yeul · 2025-07-07T16:52:44 1751907164

GOG is sabotaging themselves with their anti DRM policy.

The gaming industry will ALWAYS use DRM.

This will keep GOG a niche player.

Maskawanian · 2025-07-07T19:58:39 1751918319

Their anti-DRM stance is arguably their only compelling feature as opposed to just buying steam.

I remember the old game launcher that was called Desura?, which a lot of those third-party keys sites would sell keys for, and it's no longer around. So one of the main concerns with these online platforms is them disappearing and losing all your stuff.

Quite frankly, I am happy that GOG exists for this niche. It would be less "sabotaging" if consumers would actually vote in their own interests. However, this has been shown time and time again to never happen, sadly.

klik99 · 2025-07-11T17:46:10 1752255970

I suspect by "anybody" they meant anyone outside of the "relatively small but healthy sector", and therefore competitors such as Epic or GOG wouldn't count. Outside analysts think of games as a subset of tech and that big tech companies would merely need to turn the eye of sauren towards games and they could conquer the market.

But outside analysts are wrong and completely misunderstand the games industry, see https://www.linkedin.com/posts/ethanevansvp_as-vp-of-prime-g...

vlade11115 · 2025-07-03T20:11:22 1751573482

I recently switched from Pro to $100 Max, and the only difference I've found so far is higher usage limits. Antropic tends to give shiny new features to Max users first, but as of now, there is nothing Max-only. For me, it's a good deal nonetheless, as even $100 Max limits are huge. While on Pro, I hit the limits each day that I used Claude Code. Now I rarely see the warning, but I never actually hit the limit.