Sorry, but AI still seems to be trash at anything moderately more complex than b...

bogtog · 2025-12-24T12:40:22 1766580022

> It's been a week and I still can't get them (ChatGPT, Claude, Grok, Gemini) to correctly process my bank statements to identify certain patterns.

Can you give any more details on what you mean? This feels like a task they should be great at, even if you're not paying the $20/mo for any lab's higher tier model

Razengan · 2025-12-24T12:52:28 1766580748

I have a couple banks that are peculiar in the way they handle transactions made in a different currency while traveling etc. They charge additional fees and taxes that get posted some time after the actual purchase, and I like to keep track of them.

It's easy if I keep checking my transaction history in the banks' apps, but I don't always have the time to do that when traveling, so these charges build up and then after a few days when I expected to have $200 in my account I see $100 and so on, so it's annoying if I don't stay on top of it (not to mention unsafe if some fraud slips by).

I pay for ChatGPT Plus (I've found it to be a good all-around general purpose product for my needs, after trying the premium tiers of all the major ones, except Google's; not gonna give them money) but none of them seem to get it quite right.

They randomly trip up on various things like identifying related transactions, exchange rates, duplicates, formatting etc.

> This feels like a task they should be great at

That's what I thought too: Something that you could describe with basic guidelines, then the AI's "analog" inference/reasoning would have some room in how it interprets everything to catch similar cases.

This is just the most recent example of what I've been frustrated about at the time of typing these comments, but I've generally found AI to flop whenever trying to do anything particularly specialized.

CPLX · 2025-12-24T13:06:32 1766581592

If you installed Claude Code and put all your statements into a local folder and asked it to process them it could do literally anything you could come up with all the way up to setting up an AWS instance with a website that gives nifty visualizations of your spending. Or anything else you are thinking of.

Razengan · 2025-12-24T13:15:57 1766582157

I may try that, but at this point it's already more work wrestling with the AI than just doing it myself.

The most important factor is confidence: After seeing them get some things mixed up a few times, I would have to manually verify the output myself anyway.

----

Re: the multiple comments that suggest to ask AI for code instead of feeding data to the chatbot:

I get what you mean, but I WANT the AI's non-deterministic AIness in this case!

For example, in some countries there are these "omni apps" that can be used for ride hailing or ordering food etc. The bank statement lists all such transactions with the same merchant name. I want the AI to do its AI thing to guess which transactions were rides and which were food deliveries, based on the prices and times etc. Like if there are multiple small transactions those are taxis, and the most expensive transactions during a day are my lunch and dinner.

And there are other cases, that would be too much "imperative" code that would fail anyway.

Like I said, this is a task that any human could do easily after a short explanation, but takes a hell of a lot of wrangling with AI.

CPLX · 2025-12-24T14:17:40 1766585860

I had the same vague impression as you did when using AI via browser/chat interaction. Like it’s very impressive but how useful is it really?

Using it via the CLI approach as an entirely different experience. It’s literally shocking what you can do.

For context, among many other things I have done this exact thing I am recommending. I just hit export on a Quickbooks instance of a complex multimillion dollar business and had Claude Code generate reports on various things I wanted to optimize and it just handles it in seconds.

The real limit to these tools is knowing what to ask for and stating the requirements clearly and incrementally. Once you get the hang of it, it’s literally shocking how many use cases you can find.

scotty79 · 2025-12-24T19:59:49 1766606389

I think a good mental model of what you can expect from a chat bot is imagining that somebody read tje bank statement to you and them asked you a bunch of questions. Could you follow that, not make smy mistakes, not forget anything? Cam you perform the task "from the top of your head", not writing anything down, not pulling up excell or a calculator? If you can there's a good chance AI will be able to do that too. The fact that it sometimes can do more is pure miracle. And if you want it to do those things consistently you need to provide it with access to the tools you'd need to perform thus task consistently.

Razengan · 2025-12-25T04:04:11 1766635451

> somebody read the bank statement to you…

But it's not that. I'm GIVING it the data.

It's simple, I can do it myself:

Go row by row. See a certain phrase in the transaction description? Look a few rows ahead. Spot associated fees with just a glance. Write that group of transactions down somewhere else.

That's it.

I tried different kinds of prompts, from imperative to declarative, including telling the AI to write a script for its own internal use, but they just don't seem to get it.

scotty79 · 2025-12-25T09:28:04 1766654884

AI has purely linear input channel. It gets tokens one by one. Context is a form of short term memory. I know, that because you give it written text it seems like you provide it with a document it should be able to process in any way it likes, but the system is set up as if you read the document to AI, word by word and asked questions about it, that it needs to answer "of the top of its head".

> It's simple, I can do it myself:

> Go row by row. See a certain phrase in the transaction description? Look a few rows ahead.

Can you do it without looking at the document? Just by ear? Every time correctly? Without missing something?

Razengan · 2025-12-25T13:18:39 1766668719

Whatever the reasons/excuses, the initial assessment stands: AI is still far from "butler" level assistance with anything much beyond simple tasks.

Maybe by next Christmas?

scotty79 · 2025-12-25T19:13:40 1766690020

I think you can find what you are looking for in agentic AIs that can use tools, write programs and execute them even today.

In short, you are holding it wrong. ;-)

dgacmu · 2025-12-24T14:15:28 1766585728

This is exactly why you have it write code instead of analyzing the data. You can have tests, you can inspect then code, you know that the process will be deterministic. The chatbot LLMs are a bad match for bulk data analysis on regular, structured data. But they're often quite decent at writing code.

CPLX · 2025-12-24T15:57:35 1766591855

> Like I said, this is a task that any human could do easily after a short explanation, but takes a hell of a lot of wrangling with AI.

Replying to your edit. It just doesn’t. It’s almost effortless and fast to do exactly what you’re describing, capturing the subjective judgement of AI, to do what you want.

It took me a couple weeks to get very very good at it with good results in the first day or two. If you’re a competent programmer you’ll have the same experience and quickly if you get into the flow that’s being described to you.

I’m the ultimate skeptic I understand where you’re coming from but these workflows are crazy powerful.

darkstarsys · 2025-12-24T13:45:53 1766583953

This is the right answer. Don't just feed the data to a chatbot; have it write code to do what you want, repeatably and testably. You can probably have working python (and a docker container for it) in under 30 min.

bogtog · 2025-12-24T13:18:25 1766582305

Thanks for sharing. I'm surprised you can't just ctrl-a + copy-paste your bank statement and get it to work easily

yeasku · 2025-12-24T13:22:33 1766582553

Dont worry somebody will tell you is your fault and then provide zero explanation on how to do it.

azuanrb · 2025-12-24T16:39:20 1766594360

I've been dealing with this in 2 ways:

1. Put bunch of bank statements pdf in a folder, give a deterministic output for each pdf. Then ask Claude Code to do whatever I want. Good enough.

2. My preferred approach is similar to above but ask it to write a script instead, eg in Ruby. That way I have proper test, 100% guarantee it'll work and no regression. AI is non deterministic by default so asking any kind of agent to give a deterministic output seems unreliable to me. In the end I've turned it into a CLI, and been using it till now.

That's how I use AI. Indirectly to get what I want. Chat, CLI, it's all just a medium.

cyberrock · 2025-12-24T14:43:55 1766587435

Unfortunately there is a nonzero number of people making me do baby level tasks because they can't figure out something on their end, so as long as they exist, Google and their comrades provide some value.

brap · 2025-12-24T14:43:19 1766587399

I generally agree that they are garbage at producing code beyond things that are trivial. And the fact that non-techies use them as “fact checkers” is also disturbing because they are constantly wrong.

But I have found them to be very helpful for certain things, for example I can dump a huge log file and a chunk of the codebase and ask it to trace the root cause, 80% of the time it manages to find it. Would have taken me many hours otherwise.

jeffbee · 2025-12-24T14:55:27 1766588127

Do you actually pay for all these or are you basing your judgement on the free models (Gemini Fast, etc)?

Anyway the way to succeed in this task is to ask the model to write the program that analyses your bank statements, then read and check the program, and use it.

wepple · 2025-12-24T14:20:39 1766586039

> Sorry, but AI still seems to be trash at anything moderately more complex than baby level tasks.

How familiar are you with the concept of the jagged frontier? That is, AI does indeed fail at things we might expect a third grader to be capable of. However, it is also absolutely exceptional at a lot of things. The trick is A) knowing which is which and B) being able to update yourself when new capabilities are unlocked

So yeah, it’s unsurprising you found a use case it couldn’t trivially do. But being able to one-shot quite complicated applications that may have taken a day to get right previously is an astonishingly useful thing, no?