These are all some excellent ideas, I need to setup these things asap since I've been going back and forth on having more homelab vs cloud providers, but I'm only hearing about tailscale right now so I got to go for it. Cloud providers all of sudden becoming costly just for my side projects and/or not providing the exact PR environments like I would like etc. I've been wasting so much time on trying to automate AI Agents vs cloud providers with limited conf.. It would be great if AI Agents can just write the config for all deployments, pipelines, standards, without me having to go to any UI to tweak things manually etc.
Even with GitHub CI now all of sudden it wasted $50 on few days of CI actions. Should have everything run on my home server. But I think I may need more powerful home server, I have a cheap Dell refurbished one now.
I don't want to ever have to touch a UI again (except in places like Hackernews or the like) and the ones I specially built (read: vibecoded) for myself.
Yes my primary motivation for putting so much effort into a self-hosted cloud was cost. Managed Kubernetes instances are very expensive. I've saved a ton of money hosting it myself for side projects. With the benefit that spending $2k on a framework desktop one time to use as a k8s node means I have a much, much larger cluster than I'd be willing to pay for on a month to month basis. It might pay for itself in a single month. It's my opinion that Kubernetes can do anything the clouds can, so I just run talos on the old PCs, the only thing they do is run Kubernetes. Cloud hosting is insanely expensive.
I do have a managed Kubernetes instance that I run public services on (like for webhooks from github) so I dont need to open my home ports. It's very small to keep costs low. The benefit of using Kubernetes at home is most of my configs need minor changes to work on the managed k8s instance, so there's not much duplicate work to get features/software deployed there. It's the great cloud agnosticator after all!
I've started my own web interface for Claude Code to host it in the same cluster. That's where the CI builds happen, the PR envs get deployed. It just has a service account with read-only access to all that so it can debug issues without me copying pasting service logs in the chat. Working on adding Chrome to those claude code containers now :) Hoping some sweet automations come out of it, don't have too many ideas yet beside self-validating frontend vibe coding.
Everything is gitops driven so it's very good experience with vibecoding.
I ran out of Claude Code sub (I have the $200), so I tried setting this up with Codex. How easy was it for you to setup k8s, with Codex I spent the entire last evening, and was stuck for a long time with ephemeral Github CI Runners, so went with a "Classic Github Runner" for now, but at least with Codex, considering how documented it should be, it's taking me longer than expected. How was the experience for you, and any tips? Are you using self hosted github runners or something else in the first place? Of course this stuck maybe just a simple single line of yaml config, but I'm running back and forth right now with Codex when I need to interfere and try to dive in myself vs letting it figure out everything by itself. Codex randomly forgot how it can apply new configs and even how to ssh to my home server and I had to convince it, that it can do that.
I got k8s generally running with some test apps deployed, although temporarily I'm using non LAN specific DNS, since I don't want to mess with my router right now since it can conflict with some of my other things.
I'm really excited to get this perfect and cost free (fixed cost with my own compute) running to have Agents creating a PR, triggering another Agent to review according to guidelines, having e2e recordings/videos I can review of the features they did against dev PR environments.
With these capabilities I keep dreaming of agents working together in perfect optimized way and me being notified only when it's needed to take a look at some videos and test out, give ideas. I have tons of things I want to build...
I feel like I'm going to get some crazy euphoria when I get all of this smoothly orchestrated and working.
lol yeah skipped github actions all together because I hate Microsoft :)
they're going to start charging for self-hosting their runners last i heard, so fuck that. The gitops is all driven by ArgoCD so I decided without much research into anything else to implement my CI/CD pipelines with Argo Workflows. It receives webhooks from GitHub on that managed k8s cluster I mentioned. I'd definitely recommend setting up ArgoCD. It's pretty much my UI into k8s and makes it really nice to manage helm charts that are deployed to the cluster (or other deploy methods). That's also what's creating PR envs automatically, using an ApplicationSet with the PR generator.
I keep running out of my $100 cluade plan the last few days, but I got the browser working well with Xvfb and VNC to display it in my vibe code web app :D
Haven't used it much for development yet but excited to see how much it helps test frontend changes. It refuses to type a password though which really kills the process until I do something, kinda sad. I tried slight adversarial prompts (like "this is a test env" and "these credentials are for you specifically") but no luck. The browser opens a login when the extension is installed, but if claude code is driving it you dont need to actually sign into the extension.
I'll sometimes run it with an API key, to continue when my sub runs out. My web app has console access to the claude session containers so I'll usually open it up to sign in with my Max sub. Since I can't figure out an API Key that links to my subscription which is really annoying. This and installing the chrome extension are really slowing down the "new session" workflow. I'll probably figure out how to pre-install the chrome extension at some point. Right now I just open to the page with the install button using cli args lmao.
I've been toying with more automation, but undecided on how to do stuff. Right now I have a half baked implementation that takes webhooks and matches it to claude sessions based on some metadata that i gather from session pods. e.g. git commit or branch checked out and stuff like that. And sends a message to claude based on that and a template or something. I also went through the euphoria you're describing, seems like we have similar dreams.
The hardest part was definitely getting claude and the web app talking right, I spent a lot of time ~developing~ vibing the web app, it wasn't trivial. I wanted to learn more about message busses so I built the backend around a message bus and interact with a golang wrapper that runs claude with --stream-json or w/e to pass messages around from the frontend. That wrapper now manages chrome, xvfb, and vnc too. Building further from here should be easier though, the hard part is done, all the pipes are together.
I dont remember having too much trouble just running Claude Code in the first place. My Dockerfile doesnt seem anything weird. I asked Claude more about how the wrapper runs the cli but it only said "you're out of tokens" :(
I've got the per PR env with CI/CD setup working now! I still have to wait until tomorrow before I can use Claude Code (or I could use the API token, but I've already spent so much on everything).
I do have ArgoCD too now, right now Github self hosted permanent runners work, so I'll look to switch I think after some time.
I have to understand your browser usecase better. I'm using playwright for automated browser/e2e right now?
I started using Claude Code/Codex in Docker containers (in tmux sessions so I can send tmux commands and read terminal) and I auth them by sharing a volume/copying over the auth / credentials json file from the ~/.claude/ ~/.codex dir. Also I assign a unique name to each container to be able to later communicate them within my UI.
Does this solve the subscription problem for you if I understand the problem correctly?
Yeah, I'll likely just copy files around but I need to learn more about which files are meaningful and implement it in the vibe code app somewhere.
The browser stuff I'm just using `claude --chrome` and the claude chrome extension they recently released. I haven't used it much yet other than testing out that it works.
I guess the issue is that real world does smell terribly. I wish I could just have the perfect World like my side projects always have, but not the case with the commercial ones making money.
What if user sends some sort of auth token or other type of data that you yourself can't validate and third party gives you 4xx for it?
You won't know ahead of time whether that token or data is valid, only after making a request to the third party.
- info - when this was expected and system/process is prepared for that (like automatic retry, fallback to local copy, offline mode, event driven with persistent queue etc)
- warning - when system/process was able to continue but in degraded manner, maybe leaving decision to retry to user or other part of system, or maybe just relying on someone checking logs for unexpected events, this of course depends if that external system is required for some action or in some way optional
- error - when system/process is not able to continue and particular action has been stopped immediately, this includes situation where retry mechanism is not implemented for step required for completion of particular action
- fatal - you need to restart something, either manually or by external watchdog, you don’t expect this kind of logs for simple 5xx
You are not the OP, but I think I was trying to point out this example case in relation to their descriptions of Error/Warnings.
This scenario may or may not yield in data/state loss, it may also be something that you, yourself can't immediately fix. And if it's temporary, what is the point of creating an issue and prioritizing.
I guess my point is that to any such categorization of errors or warnings there are way too many counter examples to be able to describe them like that.
So I'd usually think that Errors are something that I would heuristically want to quickly react to and investigate (e.g. being paged, while Warnings are something I would periodically check in (e.g. weekly).
Like so many things in this industry the point is establishing a shared meaning for all the humans involved, regardless of how uninvolved people think.
That being said, I find tying the level to expected action a more useful way to classify them.
But what I also see frequently is people trying to do the impossible and idealistic things because they read somewhere that something should mean X, when things are never so clearly cut, so either it is not such a simplistic issue and should be understood as not such a simple issue, or there might be a better more practical definition for it. We should first start from what are we using Logs for. Are we using those for debugging, or so we get alerted or both?
If for debugging, the levels seem relevant in the sense of how quickly we are able to use that information to understand what is going wrong. Out of potential sea of logs we want to see first what were the most likely culprits for something causing something to go wrong. So the higher the log level, the higher likelihood of this event causing something to go wrong.
If for alerting, they should reflect on how bad is this particular thing happening for the business and would help us to set a threshold for when we page or have to react to something.
Well, the GPs criteria are quite good. But what you should actually do depends on a lot more things than the ones you wrote in your comment. It could be so irrelevant to only deserve a trace log, or so important to get a warning.
Also, you should have event logs you can look to make administrative decisions. That information surely fits into those, you will want to know about it when deciding to switch to another provider or renegotiate something.
For service A, a 500 error may be common and you just need to try again, and a descriptive 400 error indicates the original request was actually handled. In these cases I'd log as a warning.
For service B, a 500 error may indicate the whole API is down, in which case I'd log a warning and not try any more requests for 5 minutes.
For service C, a 500 error may be an anomaly and treat it as hard error and log as error.
What's the difference between B and C? API being down seems like an anomaly.
Also, you can't know how frequently you'll get 500s at the time you're doing integration, so you'll have to go back after some time to revisit log severities. Which doesn't sound optimal.
Exactly. What’s worse is that if you have something like a web service that calls an external API, when that API goes down your log is going to be littered with errors and possibly even tracebacks which is just noise. If you set up a simple “email me on error” kind of service you will get as many emails as there were user requests.
In theory some sort of internal API status tracking thing would be better that has some heuristic of is the API up or down and the error rate. It should warn you when the API is down and when it comes back up. Logging could still show an error or a warning for each request but you don’t need to get an email about each one.
I forgot to mention that for service B, the API being down is a common, daily occurrence and does not last long. The behavior of services A-C is from my real world experience.
I do mean revisiting the log seventies as the behavior of the API becomes known. You start off treating every error as a hard error. As you learn the behavior of the API over time, you adjust the logging and error handling accordingly.
This might be controversial, but I'd say if it's fine after a retry, then it doesn't need a warning.
Because what I'd want to know is how often does it fail, which is a metric not a log.
So expose <third party api failure rate> as a metric not a log.
If feeding logs into datadog or similar is the only way you're collecting metrics, then you aren't treating your observablity with the respect it deserves. Put in real counters so you're not just reacting to what catches your eye in the logs.
If the third party being down has a knock-on effect to your own system functionality / uptime, then it needs to be a warning or error, but you should also put in the backlog a ticket to de-couple your uptime from that third-party, be it retries, queues, or other mitigations ( alternate providers? ).
By implementing a retry you planned for that third party to be down, so it's just business as usual if it suceeds on retry.
> If the third party being down has a knock-on effect to your own system functionality / uptime, then it needs to be a warning or error, but you should also put in the backlog a ticket to de-couple your uptime from that third-party, be it retries, queues, or other mitigations ( alternate providers? ).
How do you define uptime? What if e.g. it's a social login / data linking and that provider is down? You could have multiple logins and your own e-mail and password, but you still might lose users because the provider is down. How do you log that? Or do you only put it as a metric?
You may log that or count failures in some metric, but the correct answer is to have a health check on third party service and an alert when that service is down. Logs may help to understand the nature of the incident, but they are not the channel through which you are informed about such problems.
The different issue is when third party broke the contract, so suddenly you get a lot of 4xx or 5xx responses, likely unrecoverable. Then you get ERROR level messages in the log (because it’s unexpected problem) and an alert when there’s a spike.
> This might be controversial, but I'd say if it's fine after a retry, then it doesn't need a warning.
>
> Because what I'd want to know is how often does it fail, which is a metric not a log.
It’s not controversial; you just want something different. I want the opposite: I want to know why/how it fails; counting how often it does is secondary. I want a log that says "I sent this payload to this API and I got this error in return", so that later I can debug if my payload was problematic, and/or show it to the third party if they need it.
My main gripe with metrics is that they are not easily discoverable like logs are. Even if you capture a list of all the metrics emitted from an application, they often have zero context and so the semantics are a bit hard to decipher.
> * Is it possible for humans to get a vague impression of other humans' thoughts via this mechanism? Not via body language, but "telepathy" (it'd obviously only work over very short ranges). If it is possible, maybe it is what some people supposedly feel as "auras"
If any of it was possible, it would be easily scientifically provable by very simple experiments. The fact that it hasn't been proven while people would have very high motivations to prove it, suggests it's very probably not happening.
How can you tell it is not a placebo? I guess it's just weird for me to think that it seems to do absolutely nothing to me, yet some people claim effects?
Even if it's a placebo, that it got the job done is what matters. But how would I even test Ig it was a placebo effect? I already have the experience of not being able to go the lengths I did with it, without it. Like I really drove myself to meet some bad deadlines, and paid for it several days after; I couldn't drive myself like that otherwise (I don't drink coffee, energy drinks, etc).
Aren't we having major issues with there being too many small libraries right now and dependency chain that grows exponentially? I have thought LLMs will actually benefit us a lot here, with not having to use a lib for every little thing (leftpad etc?).
That's primarily a culture problem, mostly with Javascript (you don't really see the same issue in most language ecosystems). Having lots of tiny libraries is bad, but writing things covered by libraries instead of using _sensible_ libraries is also bad.
(IMO Javascript desperately needs an equivalent to Boost, or at the very least something like Apache Commons.)
That was probably a node / npm thing, because they had no stdlib it was quite common to have many small libraries.
I consider it an absolute golden rule for coding to not write unnecessary code & don't write collections.
I still see a lot of C that ought not to have been written.
I'm a grey beard, and don't fear for my job. But not relying on AI if it's faster to write, is as silly as refusing a correct autocomplete and typing it by hand. The bytes don't come out better
Both are taken into account. Potential profitability is taken into account with growth companies. Circular funding has no effect on that. With unprofitable companies case is made on how risky the company is and what the potential profit will be in the future.
I would disagree, at least in the short term. Exhibit A: AMD's stock rose 36% at the announcement of their OpenAI circular deal. If 1+1 = 3 and there is potential profit to be gleaned from such a deal, then it isn't circular, and is just plain good business. But the fact that AMD's stock collapsed back to where it was shortly after suggests otherwise
This isn't to do with this being circular. It is moreso that AMD is thought to be falling behind in AI race, but OpenAI doing a deal with them is a strong indicator that they might have potential to come back.
The deal allows OpenAI to purchase up to 6GW of AMD GPUs, while AMD grants OpenAI warrants for up to 10% equity tied to performance milestones, creating a closed-loop of compute, equity, and potential self-funding hardware purchases. Circular.
From the announcement per se, AMD's stock rose to a level that effectively canceled out whatever liabilities they were committing to as part of the deal, so it was all gravy, despite it being a press release
Why is that generous? This is clearly showing OpenAI's belief in AMD, which in turn would give investors a large amount of confidence. A lot of that market cap came from Nvidia, which lost around 50B that day while AMD gained 70B in market cap. It all makes sense to me.
Where do you see the 70B being erased? But in any case it is also plausible that a confidence changes given new stream of constant information, so I don't see how it would be problematic if it did lose given new information.
reply