Fwiw, in my experience you don't actually need a CDN just to survive HN. It may be enough to just make sure you're not hitting a DB on every request; ideally you'd be caching the HTML output wholesale (via static site generator or otherwise)
For reference: with cached HTML, my single Node.js process running on Heroku's cheapest tier has weathered the front page multiple times without breaking a sweat
I switched backends a bunch of times because everything I tried (Go stdlib HTTP, Tornado, etc.) kept getting taken out whenever I would hit the front page, either due to CPU overload or some sort of resource leak. I ended up using Warp+Wai+Servant (https://github.com/yesodweb/wai) and it has been smooth sailing since then off my $3/mo VPS. It can take thousands of req/sec without flinching (which is higher than what you see from top of HN - that maxes out at a few hundred req/s).
Yes, I first tested locally with httperf and some other tools. I took it as a good sign when the load testing tools crashed before my server did. Then, I found a few services by searching for something like "website load test" and using their free tier (which would typically generate something like a few hundred req/sec - sufficient to simulate HN).
For me: it's less to manage, it's less to learn (AWS is a nightmare from my perspective), and I enjoy other benefits like the fact that one codebase can generate and then serve up the site, and the fact that it's vendor-agnostic (just clone/npm install/run). Also allows easy customization of headers and redirects, allows for the odd dynamic route, and makes local dev/previewing super simple
OP may or may not feel the same! Just wanted to communicate that a simple server can definitely do the job
A CDN is its own thing- it's distributed across a provider; it can't just be served off a simple box. It requires having or gaining familiarity with a specific provider, as well as other constraints like you have to statically export to the file system (can't just cache responses in memory), and you can't have any dynamic content without standing up a separate server, etc
Makes sense for a lot of things, but it comes with downsides, especially for hobbyists! I've found I prefer sticking with a simple server for my website, and OP might find it's easier to do that too
That's not at all true. A CDN is a content delivery network. There is nothing that says it isn't a network of a single host on the same machine as the original content.
It's just a cache that returns content faster than the original content.
> A content delivery network, or content distribution network, is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users.
Emphasis mine. CDN implies some sort of edge-hosting topology (or at least more-proximal than the main servers).
Not sure why you think CDN implies geographically distributed edge hosting. Are you getting that from Cloudflare's marketing? A company that sells a geographically distributed network?
CDN is a term used for years before Cloudflare was funded[0][1] to mean exactly that. Specifically[0]:
>CDNs act as trusted overlay networks that offer high-performance delivery of common Web objects, static data, and rich multimedia content by distributing content load among servers that are close to the clients.
and:
>CDNs first emerged in 1998 to address the fact that the Web was not designed to handle large content transmissions over long distances.
No, I'm getting it from wikipedia. It has nothing (directly) to do with Cloudflare or any other commercial interests - bittorrent is a content delivery network. You could run a private CDN. The whole purpose is moving the data to be consumed away from some centralized primary host, to nodes which are more proximal to the data consumer (either spatially, or solely in terms of bandwidth, torrent bandwidth is decoupled from the primary server). Bittorrent sort of automatically works out "proximity" by pulling from the highest bandwidth seeders. Also it's geographically distributed, providing redundancy and availability, which is arguably the more important part than proximity.
I think the criteria are that it a) delivers/distributes content b) is a network, implying multiple nodes c) lowers the latency and/or bandwidth cost of data consumption, by d) leveraging geographically distributed redundancy and/or proximity. I think the key feature is geographically distributed redundancy which differentiates it from a regular cache.
Aws clountfront is configurable to be one region or global. So would cloudfront not count as a cdn in your mind if it’s configured to be a single region?
For reference, it's a fully static site on a low-end shared host. The post had quite a few images, which were pngs from DALL-E, but I've just now recompressed as smaller jpg.
I'll see on the order of 10k-25k hits (hard to say exactly, most of HN uses adblockers/tracker blockers and I use CloudFlare for caching) from an article on the HN frontpage. It's not that bad, and I could almost certainly serve it off my colo'd server without any trouble - bandwidth just isn't that high.
But as my blog is entirely static (except for the comment threads, hosted on my Discourse forum), I just let CloudFlare serve it. I had to do some tweaks to the configs to say, "No, really, cache everything!" (it doesn't do that by default for a range of very valid reasons, none of which apply to me), but once that change went in, I'll see 98.5% or higher "served out of cache" ratios when I'm seeing a lot of traffic from HN or somewhere.
I'd originally designed it to be hosted out of a Google Cloud bucket with CloudFlare (egress traffic is cheaper that way than out to the internet), but I eventually decided to host on my server, as I could then do Tor and some other stuff more easily. I've got the server anyway...
One of these days, I may play with dropping analytics entirely and just passing requests through to my server, let images remain cached as that's the bulk of my bandwidth. Then I can go even more oldskool and parse my server logs for stats and referrers and such!
> Then I can go even more oldskool and parse my server logs for stats and referrers and such!
Expect to see a bunch of bots. I tried setting up server-side analytics for a WordPress-based website, but I had to get rid of it as the bot traffic made it essentially useless.
My $3/mo vultr box can handle HN loads easily when using a fast and well-designed (namely resource-leak-free) backend (I've settled on https://github.com/yesodweb/wai based apps - the only thing that has worked well for me so far).
I front paged a couple times back in the day. In the neighborhood of thousands of pageloads and hundreds of concurrent users. Totally trivial for static HTML, but most people get into trouble with hand-rolled or poorly tuned blog frameworks that make multiple database calls on every visitor.
Second place for a few hours and ~1k points resulted in around 50k unique visits.
If your website is a collection of static files and you're hosting them on S3+CloudFront or something similar (GitHub pages works too), then it'll work without any issues and cost pennies for the whole thing.
I've gotten on the front page more than a few times. In my experience, it usually peaks around 1.5k concurrents for a blog post. Peak was 50k total visits over a couple days, but has been much less too. Depends on the content and how interesting it is to the wider HN audience.
I just hit #1 last week and frontpaged a bunch in the past. Peaked at around 250-300 concurrent visitors, totaling around 10k in a 24-hr period, which is on par with past experience.
Should work fine. I personally avoided using a reverse proxy like nginx or apache because they tend to have a ton of vulnerabilities (check out the CVE database results for "nginx"), making them a security management headache.
Any serious vulnerability in NGINX will be big news since it is so widespread. CVE database shows some entries by searching for "nginx" but I looked at all 2022 entries and the only ones affecting NGINX itself are in NJX plugin so actually not affecting NGINX core functionality.