Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You should've invested in CloudFront first because your site isn't loading.


I know. I wasn't actually expecting to hit front-page. Trying to reinforce it best I can.


Fwiw, in my experience you don't actually need a CDN just to survive HN. It may be enough to just make sure you're not hitting a DB on every request; ideally you'd be caching the HTML output wholesale (via static site generator or otherwise)

For reference: with cached HTML, my single Node.js process running on Heroku's cheapest tier has weathered the front page multiple times without breaking a sweat


I switched backends a bunch of times because everything I tried (Go stdlib HTTP, Tornado, etc.) kept getting taken out whenever I would hit the front page, either due to CPU overload or some sort of resource leak. I ended up using Warp+Wai+Servant (https://github.com/yesodweb/wai) and it has been smooth sailing since then off my $3/mo VPS. It can take thousands of req/sec without flinching (which is higher than what you see from top of HN - that maxes out at a few hundred req/s).


Did you use some software/service for load-testing the alternatives?


Yes, I first tested locally with httperf and some other tools. I took it as a good sign when the load testing tools crashed before my server did. Then, I found a few services by searching for something like "website load test" and using their free tier (which would typically generate something like a few hundred req/sec - sufficient to simulate HN).


What VPS are you using and can you recommend it?


Vultr. They have been totally solid for me. Nothing fancy - just a reliable cloud VM. Have had no reason to look for alternatives for web hosting.


Right, like something like Varnish.


Hadn't heard of Varnish, but yeah, it looks like a good solution the OP could probably layer over their current setup without too much trouble


At that point, why not just have a static blog hosted on an AWS bucket?


For me: it's less to manage, it's less to learn (AWS is a nightmare from my perspective), and I enjoy other benefits like the fact that one codebase can generate and then serve up the site, and the fact that it's vendor-agnostic (just clone/npm install/run). Also allows easy customization of headers and redirects, allows for the odd dynamic route, and makes local dev/previewing super simple

OP may or may not feel the same! Just wanted to communicate that a simple server can definitely do the job


Right. So a CDN?


A CDN is its own thing- it's distributed across a provider; it can't just be served off a simple box. It requires having or gaining familiarity with a specific provider, as well as other constraints like you have to statically export to the file system (can't just cache responses in memory), and you can't have any dynamic content without standing up a separate server, etc

Makes sense for a lot of things, but it comes with downsides, especially for hobbyists! I've found I prefer sticking with a simple server for my website, and OP might find it's easier to do that too


That's not at all true. A CDN is a content delivery network. There is nothing that says it isn't a network of a single host on the same machine as the original content.

It's just a cache that returns content faster than the original content.


Nope, that's just a proxy/cache.

> A content delivery network, or content distribution network, is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users.

Emphasis mine. CDN implies some sort of edge-hosting topology (or at least more-proximal than the main servers).


Not sure why you think CDN implies geographically distributed edge hosting. Are you getting that from Cloudflare's marketing? A company that sells a geographically distributed network?


CDN is a term used for years before Cloudflare was funded[0][1] to mean exactly that. Specifically[0]:

>CDNs act as trusted overlay networks that offer high-performance delivery of common Web objects, static data, and rich multimedia content by distributing content load among servers that are close to the clients.

and:

>CDNs first emerged in 1998 to address the fact that the Web was not designed to handle large content transmissions over long distances.

[0]: https://ieeexplore.ieee.org/document/1250586

[1]: https://link.springer.com/book/10.1007/978-3-540-77887-5


No, I'm getting it from wikipedia. It has nothing (directly) to do with Cloudflare or any other commercial interests - bittorrent is a content delivery network. You could run a private CDN. The whole purpose is moving the data to be consumed away from some centralized primary host, to nodes which are more proximal to the data consumer (either spatially, or solely in terms of bandwidth, torrent bandwidth is decoupled from the primary server). Bittorrent sort of automatically works out "proximity" by pulling from the highest bandwidth seeders. Also it's geographically distributed, providing redundancy and availability, which is arguably the more important part than proximity.

I think the criteria are that it a) delivers/distributes content b) is a network, implying multiple nodes c) lowers the latency and/or bandwidth cost of data consumption, by d) leveraging geographically distributed redundancy and/or proximity. I think the key feature is geographically distributed redundancy which differentiates it from a regular cache.

https://en.wikipedia.org/wiki/Content_delivery_network


Aws clountfront is configurable to be one region or global. So would cloudfront not count as a cdn in your mind if it’s configured to be a single region?


A region has multiple access points.


For reference, it's a fully static site on a low-end shared host. The post had quite a few images, which were pngs from DALL-E, but I've just now recompressed as smaller jpg.


Just curious, is there any general number range for how much traffic a front page post might get? Less than 10k, 10-100,000, 100,000+, etc


I'll see on the order of 10k-25k hits (hard to say exactly, most of HN uses adblockers/tracker blockers and I use CloudFlare for caching) from an article on the HN frontpage. It's not that bad, and I could almost certainly serve it off my colo'd server without any trouble - bandwidth just isn't that high.

But as my blog is entirely static (except for the comment threads, hosted on my Discourse forum), I just let CloudFlare serve it. I had to do some tweaks to the configs to say, "No, really, cache everything!" (it doesn't do that by default for a range of very valid reasons, none of which apply to me), but once that change went in, I'll see 98.5% or higher "served out of cache" ratios when I'm seeing a lot of traffic from HN or somewhere.

I'd originally designed it to be hosted out of a Google Cloud bucket with CloudFlare (egress traffic is cheaper that way than out to the internet), but I eventually decided to host on my server, as I could then do Tor and some other stuff more easily. I've got the server anyway...

One of these days, I may play with dropping analytics entirely and just passing requests through to my server, let images remain cached as that's the bulk of my bandwidth. Then I can go even more oldskool and parse my server logs for stats and referrers and such!


> Then I can go even more oldskool and parse my server logs for stats and referrers and such!

Expect to see a bunch of bots. I tried setting up server-side analytics for a WordPress-based website, but I had to get rid of it as the bot traffic made it essentially useless.


I can tell you it's more than a $5 dreamhost box can handle right now.


My $3/mo vultr box can handle HN loads easily when using a fast and well-designed (namely resource-leak-free) backend (I've settled on https://github.com/yesodweb/wai based apps - the only thing that has worked well for me so far).


I front paged a couple times back in the day. In the neighborhood of thousands of pageloads and hundreds of concurrent users. Totally trivial for static HTML, but most people get into trouble with hand-rolled or poorly tuned blog frameworks that make multiple database calls on every visitor.


Second place for a few hours and ~1k points resulted in around 50k unique visits.

If your website is a collection of static files and you're hosting them on S3+CloudFront or something similar (GitHub pages works too), then it'll work without any issues and cost pennies for the whole thing.


I've gotten on the front page more than a few times. In my experience, it usually peaks around 1.5k concurrents for a blog post. Peak was 50k total visits over a couple days, but has been much less too. Depends on the content and how interesting it is to the wider HN audience.


I once frontpaged with a funny article that readers also shared with others. Back when people still did that instead of taking a screenshot.

1,000,000 uniques over 3 days in ~2011.

I have been trying to re-create that high ever since, lol. Going viral is one hell of a drug.


Wow thats insane! I always think this is what going viral on social media is like, although I am probably better off not having experienced that.


I just hit #1 last week and frontpaged a bunch in the past. Peaked at around 250-300 concurrent visitors, totaling around 10k in a 24-hr period, which is on par with past experience.


10-100k iirc. Peak requests maybe 200-500req/sec (but not sustained). I had a few posts get >250k but those were on Reddit as well.


Hit the front page twice with two articles in the past.

Total traffic both times was around 60k over the course of 2-3 days.


Would nginx (caching everything) work on a $5/month VPS?


Should work fine. I personally avoided using a reverse proxy like nginx or apache because they tend to have a ton of vulnerabilities (check out the CVE database results for "nginx"), making them a security management headache.


Any serious vulnerability in NGINX will be big news since it is so widespread. CVE database shows some entries by searching for "nginx" but I looked at all 2022 entries and the only ones affecting NGINX itself are in NJX plugin so actually not affecting NGINX core functionality.

https://nginx.org/en/security_advisories.html shows one "medium severity" vulnerability in the last 4 years.


Huh, guess I haven't really checked on this since mid-early 2010s. Was a lot worse back then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: