Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PostgreSQL Configuration for Humans (postgresqlco.nf)
262 points by sharjeelsayed on Nov 8, 2020 | hide | past | favorite | 42 comments


Most of the time i just use that:

https://pgtune.leopard.in.ua/#/


One of the main reasons https://postgresql.conf exists is precisely to provide as much information as possible to understand how to tune PostgreSQL's configuration. If generating a config would be as simple as filling in a form, this wouldn't make sense.

It's tempting to think it may be a good starting point. Unfortunately, it might as well be a bad starting point: form-based configurations may be totally counter-productive. For instance, max_connections is a parameter that may cause outages or very bad performance if not properly configured. And its value depends, if anything, on having or not a connection pool. work_mem heavily depends on the types of queries that you run. Plus I disagree with some of the other choices made by pgtune's default behaviour.

All in all, I'd recommend to do a deeper study of the parameters, or seek some expert help, specially for production environments.

Disclaimer: I'm part of the team behind postgresql.conf


Hy,

Unfortunately the site is unavailable:

https://postgresql.conf/

I never wrote it's a brainless goto conf generator.

>max_connections is a parameter that may cause outages or very bad performance if not properly configured.

That's why you can define the "Number of Connections"


But if you don't, it suggests you one.

My point here is that some of the suggestions provided, if taken by a non expert user, may do more harm than good. And the reason is that there are many factors that need to be considered for most of the parameters. Something it's hard to capture from a form-based configuration generation tool.

That's why my recommendation of not using form-based configuration generation tools. They can be counter-productive.


Edit: url above is wrong, correct one is https://postgresqlco.nf my brain tricked me ;)


That's awesome! Do you know something similar for sane security settings?


No sorry. But for SSL security this is really nice:

https://ssl-config.mozilla.org/#server=postgresql&version=12...


This seems pretty cool, but it would benefit from some extra information on what it considers. And maybe links to documentation about the options your setting so you can double check.



Do you know if the differences in recommended settings between Web Application and Desktop Application are mainly about network-related aspects (latency etc.), or more about typical access patterns of web vs desktop? I'm thinking of the configuration for a remote PostgreSQL server, to be accessed from a local desktop application over the internet.


Desktop Application assumes that the server is running on the user's workstation and is very conservative in terms of memory usage to avoid interfering with other applications. Web Application, on the other hand, assumes that the server is running on a dedicated computer and has all memory for itself.


For me, the greatest pain of postgresql is upgrading between major versions. You need to install both versions to the same db server at the same time and then Tthe data must be copied (using pg_upgrade) between the two clusters, something that leads to downtime (that is proportional to the size of the data ie how much time it needs to copy and run some maintenance commands). If you consider that our cluster hosts databases for like 15 apps and each needs to be changed to display some "maintenance mode" home page you'lln understand my frustration and fear of major upgrades...

I know there are some ways that reduce the downtime, like hard linking the data between the clusters instead of version, or using logical replication between the clusters and switching to the new version cluster. The problem is that since I am not a full time DBA (and my organisation doesn't have one) I don't trust myself with these techniques and rely to the trusted pg_upgrade method that will leave the old cluster as it was with all its data intact in case anything went wrong!

The only good thing is that postgresql supports each major version for a lot of time (5 years) so such updates don't need to be very frequent :)


> I know there are some ways that reduce the downtime, like hard linking the data between the clusters instead of version

You need to have a proper backup of the database anyway, so I don't really see the risk in using the --link mode (which makes the downtime pretty much independent of the size of the actual data).


Well the problem is that to actually get the backup (using pgdump) I'd also needs time that is proportional to the size of the database; and during that time the database can't get any writes so the system will again be down!

Also I'd really rather avoid restoring the data from a backup unless there's a real disaster that happened.


pg_basebackup defaults to streaming WAL from whatever server it is taking a copy from, as backups are otherwise inconsistent. There's no need for the source database to be paused while doing it, and even if your backup takes several hours to complete, once it finishes the target will have all the writes that occurred while the backup was being taken.

Thought you might find this useful, as no one should be considering a Postgres backup as a downtime requiring process!


Thank you very much; I was using pg_dump to take backups until now!


There are other options, though. For upgrades, a virtual machine snapshot is a perfect backup, and with really large databases, you would likely have something using WAL archiving, which provides a continuous backup until the point that you shut down the database for upgrade. Finally, if you have a database replica, that can also functionally act as your backup.


Thanks! I'll consider the vmware snapshot and the WAL archiving (I wasn't familiar with that till now) :)


I was talking about a backup, not a dump (snapshot).

https://blog.dbi-services.com/what-is-a-database-backup-back...


>that is proportional to the size of the data

I agree upgrades are still a pain point, but pg_upgrade running time is not proportional to the size of the data.


Maybe i didn't express myself clearly (english is not my primary language) but since with pg_upgrade the data will be copied from the old cluster to the new one, doing an upgrade to a 10GB database would need ~ 10 times more than upgrading a 1 GB database, just for the data copying. So the time will be proportional to the size of the data.


I see, that's a good point: I'm used to pg_upgrade in --link mode, but if you want to be safe and have an isolated old cluster around, you're right: you'll still need to copy the data. Thanks for clarifying.


i wonder if you can attach a replica with a bigger major version, let it sync, and drop the old versions?


Yes I think this is possible (at least there are some blog posts recommending that way for upgrading without downtime) however configuring (and testing) that mechanism is not something I want to do :|


I thought this was going to be a written word article. It is a 52 minute video.

Shame on me for not looking at the domain.

It is a conference.


It is not. Look at the list on the left side menu. It gives recommendations and detailed descriptions for each tunable parameter in postgres.


Yeah, it's definitely a disappointing onboarding. It looks like the site is there to host a single video, but actually the real value is hidden on the left


And there's not much value there, because there's no structure apart from alphabetical in that list.

Which are the most important knobs to twist? Which ones should you never, never touch without 20 years' experience and a very peculiar requirement?

Give me something to read, please. "In one ear and out the other" is a saying for a very good reason.


Thank you for the feedback, we take it seriously.

There's an upcoming new version that will come with something similar to what you say: some "guide" on which parameters to tune, classified by categories, and with varying level of "expertise" (basic, medium, advanced). With specific guidance for those set of parameters.

Would that be useful to you?


Yes, it would. Thanks!


Sorry for the confusion. As pointed downthread, the idea is to provide help and information about postgresql.conf parameters.

This is important feedback for us. We will fix this landing page soon to include a search box, like search engines landing page, just for parameters.

There's also a significant new version coming soon, with capabilities to manage postgresql.conf configurations. Stay tuned. Hope that would become clearer.

Disclaimer: I'm part of the team behind postgresql.conf


How much does something like RDS get us there regarding configuration tuning?


We have seen up to a 2x improvement in Postgres' throughput after tuning it with OtterTune over the default Amazon configuration. Here is an overview of how OtterTune's algorithms work: https://aws.amazon.com/blogs/machine-learning/tuning-your-db...

Hit us up to see if we can do the same for you: https://ottertune.com/demo.html


RDS takes care of 1/3 to 1/2 of the configurations, particularly around hardware related options. However, since postgres uses MVCC, it has a lot of tuning options around autovacuum and replication that impact performance. This is even more true if you have access patterns that don't take into account postgres' MVCC (are you updating 10%+ of large tables daily, triggering autovacuums?) but which would be fine with mysql


I use this almost daily. Last time it was an hour ago -- checked when hot_standby's default changed from off to on (it was in Postgres 10).

Highly recommend it.


Does anyone know if there's a way to pull secrets into PG from an environment variable or similar and access it within PG in a way that does not directly expose the value to the db user or to logs?


While environment variables are not the most bulletproof place to store secrets, a plpython function will be able to access environment variable via the os module.



Won't the advice in that cursory blog post expose the plaintext to query logs?

IMHO doing the hashing and password comparison in application code has benefits in both shifting the compute workload to horizontally scalable non-db workers and keeping the plaintext off the wire/queries.


Any guides on how to combine SSD's, software RAID (Linux md) and ext4 for best performance? Plenty of guides for the individual components but not all of them together


Has anyone other than Amazon successfully migrated off oracle.. and ported stuff into Postgres... e.g. 1000s of PL/SQL procedures/packages moved to postgres


The nice thing about this website it the extra explanation on most of the options, with considerations explained.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: