Presented by Christophe Pettus.
Authors site: thebuild.com
- database is set of tables & schema objects,
- mysql "databases" === postgres "Schema"
- "cluster" means "collection of databases"
pg_ctl: start/stop
$PGBASE
- postgresql.conf
- security: "role" === object that can own other objects and that has privileges
- "user" === role that can log into the system; otherwise synonyms
- important params in config: logging, memory, checkpoints, planner; done.
- use SSDs; put the transaction log onto spinning disk if you want. use server-grade for graceful memory flush in case of power failure.
- change logging first to get the data you need:
- where to log? syslog, or, use CSV format to files.
log_min_duration_statement = 250ms
<<--- important!for finding slow thingslog_lock_waits = on
<-- if anything's waiting for locks, log
memory configs:
- dont' run other servers on the same thing(?)
shared_buffers
-- best == 8GB >= 32GB, otherwise 25% total system.- look for
SHMALL
andSHMMAX
in the syslogs in the kernel -- tweak kernel params: -- calculate shared_memory in bytes + 20%. huge decimal number. --sudo sysctl -w kernel.shmmax
= (value) --sudo sysctl -w kernel.shmmall
= (value) / 4096
work_mem
:
- start low: 32-64MB.
- look for lines about "temp files," to see if there's needing to be more temp memory.
- set to 2 - 3x largest temp file you see. (why bigger? more efficient on disk, need more actual hot RAM).
- dont' exceed 5-10% of systems RAM, because it's the amount of memory per planner node.
- can configure per-session
maintenance_work_mem
- 10% of sysram up to 1GB
- if vaccuum problems, maybe need more.
effective_cache_size
- set to amount of file system cache available.
- no idea what to set? set it to 50% system RAM
- this isn't allocated, it's a hint to the planner to decide how much RAM is available.
- random_page_cost && work_mem === biggest performance gains
- prefer xfs or ext4 on linux.
a complete flush or dirty buffers to disk. two params control this:
- number of WAL segments written to disk,
- whenever a timeout occurs.
tuning these parameters:
- wal_buffers # 16mb
- checkpoint_completion_target # 0.9
- checkopoint_timeout # 10m-30m # longer to start, less IO while operating
- checkpoint_segments # 32
monitoring:
- look for checkpoint entries in log,
- happening how often? more than checkpoint_timeout? if so, it's exhausting the WAL segments more quickly, so bump up the checkpoint segments until they're less frequent than the timeout
effective_io_concurrency
: set to number of IO channels, otherwise ignore; if SSD with 32 channels, set to 32, etc. random_page_cost - 3.0 is typical RAID10, 2.0 for SAN, 1.1 for AMazone EBS. Radio between time to grab random disk page vs. sequential. SSD ~ 1.5 range. controls index v. sequential search. lower#use index.
fsync = on
. never change. this controls if postgres will flush then wait for the result.synchronous_commit = on
; you can turn this off (won't corrupt DB if fail);
- continuous stream of comitteed database modifications, broken into 16mb segments
- starts with DB cluster creation, lasts forever
- checkpoints mean "last consistent state", segements before that WAL can be thrown away
- PUT IT ON ITS OWN FILE SYSTEM because it's append-only, basically. own set of disks, etc. it stays put.
- helps prevent locking, alternative to pessmisitic locking; allows higher performance
- writers don't block readers, readers get old version of row.
- writers block writers to the same row.
- multiple versions of row may be in DB; deleted/updated aren't immediately removed.
- VACUUM cleans tuples(rows) that aren't seen by anything/anyone/any transaction.
- post 8.0, autovacuum happens; good idea to do manual vacuum after bulk update/delete operations
- ANALYZE regenerates table stats to help make good guesses for how to execute queries; always do this after major database changes, such as restore from bacup.
- "share" vs "exclusive" locks exist.
- surprising locks: table-level locks when you add a new non-NULL column -- fix by creating it NULL then changing the column later
- read committed
- repeatable read
- serializable
- keep data in normal form, don't fear joins
- "fast/slow" rule: "fast data" changes a lot, "slow data" infrequently -- put these into separate tables. Slow data tends to be the parent of other data via foreign keys.
- some indexing strategies: -- index should be selective in the sense that when the index is used, it should return a small number of rows. -- partial index: an index that only applies to certain/specific conditions (along the lines of "index where(clause)") -- index should be frequently used.
- drop unused indexes. create indexes on the basis of real-life needs, and look for sequential scans
- built-in views to check indexes: -- pg_stat_user_tables -- how many times a sequential scan has been done, -- pg_stat_user_indexes -- how often an index has been used.
- SELECT COUNT(*) from myHugeTable is implemented as a full table scan. Don't do it. pg_stat_activity has an approximate, but try and avoid it. It's not a fast-performant thing on Postgres.
- taming autovacuum -- you can cut down the number of workers, making it run more frequently, etc. other sections in the configuration file.
- bulk loading: use COPY, not INSERT.
- EXPLAIN or EXPLAIN ANALYZE -- gets the query plan
- http://explain.depesz.com
- estimate vs. actual rows return means planner's confused
- nested loops often mean joins that you can't use an index for
- pg_stat_activity -- is it waiting on a query? etc
- tail -f logs
- pg_locks, in connection with pg_stat_activity.
- cloud hosting has terrible IO; since DBs are IO bound, you want to get as much RAM as you can (up to 2x DB size), CPU capacity isn't that important as RAM; always replicate.
- store configurations in VCS
- our-own-hardware:
-- get SSDs, otherwise SAS drives.
-- RAID10; -- put pg_xlog on its own volume; -- move pg_stat_tmp to a RAMdisk if you want to (transient data, write intensive)
if you have little SSD, put your busy indexes / tables onto it.
- nagios: disk, cpu, mem, (if used) replication log
- "checkpostgres.pl" from Bucardo.org <- use
- pgAdmin3 for management, handy
- log analysis: pgFouine (Traditional, not maintained much); pgbadger (new, active).