Skip to content

Instantly share code, notes, and snippets.

@corporatepiyush
Last active August 27, 2024 06:04
Show Gist options
  • Save corporatepiyush/e2fe36e19172e5742b6c50b8bb219b8a to your computer and use it in GitHub Desktop.
Save corporatepiyush/e2fe36e19172e5742b6c50b8bb219b8a to your computer and use it in GitHub Desktop.
PostgreSQL Best Practices

Technical Guide: PostgreSQL and Stored Procedure Best Practices

Schema and Data Management

  1. Schema Organization: Description: Organize your database objects into logical schemas to improve manageability and security.

    CREATE SCHEMA auth;
    CREATE SCHEMA billing;
    
    -- Grant usage to roles
    GRANT USAGE ON SCHEMA auth TO web_user;

    Caution: Be aware of search_path to avoid schema name conflicts.

  2. Indexing: Description: Proper indexing is crucial for query performance. PostgreSQL supports several types of indexes, each suited for different scenarios.

    a. B-tree Index (default): Best for equality and range queries on sortable data.

    CREATE INDEX idx_users_email ON users(email);

    b. Hash Index: Optimized for equality comparisons, not for range queries.

    CREATE INDEX idx_users_email_hash ON users USING HASH (email);

    c. GiST (Generalized Search Tree): Useful for full-text search, geospatial data, and custom data types.

    CREATE INDEX idx_articles_fts ON articles USING GIST (to_tsvector('english', body));

    d. GIN (Generalized Inverted Index): Efficient for multi-value columns like arrays and full-text search.

    CREATE INDEX idx_products_tags ON products USING GIN (tags);

    e. BRIN (Block Range INdex): Useful for very large tables with natural ordering.

    CREATE INDEX idx_logs_timestamp_brin ON logs USING BRIN (timestamp);

    f. Partial Index: Index only a subset of a table.

    CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = 'pending';

    Monitor index usage:

    SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
    FROM pg_stat_user_indexes;

    Caution: Over-indexing can slow down write operations. Regularly review and remove unused indexes.

  3. Constraints: Description: Constraints ensure data integrity at the database level.

    ALTER TABLE users ADD CONSTRAINT chk_email_format 
    CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}$');
    
    ALTER TABLE orders ADD CONSTRAINT fk_user_id 
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT;

    Error Handling: Catch constraint violations in application code or stored procedures.

  4. Transactions: Description: Use transactions to ensure data consistency across multiple operations.

    BEGIN;
    -- Operations here
    COMMIT;

    Error Handling:

    BEGIN;
    SAVEPOINT my_savepoint;
    
    -- Risky operations here
    
    EXCEPTION WHEN OTHERS THEN
      ROLLBACK TO my_savepoint;
      -- Handle error
    END;
  5. Naming Conventions: Description: Consistent naming conventions improve code readability and maintainability.

    CREATE TABLE t_users (
      id SERIAL PRIMARY KEY,
      c_email VARCHAR(255) NOT NULL,
      c_created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE FUNCTION f_get_user_by_email(p_email VARCHAR) RETURNS t_users AS $$
    -- Function body
    $$ LANGUAGE plpgsql;
  6. Prepared Statements: Description: Use prepared statements to improve performance and prevent SQL injection.

    In application code (e.g., Node.js with node-postgres):

    const query = {
      text: 'INSERT INTO users(name, email) VALUES($1, $2)',
      values: [userName, userEmail],
    }
    client.query(query)
  7. VACUUM and ANALYZE: Description: Regular maintenance is crucial for optimal performance.

    -- Automated vacuum (adjust these settings in postgresql.conf)
    autovacuum = on
    autovacuum_vacuum_threshold = 50
    autovacuum_analyze_threshold = 50
    
    -- Manual operation
    VACUUM (VERBOSE, ANALYZE) users;

    Caution: VACUUM FULL locks the table. Use with care on production databases.

  8. Data Types: Description: Choose appropriate data types to ensure data integrity and optimize storage.

    CREATE TABLE products (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      name VARCHAR(100) NOT NULL,
      price NUMERIC(10,2) NOT NULL,
      created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
      tags TEXT[]
    );
  9. Connection Pooling: Description: Use connection pooling to efficiently manage database connections.

    Using PgBouncer (in pgbouncer.ini):

    [databases]
    mydb = host=127.0.0.1 port=5432 dbname=mydb
    
    [pgbouncer]
    pool_mode = transaction
    max_client_conn = 1000
    default_pool_size = 20

Advanced Features

  1. Row-Level Security (RLS): Description: RLS allows you to restrict access to row data based on user attributes.

    CREATE POLICY user_policy ON users
    USING (id = current_user_id());
    
    ALTER TABLE users ENABLE ROW LEVEL SECURITY;

    Caution: Ensure all access to the table goes through RLS-aware connections.

  2. Partitioning: Description: Table partitioning can significantly improve query performance for very large tables.

    CREATE TABLE orders (
      id SERIAL,
      created_at TIMESTAMP NOT NULL,
      total NUMERIC(10,2) NOT NULL
    ) PARTITION BY RANGE (created_at);
    
    CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

    Error Handling: Handle cases where data doesn't fit into any partition.

  3. Materialized Views: Description: Materialized views can dramatically speed up complex queries by pre-computing results.

    CREATE MATERIALIZED VIEW mv_monthly_sales AS
    SELECT date_trunc('month', created_at) AS month, SUM(total) AS total_sales
    FROM orders
    GROUP BY 1
    WITH DATA;
    
    CREATE UNIQUE INDEX ON mv_monthly_sales (month);
    
    -- Refresh
    REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_sales;

    Caution: Be aware of the trade-off between data freshness and query performance.

  4. Optimistic Locking: Description: Implement optimistic locking to handle concurrent updates without using exclusive locks.

    CREATE TABLE products (
      id SERIAL PRIMARY KEY,
      name TEXT,
      price NUMERIC(10,2),
      version INTEGER NOT NULL DEFAULT 0
    );
    
    -- In application code or stored procedure
    UPDATE products 
    SET name = 'New Name', price = 19.99, version = version + 1
    WHERE id = 1 AND version = 0;
    
    -- Check if any row was updated

    Error Handling: Handle cases where the update fails due to concurrent modifications.

  5. Extensions: Description: PostgreSQL extensions provide additional functionality. Use them to extend database capabilities.

    CREATE EXTENSION pg_stat_statements;
    CREATE EXTENSION pgcrypto;

    Caution: Some extensions may have security implications. Review carefully before use.

  6. Error Handling in Stored Procedures: Description: Implement robust error handling in stored procedures to manage exceptions gracefully.

    CREATE FUNCTION process_order(order_id INTEGER) RETURNS VOID AS $$
    DECLARE
      v_user_id INTEGER;
    BEGIN
      -- Process order
    EXCEPTION
      WHEN no_data_found THEN
        RAISE EXCEPTION 'Order % not found', order_id;
      WHEN OTHERS THEN
        RAISE EXCEPTION 'Error processing order %: %', order_id, SQLERRM;
    END;
    $$ LANGUAGE plpgsql;
  7. Roles and Grants: Description: Implement a robust security model using roles and grants.

    CREATE ROLE read_only;
    GRANT CONNECT ON DATABASE mydb TO read_only;
    GRANT USAGE ON SCHEMA public TO read_only;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only;
    
    CREATE ROLE john_doe LOGIN;
    GRANT read_only TO john_doe;
  8. JSON/JSONB: Description: Use JSON/JSONB types for flexible schema designs and complex data structures.

    CREATE TABLE events (
      id SERIAL PRIMARY KEY,
      payload JSONB NOT NULL
    );
    
    CREATE INDEX idx_events_payload ON events USING GIN (payload);
    
    -- Query
    SELECT * FROM events 
    WHERE payload @> '{"type": "login", "status": "success"}';

    Caution: Complex JSON operations can be slower than operations on normalized data.

Stored Procedure Techniques

  1. Function Overloading: Description: Create multiple functions with the same name but different parameters for flexibility.

    CREATE FUNCTION get_user(id INTEGER) RETURNS users AS $$
    -- Function body
    $$ LANGUAGE plpgsql;
    
    CREATE FUNCTION get_user(email VARCHAR) RETURNS users AS $$
    -- Function body
    $$ LANGUAGE plpgsql;
  2. Input Validation: Description: Validate input parameters within stored procedures to ensure data integrity.

    CREATE FUNCTION create_user(p_email VARCHAR, p_name VARCHAR) RETURNS INTEGER AS $$
    BEGIN
      IF p_email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}$' THEN
        RAISE EXCEPTION 'Invalid email format';
      END IF;
      
      -- Rest of function
    END;
    $$ LANGUAGE plpgsql;
  3. Custom Types: Description: Use custom types to group related data and improve code readability.

    CREATE TYPE user_info AS (
      id INTEGER,
      email VARCHAR,
      name VARCHAR
    );
    
    CREATE FUNCTION get_user_info(p_id INTEGER) RETURNS user_info AS $$
    -- Function body
    $$ LANGUAGE plpgsql;
  4. PL/pgSQL Features: Description: Leverage advanced PL/pgSQL features for complex logic and data processing.

    CREATE FUNCTION process_orders() RETURNS VOID AS $$
    DECLARE
      r RECORD;
      v_total NUMERIC := 0;
    BEGIN
      FOR r IN SELECT * FROM orders WHERE status = 'pending' LOOP
        -- Process each order
        v_total := v_total + r.amount;
      END LOOP;
      
      RAISE NOTICE 'Processed orders with total amount: %', v_total;
    END;
    $$ LANGUAGE plpgsql;
  5. Logging in Procedures: Description: Implement logging within procedures for debugging and auditing.

    CREATE TABLE audit_log (
      id SERIAL PRIMARY KEY,
      action VARCHAR NOT NULL,
      timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE FUNCTION audit_action(p_action VARCHAR) RETURNS VOID AS $$
    BEGIN
      INSERT INTO audit_log (action) VALUES (p_action);
    END;
    $$ LANGUAGE plpgsql;
    
    -- Usage in another function
    PERFORM audit_action('User created');
  6. Idempotent Procedures: Description: Design procedures to be idempotent, allowing safe re-execution.

    CREATE FUNCTION ensure_user_exists(p_email VARCHAR, p_name VARCHAR) RETURNS INTEGER AS $$
    DECLARE
      v_user_id INTEGER;
    BEGIN
      SELECT id INTO v_user_id FROM users WHERE email = p_email;
      
      IF v_user_id IS NULL THEN
        INSERT INTO users (email, name) VALUES (p_email, p_name)
        RETURNING id INTO v_user_id;
      END IF;
      
      RETURN v_user_id;
    END;
    $$ LANGUAGE plpgsql;
  7. RETURNS TABLE: Description: Use RETURNS TABLE for functions that return multiple rows and columns.

    CREATE FUNCTION get_recent_orders(p_user_id INTEGER, p_limit INTEGER DEFAULT 10)
    RETURNS TABLE (order_id INTEGER, order_date TIMESTAMP, total NUMERIC) AS $$
    BEGIN
      RETURN QUERY
      SELECT id, created_at, total
      FROM orders
      WHERE user_id = p_user_id
      ORDER BY created_at DESC
      LIMIT p_limit;
    END;
    $$ LANGUAGE plpgsql;
  8. Transaction Management: Description: Implement fine-grained transaction control within procedures.

    CREATE FUNCTION transfer_funds(
      sender_id INTEGER, 
      recipient_id INTEGER, 
      amount NUMERIC
    ) RETURNS VOID AS $$
    DECLARE
      v_sender_balance NUMERIC;
    BEGIN
      -- Start a subtransaction
      SAVEPOINT my_savepoint;
      
      -- Deduct from sender
      UPDATE accounts SET balance = balance - amount 
      WHERE id = sender_id 
      RETURNING balance INTO v_sender_balance;
      
      IF v_sender_balance < 0 THEN
        ROLLBACK TO my_savepoint;
        RAISE EXCEPTION 'Insufficient funds';
      END IF;
      
      -- Add to recipient
      UPDATE accounts SET balance = balance + amount 
      WHERE id = recipient_id;
      
      -- If we get here, commit the subtransaction
      RELEASE SAVEPOINT my_savepoint;
    EXCEPTION
      WHEN OTHERS THEN
        ROLLBACK TO my_savepoint;
        RAISE;
    END;
    $$ LANGUAGE plpgsql;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment