Skip to content

Instantly share code, notes, and snippets.

@koshov
Created April 23, 2013 10:10
Show Gist options
  • Save koshov/5442374 to your computer and use it in GitHub Desktop.
Save koshov/5442374 to your computer and use it in GitHub Desktop.
DBS - University of Edinburgh revision notes

SPJU - Select, Project, Join, Union queries
ID - Inclusion dependency
FD - Functional dependency
MVD - Multivalued Dependency

Overview

Typical DBMS functionality

  • Define a particular database in terms of its data types, structures and constraints
  • Construct or Load the initial database contents on a secondary storage device
  • Manipulating the database
    • Retrieval: QUerying, generating reports
    • Modification: Insertions, deletions and updates
    • Accessing the database through Web applications
  • Processing and Sharing by a set of concurrent users and apps - yet, keeping data valid and consistent
  • Protection and security
  • Maintaining the db over the lifetime of the db application

Why use DBMS

  • Self-describing nature of database system (description called meta-data - stored in catalog itself)
  • Data independence - don't need to know the implementation of the database to access the data
  • Application insulated from how data is structured and stored
    • change the order of tuples
    • add or modify columns
    • add or modify indexes
  • Query does not change when physical structure changes
  • Efficient access - queries are optimized
  • Reduced application development time - declarative expression, don't need to indicate how to execute query
  • Data integrity and security - some constraints enforced automatically
  • Concurrent access, recovery from crashes - users behave as if using a single-user system
  • Speed - even with huge data

Data Models

Schema - description of a particular collection of data, using a given data model
Relation model of data - a table with rows and columns, has a schema describing columns or fields

Database Users

  • End users
  • Database administrators
  • database developers

Relation Model

-> All SPJU and SPJ queries are safe from infinite negation, because everything in output must have occurred in input (no elements are created)

Table Name: [List of attributes]
Unique attributes within a table and unique names of tables

Relational Calculus - Rule-based queries

  • existential quantifiers
  • universal quantification 'for all' - ∀x F(x) = ¬∃x ¬F(x)
  • conjunction
  • disjunction - F ∨ G = ¬(¬F ∧ ¬G)
  • Bound Variable - occurring in ∃ or ∀
  • Free variable - not bound
  • Queries without free variables are called Boolean queries
  • Safe Relational Calculus Query - if it always returns a finite result (e.g. Boolean query or SPJ/SPJU query)

SQL

  • SELECT
  • FROM Relation Variable
  • WHERE
  • AND, OR
  • AS (e.g. SELECT Mother AS Parent)
  • UNION
  • INTERSECT
  • EXCEPT
  • IN, NOT IN, EXISTS

Relational Algebra

  • Projection π

  • Selection σ

  • Cartesian Product ×

  • Natural Join

  • Renaming ρ

  • Union

  • Intersection

  • Difference -

    Properties of relational algebra operators

    • size(π(R)) ≤ size(R)
    • 0 ≤ size(σ(R)) ≤ size(R)
    • size(R × S) = size(R) × size(S)
    • 0 ≤ size(R ⋈ S) ≤ size(R) × size(S)

Disjunctions OR

  • WHERE A.x="asd" OR A.x="qwe"
  • SELECT.. FROM.. WHERE
    UNION
    SELECT.. FROM.. WHERE
  • answer(y) :- A(x,y,z), x="asd"
    answer(y) :- A(x,y,z), x="qwe"
  • Relational Algebra - ∪

Calclulus to Algebra translation

  • Active domain of relation: the set of all constants that occur in it: ADOM(R1,...,Rn) = ADOM(R1) ∪...∪ ADOM(Rn)
  • A safe query over relations R1,...,Rn cannot produce an element outside ADOM(R1,...,Rn)
  • ∀ in calculus could be translated to M - (A×B - A⋈B) with appropriate projections

Functional Dependencies

A relation R satisfies a functional dependency A->B where A and B are attributes, if for every two tuples t1 and t2 in R:

__π__A(t1) = __π__A(t2) implies __π__B(t1) = __π__B(t2)

Let K be a set of attributes of R and U the set of all attributes of R. Then K is a key if R satisfies functional dependency K->U

__π__K(t1) = __π__K(t2) implies t1 = t2

Inclusion Dependencies - foreign keys

E.g.
Employee(EmplId, Name, Dept, Salary)
ReportsTo(Empl1, Empl2)
Expect both Empl1 and Empl2 to be found in Employee, hence:
ReportsTo[Empl1] ⊆ Employee[EmplId]
ReportsTo[Empl2] ⊆ Employee[EmplId]

SQL

Create table

CREATE TABLE Name (attribute1 type, ..., attributeN type)

Types

char(n)
varchar(n)
bit(n) - fixed length bit string of exactly n bits
bit varying(n) - variable length bit string of up to n bits
int - signed 4 bytes
smallint - signed 2 bytes
real
float(s)
date - e.g. DATA '2013-04-22'
time - e.g. TIME '17:15:00'
timestamp - e.g. TIMESTAMP '2013-04-22 17:15:00'
character, etc.

Populating tables

INSERT INTO Name VALUES (...)

Dropping tables

DROP TABLE Name

Changing tables

ALTER TABLE Name ADD COLUMN newColumnName type
ALTER TABLE Name DROP COLUMN columnName

Default values

CREATE TABLE Name (field type DEFAULT value)
e.g. CREATE TABLE Fck (U INT DEFAULT 0, B INT)

Fixed and variable length

field CHAR(10) field VARCHAR(10)
INSERT ... "xxx" INSERT ... "xxx"
LENGTH(field) = 10 LENGTH(field) = 3

Constraints on fields

  • not null
  • primary key
    field int not null primary key
    OR
    field int not null
    primary key (field)
  • unique (field1, field2)
  • foreign key (field1, field2) references OtherTable
  • foreign key (field1, field2) references OtherTable on delete cascade
  • foreign key (field1, field2) references OtherTable on delete set null

SELECT

  • SELCET DISTINCT
  • SUM, COUNT - still requires DISTINCT
  • AVG - average, keep duplicates
    SELECT Director, AVG(CAST (Length AS REAL)) AS Avgl
    FROM Movies
    GROUP BY Director
  • MIN, MAX
  • HAVING, e.g.:
    SELECT Director, AVG(Length+0.0)
    FROM Movies
    GROUP BY Director
    HAVING MAX(Length) > 120
  • < or > ANY or ALL
  • ORDER BY field (, field,..) (DESC)

WHERE

  • UNION, INTERSECT and EXCEPT eliminate duplicates
  • UNION ALL, INTERSECT ALL and EXCEPT ALL to keep duplicates
  • LIKE 'string%' or LIKE 'string_'
    _ - stands for any letter
    % - stands for any substring, including empty
    Note: LIKE 'char_' may return empty because of trailing spaces - use VARCHAR or %

UPDATE

UPDATE table
SET field = value
WHERE condition

NATURAL JOIN

  • Generally: R JOINT S ON c
  • preserve all tuples and fill with nulls
    • Relation1 NATURAL LEFT OUTER JOIN Relation2 - keep non-matching tuples from R1
    • Relation1 NATURAL RIGHT OUTER JOIN Relation2 - keep non-matching tuples from R2
    • Relation1 NATURAL FULL OUTER JOIN Relation2 - keep all

VIEWS

  • CREATE VIEW Name (field1, field2, ...) AS [query]
  • DROP VIEW Name

WITH

  • Same as view, but temporary

Design

Entities

  • Entity - an object that is involved in the enterprise
  • Entity type - set of similar objects (e.g. students, courses, professors)
  • Attribute - describes on aspect of an entity type (e.g. name, address, Id)
    • Domain - possible values (e.g. {Informatics, EE, Joint})
    • Key - minimum set of attributes that uniquely identifies an entity
    • Entity Schema - Type name, attributes, key

E-R diagrams

E-R diagram E-R diagram further

IsA - Attributes of supertype apply to subtype

  • Can create more concise and readable E-R diagram
    common attributes need not be repeated
    attributes of sibling types can be different

E-R diagram further

Participation E-R diagram further

Constraints and Design

  • FDs - X -> Y
  • Keys - X -> U, where U is the set of all attributes in a relation
  • Candidate keys - a minimal key X, no subset of X is key
  • Inclusion dependency - R[A1, .., An] ⊆ S[B1, .., Bn] (unary if n=1)
  • Foreign key - R[A1, .., An] ⊆ S[B1, .., Bn] and {B1, .., Bn} is a key

Good Properties of DB schemas

  • no update anomalies
  • no redundancies
  • no information loss

What causes update anomalies

Functional dependencies X -> Y where X is not a key

BCNF

A relation is in BCNF if for every nontrivial FD X -> Y, X is a key.

A database is in BCNF if every relation in it is in BCNF.

Decomposition

Given a set (U, F) of attributes U and FDs F, a decomposition is:

(U1, F1), .., (Un, Fn)

where Ui ⊆ U and Fi is a set of FDs over Ui.
BCNF decomposition - if each (Ui, Fi) is in BCNF.

4NF - 4th Normal Form

Can only be produced in the presence of MVDs
4NF implies BCNF
Any BCNF schema that has one key that consists of single attribute, is in 4NF.

3NF - 3rd Normal Form

  • (U, F) is in 3NF if for any FD X -> A, where A ∉ X is an attribute F ⊢ X -> A implies that one of the following is true:
    • either X is a key, or
    • A is prime

Query Processing and Optimization

  • σc1(σc2(E)) = σc1∧c2(E)
  • πA1,..,An(σc(E)) = σc(πA1,..,An(E))
  • σc(E1 ⋈ E2) = σc(E1) ⋈ E2 (if c only involves attributes of E1)
  • σc(E1 ⋈ E2) = σc(E1) ⋈ σc(E2)

Tailored optimization

  • Order matters - in WHERE use more limiting conditions first, e.g.
    WHERE grade=’A’
    AND sex=’female’
  • Use a>x OR a<x instead of a<>x
  • Provide more join information (e.g. all equal fields in >3 relations)
  • Avoid UNION if OR is sufficient
  • Use JOIN instead of nested queries

Transaction and Concurrency Control

ACID properties

  • Atomicity - either all operations of a transaction are reflected properly in the db, or none are.
  • Consistency - execution of a transaction in isolation preserves the consistency of the db.
  • Isolation - for any two transactions T and T', it appears to T that either T' finished before or started execution after T.
  • Durability - after a transaction completes successfully, the changes it has made persist.

States of transaction

Swapping a pair of operations in a schedule is allowed when:

  • they refer to different data items
  • they refer to the same data item and are both read operations

A schedule is called conflict serializable if it can be transformed into a serial schedule by a sequence of such swap operations

Precedence Graphs__

With read and write
An edge from T to T' if:

  • T executes write(X) before T' executes read(X)
  • T executes read(X) before T' executes write(X)
  • T executes write(X) before T' executes write(X)

A schedule S is conflict serializable iff its precedence graph contains no cycles.
A topological order of a precedence graph gives a serial schedule.

With lock and unlock
An edge from Ti to Tj when operation Ti:unlock(A) and a consecutive Tj:lock(A)

Two-phase locking

  • A protocol which guarantees conflict-serializable schedule
  • Two phases:
    • Growing phase - request new locks, but not release locks
    • Shrinking phase - release locks, but not request locks

Deadlocks

  • Prevention - low data utilization
  • Detection - if T2 request a resource that T1 holds:
    • let T2 wait
    • roll back T1 and grant the lock to T2
    • roll back T2
      Decision based on timestamps - older transaction gets it.
      Starvation may occur - as some transaction are rolled back constantly

Wait-fore graph

  • There is an edge from T to T' if T' waits for T to release a lock.
  • There is a lock if there is a cycle in the graph
  • Solution: identify a minimal set of transactions such that rolling them back (removing from graph) would result in a cycle-free graph.

XML - eXtensive Markup Language

  • Tags and text
  • User-defined structure
  • Tags come in pairs - must be properly nested
  • Has only one "basic" type: PCDATA (Parsed Character DATA)
  • XML elements
    • Element - segment between an start and corresponding end tag
    • Subelement - -the relation between an element and its component elements
    • Attribute - unique per tag, cannot be nested, e.g.

Query Languages

  • XPath
  • XSLT
  • XQuery

Tasks

  • FIlter, select values - navigation, selection, extraction
  • Merge, integrate values from multiple XML sources - joins, aggregation
  • Transform XML values from one schema to another - XML construction

DTD

<!ELEMENT node_name (regex)>
<!ELEMENT node_name attribute_name TYPE #required>
<!ELEMENT leaf_name (#PCDATA)>
E.g.:
<!ELEMENT book (title, (author+ | editor+), publisher?, price?) >
<!ATTLIST book year CDATA #required >
<!ELEMENT publisher (#PCDATA) >

XPath

  • Downward traversal
    • lib/book -- parent/child
    • lib//last -- parent descendant
    • lib/book/* -- wild card
    • lib/book/@year -- attribute
    • //book/@* -- attribute wild card
    • book/(editor | author)
  • Filters (qualifiers)
    • //book[price]/title -- titles of books with price
    • //book[@year > 1991]/title -- titles of books published after 1991
    • //book[title and author and not(price)]/title
    • //book[author/first = "Neil"]/title -- titles of books with an author whose fist name is Neil
    • //book[editor | author]/title -- titles of books with either an author or editor
  • Upward traversal
    • //author[../title="Databases"]/last == //book[title="Databases"]/author/last
    • ancestor :: book[//last="Gaiman"] -- find book ancestors with "Gaiman" as its last descendant
  • Sideways
    • following-sibling :: book[//last="Gaiman"] -- find the next book written by Gaiman
    • preceding-sibling :: book[//last="Gaiman"] -- find the previous book written by Gaiman

XQuery

  • I'm not revising that shite, unless it shows up in a past paper :/

DTD

  • <!ELEMENT E P>
    • P: EMPTY|ANY|#PCDATA|E'|P1,P2|(P1|P2)|P?|P+|P*
    • Recursive definition, e.g.:
      <!ELEMENT node (leaf| (node,node))>
      <!ELEMENT leaf (#PCDATA)>
  • <!ATTLIST element_name (attribute-name attribute-type default-declaration)+>
    • attribute-type:
      CDATA -- text
      ID -- element's unique ID
      IDREF -- value must be some other element's value already in document
      IDREFS -- a set of IDREF
    • default-declaration:
      #required - a value must be provided
      #implied - a value is optional

Limitations

  • ID unique within entire document, while a key needs only to uniquely identify a tuple within a relation
  • IDREF untyped: one has no control over what it points to
  • Keys in relations could be multi-valued, while XML IDs must be single-valued
  • a relation may have multiple keys, while an element can have at most one ID (primary)
  • ID/IDREF can only be defined in a DTD, while XML data may not come with a DTD/schema
  • ID/IDREF, even relational keys / foreign keys, fail to capture semantics of hierarchical data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment