SPJU - Select, Project, Join, Union queries
ID - Inclusion dependency
FD - Functional dependency
MVD - Multivalued Dependency
- Define a particular database in terms of its data types, structures and constraints
- Construct or Load the initial database contents on a secondary storage device
- Manipulating the database
- Retrieval: QUerying, generating reports
- Modification: Insertions, deletions and updates
- Accessing the database through Web applications
- Processing and Sharing by a set of concurrent users and apps - yet, keeping data valid and consistent
- Protection and security
- Maintaining the db over the lifetime of the db application
- Self-describing nature of database system (description called meta-data - stored in catalog itself)
- Data independence - don't need to know the implementation of the database to access the data
- Application insulated from how data is structured and stored
- change the order of tuples
- add or modify columns
- add or modify indexes
- Query does not change when physical structure changes
- Efficient access - queries are optimized
- Reduced application development time - declarative expression, don't need to indicate how to execute query
- Data integrity and security - some constraints enforced automatically
- Concurrent access, recovery from crashes - users behave as if using a single-user system
- Speed - even with huge data
Schema - description of a particular collection of data, using a given data model
Relation model of data - a table with rows and columns, has a schema describing columns or fields
- End users
- Database administrators
- database developers
-> All SPJU and SPJ queries are safe from infinite negation, because everything in output must have occurred in input (no elements are created)
Table Name: [List of attributes]
Unique attributes within a table and unique names of tables
- existential quantifiers ∃
- universal quantification 'for all' ∀ - ∀x F(x) = ¬∃x ¬F(x)
- conjunction ∧
- disjunction ∨ - F ∨ G = ¬(¬F ∧ ¬G)
- Bound Variable - occurring in ∃ or ∀
- Free variable - not bound
- Queries without free variables are called Boolean queries
- Safe Relational Calculus Query - if it always returns a finite result (e.g. Boolean query or SPJ/SPJU query)
- SELECT
- FROM Relation Variable
- WHERE
- AND, OR
- AS (e.g. SELECT Mother AS Parent)
- UNION
- INTERSECT
- EXCEPT
- IN, NOT IN, EXISTS
-
Projection π
-
Selection σ
-
Cartesian Product ×
-
Natural Join ⋈
-
Renaming ρ
-
Union ∪
-
Intersection ∩
-
Difference -
Properties of relational algebra operators
- size(π(R)) ≤ size(R)
- 0 ≤ size(σ(R)) ≤ size(R)
- size(R × S) = size(R) × size(S)
- 0 ≤ size(R ⋈ S) ≤ size(R) × size(S)
- WHERE A.x="asd" OR A.x="qwe"
- SELECT.. FROM.. WHERE
UNION
SELECT.. FROM.. WHERE - answer(y) :- A(x,y,z), x="asd"
answer(y) :- A(x,y,z), x="qwe" - Relational Algebra - ∪
Calclulus to Algebra translation
- Active domain of relation: the set of all constants that occur in it: ADOM(R1,...,Rn) = ADOM(R1) ∪...∪ ADOM(Rn)
- A safe query over relations R1,...,Rn cannot produce an element outside ADOM(R1,...,Rn)
- ∀ in calculus could be translated to M - (A×B - A⋈B) with appropriate projections
A relation R satisfies a functional dependency A->B where A and B are attributes, if for every two tuples t1 and t2 in R:
__π__A(t1) = __π__A(t2) implies __π__B(t1) = __π__B(t2)
Let K be a set of attributes of R and U the set of all attributes of R. Then K is a key if R satisfies functional dependency K->U
__π__K(t1) = __π__K(t2) implies t1 = t2
E.g.
Employee(EmplId, Name, Dept, Salary)
ReportsTo(Empl1, Empl2)
Expect both Empl1 and Empl2 to be found in Employee, hence:
ReportsTo[Empl1] ⊆ Employee[EmplId]
ReportsTo[Empl2] ⊆ Employee[EmplId]
CREATE TABLE Name (attribute1 type, ..., attributeN type)
char(n)
varchar(n)
bit(n) - fixed length bit string of exactly n bits
bit varying(n) - variable length bit string of up to n bits
int - signed 4 bytes
smallint - signed 2 bytes
real
float(s)
date - e.g. DATA '2013-04-22'
time - e.g. TIME '17:15:00'
timestamp - e.g. TIMESTAMP '2013-04-22 17:15:00'
character, etc.
INSERT INTO Name VALUES (...)
DROP TABLE Name
ALTER TABLE Name ADD COLUMN newColumnName type
ALTER TABLE Name DROP COLUMN columnName
CREATE TABLE Name (field type DEFAULT value)
e.g. CREATE TABLE Fck (U INT DEFAULT 0, B INT)
field CHAR(10) | field VARCHAR(10) |
---|---|
INSERT ... "xxx" | INSERT ... "xxx" |
LENGTH(field) = 10 | LENGTH(field) = 3 |
- not null
- primary key
field int not null primary key
OR
field int not null
primary key (field) - unique (field1, field2)
- foreign key (field1, field2) references OtherTable
- foreign key (field1, field2) references OtherTable on delete cascade
- foreign key (field1, field2) references OtherTable on delete set null
- SELCET DISTINCT
- SUM, COUNT - still requires DISTINCT
- AVG - average, keep duplicates
SELECT Director, AVG(CAST (Length AS REAL)) AS Avgl
FROM Movies
GROUP BY Director - MIN, MAX
- HAVING, e.g.:
SELECT Director, AVG(Length+0.0)
FROM Movies
GROUP BY Director
HAVING MAX(Length) > 120 - < or > ANY or ALL
- ORDER BY field (, field,..) (DESC)
- UNION, INTERSECT and EXCEPT eliminate duplicates
- UNION ALL, INTERSECT ALL and EXCEPT ALL to keep duplicates
- LIKE 'string%' or LIKE 'string_'
_ - stands for any letter
% - stands for any substring, including empty
Note: LIKE 'char_' may return empty because of trailing spaces - use VARCHAR or %
UPDATE table
SET field = value
WHERE condition
- Generally: R JOINT S ON c
- preserve all tuples and fill with nulls
- Relation1 NATURAL LEFT OUTER JOIN Relation2 - keep non-matching tuples from R1
- Relation1 NATURAL RIGHT OUTER JOIN Relation2 - keep non-matching tuples from R2
- Relation1 NATURAL FULL OUTER JOIN Relation2 - keep all
- CREATE VIEW Name (field1, field2, ...) AS [query]
- DROP VIEW Name
- Same as view, but temporary
- Entity - an object that is involved in the enterprise
- Entity type - set of similar objects (e.g. students, courses, professors)
- Attribute - describes on aspect of an entity type (e.g. name, address, Id)
- Domain - possible values (e.g. {Informatics, EE, Joint})
- Key - minimum set of attributes that uniquely identifies an entity
- Entity Schema - Type name, attributes, key
IsA - Attributes of supertype apply to subtype
- Can create more concise and readable E-R diagram
common attributes need not be repeated
attributes of sibling types can be different
- FDs - X -> Y
- Keys - X -> U, where U is the set of all attributes in a relation
- Candidate keys - a minimal key X, no subset of X is key
- Inclusion dependency - R[A1, .., An] ⊆ S[B1, .., Bn] (unary if n=1)
- Foreign key - R[A1, .., An] ⊆ S[B1, .., Bn] and {B1, .., Bn} is a key
- no update anomalies
- no redundancies
- no information loss
Functional dependencies X -> Y where X is not a key
A relation is in BCNF if for every nontrivial FD X -> Y, X is a key.
A database is in BCNF if every relation in it is in BCNF.
Given a set (U, F) of attributes U and FDs F, a decomposition is:
(U1, F1), .., (Un, Fn)
where Ui ⊆ U and Fi is a set of FDs over Ui.
BCNF decomposition - if each (Ui, Fi) is in BCNF.
Can only be produced in the presence of MVDs
4NF implies BCNF
Any BCNF schema that has one key that consists of single attribute, is in 4NF.
- (U, F) is in 3NF if for any FD X -> A, where A ∉ X is an attribute F ⊢ X -> A
implies that one of the following is true:
- either X is a key, or
- A is prime
- σc1(σc2(E)) = σc1∧c2(E)
- πA1,..,An(σc(E)) = σc(πA1,..,An(E))
- σc(E1 ⋈ E2) = σc(E1) ⋈ E2 (if c only involves attributes of E1)
- σc(E1 ⋈ E2) = σc(E1) ⋈ σc(E2)
- Order matters - in WHERE use more limiting conditions first, e.g.
WHERE grade=’A’
AND sex=’female’ - Use a>x OR a<x instead of a<>x
- Provide more join information (e.g. all equal fields in >3 relations)
- Avoid UNION if OR is sufficient
- Use JOIN instead of nested queries
ACID properties
- Atomicity - either all operations of a transaction are reflected properly in the db, or none are.
- Consistency - execution of a transaction in isolation preserves the consistency of the db.
- Isolation - for any two transactions T and T', it appears to T that either T' finished before or started execution after T.
- Durability - after a transaction completes successfully, the changes it has made persist.
Swapping a pair of operations in a schedule is allowed when:
- they refer to different data items
- they refer to the same data item and are both read operations
A schedule is called conflict serializable if it can be transformed into a serial schedule by a sequence of such swap operations
With read and write
An edge from T to T' if:
- T executes write(X) before T' executes read(X)
- T executes read(X) before T' executes write(X)
- T executes write(X) before T' executes write(X)
A schedule S is conflict serializable iff its precedence graph contains no cycles.
A topological order of a precedence graph gives a serial schedule.
With lock and unlock
An edge from Ti to Tj when operation Ti:unlock(A) and a consecutive Tj:lock(A)
- A protocol which guarantees conflict-serializable schedule
- Two phases:
- Growing phase - request new locks, but not release locks
- Shrinking phase - release locks, but not request locks
- Prevention - low data utilization
- Detection - if T2 request a resource that T1 holds:
- let T2 wait
- roll back T1 and grant the lock to T2
- roll back T2
Decision based on timestamps - older transaction gets it.
Starvation may occur - as some transaction are rolled back constantly
Wait-fore graph
- There is an edge from T to T' if T' waits for T to release a lock.
- There is a lock if there is a cycle in the graph
- Solution: identify a minimal set of transactions such that rolling them back (removing from graph) would result in a cycle-free graph.
- Tags and text
- User-defined structure
- Tags come in pairs - must be properly nested
- Has only one "basic" type: PCDATA (Parsed Character DATA)
- XML elements
- XPath
- XSLT
- XQuery
- FIlter, select values - navigation, selection, extraction
- Merge, integrate values from multiple XML sources - joins, aggregation
- Transform XML values from one schema to another - XML construction
<!ELEMENT node_name (regex)>
<!ELEMENT node_name attribute_name TYPE #required>
<!ELEMENT leaf_name (#PCDATA)>
E.g.:
<!ELEMENT book (title, (author+ | editor+), publisher?, price?) >
<!ATTLIST book year CDATA #required >
<!ELEMENT publisher (#PCDATA) >
- Downward traversal
- lib/book -- parent/child
- lib//last -- parent descendant
- lib/book/* -- wild card
- lib/book/@year -- attribute
- //book/@* -- attribute wild card
- book/(editor | author)
- Filters (qualifiers)
- //book[price]/title -- titles of books with price
- //book[@year > 1991]/title -- titles of books published after 1991
- //book[title and author and not(price)]/title
- //book[author/first = "Neil"]/title -- titles of books with an author whose fist name is Neil
- //book[editor | author]/title -- titles of books with either an author or editor
- Upward traversal
- //author[../title="Databases"]/last == //book[title="Databases"]/author/last
- ancestor :: book[//last="Gaiman"] -- find book ancestors with "Gaiman" as its last descendant
- Sideways
- following-sibling :: book[//last="Gaiman"] -- find the next book written by Gaiman
- preceding-sibling :: book[//last="Gaiman"] -- find the previous book written by Gaiman
- I'm not revising that shite, unless it shows up in a past paper :/
- <!ELEMENT E P>
- P: EMPTY|ANY|#PCDATA|E'|P1,P2|(P1|P2)|P?|P+|P*
- Recursive definition, e.g.:
<!ELEMENT node (leaf| (node,node))>
<!ELEMENT leaf (#PCDATA)>
- <!ATTLIST element_name (attribute-name attribute-type default-declaration)+>
- attribute-type:
CDATA -- text
ID -- element's unique ID
IDREF -- value must be some other element's value already in document
IDREFS -- a set of IDREF - default-declaration:
#required - a value must be provided
#implied - a value is optional
- attribute-type:
- ID unique within entire document, while a key needs only to uniquely identify a tuple within a relation
- IDREF untyped: one has no control over what it points to
- Keys in relations could be multi-valued, while XML IDs must be single-valued
- a relation may have multiple keys, while an element can have at most one ID (primary)
- ID/IDREF can only be defined in a DTD, while XML data may not come with a DTD/schema
- ID/IDREF, even relational keys / foreign keys, fail to capture semantics of hierarchical data