Data Denormalization

Data DeNormalization

What is Data Normalization?

What about data normal forms?

Avoid data duplication to reduce data redundancy
- Less data
Avoid multiple updates - one source of truth
- Avoid data editing anomaly
A more representative data model

Sample

Employee	Phone	Floor	Room №
Alice	651435	1	20 b
Bob	4324234	3	30 a
John	534315	3	30 a

3NF

John	534315	2
----------+---------+---------+ ~~---------~~----------+-------
Employee	Phone	Room ID	Room ID	Room №	Floor
----------+---------+---------+ ~~---------~~----------+-------
Alice	651435	1	1	20 b	1
Bob	4324234	2	2	30 a	3

Why denormalization?

Project on begin

`Employee`		`Room`
id		->	id
name			floor
phone			room№
room_id	–

Project on prod

`Salary`		`Room`
id		id	<----+
employee	----+	floor
date			room№
amount

DeparmentEmployee

->	id		->	id
	name			name
	head	-+	phone
	room_id	-+

+---------------------| department |

Project after 1 year

	`Project`		`Customer`
	-----------		------------
╔═>	id	╔═>	id
║	customer	═════╝	name
║	name		phone
║	logo
║
║	`Member`
╚══	project
	role
	employee	══════╗
║
	`Salary`	║	`Room`
	id	║	id	<══╗
	employee	════╣	floor	║
	date	║	room№	║
	amount	║ ║
║ ║
	`Deparment`	║	`Employee`	║
╔═>	id	╠═>	id	║
║	name	║	name	║
║	head	═╝	phone	║
║	room_id	═╝
╚═════════════════════	department

New feature

~~--------------------------------------------~~

Search: __________________
John +1(991)33434 - Development Room 42
Year Salary: 100$ Lead: Max
Projects: QA in Cars, Dev in Toys
Ivan +1(991)4325355 - Management Room 36
Year Salary: 80$ Lead: Ken
Projects: QA in Cars, Manager in Food

~~--------------------------------------------~~

SQL

-- pseudo sql =)
select
  employee
  , sum(salary)
  , array_agg(member)
  ...
from employee
join department on ..
join employee   on ..
join salary     on ..
join member     on ..
join project    on ..
group by employee
where
  employee      ilike $search
  or member     ilike $search
  or project    ilike $search
  or department ilike $search
  ...

Alter `Employee`

`Employee`
id
name
phone
room_id
department
=salary
=boss
=member[]
=fts

Add subs/hooks

Add subs/hooks for all related tables

Salary
Deparment
Employee
Project
Member

Goals of denormalization

Improve search
- Avoid joins and aggregations
- Avoid multiple or
Simplify complex data relations and structure

How to denormalize?

Edit existing entity

`Employee`
id
name
phone
room_id
department
=salary
=boss
=member[]
=fts

Real case

Normal form data:

~~--------------~~ ~~------------------~~ ~~------------------------~~ ~~-----------------~~

Practitioner	<-+-	PractitionerRole	->	Organization	->	Organization
Иванов И.И.			Врач стоматолог		Стоматологическое отд.		Гор. стом. пол.

~~--------------~~ | ~~------------------~~ ~~------------------------~~ ~~-----------------~~

~~------------------~~ ~~------------------------~~ ~~-----------------~~

PractitionerRole

Organization

Терапевт

Терапевт. отделение

ГКБ 1

~~------------------~~ ~~------------------------~~ ~~-----------------~~

After denormalization:

~~-------------------------~~

Practitioner

Иванов И.И.

=Врач стоматолог

=Стоматологическое отд.

=Гор. стом. пол.

=Терапевт

=Терапевт. отделение

=ГКБ 1

~~-------------------------~~

Profit

Super easy

Problems

Standarts conflict
Integration
Conflict existing ui forms/controllers
Business Logic
Heavy write load

MVCC

~~----------~~

8kb page
1 -----
2 -----
3 -----

~~----------~~

Update row 1 and 2

~~----------~~

8kb page
-1 ------
-2 ------
3 ------
1*------

~~----------~~

8kb page
2*------

~~----------~~

TOAST - The Oversized-Attribute Storage Technique

~~----------~~

8kb page
1 -----
2 -----
3 -----

~~----------~~

Update row 1 insert too large object

~~----------~~

8kb page
-1 -----
2 -----
1*-------	----------------

~~----------~~

~~----------~~ ~~--------------~~

8kb page		Toast
----------		--------------
-1 -----		1.1* -------
2 -----		1.2* -------
1* toast

~~----------~~ ~~--------------~~

Create new entity

`EmployeeInfo`
employee_id
=phone
=salary
=boss
=member[]
=fts

Real case

~~-----------~~ ~~---------------~~ ~~---------~~ ~~---------~~ ~~--------~~ ~~--------------~~ ~~-----------------~~

Condition

EpisodeOfCare

Patient

Binding

Sector

Practitioner

PractitonerRole

~~-----------~~ ~~---------------~~ ~~---------~~ ~~---------~~ ~~--------~~ ~~--------------~~ ~~-----------------~~

Separate administrative and clinical parts

Administrative part

~~---------~~ ~~---------~~ ~~--------~~ ~~--------------~~ ~~-----------------~~

Patient

Binding

Sector

Practitioner

PractitonerRole

~~---------~~ ~~---------~~ ~~--------~~ ~~--------------~~ ~~-----------------~~

`PatientBinding`
patient_id
sector_id
practitionerrole_id

Clinical part

~~---------~~ ~~---------------~~ ~~-----------~~

Patient

EpisodeOfCare

Condition

~~---------~~ ~~---------------~~ ~~-----------~~

`PatientCondition`
patient_id
condition_type

Profits

Less and easiera update logic
No update logic
Explicit denormalization
Less toast risk

Bloat protection =)

-- pseudo sql
drop table if exists denormalization.patientbinding;
CREATE TABLE if not exists denormalization.patientbinding (like public.patientbinding);

insert into denormalization.patientbinding (patient_id, sector_id, practitionerrole)
select patient.id ....
from patient
join binding on ..
join sector on ..
join practitioner on ..
join practitionerrole on ..;

create index patientbinding_patient_id on denormalization.patientbinding (patient_id);
create index patientbinding_sector_id on denormalization.patientbinding (sector_id);

BEGIN;
-- Drop target table
drop table public.patientbinding;

-- Replace target table on public schema wiht temp table
alter table denormalization.patientbinding set schema public;

COMMIT;

Conclusion

Use denormalization careful
Database maintenance
Try to decomposit your complex task

References

PG bloating - https://www.youtube.com/watch?v=-GNHIHEHDmQ&t=2361s&ab_channel=HighLoadChannel PG autovacuum -https://www.youtube.com/watch?v=TDWC66qzxCs&ab_channel=HighLoadChannel

Aitem/denormalization.org

Data Denormalization

Data DeNormalization

What is Data Normalization?

Sample

Why denormalization?

Project on begin

Project on prod

Project after 1 year

New feature

SQL

Alter Employee

Add subs/hooks

Goals of denormalization

How to denormalize?

Edit existing entity

Real case

Profit

Problems

MVCC

TOAST - The Oversized-Attribute Storage Technique

Create new entity

Real case

Separate administrative and clinical parts

Administrative part

Clinical part

Profits

Bloat protection =)

Conclusion

References

Alter `Employee`