Skip to content

Instantly share code, notes, and snippets.

@Aitem
Created February 4, 2021 08:10
Show Gist options
  • Save Aitem/44135fd19ae1c76fd5b260c265ee0196 to your computer and use it in GitHub Desktop.
Save Aitem/44135fd19ae1c76fd5b260c265ee0196 to your computer and use it in GitHub Desktop.
Data Denormalization meetup

Data Denormalization

Data DeNormalization

What is Data Normalization?

What about data normal forms?

  • Avoid data duplication to reduce data redundancy
    • Less data
  • Avoid multiple updates - one source of truth
    • Avoid data editing anomaly
  • A more representative data model

Sample

EmployeePhoneFloorRoom №
Alice651435120 b
Bob4324234330 a
John534315330 a

3NF

----------+---------+---------+ -------------------+-------
EmployeePhoneRoom IDRoom IDRoom №Floor
----------+---------+---------+ -------------------+-------
Alice6514351120 b1
Bob43242342230 a3
John5343152-------------------+-------

Why denormalization?

Project on begin

EmployeeRoom
id->id
namefloor
phoneroom№
room_id

Project on prod

SalaryRoom
idid<----+
employee----+floor
dateroom№
amount
DeparmentEmployee
->id->id
namename
head-+phone
room_id-+

+---------------------| department |

Project after 1 year

ProjectCustomer
-----------------------
╔═>id╔═>id
customer═════╝name
namephone
logo
Member
╚══project
role
employee══════╗
SalaryRoom
idid<══╗
employee════╣floor
dateroom№
amount║ ║
║ ║
DeparmentEmployee
╔═>id╠═>id
namename
head═╝phone
room_id═╝
╚═════════════════════department

New feature

--------------------------------------------

Search: __________________
John +1(991)33434 - Development Room 42
Year Salary: 100$ Lead: Max
Projects: QA in Cars, Dev in Toys
Ivan +1(991)4325355 - Management Room 36
Year Salary: 80$ Lead: Ken
Projects: QA in Cars, Manager in Food

--------------------------------------------

SQL

-- pseudo sql =)
select
  employee
  , sum(salary)
  , array_agg(member)
  ...
from employee
join department on ..
join employee   on ..
join salary     on ..
join member     on ..
join project    on ..
group by employee
where
  employee      ilike $search
  or member     ilike $search
  or project    ilike $search
  or department ilike $search
  ...

Alter Employee

Employee
id
name
phone
room_id
department
=salary
=boss
=member[]
=fts

Add subs/hooks

Add subs/hooks for all related tables

  • Salary
  • Deparment
  • Employee
  • Project
  • Member

Goals of denormalization

  • Improve search
    • Avoid joins and aggregations
    • Avoid multiple or
  • Simplify complex data relations and structure

How to denormalize?

Edit existing entity

Employee
id
name
phone
room_id
department
=salary
=boss
=member[]
=fts

Real case

Normal form data:

-------------- ------------------ ------------------------ -----------------

Practitioner<-+-PractitionerRole->Organization->Organization
Иванов И.И.Врач стоматологСтоматологическое отд.Гор. стом. пол.

-------------- | ------------------ ------------------------ -----------------

------------------ ------------------------ -----------------
-PractitionerRole->Organization->Organization
ТерапевтТерапевт. отделениеГКБ 1

------------------ ------------------------ -----------------

After denormalization:

-------------------------

Practitioner
Иванов И.И.
=Врач стоматолог
=Стоматологическое отд.
=Гор. стом. пол.
=Терапевт
=Терапевт. отделение
=ГКБ 1

-------------------------

Profit

  • Super easy

Problems

  • Standarts conflict
  • Integration
  • Conflict existing ui forms/controllers
  • Business Logic
  • Heavy write load
MVCC

----------

8kb page
1 -----
2 -----
3 -----

----------

Update row 1 and 2

----------

8kb page
-1 ------
-2 ------
3 ------
1*------

----------

8kb page
2*------

----------

TOAST - The Oversized-Attribute Storage Technique

----------

8kb page
1 -----
2 -----
3 -----

----------

Update row 1 insert too large object

----------

8kb page
-1 -----
2 -----
1*-----------------------

----------

---------- --------------

8kb pageToast
------------------------
-1 -----1.1* -------
2 -----1.2* -------
1* toast

---------- --------------

Create new entity

EmployeeInfo
employee_id
=phone
=salary
=boss
=member[]
=fts

Real case

----------- --------------- --------- --------- -------- -------------- -----------------

Condition<-EpisodeOfCare->Patient<-Binding->Sector->Practitioner->PractitonerRole

----------- --------------- --------- --------- -------- -------------- -----------------

Separate administrative and clinical parts

Administrative part

--------- --------- -------- -------------- -----------------

Patient<-Binding->Sector->Practitioner->PractitonerRole

--------- --------- -------- -------------- -----------------

PatientBinding
patient_id
sector_id
practitionerrole_id
Clinical part

--------- --------------- -----------

Patient<-EpisodeOfCare->Condition

--------- --------------- -----------

PatientCondition
patient_id
condition_type
Profits
  • Less and easiera update logic
  • No update logic
  • Explicit denormalization
  • Less toast risk
Bloat protection =)
-- pseudo sql
drop table if exists denormalization.patientbinding;
CREATE TABLE if not exists denormalization.patientbinding (like public.patientbinding);

insert into denormalization.patientbinding (patient_id, sector_id, practitionerrole)
select patient.id ....
from patient
join binding on ..
join sector on ..
join practitioner on ..
join practitionerrole on ..;

create index patientbinding_patient_id on denormalization.patientbinding (patient_id);
create index patientbinding_sector_id on denormalization.patientbinding (sector_id);

BEGIN;
-- Drop target table
drop table public.patientbinding;

-- Replace target table on public schema wiht temp table
alter table denormalization.patientbinding set schema public;

COMMIT;

Conclusion

  • Use denormalization careful
  • Database maintenance
  • Try to decomposit your complex task

References

PG bloating - https://www.youtube.com/watch?v=-GNHIHEHDmQ&t=2361s&ab_channel=HighLoadChannel PG autovacuum -https://www.youtube.com/watch?v=TDWC66qzxCs&ab_channel=HighLoadChannel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment