In a recent project, we had a REST API with GET
/profile
route, which returned profile data. Profile data needed to be taken from users
table as well as many other tables like: universities
, identities
, sectors
, companies
, question_answers
, can_help
. All of those subsequent tables are related to user with foreign key and initially there were N
network requests for each table query, which was taking O(N)
linear time to execute. After optimisations the time function was reduced to O(1)
constant time. The optimisation was to send queries in bulk (Multiple select statemets in a single network request) and extract each table afterwards from the DB response like so:
const [ query_tables ] = await db.read.raw(`
SELECT
un.id AS id,
un.name AS name
FROM universities un
INNER JOIN user_universities us
ON un.id = us.university_id
WHERE us.user_id = ?
ORDER BY us.id DESC;
SELECT
i.id AS id,
i.name AS name,
i.emoji AS emoji,
i.is_active AS is_active
FROM user_identities ui
INNER JOIN identities i
ON i.id = ui.identity_id
WHERE ui.user_id = ?
ORDER BY ui.id DESC;
SELECT
s.id AS id,
s.name AS name,
s.emoji AS emoji
FROM user_sectors us
INNER JOIN sectors s
ON s.id = us.sector_id
WHERE us.user_id = ?
ORDER BY us.id DESC;
SELECT
c.id AS id,
c.company AS company,
c.is_active AS is_active
FROM user_companies uc
INNER JOIN companies c
ON c.id = uc.company_id
WHERE uc.user_id = ?
ORDER BY uc.id DESC;
SELECT
aq.id AS question_id,
aq.question AS question,
aq.placeholder AS placeholder,
aq.is_required AS is_required,
aqa.id AS answer_id,
aqa.answer AS answer,
aqa.hashtags AS hashtags
FROM ama_question_answer aqa
INNER JOIN ama_questions aq
ON aq.id = aqa.question_id
WHERE aqa.user_id = ?
ORDER BY aq.is_required DESC, aq.id ASC;
SELECT
ch.id AS id,
ch.name AS name,
ch.group_title AS group_title,
ch.created_at AS created_at
FROM can_help_user chu
INNER JOIN can_help ch
ON ch.id = chu.can_help_id
WHERE chu.user_id = ?
`, [id, id, id, id, id, id])
const [ universities, identities, sectors, companies, question_answers, can_help ] = query_tables
user['universities'] = universities
user['identities'] = identities
user['sectors'] = sectors
user['companies'] = companies
user['question_answers'] = question_answers
user['can_help'] = can_help
knex.js was used as a query builder
Another SQL query optimisation I can remember was regarding hashtag system, where there was a hashtags
table with following structure:
id | hashtag | times_used |
---|---|---|
pk |
varchar(50) , unique |
int |
Each time a post is tagged with hashtags, all the hashtags are extracted and inserted into the table, the catch is, unique hashtag
column would result in error on duplicate insertions, so, upon those ON DUPLICATE KEY
errors, I could update number of times the hashtag is used by incrementing times_used
column. The query used multiple requests for each hashtag, making redundant network calls to database, and I come up with following optimisation where all the hashtags are validated and converted to raw sql insertable string like so:
const data_table = answers.map(a => ({
user_id: user.id,
question_id: a.question_id,
answer: a.answer.trim(),
hashtags: a.hashtags || null,
}))
const flattened_hashtags = data_table
.filter(x => !!x.hashtags && typeof x.hashtags === 'string')
.map(({ hashtags }) => hashtags)
// ['founders', 'fundraising', 'nofilter', 'tag', 'tag']
const uniquie_hashtags = Array(...new Set(flattened_hashtags))
// ['founders', 'fundraising', 'nofilter', 'tag']
if(uniquie_hashtags.length > 0){
const sqlized_values = uniquie_hashtags.map(hashtag => `('${hashtag}')`).join(', ')
// ('founders'), ('fundraising'), ('nofilter'), ('tag')
await db.write.raw(`INSERT INTO hashtags (hashtag) VALUES ${sqlized_values} ON DUPLICATE KEY UPDATE hashtag = VALUES(hashtag), times_used = times_used + 1`)
}
data_table
comes from well validatedrequest.body