-
-
Save yorzi/2edd05829351df05cb964bd6f21c275b to your computer and use it in GitHub Desktop.
Rails Exercise by Andy Wang
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Background: | |
# The following code works but I have intentionally introduced issues to make it inefficient. | |
# UserEvents are created for every interaction a user has in the system. It stores what employer, partner and | |
# controller they were on. When there were 10 users and only a little bit of traffic, this would run quickly. | |
# But now with 10,000 users and months of traffic - (millions of user events), this code would be very inefficient. | |
# Instructions: | |
# 1. Please make this code more efficient through refactoring so that it can run at a greater scale, | |
# along with adding some comments to improve the readability. | |
# 2. Please explain what the variable 'distinct_groups' will contain by the end of the execution and what | |
# this data would be useful for. | |
# 3. Send the resulting file as a gist, .rb file or zip package via email to ryan@zevobenefits.com | |
start_range = Date.today.beginning_of_month | |
end_range = Date.today.end_of_month | |
users = User.where(role: role) | |
# Initialize these as sets to avoid duplicates and make lookup more efficient | |
employers = Set.new | |
partners = Set.new | |
controller_resources = Set.new | |
distinct_groups = {} | |
# Use find_each to avoid that all users are loaded in memory, it only loads 1000 one time by default. | |
users.find_each do |user| | |
# Track the total time for each user | |
total_time = 0 | |
# Track the last time we saw an event for this user | |
last_time = nil | |
# Use the range in the query to avoid loading unnecessary records into memory, this will improve the performance a lot | |
user_events = user.user_events.where(created_at: start_range..end_range).order(:created_at) | |
user_events.each do |event| | |
# Only need to get the user_name once per user | |
user_name = event.user_name unless user_name | |
if event.last_known_session.present? | |
employers << event.last_known_session["employer"] | |
partners << event.last_known_session["partner"] | |
end | |
if event.data["params"].present? | |
controller_resources << event.data["params"]["controller"] | |
end | |
if last_time.nil? || last_time + 10.minutes < event.created_at | |
last_time = event.created_at | |
else | |
total_time += event.created_at - last_time | |
last_time = event.created_at | |
end | |
end | |
distinct_groups[user_name] = total_time | |
end | |
# Note: | |
# - The variable `distinct_groups` will contain a hash where the keys are | |
# the user names and the values are the total time spent by that user on | |
# events within the specified date range. This could be useful for understanding | |
# user engagement during a month. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Several updates/improving after refactoring the code:
Using AR's
where
to filter user events, it's on database level and it's faster than filtering items in Ruby.Using the
find_each
to fetch users, it loads users in batches to avoid unnecessary memory usage.Using
Set
instead of Array foremployers
,partners
, andcontroller_resources
, it avoids duplicated entries and makes later lookup more quicker.It calculates the total time directly, it's better summing items from the Array in the last step.
If the
created_at
field inuser_events
is indexed, the date range query would be more faster.