for mike

So just looking at your code a bit, I can’t run anything because I don’t have the data. An idea I had that might speed up the process:

Instead of your double for loop where you count the number of times each keyword apeears in the query, you could iterate over all the search queries and build a dictionary (or use counter module) to count the number of times each word appears in total.

# i.e.
totals = dict()
query = 'an example query'
for word in query.split():
    totals.setdefault(word, 0)
    totals[word] = totals.get(word, 0) + 1

Once you have your total word count you can just pull out the keyword values you care about. A dictionary comprehension would work.

{key:val for key in totals if key in keywords

This method is roughly O(n) as you are only iterating over the query set once (and the length of the query as well. I said roughly)

Also I notice you normalize your data with .upper() in the following line:

if bool(keyword.upper() in query.upper()):

This is good! In the approach I suggest I think it doesn't matter but I'd try to do this when you are pulling the data out of the csv rather than when you actually use the data. For example, in your double loop approach you end up calling .upper() multiple times on each keyword. This isn't a bad thing just less efficient.

hassanshamim/search_review.md