Skip to content

Instantly share code, notes, and snippets.

@hassanshamim
Created November 12, 2016 00:05
Show Gist options
  • Save hassanshamim/a4d2981ae03f3aa50c6d20a205a5d711 to your computer and use it in GitHub Desktop.
Save hassanshamim/a4d2981ae03f3aa50c6d20a205a5d711 to your computer and use it in GitHub Desktop.
for mike

So just looking at your code a bit, I can’t run anything because I don’t have the data. An idea I had that might speed up the process:

Instead of your double for loop where you count the number of times each keyword apeears in the query, you could iterate over all the search queries and build a dictionary (or use counter module) to count the number of times each word appears in total.

# i.e.
totals = dict()
query = 'an example query'
for word in query.split():
    totals.setdefault(word, 0)
    totals[word] = totals.get(word, 0) + 1

Once you have your total word count you can just pull out the keyword values you care about. A dictionary comprehension would work.

{key:val for key in totals if key in keywords

This method is roughly O(n) as you are only iterating over the query set once (and the length of the query as well. I said roughly)

Also I notice you normalize your data with .upper() in the following line:

if bool(keyword.upper() in query.upper()):

This is good! In the approach I suggest I think it doesn't matter but I'd try to do this when you are pulling the data out of the csv rather than when you actually use the data. For example, in your double loop approach you end up calling .upper() multiple times on each keyword. This isn't a bad thing just less efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment