Skip to content

Instantly share code, notes, and snippets.

@MaxLarue
Created August 6, 2019 20:37
Show Gist options
  • Save MaxLarue/a05c29157e25df5698fa49185bc94f79 to your computer and use it in GitHub Desktop.
Save MaxLarue/a05c29157e25df5698fa49185bc94f79 to your computer and use it in GitHub Desktop.
string matching
def histogram_match(reference, input):
reference_histogram = defaultdict(int)
input_histogram = defaultdict(int)
for letter in reference.lower():
reference_histogram[letter] += 1
for letter in input.lower():
input_histogram[letter] += 1
mismatch = 0
for letter in reference.lower():
mismatch += min(abs(max(reference_histogram[letter] - input_histogram[letter], 0)), reference_histogram[letter])
# worst mismatch is the number of letters in reference string
# best mismatch is 0
worst = float(len(reference))
return max(((worst - float(mismatch)) / worst) * 100, 0.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment