Skip to content

Instantly share code, notes, and snippets.

@phargogh
Last active August 2, 2022 21:30
Show Gist options
  • Save phargogh/aff32bb689220fb317b0a5196c7cf307 to your computer and use it in GitHub Desktop.
Save phargogh/aff32bb689220fb317b0a5196c7cf307 to your computer and use it in GitHub Desktop.
Can HRA Reclassification Percentages be Derived from `pygeoprocessing.zonal_statistics` outputs?

Can HRA Reclassification Percentages be Derived from pygeoprocessing.zonal_statistics outputs?

Background

Among its outputs, HRA produces a raster where the calculated floating-point risk score is reclassified into the following classifications:

  1. 0 representing no risk
  2. 1 representing low risk
  3. 2 representing medium risk
  4. 3 representing high risk

The current iteration of HRA does a lot of extra work to tally up the counts of pixels under each AOI polygon to determine the percentage of pixels under the polygon that are in each risk classification category because this output is not directly supported by HRA.

For reference, the structure of the return dictionary from zonal_statistics is:

{FID: {
    'min': 0,
    'max': 1,
    'sum': 1.7,
    'count': 3,
    'nodata_count': 0,
}}

Thus the question becomes: can the breakdown of high/medium/low classifications be derived from the outputs of zonal_statistics?

Mathy Approach: this can't be done with just these values

At first glance, we appear to be able to construct a system of linear equations:

0*x_0 + 1*x_1 + 2*x_2 + 3*x_3 = stats_{sum}

x_0 + x_1 + x_2 + x_3 = stats_{count}

It's worth noting that althoug this appears plausible, it is impossible to find a single solution to all of the unknowns: to solve 4 unknowns, we need 4 equations. Thus, solving a system of linear equations is only possible when there are 2 or fewer unknowns.

But in the general case, we end up with 2 unknowns that cannot be accounted for, effectively a whole plane of possible solutions

Demonstration

By way of demonstration, I have added a program here, test_lulc_pixelcounts.py, that:

  1. Generates a set of 3 random integers representing pixel counts under a hypothetical area of interest
  2. Uses a brute-force method to determine if there are any alternate solutions.

Example output:

In this example, this hypothetical polygon has:

  • 0 pixels with risk classification 0
  • 11 pixels with risk classification 1
  • 62 pixels with risk classification 2
  • 4 pixels with risk classification 3

Given the sum of these risk classifications ( 11+(2*62)+(3*4)=147 ) and the count of valid pixels (11+62+4=77), we can see that there are at least 36 other possible solutions:

Testing 0, 11, 62, 4, 36 solutions
    Solutions:
        0 7 70 0
        0 8 68 1
        0 9 66 2
        0 10 64 3
        0 11 62 4
        0 12 60 5
        0 13 58 6
        0 14 56 7
        0 15 54 8
        0 16 52 9
        0 17 50 10
        0 18 48 11
        0 19 46 12
        0 20 44 13
        0 21 42 14
        0 22 40 15
        0 23 38 16
        0 24 36 17
        0 25 34 18
        0 26 32 19
        0 27 30 20
        0 28 28 21
        0 29 26 22
        0 30 24 23
        0 31 22 24
        0 32 20 25
        0 33 18 26
        0 34 16 27
        0 35 14 28
        0 36 12 29
        0 37 10 30
        0 38 8 31
        0 39 6 32
        0 40 4 33
        0 41 2 34
        0 42 0 35

Conclusion

The only way to be completely sure about the per-class breakdown per AOI polygon is to inspect them. zonal_statistics cannot be used at this time.

import json
import random
import cython
MAX_N_PIXELS = 100
def sum_scores(a, b, c, d):
return 0*a + 1*b + 2*c + 3*d
@cython.compile
def find_solutions(sum_of_scores, pixelcount):
solutions = []
a = 0
for b in range(0, MAX_N_PIXELS+1):
for c in range(0, MAX_N_PIXELS+1):
for d in range(0, MAX_N_PIXELS+1):
if (sum_scores(a, b, c, d) == sum_of_scores and
pixelcount == a + b + c + d):
solutions.append([a, b, c, d])
return solutions
def main():
combos_tested = set()
for i in range(5):
for i in range(4):
ints = (0,) + tuple(round(random.uniform(0, 100)) for i in range(3))
if ints not in combos_tested:
break
sum_of_scores = sum_scores(*ints)
pixelcount = sum(ints)
solutions = find_solutions(sum_of_scores, pixelcount)
print(f"Testing {', '.join(str(i) for i in ints)}, {len(solutions)} solutions")
if len(solutions) > 1:
print(" Solutions:")
for solution in solutions:
print(f" {' '.join(str(i) for i in solution)}")
with open('problem_combinations.txt', 'a') as combos:
combos.write(
f"{json.dumps({'codes': ints, 'solutions': solutions})}")
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment