Skip to content

Instantly share code, notes, and snippets.

@jemygraw
Last active December 20, 2016 09:55
Show Gist options
  • Save jemygraw/78a06418cd8ea02436ee8a57b8903315 to your computer and use it in GitHub Desktop.
Save jemygraw/78a06418cd8ea02436ee8a57b8903315 to your computer and use it in GitHub Desktop.
compare flow
import sys
fname1 = sys.argv[1]
fname2 = sys.argv[2]
first_dict = {}
second_dict = {}
fp1 = open(fname1)
for line in fp1:
items = line.strip().split("\t")
path = items[0]
flow = 0
if items[1] != "NULL":
flow = int(items[1])
first_dict.update({path: flow})
fp1.close()
fp2 = open(fname2)
for line in fp2:
items = line.strip().split("\t")
path = items[0]
flow = 0
if items[1] != "NULL":
flow = int(items[1])
second_dict.update({path: flow})
fp2.close()
##check
first_keys = set(first_dict.keys())
second_keys = set(second_dict.keys())
shared_keys = first_keys & second_keys
first_only_keys = first_keys - second_keys
second_only_keys = second_keys - first_keys
for k in shared_keys:
f_flow = first_dict[k]
s_flow = second_dict[k]
d_flow = s_flow - f_flow
print("{0}\t{1}\t{2}\t{3}".format(k, f_flow, s_flow, d_flow))
for k in first_only_keys:
f_flow = first_dict[k]
print("{0}\t{1}\t{2}\t{3}".format(k, f_flow, 0, -f_flow))
for k in second_only_keys:
s_flow = second_dict[k]
print("{0}\t{1}\t{2}\t{3}".format(k, 0, s_flow, s_flow))
@jemygraw
Copy link
Author

该程序的输入为hadoop汇总的结果,格式为:
path\tsum(length)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment