Skip to content

Instantly share code, notes, and snippets.

@harshildarji
Created March 27, 2022 14:44
Show Gist options
  • Save harshildarji/ed2d9b20745f02b4b5b4d6e89706770d to your computer and use it in GitHub Desktop.
Save harshildarji/ed2d9b20745f02b4b5b4d6e89706770d to your computer and use it in GitHub Desktop.
Compare DFs, and store differences in separate one
import pandas as pd
df1 = pd.read_csv('raw.csv', sep='|')
df2 = pd.read_csv('clean_OLD.csv', sep='|')
not_clean = list(set(df1['id']) - set(df2['id']))
print(f'Raw DF: {df1.shape}\nClean DF: {df2.shape}\nNot clean: {len(not_clean)}')
data = df1[df1['id'].isin(not_clean)].reset_index(drop=True)
print(f'Not clean DF: {data.shape}')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment