Kyle Streepy kstreepy

1 follower · 0 following

Data Scientist
Arlington, VA

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

kstreepy / dict_map function

Created June 1, 2020 15:33

map in dict as new column

	def gs_group(df):
	gs_dict = {'GS-1' : 'GS 1-6',
	'GS-2' : 'GS 1-6',
	'GS-3' : 'GS 1-6',
	'GS-4' : 'GS 1-6',
	'GS-5' : 'GS 1-6',
	'GS-6' : 'GS 1-6',
	'GS-7' : 'GS 7-9',
	'GS-8' : 'GS 7-9',
	'GS-9' : 'GS 7-9',

kstreepy / gz_extract.py

Created June 11, 2019 16:09

For a given directory, unzip all .gz files in folder, save unzipped files in folder and deleted zipped files. A python solution for instances where you do not have access to PowerShell.

	import os, gzip, shutil

	dir_name = 'x'

	def gz_extract(directory):
	extension = ".gz"
	os.chdir(directory)
	for item in os.listdir(directory): # loop through items in dir
	if item.endswith(extension): # check for ".gz" extension
	gz_name = os.path.abspath(item) # get full path of files

kstreepy / quick_melt.py

Created May 30, 2019 12:02

Given a wide dataframe, use melt to transform from wide to long. Declare the ID Columns and establish value columns as all remaining columns in dataframe.

	import pandas as pd

	def quick_melt(wide_df):
	'''
	Take wide dataframe and melt to long. Declare ID Columns (id_cols)
	and then establish all remaining columns as value columns
	'''

	id_cols = {'A', 'B', 'C'}
	value_cols = set(wide_df.columns) - id_cols

kstreepy / read_multi_csv_source.py

Last active May 29, 2019 17:48

Read in multiple CSV files in a folder into single dataframe with a new column with the name of the source file.

	import pandas as pd
	import os
	import glob

	def read_multi_csv(path):
	'''
	Given a file path with wildcard and extension, parse all files with that extension in directory
	into a single dataframe.
	'''

kstreepy / read_multi_excel_source.py

Created May 29, 2019 17:47

Read in multiple Excel files into single dataframe with filename as a column in new dataframe.

	import pandas as pd
	import os
	import glob

	def read_multi_excel(path):
	'''
	Given a file path with wildcard and extension, parse all files with that extension in directory
	into a single dataframe.
	'''

kstreepy / read_multi_csv.py

Created May 29, 2019 17:37

Read multiple CSV's in file folder into single pandas dataframe.


	import pandas as pd
	import glob

	def read_multi_csv(path):
	'''
	Given a file path with wildcard and extension, parse all files with that extension in directory
	into a single dataframe.
	'''

kstreepy / read_multi_excel.py

Last active May 29, 2019 17:49

Read multiple Excel files within a file folder into single pandas dataframe.

	import pandas as pd
	import glob

	def read_multi_excel(path):
	'''
	Given a file path with wildcard and extension, parse all files with that extension in directory
	into a single dataframe.
	'''

	all_files = glob.glob(path)