Skip to content

Instantly share code, notes, and snippets.

View peeush-agarwal's full-sized avatar

Peeush Agarwal peeush-agarwal

View GitHub Profile
@peeush-agarwal
peeush-agarwal / handle_multicollinearity_by_VIF.py
Created May 12, 2024 07:48
Handle multicollinearity using VIF and dropping correlated columns with higher VIF value
def multicollinearity_by_vif(X, vif=5):
"""Remove columns from X whose VIF is greater than supplied 'vif'
Parameters:
X:array or dataframe containing data excluding target variable
vif: int or float of limiting value of VIF
Note:
This function changes X inplace
"""
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
@peeush-agarwal
peeush-agarwal / handle_outliers_eliminate_rows.py
Last active May 12, 2024 07:35
Handle outliers using IQR and eliminate rows
import numpy as np
# data as pandas DataFrame
train = ...
# Variables to store out bounds by column value
upper_bounds = {}
lower_bounds = {}
# Best practice
{
"basics": {
"name": "Peeush Agarwal",
"label": "Senior ML Engineer",
"email": "agarwal.peeush@gmail.com",
"phone": "+91 82377 28795",
"location": {
"city": "Pune",
"region": "Maharashtra",
"countryCode": "IN"
@peeush-agarwal
peeush-agarwal / resume_mg.json
Last active September 18, 2024 10:22
Resume in JSON format for Megha
{
"basics": {
"name": "Megha Goyal",
"label": "WFM Business Partner | Chief of Staff | MIS Executive",
"email": "meghagoyal0602@gmail.com",
"phone": "+91-90961 69255",
"location": {
"address": "Wagholi",
"city": "Pune",
"region": "Maharashtra"
@peeush-agarwal
peeush-agarwal / compute_eigen_vals.py
Created March 14, 2022 17:25
PCA: computation of eigen values for the dataframe
import numpy as np
cov_mat = np.cov(X_train_std.T)
eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)
@peeush-agarwal
peeush-agarwal / standardize_data.py
Created March 14, 2022 17:23
PCA: standardize the wine dataframe
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# split into training and testing sets
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3,
stratify=y, random_state=0
)
# standardize the features
@peeush-agarwal
peeush-agarwal / load_wine_data.py
Created March 14, 2022 17:19
PCA: load wine data into the dataframe
import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/'
'machine-learning-databases/wine/wine.data',
header=None)
df_wine.head()
@peeush-agarwal
peeush-agarwal / main.yml
Last active February 28, 2022 02:58
GitHub workflow file changes with docker container
# This is a basic workflow to help you get started with Actions
name: Deploy to Raspberry Pi
# Controls when the workflow will run
on:
# Triggers the workflow on push or pull request events but only for the main branch
push:
branches: [ main ]
@peeush-agarwal
peeush-agarwal / .dockerignore
Created February 27, 2022 16:13
Docker ignore file to ignore specific files and folders when building Docker container
Dockerfile
__pycache__/
@peeush-agarwal
peeush-agarwal / gunicorn_starter.sh
Created February 27, 2022 04:26
Gunicorn server shell script
#!/bin/sh
gunicorn -w 1 -b 0.0.0.0:4000 app:app