Skip to content

Instantly share code, notes, and snippets.

View redpoint13's full-sized avatar
👊
I am groot.

Tony Shouse redpoint13

👊
I am groot.
View GitHub Profile
did:3:kjzl6cwe1jw14adzo7h33t3lkl72c1gs03062bfrvfvjteekv9hn3vbzu3w2m7v
@redpoint13
redpoint13 / install_nvidia_driver.md
Last active September 20, 2022 04:31 — forked from espoirMur/install_nvidia_driver.md
How I fix this issue NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

I had many driver installed I my virtual machine , so It was actually the reason why I was having the error.

To fix it I had first to remove all driver I have installed before using :

  • sudo apt purge nvidia-*

  • sudo apt update

  • sudo apt autoremove

After that I when a head and installed the latest version of it nvidia driver:

@redpoint13
redpoint13 / ubuntu-install-gcc-6
Created December 29, 2020 20:05 — forked from zuyu/ubuntu-install-gcc-6
Install gcc 6 on Ubuntu
sudo apt-get update && \
sudo apt-get install build-essential software-properties-common -y && \
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y && \
sudo apt-get update && \
sudo apt-get install gcc-6 g++-6 -y && \
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 60 --slave /usr/bin/g++ g++ /usr/bin/g++-6 && \
gcc -v
@redpoint13
redpoint13 / missing value heatmap
Created December 28, 2020 18:10
missing value heatmap from pandas dataframe
import seaborn as sns
cols = df.columns[:30] # first 30 columns
colors = ['#000099', '#ffff00'] # specify the colours - yellow is missing. blue is not missing.
sns.heatmap(df[cols].isnull(), cmap=sns.color_palette(colors))
# if it's a larger dataset and the visualization takes too long can do this.
# % of missing.
for col in df.columns:
pct_missing = np.mean(df[col].isnull())
if pct_missing > 0.009:
@redpoint13
redpoint13 / variance_inflation_factor
Created December 28, 2020 18:06
variance_inflation_factor
# variance_inflation_factor
# VIF >= 5 indicates issue w multicollearity
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.tools.tools import add_constant
X = add_constant(data.select_dtypes(include=np.number).fillna(-1))
vif = pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)
vif
@redpoint13
redpoint13 / cleanNonPrintableCharFromDF
Created December 11, 2020 20:25
"""Clean non-printable characters from headers and data."""
def clean_df_chars(df):
"""Clean non-printable characters from headers and data."""
df.columns.str.replace("/[^ -~]+/g", "").str.strip()
df = df.replace(to_replace="/[^ -~]+/g", value="", regex=True)
return df
@redpoint13
redpoint13 / confusion_matrix_calc.py
Created July 31, 2019 18:32
Formulas for confusion matrix components
# True Positive (TP): we predict a label of 1 (positive), and the true label is 1.
TP = np.sum(np.logical_and(pred_labels == 1, true_labels == 1))
# True Negative (TN): we predict a label of 0 (negative), and the true label is 0.
TN = np.sum(np.logical_and(pred_labels == 0, true_labels == 0))
# False Positive (FP): we predict a label of 1 (positive), but the true label is 0.
FP = np.sum(np.logical_and(pred_labels == 1, true_labels == 0))
# False Negative (FN): we predict a label of 0 (negative), but the true label is 1.
@redpoint13
redpoint13 / compare_dir_contents.py
Created July 1, 2019 11:38
snippet goes through two folders recursively and displays all files that have the same name, but are different and it lists all files that exist either on the left or right filepath:
import filecmp
c = filecmp.dircmp(filepath1, filepath2)
def report_recursive(dcmp):
for name in dcmp.diff_files:
print("DIFF file %s found in %s and %s" % (name,
dcmp.left, dcmp.right))
for name in dcmp.left_only:
print("ONLY LEFT file %s found in %s" % (name, dcmp.left))
for name in dcmp.right_only:
for c in cols_cat:
df = pd.concat([df, pd.get_dummies(df[c], prefix=c,dummy_na=True)], axis=1).drop([c], axis=1)
@redpoint13
redpoint13 / useful_pandas_snippets.py
Last active January 2, 2018 20:18 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!