Skip to content

Instantly share code, notes, and snippets.

@MattPitlyk
MattPitlyk / fine-tuning-gpt-2-on-a-custom-dataset.ipynb
Created February 14, 2020 19:14
Fine-Tuning GPT-2 on a Custom Dataset
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@MattPitlyk
MattPitlyk / send_email.py
Created October 15, 2018 01:07
send email from python
def send_mail(msg, to_add, from_add=from_add,
smts_server=('smtp.gmail.com', 587),
login=(email, password),
subj=None,
header=True):
"""Function to consolidate sending a single email. to_add can be a list."""
if header:
@MattPitlyk
MattPitlyk / testcustomscript.sh
Created November 3, 2017 19:44
Test custom script
echo "test script" > test.txt
@MattPitlyk
MattPitlyk / mutliprocessing_a_generator.py
Last active March 13, 2017 04:44
How to partially consume a generator during multiprocessing.
'''
This file demostrates how to prevent a generator from being fully comsumed by
multiprocessing. It avoids using multiprocessing.map() in favor of a single
queue with a max size from which processes pull elements to work on. As elements
are being processed, a loop fills the queue will elements from the generator
until it reaches its max size, at which point it blocks until elements are
retrieved from the queue by the processes.
'''
from multiprocessing import Process, JoinableQueue
import os
@MattPitlyk
MattPitlyk / multithreading_requests.py
Last active July 26, 2016 03:12
Template for making multithreaded requests calls.
from threading import Thread
from Queue import Queue
import requests
def worker():
while True:
url = input_q.get() # grab a task (a url to scrape) from the input_q
res = requests.get(url)
output_q.put((url, res.content)) # save just the content or the entire response object to process later
input_q.task_done() # Tells the input_q that the task we grabbed above (the url) has been processed
"""Example pulled from pandas docs for highlights
the highest number in each columns to illustrate the
experimental DataFrame.style api.
"""
import pandas as pd
df = pd.DataFrame.from_dict({'Medications': {'By Alphabetical': 7.7914173413117429e-05,
'By Diagnosis': 0.12608248612763967,
'By Occurrences': 0.024339338681789544},
# coding: utf-8
"""
Scan all folders in path and output a list of folder names and sizes sorted by descending size.
"""
import os
from os.path import join, getsize
from collections import defaultdict
import argparse