Skip to content

Instantly share code, notes, and snippets.

View dayyass's full-sized avatar
🚀
Rocket Science

Dani El-Ayyass dayyass

🚀
Rocket Science
View GitHub Profile
@veekaybee
veekaybee / normcore-llm.md
Last active September 25, 2024 00:46
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@dayyass
dayyass / logreg_sklearn2torch.ipynb
Created May 19, 2022 20:07
Convert sklearn logreg to torch neural network
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dayyass
dayyass / torch_unpack_sequence.py
Created October 13, 2021 12:04
Inverse function for torch.nn.utils.rnn.pack_sequence.
import torch
from torch.nn.utils.rnn import pack_sequence, pad_packed_sequence
def unpack_sequence(packed_sequences):
"""Unpacks PackedSequence into a list of variable length Tensors"""
unpacked_sequences = []
padded_sequences, lengths = pad_packed_sequence(packed_sequences, batch_first=True)
@dayyass
dayyass / permutation_accuracy.py
Created October 9, 2021 14:01
Find a labels mapper with the highest accuracy.
from itertools import permutations
import numpy as np
from sklearn.metrics import accuracy_score
np.random.seed(42)
y_true = np.random.randint(low=0, high=3, size=100)
noize_mapper = {0: 1, 1: 2, 2: 0}
@dayyass
dayyass / tfidf_lemmatization.py
Created September 29, 2021 09:20
How to use sklearn TfidfVectorizer with lemmatizer.
from sklearn.feature_extraction.text import TfidfVectorizer
# pymorphy2 lemmatizer
import pymorphy2
class Lemmatizer:
def __init__(self):
self.morph = pymorphy2.MorphAnalyzer()
def __call__(self, x: str) -> str:
@dayyass
dayyass / tfidf_token2idf.py
Last active September 29, 2021 12:25
Extract token2idf mapper from TfidfVectorizer.
from sklearn.feature_extraction.text import TfidfVectorizer
# data
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
]
@dayyass
dayyass / lemmatized.py
Last active May 26, 2022 11:26
Pymorphy2 lemmatizer class.
import pymorphy2
class Lemmatizer:
"""
Pymorphy2 lemmatizer class.
"""
def __init__(self):
"""
@dayyass
dayyass / humanize_bytes.py
Created July 25, 2021 08:25
Convert bytes to human readable format.
def humanize_bytes(bytes: int, suffix: str = "B") -> str:
"""
Convert bytes to human readable format.
:param int bytes: number of bytes.
:param str suffix: bytes suffix.
:return: human readable size.
:rtype: str
"""
@dayyass
dayyass / Dockerfile
Last active July 19, 2021 10:06
jupyter-cuda10.1-tf2.2.0-docker-mlspace
FROM cr.msk.sbercloud.ru/aicloud-jupyter/jupyter-cuda10.1-tf2.2.0-mlspace:latest
MAINTAINER Dani El-Ayyass <dayyass@yandex.ru>
USER root
# Docker
# Set up the repository
RUN apt-get update
RUN apt-get -y install apt-transport-https ca-certificates curl gnupg lsb-release
- repo: local
hooks:
- id: unittest
name: unittest
entry: python -m unittest discover
language: python
always_run: true
pass_filenames: false