Skip to content

Instantly share code, notes, and snippets.

View KobaKhit's full-sized avatar
🏀

Koba Khitalishvili KobaKhit

🏀
View GitHub Profile
@KobaKhit
KobaKhit / snowflake-classify-text.md
Last active September 16, 2024 16:06
Text Classification in Snowflake SQL

Classifying Text in Snowflake SQL

With LLMs becoming available in Snowflake as part of their Cortex suite of products in this piece we will explore what the experience is like when classifying text. First of all, Snowflake has native CLASSIFY_TEXT function that does exactly what it says when given a piece of text and an array of possible categories. Second, one could classify text using emebeddings (EMBED_TEXT_768) and similarity to possible categories calculated by one of the distance function like cosine similarity (VECTOR_COSINE_SIMILARITY). Finally, when going the embeddings + similarity route we could use a cross join with a categories table or create a column for each category's similarity score and then assign the greatest one. So we have thre

@KobaKhit
KobaKhit / sql-spines.md
Created February 8, 2024 02:58
Examples of generating spines/dates in SQL. Assisted by Caleb Kassa.

Spines in SQL

Given a starting date 2024-02-01 I would like to generate 7 days into the future until February 8th (2024-02-08), ex.g.

dt
2024-02-01
2024-02-02
2024-02-03
2024-02-04
@KobaKhit
KobaKhit / repartition_pyspark_dataframe.py
Last active August 27, 2020 23:59
Repartition skewed pyspark dataframes.
from pyspark.sql.functions import monotonically_increasing_id, row_number
from pyspark.sql import Window
from functools import reduce
def partitionIt(size, num):
'''
Create a list of partition indices each of size num where number of groups is ceiling(len(seq)/num)
Args:
size (int): number of rows/elemets
@KobaKhit
KobaKhit / tableau_server_export.py
Last active October 18, 2023 18:36
A simple class that enables you to download (workbooks) or (csv from views) from a Tableau Server.
import tableauserverclient as TSC
import pandas as pd
from io import StringIO
class Tableau_Server(object):
"""docstring for ClassName"""
def __init__(self,username, password,site_id,url, https = False):
super().__init__() # http://stackoverflow.com/questions/576169/understanding-python-super-with-init-methods
@KobaKhit
KobaKhit / visualforce_embed_with_user.html
Last active June 13, 2019 21:13
Create dynamic embed in visual force which displays information by user
<apex:page >
<html>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<!-- User Id in a span -->
<span id = 'user' style = 'display: none;'>
<apex:outputText label="Account Owner" value="{!$User.Id}"></apex:outputText>
</span>
<!-- Embed placeholder -->
@KobaKhit
KobaKhit / reddit_posts_and_comments.py
Created October 24, 2018 19:42
A class that enables user to download posts and comments from a subreddit
class Reddit():
def __init__(self,client_id, client_secret,user_agent='My agent'):
self.reddit = praw.Reddit(client_id=client_id,
client_secret=client_secret,
user_agent=user_agent)
def get_comments(self, submission):
# get comments information using the Post as a starting comment
comments = [RedditComment(author=submission.author,
commentid = submission.postid,
@KobaKhit
KobaKhit / unnest_byseat.R
Last active August 3, 2018 14:27
Example of how to unnest rows by seat or any other array in a cell.
library(tidyr)
setwd("~/Desktop/unnest")
fname = "file-name.csv"
df = read.csv(paste0(fname,'.csv'), stringsAsFactors = F)
df$seats =
sapply(1:nrow(df), function(x) {
seats = c(df[x,]$first_seat,df[x,]$last_seat)
@KobaKhit
KobaKhit / stubhub_inventory_v2.py
Last active November 14, 2017 14:57
Example of using stubhub inverntory v2 api to download all listings for a given event id.
import requests
import base64
import pprint
import pandas as pd
import json
from tqdm import tqdm
# https://stubhubapi.zendesk.com/hc/en-us/articles/220922687-Inventory-Search
@KobaKhit
KobaKhit / hmtl_table_parser.py
Last active July 18, 2022 07:25
Parse all html tables on a page and return them as a list of pandas dataframes. Modified from @srome
# http://srome.github.io/Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas/
class HTMLTableParser:
@staticmethod
def get_element(node):
# for XPATH we have to count only for nodes with same type!
length = len(list(node.previous_siblings)) + 1
if (length) > 1:
return '%s:nth-child(%s)' % (node.name, length)
else:
return node.name
@KobaKhit
KobaKhit / Large dataframe to csv in chunks in R
Last active September 7, 2017 19:56
Write a large dataframe to csv in chunks
df = read.csv("your-df.csv")
# Number of items in each chunk
elements_per_chunk = 100000
# List of vectors [1] 1:100000, [2] 100001:200000, ...
l = split(1:nrow(df), ceiling(seq_along(1:nrow(df))/elements_per_chunk))
# Write large data frame to csv in chunks
fname = "inventory-cleaned.csv"