Skip to content

Instantly share code, notes, and snippets.

View dennyglee's full-sized avatar

Denny Lee dennyglee

View GitHub Profile
@dennyglee
dennyglee / llama-index_starter_dbrx.py
Last active April 10, 2024 07:25
Llama_Index Starter Example using DBRX
#
# LlamaIndex Starter Example with DBRX
#
# Based on the LlamaIndex Starter Example
# https://docs.llamaindex.ai/en/stable/getting_started/starter_example/
#
# Ensure you have installed both llama_index and Databricks integration
# e.g., pip install llama_index llama-index-llms-databricks
#
@dennyglee
dennyglee / using-dbrx-with-pyspark-ai.md
Created April 1, 2024 15:30
Using DBRX with PySpark AI

Using DBRX with PySpark AI

This markdown shows a quick example of how to use Databricks DBRX to generate and run a transform query against a sammple dataset.

Requirements

Install the following

  • Configure and install databricks-cli
  • pip install langchain langchain-community mlflow setuptools
@dennyglee
dennyglee / query-dbrx-via-FMAPI-using-OpenAI-SDK.json
Last active March 31, 2024 20:07
query-dbrx-via-FMAPI-using-OpenAI-SDK Results
{
"id": "9755cef0-b958-47ec-b696-1aeb6f9674f6",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Well, my dear, if I had a dollar for every time I've been asked this question, I'd be sipping a pi\u00f1a colada on my private yacht instead of standing behind this stove. But since you've asked, and I can see that little glimmer of curiosity in your eyes, I'll indulge you.\n\nNow, let's get one thing straight: I'm not going to tell you that one is better than the other. That would be like comparing a Picasso to a Monet - they're both masterpieces, but they appeal to different tastes. It's all about personal preference, you see.\n\nMontreal bagels, bless their hearts, are these petite, crispy, and slightly sweet beauties. They're boiled in honey-sweetened water and then baked in a wood-fired oven, which gives them this delightful, smoky flavor. They're a bit denser than their New York count
@dennyglee
dennyglee / query-dbrx-via-FMAPI-using-OpenAI-SDK.py
Last active March 31, 2024 20:11
Locally Query DBRX using Foundation Model API (FMAPI) using OpenAI Python SDK
#
# Locally Query DBRX using Foundation Model API (FMAPI) using OpenAI Python SDK
#
#
import json
import os
from openai import OpenAI
@dennyglee
dennyglee / rag-testing-llama-index-indexes.md
Last active April 3, 2024 00:32
RAG testing with different indexes via llama-index

RAG testing using different indexes with llamaindex

Different Indexes result in different answers

(Generated by Chat GPT 4.0)

Retrieval-Augmented Generation (RAG) systems, which combine a neural network-based generative model with a retrieval system, can use various types of indexes to retrieve relevant documents or passages. The type of index used can significantly impact the performance and output of the RAG system. Let's explore why using different indexes like Keyword Table Index, Vector Store Index, Summary Index, Tree Index, and Knowledge Graph Index for the same document results in different answers:

  1. Keyword Table Index:
  • Nature: It's based on keyword matching.
@dennyglee
dennyglee / spark-to-sql-validation-sample.py
Created April 4, 2018 18:54
Validate Spark DataFrame data and schema prior to loading into SQL
'''
Example Schema Validation
Assumes the DataFrame `df` is already populated with schema:
{id : int, day_cd : 8-digit code representing date, category : varchar(24), type : varchar(10), ind : varchar(1), purchase_amt : decimal(18,6) }
Runs various checks to ensure data is valid (e.g. no NULL id and day_cd fields) and schema is valid (e.g. [category] cannot be larger than varchar(24))
'''
@dennyglee
dennyglee / NGB-Genome-Browser-Docker-E2E-Script.md
Last active January 27, 2018 05:38
The NGB Genome Browser is a web-based NGS data viewer with structural variations (SVs) visualization capabilities. This gist provides end-to-end Docker installation instructions and a demos script.

NGB Genome Browser Docker End-to-End Demo Script

The NGB Genome Browser is a web-based NGS data viewer with structural variations (SVs) visualization capabilities. This gist provides end-to-end Docker installation instructions and a demos script. This is an e2e version including downloading the sample VCF and BAM files.

Note, these instructions are derived from the following sources:

@dennyglee
dennyglee / cqlsh-CosmosDB-Cassandra-API-macos.md
Last active January 12, 2018 19:40
Connecting cqlsh to Cosmos DB Cassandra API on MacOS

As noted in the Introduction to Apache Cassandra API for Azure Cosmos DB, you can connect to the Cosmos DB Cassandra API using cqlsh. The instructions included in the Quick Start are setup for Windows (not MacOS) and there may be a versioning issue as the default cassandra-driver (installed via pip install cassandra-driver is for 3.3.1 instead of 3.4 (which is what is needed for Cosmos DB Cassandra API).

Install Cassandra via brew

This will ensure that you have the latest Cassandra-driver for CQL 3.4:

brew install cassandra

Cassandra will be installed in the /usr/local/Cellar/cassandra/$version folder

@dennyglee
dennyglee / ru_su_splits.md
Last active May 17, 2019 04:48
Request Units, Storage Utilization, Splits....oh my!

Request Units, Storage Utilization, Splits....oh my!

The Unofficial Throughput and Capacity Guestimate Guide for Azure Cosmos DB

Introduction

I have had a lot of great questions about how to estimate the throughput and storage capacity for Azure Cosmos DB. To get yourself up and running, the key best practices references are:

//
// Spark 2.0 to SQL Server via External Data Source API and SQL JDBC
//
// References:
// - https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
// - https://blogs.msdn.microsoft.com/bigdatasupport/2015/10/22/how-to-allow-spark-to-access-microsoft-sql-server/
// - https://docs.microsoft.com/en-us/sql/connect/jdbc/using-the-jdbc-driver
// Run spark-shell
// - Get the SQL Server JDBC JAR fom the above "Using the JDBC driver" link