Skip to content

Instantly share code, notes, and snippets.

View Granitosaurus's full-sized avatar

Bernardas Ališauskas Granitosaurus

View GitHub Profile
Granitosaurus /
Created July 17, 2024 06:39
How to use Llamaindex to scrape multiple pages and parse data with LLMs

What are you using for your LLM execution? If you're using something like LlamaIndex it can work with multiple documents surprisingly well especially in markdown.

For example, here's a script that I explored with Scrapfly. It scrapes multiple pages as markdown and then loads them all into an index you can query (with openAI in this case):

import os
from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.readers.web import ScrapflyReader
Granitosaurus /
Created May 24, 2024 04:30
export qutebrowser bookmarks to firefox compatible bookmarks HTML file.
"""convert qutebrowser bookmarks to firefox bookmark HTML file for importing to firefox"""
from pathlib import Path
from os import getenv
xdg_config = Path(getenv('XDG_CONFIG_HOME', Path.home() / '.config'))
bookmarks = (xdg_config / 'qutebrowser/bookmarks/urls').read_text().splitlines()
html = [
'<!DOCTYPE NETSCAPE-Bookmark-file-1>',
'<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">',
Granitosaurus /
Created January 17, 2024 03:47
scraper for
import asyncio
import httpx
from bs4 import BeautifulSoup
async def scrape():
async with httpx.AsyncClient() as client:
# get first page and extract links
resp_first_page = await client.get("")
soup_first_page = BeautifulSoup(resp_first_page.content, "html.parser")
"data": [
"label": "05/09/2022 - 19:00",
"value": [
Granitosaurus / ultimate-gaming-glossary.txt
Created May 15, 2022 08:05
Ultimate list of popular, gaming-related terms
Aggro - Refers to attention from in-game enemies, often relating to the distance at which an enemy will notice the player character and engage in combat, or - in a multiplayer setting - the player being targeted by an enemy. See also: 'Tanking'
ARPG - An acronym for 'action role-playing game', ARPG is a sub-genre of RPG that focuses primarily on combat, exploration and character development. Typically involving lengthy dungeons and tough boss fights, ARPGs are popular with min-max playstyles.
ADS - Short for 'Aim Down Sights', ADS is the action of raising a ranged weapon to eye-level and aiming along the sights.
AoE - Short for 'Area of Effect', AoE refers to attacks which cover a certain area with a damaging effect.
Bots - A bot is a computer-controlled character playing alongside regular players in a multiplayer setting. It may also refer to an inexperienced or poor player, whose ability is no greater than an AI bot.
Backwards compatibilty - The ability of a games console to run software created for prior c
Granitosaurus /
Created November 17, 2021 07:20
aircraft scraper for website.
import asyncio
from aiohttp.client import ClientSession
from parsel import Selector
async def get_page(page: int, session: ClientSession):
"""This is a little shortcut function that requests specific page of our pagination"""
url = f"{page}"
print(f"requesting {url}")
return await session.get(url)
"contexts": [
"searchTerm": "toast",
"pn": 14
import requests
headers = {
# when web scraping we always want to appear as
# a web browser to prevent being blocked
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
data = {
# our recipe search term
"searchTerm": "Toast",
Granitosaurus /
Created September 18, 2020 10:14
python object hook
def object_hook(dict_, func: Callable, types: Union[Tuple[Type], Type] = dict):
object hook for python dictionary - applies function to every object of type recursively
note: dictionary keys are not considered to be objects and will be type matched,
for processing dictionary keys you should process whole dict type.
>>> object_hook({'foo': 'bar'}, str.upper, str)
{'foo': 'BAR'}
Granitosaurus /
Created February 24, 2020 07:48
Infer date format from date string in python
somedate = '2020-02-24 07:22'
somedates = ['2020-02-24 07:22', '1995-12-01 22:00', '2001-01-01 11:54']
import dateinfer
#'%H' incorrect
#'%Y-%m-%d %H:%M' correct!