Skip to content

Instantly share code, notes, and snippets.

View LukasKriesch's full-sized avatar

Lukas Kriesch LukasKriesch

View GitHub Profile
@LukasKriesch
LukasKriesch / gist:e75a0132e93ca989f8870c4f95be734b
Created August 26, 2024 09:12
Python translation Jina AI chunking regex
import regex as re
import requests
MAX_HEADING_LENGTH = 7
MAX_HEADING_CONTENT_LENGTH = 200
MAX_HEADING_UNDERLINE_LENGTH = 200
MAX_HTML_HEADING_ATTRIBUTES_LENGTH = 100
MAX_LIST_ITEM_LENGTH = 200
MAX_NESTED_LIST_ITEMS = 6
MAX_LIST_INDENT_SPACES = 7