Skip to content

Instantly share code, notes, and snippets.

View axhiao's full-sized avatar
🏠
Working from home

Minglei Yin axhiao

🏠
Working from home
View GitHub Profile
@axhiao
axhiao / testRegex.js
Created August 16, 2024 06:37 — forked from hanxiao/testRegex.js
Regex for chunking by using all semantic cues
// Updated: Aug. 15, 2024
// Run: node testRegex.js testText.txt
// Used in https://jina.ai/tokenizer
const fs = require('fs');
const util = require('util');
// Define variables for magic numbers
const MAX_HEADING_LENGTH = 7;
const MAX_HEADING_CONTENT_LENGTH = 200;
const MAX_HEADING_UNDERLINE_LENGTH = 200;