Skip to content

Instantly share code, notes, and snippets.

View hanfangyuan4396's full-sized avatar
😐
很内向,在工位都不敢工作

Han Fangyuan hanfangyuan4396

😐
很内向,在工位都不敢工作
  • Shanghai
  • 17:51 (UTC +08:00)
View GitHub Profile
@hanfangyuan4396
hanfangyuan4396 / testRegex.js
Created August 17, 2024 05:15 — forked from hanxiao/testRegex.js
Regex for chunking by using all semantic cues
// Updated: Aug. 15, 2024
// Run: node testRegex.js testText.txt
// Used in https://jina.ai/tokenizer
const fs = require('fs');
const util = require('util');
// Define variables for magic numbers
const MAX_HEADING_LENGTH = 7;
const MAX_HEADING_CONTENT_LENGTH = 200;
const MAX_HEADING_UNDERLINE_LENGTH = 200;