Understanding the impact on memory and speed of different tokenizers

I asked ChatGPT the following questions:

Memory Factor (n = length of word) what is the memory impact of * 1. The value of n is firefox

Memory Factor (n = length of word) what is the memory impact of * n. The value of n is firefox

Memory Factor (n = length of word) what is the memory impact of * 2n - 1. The value of n is firefox

Memory Factor (n = length of word) what is the memory impact of * n * (n - 1). The value of n is firefox

It responded with the following:

The memory impact of 1 when the value of n is “firefox” would be minimal, it would be the minimal amount of memory required to store a single character.

The memory impact of n when the value of n is “firefox” would be 8 bytes, assuming that each character in the word “firefox” is stored using 1 byte of memory.

The memory impact of 2n - 1 when the value of n is “firefox” would be 15 bytes, assuming that each character in the word “firefox” is stored using 1 byte of memory.

The memory impact of n * (n - 1) when the value of n is “firefox” would be 28 bytes, assuming that each character in the word “firefox” is stored using 1 byte of memory. It is important to note that these are rough estimates as they do not take into consideration any additional memory overhead for storing the word in memory or any data type or encoding used. It is also important to note that the equation 2n-1 and n*(n-1) are not standard memory factors used to calculate the memory usage, they are custom equations.

Thoughts? For additional context. I am trying to understand/make sense of the following: https://github.com/nextapps-de/flexsearch#tokenizer-prefix-search

f (1 byte) fi (2 bytes) fir (3 bytes) fire (4 bytes) firef (5 bytes) firefo (6 bytes) firefox (7 bytes) irefox (6 bytes) refox (5 bytes) efox (4 bytes) fox (3 bytes) ox (2 bytes) x (1 byte)

schalkneethling/question.md

schalkneethling commented Jan 21, 2023 •

edited

Loading

schalkneethling commented Jan 22, 2023

schalkneethling/question.md

schalkneethling commented Jan 21, 2023 • edited Loading

schalkneethling commented Jan 22, 2023

schalkneethling commented Jan 21, 2023 •

edited

Loading