Skip to content

Instantly share code, notes, and snippets.

@bmarwell
Last active May 20, 2019 08:44
Show Gist options
  • Save bmarwell/18b57655e0c0c8a5a38d6cdf487866e4 to your computer and use it in GitHub Desktop.
Save bmarwell/18b57655e0c0c8a5a38d6cdf487866e4 to your computer and use it in GitHub Desktop.
zchunk splitter proposal
yaml:
extensions:
- ".yaml"
- ".yml"
split:
type: string-before
separators:
- "\0-"
- "\0[a-zA-Z]"
min_chunk_size: 10240
max_chunk_size: 1048576
fedora-metadata:
extensions:
- "-comps-Everything.x86_64.xml"
split:
type: string-before
separators:
- "<group>"
# the min chunk size should not be too big to capture small groups.
# merging groups can result in never getting the same hashes for chunks.
min_chunk_size: 1024
# set this high enough that bigger groups can easily fit into a chunk.
max_chunk_size: 102400
sqlite3:
file-magic:
- "SQLite format 3"
# min: 1 KiB, max: 10 KiB.
min_chunk_size: 10240
max_chunk_size: 102400
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment