Skip to content

Instantly share code, notes, and snippets.

@joshuawscott
Created March 7, 2018 21:07
Show Gist options
  • Save joshuawscott/758b407adfc141310b3993401cc41f2e to your computer and use it in GitHub Desktop.
Save joshuawscott/758b407adfc141310b3993401cc41f2e to your computer and use it in GitHub Desktop.
LZF Format

LZF Format

LZF is made of 1 or more "Chunks" that are simply concatenated to each other.

Chunk Layout

   "Z"   |  "V"   | Type   |compressed length| original length | payload
+--------+--------+--------+--------+--------+--------+--------+=================
|01011010|01010110|0000000C|LLLLLLLL|LLLLLLLL|BBBBBBBB|BBBBBBBB| L bytes of data
+--------+--------+--------+--------+--------+--------+--------+=================
  • C is a flag: 0 = chunk is not compressed, 1 = chunk is compressed.
  • L is a uint16 - the length of the chunk after the header.
  • B is a uint16 (only present if compressed) - the original uncompressed length.

if C == 0, then we simply add the payload to the end of the output buffer.

Otherwise, the payload is made up of multiple segments. The segments are concatenated together.

Each segment has a 1-3 byte header. The first 3 bits of the segment header determine the type. There are 3 segment types:

  • Literal Run
  • Short Backreference
  • Long Backreference

Literal Run

| Header | payload
+--------+=========================
|000LLLLL| RunLength bytes of data
+--------+=========================
  • L is a uint5
  • RunLength = L + 1

Short Backreference - allows backreferences from 3 to 8 bytes in length.

+--------+--------+
|RRRXXXXX|XXXXXXXX|
+--------+--------+
  • R is a uint3 between 1 and 6 (0 is used for literal run, 7 is used for long backreference)
  • X is a uint13
  • Offset = -(X + 1)
  • RunLength = R + 2

Long Backreference - allows backreferences from 9 to 264 bytes in length

+--------+--------+--------+
|111XXXXX|RRRRRRRR|XXXXXXXX|
+--------+--------+--------+
  • X is a uint13
  • R is a uint8
  • Offset = -(X + 1)
  • RunLength = R + 9

Putting it all together

Backreferences are decoded by looking at the current output buffer. Move Offset bytes back from the end, This is the start. take RunLength bytes. If there are not enough bytes, cycle through the available bytes until you have accumulated RunLength bytes. Add this to the end of the output buffer.

Example

beginning output buffer = "123abc"

Backreference is a short backreference, and it says Offset = -3, and RunLength = 7. Moving back 3 bytes points us to a, so we begin taking bytes from there. After taking 3 bytes, we hit the end of the output buffer.

Repeat the above until we have 7 bytes. In this case, we would get "abcabca" to add to the end of the output buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment