I am developing this format for Clasp to replace our previous FASL formats, which are largely not under our control (e.g. they are ELF shared objects). The direct impetus is the desire to allow our Lisp VM to be targeted by cl:compile-file
, but there are several broader design goals. Fundamentally, I would like to shoot for a real portable FASL format, that can be used for code interchange between different Lisp implementations running on different architectures. This is quite a ways off.
The format is vaguely based on Java classfiles, because the Java standards are some of the only standards for remotely dynamic languages I have found that are sufficiently clear, and because I think the Java "no crashing" design goal is something to aim for. The actual encoding of objects is very roughly based on conspack, which has similar speed and compactness goals as far as binary encoding goes.
In no particular order of priority.
- Standards-compliance: The format can be used to implement
cl:compile-file
andcl:load
correctly. - No native code: The FASL does not need to contain native machine code. This is both intended to improve portability (e.g. it's a ways towards FASLs being transferable between machines of different architecture) and speed, since machine code compilation is more time consuming than producing VM bytecode. (It might contain native code as an option later.)
- Portability: FASLs can be produced by any Lisp implementation, and ideally, loaded on any implementation as well. Besides generally improving interoperability, this will hopefully facilitate a cleaner build process for Clasp.
- No crashing: Generally speaking, the loader should not "crash", like by segfaulting or signaling an inscrutable error. If the FASL is syntactically (if not semantically) correct it should load, and if it's syntactically incorrect that can be detected and signaled properly.
- Scrutable errors: Generally speaking, if the loader hits a problem, it's reported sensibly.
- Versioning: The loader can detect FASLs in an obsolete (or unknown future) format. It can then load using the correct old format, or it can signal a version error, but it can't just crash, and it can't partially load the FASL before failing.
- Offline operations: It is possible to interrogate certain information about a FASL without loading it. For example, syntactic validity, version, and possibly more detailed information such as what packages need to be loaded with what symbols for the FASL to be semantically valid can be read. Other important operations include bundling, i.e. combining multiple FASLs into one big FASL with all of their effects for ease of distribution and compactness.
- Simple loading: FASLs can be loaded by a primitive system containing little besides the essential Lisp runtime. For example a loader may be written that does not use CLOS. (Nothing restricts a CLOS-based loader from being written, though, and my prototype loader is one.)
- Subsumption of implementation definitions: Implementation-specific information can be embedded in the FASL in such a way that ignorant implementations can still load the FASL. For example, source information could be included in some bespoke format, and on an implementation unfamiliar with the source information the load will complete correctly but without the source information. This is intended to facilitate the format's stability.
Efficiency is also a design goal, although it is less of a priority. There are several aspects of efficiency:
- FASLs are loaded quickly.
- FASLs are compiled quickly (although probably less quickly). Ideally more time is spent compiling the actual Lisp code than putting the FASL together.
- FASLs are compact, though not so compact that encoding or especially decoding takes very long.
A FASL consists of a header, a sequence of instructions, and a set of attributes. The header and instructions describe how to build a vector of objects I call the literals vector, and the instructions also lay out how to perform the FASL's side effects. The attributes are additional, more optional information.
The instructions are executed by the loader one after the other. While they could be considered to form a simple virtual machine, there is no control flow, so it would be pretty simplistic indeed. Each instruction is either a creator or an effect. A creator describes how to create some object that the loader places in the vector at a specified position. An effect describes how to mutate a previously created object in the literals vector, or is just some general side effect that must be performed. The serial, side effectual nature of the instructions is more or less required by how Lisp semantics work in general.
Once all instructions are executed, the literals vector can be garbage collected. Objects within the vector may persist of course.
A FASL is encoded as a sequence of 8-bit bytes. Compilers and loaders should be able to write to and read from byte streams as well as files.
The overall format is as follows. Here, u4
means 4 bytes, interpreted as a big-endian integer, and so on. All multi-byte encoding of integers is big-endian. There is no padding or alignment between fields.
((u4 magic)
(u2 major-version)
(u2 minor-version)
(u8 object-count)
(u8 instruction-count)
((vector u8) instructions)
(u4 attribute-count)
((vector u8) attributes))
The magic number is #x8d7498b1
. This was randomly chosen, since something clever like the ASCII for "FASL" or "LISP" or "CLSP" is more likely to cause conflicts with other file formats.
The current version is 0.4. When things are fully operational in Clasp, the version will be 1.0.
object-count
is how many objects will eventually fill the literals vector, and so this number determines the length of the literals vector.
There are instruction-count
instructions and attribute-count
attributes. The lengths of instructions and attributes are variable in general, so the instruction count alone does not determine the byte index where the attributes start, for example.
Bytes past the end of the attributes are not permitted.
Instructions and attributes refer to positions in the literals vector by numeric indices. In any encoded FASL, the length of an index in bytes depends on how large the literals vector is. If there are less than 256 objects, an index is one byte; between 256 and 65535 objects an index is two bytes; etc. Indices are indicated below by the index
type.
The encoding of an instruction consists of a one-byte opcode followed by instruction-specific data. Some instructions are variable-length. The length of an instruction is always apparent from just the byte stream, and not for example previous instructions or previously created objects.
Type: Creator
Opcode: #x41
, #x42
Syntax: ((index sind))
Operation: Place cl:nil
or cl:t
respectively into the literals vector at sind
.
Type: Creator
Opcode: #x43
Syntax: ((index sind) (index numerator) (index denominator))
Operation: Read the literals at numerator
and denominator
, which must be integers with a gcd of 1. Create a ratio with that numerator and denominator and place it at sind
.
Notes:
It is intended that a loader can use a primitive ratio creation function without having to actually carry out integer division.
Since it's probably mostly pointless to coalesce integers, it might be prudent to have ratio opcodes that inline the numerator and denominator.
Type: Creator
Opcode: #x44
Syntax: ((index sind) (index realpart) (index imagpart))
Operation: Read the literals at realpart
and imagpart
. These must be reals, and if realpart
is rational than imagpart
must not be zero. Create a complex with that realpart and imagpart and place it at sind
.
Notes:
It is intended that a loader can use a primitive complex creation function without having to canonicalize.
Since it's probably mostly pointless to coalesce numbers, it might be prudent to have complex opcodes that inline the realpart and imagpart.
Type: Creator
Opcode: #x45
Syntax: ((index sind))
Operation: Create a cons with undefined car
and cdr
and place it at sind
.
Notes:
- For safety it might be better to specify
car
andcdr
ofnil
. - Currently this is the only instruction to create a list. Something like
cons
orlist*
is necessary in general in order for circular lists to work correctly, but for the more common case it might be more convenient and compact to have alist
instruction.
Type: Effect
Opcode: #46
, #47
respectively
Syntax: ((index consind) (index valueind))
Operation: Read the literal at consind
, which must be a cons. Replace the car
or cdr
respectively with the literal at valueind
. Only one rplaca
instruction and one rplacd
instruction may refer to a given cons.
Type: Creator
Opcode: #4a
Syntax: ((index sind) (u8 uaet-code) (u8 packing-code) (u8 rank) ((vector u2) dimensions) ((vector u8) data))
Operation:
Creates an array and places it at sind
. The array's rank can be up to 256, and the rank is followed by however many dimensions. Vectors are not specially indicated.
The packing determines the format of the data. The packing is independent of the upgraded array element type, so for example an array with element type T
that happens to only contain bits may be efficiently packed.
The upgraded array element type and packing share the same code (see Notes):
#b00000000
: element typenil
. Data is zero length.#b10000000
:base-char
. Data contains one byte per array element. Each byte is the Latin-1 encoding of a character.#b11000000
:character
. Data contains four bytes per array element, the UTF-32 encoding of characters. (See Notes)#b10100000
:short-float
. Data contains two bytes per array element. Each two bytes are the IEEE754 binary16 encoding of a short float. (See Notes)#b00100000
:single-float
. Ditto, but four bytes, the binary32 encoding.#b01100000
:double-float
. Ditto, but eight bytes, the binary64 encoding.#b11100000
:long-float
. Not yet specified (see Notes).#b10110000
:(complex short-float)
. Data contains four bytes per array element. The first two bytes are the IEEE754 binary16 encoding of the realpart and the next two the binary16 encoding of the imagpart.#b00110000
:(complex single-float)
. Ditto, eight bytes, binary32.#b01110000
:(complex double-float)
. Ditto, sixteen bytes, binary64.#b11110000
:(complex long-float)
. Not yet specified (see Notes).#b00000001
:bit
. Data contains one byte for every eight elements, rounded up. Elements are stored big endian, i.e. the first array element is the most significant bit of the first byte, the next element is the next most significant bit of the first byte, etc.#b00000010
:(unsigned-byte 2)
. Ditto but two bits per element.#b00000011
:(unsigned-byte 4)
. Ditto but four bits per element.#b00000100
:(unsigned-byte 8)
. One byte per element.#b00000101
:(unsigned-byte 16)
. Two bytes per element.#b00000110
:(unsigned-byte 32)
. Four bytes per element.#b00000111
:(unsigned-byte 64)
. Eight bytes per element.#b10000100
:(signed-byte 8)
. Data is stored in two's complement, one byte per element.#b10000101
:(signed-byte 16)
. Ditto, two bytes.#b10000110
:(signed-byte 32)
. Ditto, four bytes.#b10000111
:(signed-byte 64)
. Ditto, eight bytes.#b11111111
:t
. Data is zero length. Array elements are instead recorded with(setf row-major-aref)
instructions; before such an instruction defines an element, that element is undefined.
Notes:
This instruction needs the most work.
Character strings should be stored as UTF-8 for compactness. The only reason I haven't implemented this yet is laziness.
Different implementations can have different sets of upgraded array element types, so it's probably impossible to actually encode them with a byte. It might be better to encode the UAET as a normal literal, and then have the loader call upgraded-array-element-type
on it to ensure things work in the loader's implementation. This might be a bit slower, but it's probably negligible compared to actually reading the array data.
Different implementations may have different ideas of short, single, etc floats that don't slot neatly into the IEEE754 standard. For example if they have a single float that's forty bits, they would have to be encoded as double floats, but then the loader would make them as double floats instead of singles. But, I think this would be pretty perverse, and making it so that the FASL format just doesn't support non-IEEE754 implementations is fine.
Long floats are a bit of another story though. On Clisp at least you can have multi-precision floats, and it might be nice to have this as a general capability somehow. But it might also be nice to support binary128 floats.
Dimensions greater than 65535 should probably be allowed. Four bytes might be fine, eight bytes would definitely be fine. The rank only goes up to 256 because while implementations may technically support highly multidimensional arrays, they're not very practically usable or used.
T
arrays are stored via a mutating instruction in order to support arrays that contain themselves. However, this is pretty rare, and it would probably make sense to have a packing type that's just a sequence of indices, to save a bit of space and complexity in the common case.
This is not required by the standard, but it might be nice to allow encoding of complex arrays.
Type: Effect
Opcode: #x4b
Syntax: ((index sind) (u2 rmindex) (index valueind))
Operation: Read the literal at sind
, which must be a T
array. Replace the rmindex
-th row major element of the array with the value at valueind
. A FASL may only define a given array element once.
Notes: The array index should definitely be larger.
Type: Creator
Opcode: #x4c
Syntax: ((index sind) (u8 test) (u16 count))
Operation:
Create an empty hash table and place it at sind
.
The hash table test is encoded as follows:
#b00
:eq
#b01
:eql
#b10
:equal
#b11
:equalp
The count is the count of key-value pair sin the hash table. This can be used as a sizing hint so that the hash table does not need to be resized during loading.
Notes:
It might be nice to support custom hash table tests, for example by having another index that's the literal for the test. But I don't know how portably useful that would be.
As part of the running theme, the elements of the hash table are set via side effects in order to support hash tables that contain themselves, but this is a rare case and there should probably be an inline encoding for the common case.
The count presents a bit of an issue for loaders since it specifies the actual count, but cl:make-hash-table
the function accepts a capacity or size, which is larger than the count. A portable loader cannot determine exactly what the size should be to avoid rehashing.
It might be nice to encode the rehash size and rehash threshold, though they are just hints.
Type: Effect
Opcode: #x4d
Syntax: ((index hash-table) (index key) (index value))
Operation: Set the key to be the value in the hash table.
Type: Creator
Opcode: #x4e
Syntax: ((index sind) (u8 sb64))
Operation: Create a signed integer from the given eight bytes and place it at sind
.
Notes: This is intended to be able to create fixnums (and small bignums) but without relying on the implementation-defined fixnum size.
Type: Creator
Opcode: #x4f
Syntax: ((index sind) (index namei))
Operation: Read the literal at namei
, which must be a string. Look up the package by that name and place it at sind
.
Notes: The standard states that is the responsibility of the programmer to ensure that the package at compile time matches the package at load time, so there's no way to actually specify a package in a FASL. It might be interesting to provide this as an extension, so that incompatibility between compiler and loader can be detected.
Type: Creator
Opcode: #x50
Syntax: ((index sind) (u8 nwords) ((vector u8) words))
Operation: Treat nwords
as a signed integer. There are (abs nwords)
words in the instruction; read them to form an integer (big-endian). (abs nwords)
must be greater than one. If nwords
is negative, negate the integer. The resulting integer is placed at sind
.
Notes:
The sign and magnitude encoding seems to be the usual implementation of bignums. The "signed size" convention is taken from GMP.
The prohibition of "small bignums" is so that an integer can only be encoded one way - either as an sb64 if it has at most 64 bits, or else as a "bignum". A loader for an implementation with fixnums of more than 64 bits will have to check for that in this instruction.
Type: Creator
Opcode: #x51
Syntax: ((index sind) (index nameind))
Operation: Read the literal at nameind
, which must be a string. Create an uninterned symbol of that name and place it at sind
.
Type: Creator
Opcode: #x52
Syntax: ((index sind) (index packageind) (index nameind))
Operation: Read the literals at packageind
, which must be a package, and nameind
, which must be a string. Intern a symbol of that name in that package and place it at sind
.
Notes:
For compactness, it might be worthwhile to define special opcodes for defining symbols in the cl:
and keyword:
packages.
The standard does not really contemplate that some symbols must be interned before loading. For example, if I write a library that uses the alexandria:factorial
function, and that library's FASL is loaded into an image that contains an alexandria
package that does not contain that symbol, my library will not work, but this will not become apparent until things reach that point, because the loader will intern
the symbol rather than just find-symbol
-ing it. It might be interesting to provide a find-symbol
operation as an extension.
Type: Creator
Opcode: #x53
Syntax: ((index sind) (u4 code))
Operation: Place the character with the code attribute (in the CL sense) of code
at sind
.
Notes: In current Clasp FASLs, code is actually a u8, but this seems like a bit much considering that even unicode is only 21 bits at the moment.
Type: Creator
Opcode: #x55
Syntax: ((index sind) (index hosti) (index devicei) (index directori) (index namei) (index typei) (index versioni))
Operation: Create a pathname with the given components and place it at sind
.
Type: Creator
Opcode: #x57
Syntax: ((index sind) (u4 entry) (u2 nlocals) (u2 nclosed) (index modulei) (index namei) (index lambda-listi) (index docstringi))
Operation: Create a bytecode function and place it at sind
. The module is at modulei
, and the function starts at byte index entry
within that module.
Notes:
The name, lambda-list, and docstring basically only exist for documentation purposes. It might be good to make them optional via standard attributes.
It would be interesting to allow dumping closures as an extension to the standard. As-is, nclosed
just indicates how many closure cells the function needs, and these cells are expected to be filled by the bytecode at some point, rather than by the loader.
Type: Creator
Opcode: #x58
Syntax: ((index sind) (u4 bytecode-len) ((vector u8) bytecode))
Operation: Create a bytecode module with the given bytecode and place it at sind
.
Notes:
The intersection of modules with FASLs is something I've struggled with. Originally I thought there would be one bytecode module for the whole FASL, shared by all functions. This turned out to be somewhat difficult (but probably still possible) to arrange, because dumping a constant may entail creating new bytecode functions via make-load-form
. Things got messy.
So now a FASL can have and probably does have multiple modules, and the literals vector for a given module is distinct from the FASL literals vector (and in fact is itself an entry in said FASL vector). This can mean some extra space taken up due to different modules having different vectors with the same elements (although the elements themselves are still shared). But there is an advantage in that it means different modules and their literal vectors can be garbage collected separately, whereas if all functions/modules share the FASL literals vector, all constants in the literals vector will stay alive as long as any function from the FASL is still alive.
Another possible advantage is making it easier to specify native code functions eventually. Native code blobs are more difficult to re-link if the numbering of their literals changes, as it might when multiple modules/blobs are being linked together. With separate vectors this is not a problem.
Type: Effect
Opcode: #x59
Syntax: ((index moduleind) (index vectorind))
Operation: Read the literal at vectorind
, which must be a T vector. Set it as the literals vector of the module at moduleind
. This is done exactly once for each module.
Notes: This instruction exists because of an intrinsic circularity with modules. Their literals vector can contain bytecode functions, which in turn need the module to exist. So the usual sequence of instructions for creating bytecode functions is make-bytecode-module; make-bytecode-function; ... setf-literals
.
Type: Creator
Opcode: #x5a
Syntax: ((index sind) (u4 float))
Operation: Read float
as an IEEE754 binary32 float, make a single-float out of that, and place it at sind
.
Notes: Similar portability considerations to those described in the notes for make-array
above apply here.
Type: Creator
Opcode: #x5b
Syntax: ((index sind) (u8 float))
Operation: Read float
as an IEEE754 binary64 float, make a double-float out of that, and place it at sind
.
Notes: Similar portability considerations to those described in the notes for make-array
above apply here.
Type: Creator
Opcode: #x5d
Syntax: ((index sind) (index fun))
Operation: Read the literal at fun
, which must be a function. Call it with no arguments, and place the result at sind
.
Notes: This can be used to implement the creation form of make-load-form
as well as load-time-value
.
Type: Effect
Opcode: #x5e
Syntax: ((index fun))
Operation: Read the literal at fun
, which must be a fun. Call it with no arguments. Discard any return values.
Notes: This can be used to implement general top level forms, as well as the initialization form of make-load-form
.
Each attribute has the same general syntax:
((index attribute-name)
(u4 nbytes)
((vector u8) data))
The attribute-name
is the index (into the usual literals vector) of a string. This is a string instead of a symbol so that attributes can be understood regardless of the state of the package system. The attribute name is used by the loader to determine the meaning of the attribute.
nbytes
is the length in bytes of the data. This is in place so that a loader that does not know about a particular kind of attribute can simply skip over it.
Currently there are no standard attributes. For illustration purposes, here is one from Clasp:
Syntax: ((index funi) (index pathnamei) (u8 lineno) (u8 column) (u8 filepos))
Meaning: The function at funi
is marked as having the given source position info.
Currently I have a prototype implementation of the compiler, loader, and some offline operations in Clasp. It is mostly but not entirely portable. Soon I will write a C++ version of the loader for Clasp's purposes. After that I would like to make a really portable version of the compiler so that Clasp code can be built in other implementations. From there I can think about making the loader portable, but that will require making the bytecode VM portable again as well.
I put attributes after the instructions to mimic Java, but given that we're calling functions and otherwise using objects as we produce them, it might be better to make attributes a kind of instruction that are executed in sequence with the other instructions.
Currently the file compiler performs very inefficiently at complex objects dumped via make-load-form
, because it compiles every single creation and initialization form as their own functions. We ran into this problem with the earlier Clasp FASL formats, and the basic solution is to add a few more instructions for common cases of make-load-form
:
find-class
: Given a class name, look up its class. This is used for the standard dumping ofclass
objects.allocate-instance
: Given a class, allocate an instance. This is used for the creation form ofmake-load-form-saving-slots
.funcall
: Likefuncall-initialize
, but takes a fixed number of literal arguments. This is used for the most common kinds of initialization and creation forms, but in general requires the file compiler to recognize certain "easy" kinds of Lisp forms that it can deal with without calling the Lisp compiler.initialize-slots
: Given a standard object and a plist of slot names with values, set the slots in the object. This is used for the initialization form ofmake-load-form-saving-slots
(which doesn't useinitialize-instance
, hence the distinct name). This would require the file compiler to recognize the forms used in different implementations'make-load-form-saving-slots
implementations. It might also be possible to usestandard-instance-access
rather than slot names, but this would be less safe.
Another important set of convenience instructions would be for non-circular aggregate objects, as mentioned in the notes to instructions above.
Another possibility is to include instructions for common defining operators such as defun
or defparameter
. These could be executed directly by the loader. Currently all nontrivial top level forms must end up as funcall-initialize
instructions; this means that the Lisp compiler has to do some pretty stereotypical work repeatedly, and the loader must produce functions that are only called once. (Plus, for defun
, the (setf (fdefinition ...) ...)
kinds of forms are compiled into functions that are only called once but remain in memory indefinitely as they are referenced from the same module as the function actually defined.)
As described, the format makes all objects in one literals vector. This includes subobjects. For example, if a literal (:foo :bar :baz)
appears in the code, there will be entries in the literals vector for all of (nil "BAZ" "KEYWORD" #<package keyword> :baz #1=(:baz) "BAR" :bar #2=(:bar . #1#) "FOO" :foo (:foo . #2#))
. While the literals vector is discarded afterwards, this may take up a nontrivial amount of memory during loading. It might be worth it to complicate the format with a stack machine so that objects can be created and discarded during loading rather than only discarded after loading. Thi would be enough of a change that it should not be done unless memory use turns out to be a real problem.
This FASL format is very side-effect-based because that is how the semantics of Common Lisp are defined. However, practically speaking, many if not most Common Lisp source files could be recorded in a declarative fashion, e.g. if they consist entirely of defining forms like defparameter
, defun
, defmacro
, defclass
. A declarative FASL format would be easier to interrogate offline, and could potentially be loaded in parallel.
Various numeric limits described above are quite arbitrary and could be reevaluated. For example it seems a bit unlikely that a FASL could need more than four bytes as an object count - that would mean one FASL with over four billion object. But on the other hand, taking up an extra four bytes isn't that big a deal.
The opcodes are currently very arbitrary and not structured very well (in fact they are vaguely based on the opcodes in Clasp's existing FASL machinery, which for reasons I'm not clear on are valid ASCII characters). Conspack structures opcodes for clarity and this might be worth mimicking.
Comments from jackdaniel (reproduced from his website with permission):
some more on irc as well, which i will paste in here as soon as the logs come back up