Binary matching is a powerful feature in Elixir that is useful for extracting information from binaries as well as pattern matching. This article serves as a short overview of the available options when pattern matching and demonstrates a few common usecases.
Binary matching can be used by itself to extract information from binaries:
iex> <<"Hello, ", place::binary>> = "Hello, World"
"Hello, World"
iex> place
"World"
Or as a part of function definitions to pattern match:
defmodule ImageTyper
@png_signature <<137::size(8), 80::size(8), 78::size(8), 71::size(8),
13::size(8), 10::size(8), 26::size(8), 10::size(8)>>
@jpg_signature <<255::size(8), 216::size(8)>>
def type(<<@png_signature, rest::binary>>), do: :png
def type(<<@jpg_signature, rest::binary>>), do: :jpg
def type(_), do :unknown
end
There are 9 types used in binary matching:
integer
float
bits
(alias for bitstring)
bitstring
binary
bytes
(alias for binary)
utf8
utf16
utf32
When no type is specified, the default is integer
.
The length of the match is equal to the unit
(a number of bits) times the size
(the number of repeated segnments of length unit
).
Type | Default Unit |
---|---|
integer |
1 bit |
float |
1 bit |
binary |
8 bits |
Sizes for types are a bit more nuanced. The default size for integers is 8.
For floats, it is 64. For floats, size * unit
must result in 32 or 64, corresponding to binary32 and binary64, respectively.
For binaries, the default is the size of the binary. Only the last binary in a binary match can use the default size. All others must have their size specified explicitly, even if the match is unambiguous.
For example:
iex> <<name::binary, " the ", species::binary>>= <<"Frank the Walrus">>
** (CompileError): a binary field without size is only allowed at the end of a binary pattern
iex> <<name::binary-size(5), " the ", species::binary>>= <<"Frank the Walrus">>
"Frank the Walrus"
iex> {name, species}
{"Frank", "Walrus"}
For floats, size * unit must result in 32 or 64, corresponding to binary32 and binary64, respectively.
Some types have associated modifiers to clear up ambiguity in byte representation. The following
Modifier | Relevant Type(s) |
---|---|
signed |
integer |
unsigned (default) |
integer |
little |
integer , utf16 , utf32 |
big (default) |
integer , utf16 , utf32 |
native |
integer , utf16 , utf32 |
Integers can be signed
or unsigned
, defaulting to unsigned
.
iex> <<int::integer>> = <<-100>>
<<156>>
iex> int
156
iex> <<int::integer-signed>> = <<-100>>
<<156>>
iex> int
-100
Elixir has three options for endianness: big
, little
, and native
. The default is big
. native
is determined by the VM at startup.
iex> <<number::little-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
256
iex> <<number::big-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
1
iex> <<number::native-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
256```