© 2016 Mantas Mikulėnas grawity@gmail.com
This documentation is released under CC BY 4.0.
OBML (Opera Binary Markup Language) files are self-contained, rendered versions of HTML documents generated by the Presto v2 engine. They are static, containing pixel-positioned regions adapted for a specific device's screen size & font metrics. (Thus OBML documents generated for one device tend to look slightly 'off' everywhere else, and a perfect rendering is impossible without knowing the original device.)
Over time, various OBML versions were used, and each Opera Mini version is only compatible with one OBML format, thus an upgrade might leave old saved pages unreadable. (The OBML version used can be seen by visiting debug:
.)
Most saved pages use OBML v12, v13, v15, or v16; I haven't investigated the format used by earlier "modded" Opera Mini versions which had this feature added unofficially.
OBML uses these primitive types:
- byte – unsigned integer (1 byte)
- short – unsigned integer (2 bytes, big-endian)
- medium – unsigned integer (3 bytes, big-endian)
- blob – consists of a short indicating the length, followed by that many bytes of data
- char – a byte containing an ASCII character
- string – a blob containing UTF-8 encoded text
Each OBML file has a "base URL", essentially a reusable prefix. When other URLs start with a null byte, it is to be replaced with the global prefix. For example, if the base is http://example.com/dir
and you have an URL value \x00/index.html
, it expands to http://example.com/dir/index.html
.
Colors are stored as ARGB tuples, with one byte (0–255) per component.
Coordinates are stored as a short for the X position (0–65535) followed by a medium for the Y position (0–16777215). The origin (0, 0) is in the top-left corner.
In format versions ≤ 13, "position" coordinates are absolute and stored as-is.
In format version ≥ 15, "position" coordinates are stored as relative to the last position coordinate. Relative coordinates wrap around, thus negative offsets are stored as very large positive coordinates. For example, (-4, -2) is stored as (0xFFFC, 0xFFFFFE).
(Note that positions are relative only to the last position – absolute coordinates like sizes do not affect the relative offset in any way.)
The file starts with a file_size: medium
followed by version: byte
.
In v≥15, file_size is always 0x02d355 and version is always 16; they're followed by a second identical header containing the real values. The reason for that is unknown.
Note that file_size only includes the bytes following it. It doesn't include the field's own size, nor the preceding fields.
Following is page_size: coords
.
In v16, following is unknown: bytes[3]
(always S\x00\x00
).
Following is unknown: short
(always 0xFFFF).
Following are page_title: string
, unknown: blob
, page_url_base: string
, and page_url: url
. The unknown blob seems to always start with C\x10\x10...
on v15, empty otherwise.
Following is an unknown header (6 bytes for v≥15, 5 bytes for v≤13).
- In v13, the format appears to be
counter: short
,unknown: medium
.
Following is the "metadata" section and the "content" section, both composed of tagged chunks.
This section consists of several chunks. Each chunk starts with type: char
(an ASCII letter), followed by variable amount of fields.
In v≥15, always contain byte[23]
of unknown data.
Always contain subtype: char
, unknown: byte
(always 0x00), data: blob
.
Secure connection (TLS) information. Contains byte[6]
, cert_expiry: string
, secure_status: string
, tls_details: string
, cert_common_name: string
.
Appear to contain links_size: medium
followed by a "links" sub-section of that size.
This section consists of chunks and appears to be a sub-section of the preceding 'S' chunk.
Always contain unknown: byte
and count: byte
, followed by that many (id: string, label: string)
pairs. Each of these chunks seems to store the <option>
choices for a HTML <select>
widget.
Most other chunks in this sub-section define 'regions' and share the same data format.
In v≥15 the format is box_count: byte
, followed by that many (pos: coords[rel], size: coords)
coordinate pairs, followed by link_target: blob
, unknown: byte[2]
(always \x01\x74
), link_type: blob
.
In v13 the format is box_count: byte
, followed by that many (pos: coords, size: coords)
coordinate pairs, followed by link_target: blob
, unknown: byte[2]
, link_type: blob
.
In v12 the format is box_count: byte
, followed by that many (pos: coords, size: coords)
pairs, followed by link_type: blob
, link_target: blob
.
link_type
, if non-empty, seems to be a string with the MIME type.
link_target
can be an url, a string, or an unknown blob.
Unknown region types. These follow the "region chunk" format.
In v≥15, an unknown region type. In v12, unknown: byte[24]
; likely to also be a region type but I haven't checked yet.
Image region (link_target: url
links to the original image). Note that this doesn't actually render an image, only define a link region for the original URL. The image itself is drawn by the content section.
Link region (link_target: url
is the link target). Note that this isn't directly associated with link text in any way; it merely defines the 'active' rectangle overlayed on top of the text.
URLs starting with b:
seem to be JavaScript links.
Link region similar to 'L' but containing a "platform" link (usually mailto:
).
Link region similar to 'L' but meant to trigger a file download dialog (for image "Save" buttons). The target URL is hosted by the Opera Mini proxy, and expires after some time.
Link region similar to 'w' but meant to open the target in platform's native web browser (for image "Open" buttons).
Define a filled rectangle (a "box"); used to draw background colors, borders, other lines (including even link underlines).
In v≥15, contain pos: coords[rel]
, size: coords
, fill: color
.
In v≤13, contain pos: coords
, size: coords
, fill: color
.
Form fields.
In v≥15, contain pos: coords[rel]
, size: coords
, foreground: color
, type: byte[2]
, field_id: string
, value: string
, byte[5]
.
In v≤13, contain pos: coords[rel]
, size: coords
, foreground: color
, type: byte[2]
, field_id: string
, value: string
, byte[3]
.
Types:
a
is a multi-line input box (textarea)c
is a checkboxr
is a radio buttonx
is a single-line input boxs
is a select drop-down
Image.
In v16, contain pos: coords[rel]
, size: coords
, fill: color
, file_addr: medium
, unknown: byte[11]
.
In v15, contain pos: coords[rel]
, size: coords
, fill: color
, unknown: byte[14]
.
In v≤13, contain pos: coords[rel]
, size: coords
, fill: color
, unknown: byte[3]
, file_addr: medium
.
fill is the image's average color, for use as placeholder when images are disabled/loading.
file_addr is the byte offset within the 'S'-chunk, relative to the end of data_size.
Unknown. Contain byte[9]
with unknown data.
Unknown. Contain byte[2]
, blob
with unknown data.
Not sure if an actual chunk, or just part of the preceding 'I'-chunk.
Contain blob
with unknown data.
Embedded images.
Contain data_size: medium
, followed by some number of file_data: blob
. (The blob count isn't given, so keep reading blobs until you've consumed at least data_size bytes.)
Each blob contains an image (PNG or JPEG) to be drawn in all 'I'-chunks whose file_addr matches the blob's offset relative to the end of data_size.
Text.
In v16, contain pos: coords[rel]
, size: coords
, foreground: color
, unknown: byte
, font: byte
, unknown_count: byte
, unknown_count × (byte, blob)
pairs, text: string
. (It seems that the unknown pairs define some sort of links.)
In v15, contain pos: coords[rel]
, size: coords
, foreground: color
, font: byte
, text: string
.
In v≤13, contain pos: coords
, size: coords
, foreground: color
, font: byte
, text: string
.
In font, the least-significant bit indicates bold text. With the 'bold' bit masked out, the remaining value indicates the font size:
0
– medium (approx. 11px)2
– large (approx. 12px)4
– extra large (approx. 13px)6
– small (approx. 10px)
The following CSS results in an acceptable rendering:
font-family: sans-serif;
line-height: 1.1;
white-space: pre;
Unknown. Rare. The only occurence seen contains byte[6]
.
Form buttons do not have special representation, they just consist of an image + text + link region, using special b:…
URLs.
Input fields and select dropdowns haven't been fully researched yet.