Status: Draft 1 In Progress. This document is undergoing its first revision. Initial implementation has begun alongside editing Draft 1. Your feedback is hoped and dreamed of.
Mathematical Structured Object Notation is a JSON-based representation
for most of the common subset of what LaTeX and Presentation MathML can
represent, but distilled down to the essential content and structure.
It can also represent diffs between two formulas in a similar fashion to
ottypes/text and ottypes/rich-text, and the associated cursor
positions are a better fit for editing math than the DOM Range
s
associated with MathML.
Okay.
// the quadratic formula
> MathSON.fromLatex('x = \\frac{ -b \\pm \\sqrt{ b^2 - 4ac } }{ 2a }').ops
['x=',
{ numer: ['-b±', { sqrt: ['b', { sup: ['2'] }, '-4ac'] }],
denom: ['2a'] }]
// diff: simplifying the derivative of square root
// \frac{1}{2} x^{ - \frac{1}{2} }
> var a = MathSON([{numer: ['1'], denom: ['2']}, 'x', {sup: ['-', {numer: ['1'], denom: ['2']}]}]);
// \frac{ 1 }{ 2 \sqrt{x} }
> var b = MathSON([{numer: ['1'], denom: ['2', {sqrt: ['x']}]}]);
> a.diff(b).ops
[{ _denom: [1, { sqrt: ['x'] }] },
{ delete_: ['x', { sup: ['-', { numer: ['1'], denom: ['2'] }] }] }]
When editing the quadratic formula, '2.numer.3.sqrt.5'
represents a cursor in this position:
Currently, the viable Web formats for math formulae are:
- AMS-LaTeX subset (as rendered by MathJax and KaTeX) is a compromise between human- and machine-readability, plus legacy/compat concerns.
- MathML is a compromise between people who like XML and people who are reasonable...who am I kidding, MathML is even further unnecessarily complex than just due to XML (see "What do you have against MathML?")
- obscure formats like AsciiMath that optimize even more for human-readability than TeX & friends, at the expense of machine-readability and simplicity
MathSON is intended to fill the niche of being easily parseable (unlike TeX) into a tree structure that's easy to understand and use (unlike MathML).
My primary motivation is MathQuill, my formula editor whose API uses an
AMS-LaTeX subset to represent math. It's impractical to use this API to
implement stuff like what typing a slash /
does in MathQuill, which is to
scan backwards until a +
or similar and move the group into the numerator of
a fraction (so typing 1+1/x
yields 1+\frac{1}{x}
). You'd need to parse the
LaTeX into an AST, scan & modify the AST, then serialize the AST back to LaTeX.
Which sucks, because MathQuill already has a perfectly good (well, not
perfectly, but still) internal AST that it parses the LaTeX to, so that'd be
so much wasted, duplicated parsing & serialization. (More on why MathQuill
needs MathSON)
Completely unintentionally, this format turns out to be surprisingly useful for
for accessibility: for most math its tree structure is isomorphic to the
corresponding MathSpeak, the speech protocol used by mathematician Abe Nemeth
(inventor of the widely-used Nemeth Braille Code for Math). Notably, whereas
MathML has lots of extraneous information that'd be ignored when converting to
MathSpeak (like <mo>
vs <mi>
vs <mn>
), that's all implicit in MathSON just
like in MathSpeak; in other words, virtually anything explicit in MathSON is also
explicit in MathSpeak. Even the cursor positions represent closely what a screen
reader would read when navigating an editing interface for MathSpeak.
(It wasn't until we started work on just such an accessible math editing interface that I noticed this.)
This suggests that MathSON does in fact perfectly extract the essential content and structure of any given math.
> MathSON.fromLatex('x = \\frac{ -b \\pm \\sqrt{ b^2 - 4ac } }{ 2a }')
['x=',
{ numer: ['-b±', { sqrt: ['b', { sup: ['2'] }, '-4ac'] }],
denom: ['2a'] }]
> MathSON.fromLatex('\\frac{ \\sin x }{ x }')
[{ numer: [{ inline_op: ['sin'] }, 'x'], denom: ['x'] }]
> MahtSON.fromLatex('\\sin\\left( \\frac{1}{x} \\right)')
[{inline_op: ['sin']}, {$left: '(', group: [{numer: ['1'], denom: ['x']}], $right: ')'}]
// KaTeX's homepage example:
> MathSON.fromLatex('f(x) = \\int_{-\\infty}^\\infty \\hat f(\\xi) e^{2 \\pi i \\xi x} d\\xi')
[
"f(x)=∫",
{ "sub": ["-∞"], "sup": ["∞"] },
{ "hat": ["f"] },
"(ξ)e",
{ "sup": ["2πiξx"] },
"dξ"
]
Notes:
- The top-level MathSON object is always an array. Arrays represent snippets of math known as "blocks". Arrays contain strings which represent math symbols, and objects which represent "commands" i.e. complex math notation like fractions and paren groups.
- Command objects' keys will usually be letters-only (not even
_
allowed), in which case the value must be a math block (an array of strings and objects, as described above; in the future we may allow arrays of arrays to support LaTeX'scases
andmatrix
). - Command objects' keys can also be any string starting with a dollar
sign
$
, these can have any JSON value (these are "attributes" rather than "content", basically). - What do the special
inline_*
keys do? They're for blocks of math that don't have a boundary or border that the cursor has to cross. At the edge of a normal block of math, like a square root or paren group, the cursor can cross between inside and outside; but at the left edge ofsin
, there's no inside or outside, thes
,i
, andn
are "inline" in the containing block. At most one is ever allowed, and if present, all other keys must start with$
(i.e. be "attribute" keys, so that cursor position can make sense, see below). - This just uses Unicode rather than TeX and friends' backslash names
for fancy math symbols, which strikes me as both simpler (it's just
text!) and better specced (Unicode is a mess but at least has a
standards body, nobody likes hunting through Plain TeX, LaTeX,
AMS-LaTeX, nonstandard MathJax commands and more for the right
backslash name for every symbol).
- We do still have to restrict to a subset of Unicode and ban stuff like Unicode subscripts and superscripts.
- For ASCII-only environments, built-into JSON is a Unicode escape
sequence (e.g.
\u2264
for≤
).
- Uniqueness: any given MathSON value is represented by exactly one
JSON value. The serialization of a JSON value isn't unique, of course
(e.g. whitespace insensitivity), but this means that deep comparison
of JSON values tells you all you need to know about MathSON.
For MathML, what if two trees are equivalent except for an
id
attribute? What about differentlspace
orminsize
/maxsize
attributes? Who knows?
// \frac{1}{2} x^{ - \frac{1}{2} }
var a = MathSON([{numer: ['1'], denom: ['2']}, 'x', {sup: ['-', {numer: ['1'], denom: ['2']}]}]);
// \frac{ 1 }{ 2 \sqrt{x} }
var b = MathSON([{numer: ['1'], denom: ['2', {sqrt: ['x']}]}]);
a.diff(b) // => [{_denom: [1, {sqrt: ['x']}]}, {delete_: ['x', {sup: ['-', {numer: ['1'],
// denom: ['2']}]}]}]
Note that just like in ottypes/text, an insert of a piece of MathSON is
represented by "itself", a retain/skip is represented by a raw number (different
from a numeral string), and a delete is represented by an object with a special
key (but it's an invertible delete because, c'mon, diffs should be invertible).
One new thing is a syntax to mutate an existing thing, using keys prefixed with
_
or insert_
or delete_
:
// \frac{1}{2} + \frac{1}{2} + x_1 + x_1 + x_2
var c = MathSON([{numer: ['1'], denom: ['2']}, '+', {numer: ['1'], denom: ['2']}, '+x', {sub: ['1']},
'+x', {sub: ['1']}, '+x', {sub: ['2']}]);
// \frac{x}{y}\frac{1}{2} + \frac{x1}{y2} + x_1^2 + x^2{}_1 + x^2
// (the second-to-last one can be typed in MathQuill by typing x^2 y_1 and backspacing the y)
var d = MathSON([{numer: ['x'], denom: ['y']}, {numer: ['1'], denom: ['2']}, '+', {numer: ['x1'], denom: ['y2']},
'+x', {sub: ['1'], sup: ['2']}, '+x', {sup: ['2']}, {sub: ['1']}, '+x', {sup: ['2']}]);
c.diff(d) // => [{numer: ['x'], denom: ['y']}, 2, {_numer: ['x'], _denom: ['y']}, 2, {insert_sup: ['2']},
// 2, {sup: ['2']}, 3, {delete_sub: ['2'], insert_sup: ['2']}]
(Alternative: ottypes/text is deliberately noninvertible, we could do the same to slightly simplify our syntax:
a.diff(b) // => [{_denom: [1, {sqrt: ['x']}]}, {_delete_: 2}]
c.diff(d) // => [{numer: ['x'], denom: ['y']}, 2, {_numer: ['x'], _denom: ['y']}, 2,
// {insert_sup: ['2']}, 2, {sup: ['2']}, 3, {_delete_: 'sub', insert_sup: ['2']}]
)
A cursor position is just a sequence of indicies and keys (typically alternating,
but cases
and matrix
may change that), always starting and ending with an index.
For example, consider:
[{numer: ['1'], denom: ['2']}, 'x', {sup: ['-', {numer: ['1'], denom: ['2']}]}]
To get to the cursor position, we start in the root block, go to its 2nd item
(0-indexed) which is the superscript, go into its sup
block, go to its 1st item
which is the fraction, go into its denominator, and go to slice index 1 (slicing
from index 0 would slice from before the 2
). In JavaScript this could be
mathObj[2].sup[1].denom[1]
; for simplicity, in MathSON this is represented by
the string '2.sup.1.denom.1'
.
Note that these indices aren't quite array indices, since strings can span a range
of indices. Consider:
['ax', {sup: ['2']}, '+by', {sub: ['2']}]
The cursor is in the middle of the string '+by'
, which is the 2nd item in the
array, but in MathSON there are cursor positions between adjacent symbols, so the
cursor is at index 4.
There is one special case, inline_*
blocks. Whereas normal commands only count
for one index increment, inline_*
blocks are like strings, they can span a range
of indices. Consider:
[{numer: [{inline_op: ['sin']}, 'x+', {inline_op: ['cos']}, 'x'],
denom: ['x']}]
In this case, the cursor position is '0.numer.7'
, there isn't a step in the
cursor position where we go "into" the cos
. This makes sense if you consider what
happens between the cos
and the x
: there is no going "into" or coming "out of"
the cos
, from the cursor's perspective the c
, o
, and s
are at the same
"level" as the x
.
This is also why inline_*
commands may only have "attributes" but no "content" child
blocks. If the cos
block had a child: ['y']
, what would be the cursor position
of a cursor next to the y
? The c
is index 5, the o
is index 6, the s
is
index 7, but what index is the command with a .child
?
Nothing about diffs or cursor positions is math-specific. We could use this for rich text:
['This sentence has both ', {inline_text: ['bold'], $bold: true}, ' and ',
{inline_text: ['italic'], $italic: true}, ' words.']
and the diff and cursor position definitions (and hopefully, implementations) would work equally well.
I'm not sure what to call that—MathSON Level 0, Base MathSON, Core MathSON, DocSON, EditSON, EdSON—but I think it's very important. MathQuill's edit tree and associated cursor and selection model was originally designed by Jeanine and then haphazardly evolved by me basically just to generalize the cases we thought of at the time (fractions, square roots, and paren groups I guess—we also had supsubs but the tree model already didn't generalize well to them—hence the double-layered tree where blocks have a variable number of commands, each with some fixed number of blocks).
We didn't, couldn't, and can't think about all the other math notation supported by TeX and friends that we want to eventually support. There will be continuing work to add commands to MathQuill, and that needs to be possible without having to change the underlying tree and cursor model that everything else relies on. In fact, a safe tree and cursor model opens up entirely new API possibilities, since the lack of safety is a key reason that MathQuill's tree and cursor are super hidden away from the API ([For MathQuill, this is more than just a notation.] (#for-mathquill-this-is-more-than-just-a-notation)).
By the by, this is why inline_*
needs to be a block (array) and not just a
string, even though the only immediate use-case is operator names like sin
whose contents are only ever strings. MathSON Level 0 shouldn't know about only
being strings, it only knows about cursor position semantics. And, I can totally
imagine use cases that aren't just strings, like exotic sup/sub,
or like if in the rich text example above, a bold region of text had some math
in it:
{$bold: true, inline_text: ['7⋅10', {sup: ['2']}, ' weight bold']}
Separate from MathSON Level 0, there of course needs to be a spec for MathSON
Level 1 listing the kinds of commands accepted, {numer, denom}
for fractions,
{$left, group, $right}
for paren/bracket/brace groups, {sup, sub}
, etc.
First of all, syntax matters. Syntax is a UI, and shapes every interaction that people have with something.
Secondly, syntax isn't even that big a part of MathSON Level 0, as described. I've talked enough about semantics that it's more like XML + DOM (including DOM Ranges, kinda analogous to cursor positions).
Thirdly, by relegating e.g. Unicode escaping and most well-formedness concerns (matching braces etc) to the "lower level" JSON spec, JSON + MathSON Level 0 are better organized than XML which deals with all of that in one monolithic spec.
Finally, you know what's crazy? Even with all that, JSON + MathSON Level 0 combined is still simpler than XML alone. There are no intricate whitespace semantics, no Text vs CharacterData vs Comments vs Processing Instructions, no custom character entity references, no self-closing tags. Hell, the only consideration we have to make that XML doesn't (that isn't because XML is missing a feature we need), as far as I can think of, is that JSON inherited JS's UTF-16 surrogate pairs for "astral plane" Unicode characters, and I dunno how our indicies should treat those.
(See also "Wait so, what do you have against XML?")
- should commands have a
type
? Seems unnecessary to me - full words (
numerator
,subscript
) or abbreviations (numer
,sub
)? - should the format be even more minimal? Currently arrays are required in
more places than are strictly necessary for Level 0 to be unambiguous, for
example one-half (1/2) is
[{numer: ['1'], denom: ['2']}]
when it could be{numer: '1', denom: '2'}
instead. I prefer arrays because I think it makes it clearer why the cursor position rightward of the2
is0.denom.1
, for instance, whereas without the outer array making the root block explicit it seems like it should just bedenom.1
or something - should there be a "noncanonical" variant where prohibited Unicode characters
like subscript and superscript characters are allowed, and a canonicalization
that'll convert them into "proper"
{sub: ...}
objects? (Folding them into nearby ones as necessary)- what about the goddamned Mathematical Alphanumeric Symbols? Bold, italic, serif/sans-serif/monospace clearly need to be canonicalized as a font style thing, but what about calligraphic, fraktur, and double-struck? Do we have to use a different font? MathQuill doesn't; then again, MathQuill's font, Symbola, doesn't support that full range, only the subset that's actually in the Letterlike Symbols block.
- the "noncanonical" variant could also feature unmerged consecutive strings
(i.e.
canonicalize(['ab', 'cd']) => ['abcd']
)
This is a way of life.
Really though, I'm so excited about this as an API to manipulate MathQuill's tree structure, even by internal code. MathQuill's internal tree manipulation API is so prone to becoming ill-formed if you sneeze at it that there are 750 lines of 89 tests for paren typing behavior, to make sure that the tree and cursor doesn't become ill-formed in the course of the manipulation in all the different cases. There are intrinsically a lot of cases, don't get me wrong... but for any given case, there's 4 or 8 tests checking the same paren typing behavior in similar tree shapes. That's not cool.
One major source of bugs in particular has been that the cursor position
is represented by pointers to nodes in the tree, and that can easily
become ill-formed due to simple modifications to the tree. (#429 is an
example of this class of bugs that was fixed not that long ago.) This is
actually kind of a blocker for exposing the tree and cursor to manipulation
by external API calls: how do we ensure well-formedness without the API
feeling like moving piles of rice around with tweezers (like if all you had
was cursor.moveLeft()
and cursor.moveRight()
or something)? Well, how
come flat text fields don't have this problem? The answer is in the
data model.
In flat text fields, a cursor position is an index, so even if it is ill-formed (i.e. out of bounds), the right way to normalize it is obvious, just clamp it to the nearest bound. By contrast, in MathQuill's current representation where the cursor position is pointers to tree nodes, if the cursor's parent is a detached node, there's no obvious way to normalize that into where the cursor "should" be. However, if the cursor position is a path through the ancestors like proposed here, normalization is obvious, put the cursor in the deepest ancestor that still exists.
And externally, of course, the LaTeX imported and exported by MathQuill isn't meant to be human-edited (the point of MathQuill is to edit math visually, which is more human-readable than a text format could ever be), so LaTeX compromising machine-readability for human-readability doesn't really serve MathQuill well. MathQuill needs a format where the overriding concern is being dead simple for machines to read, possibly at the expense of human-readibility.
"...like they do in both MathML and KaTeX's AST?"
Because that lets you do stuff like {\frac{ \frac{1}{2} }{3} + 1 + 2}^2
:
What the hell is that? How do you edit that? How do you show whether the cursor is inside or outside the base of the super/subscript? There's no analogue when writing math on a whiteboard.
Note that something like it is still possible with e.g.:
{inline_base: [{numer: ['1'], denom: ['2']}, '+1', {sup: ['2']}]}
and a special relationship between the containing thingy and the sup node.
Okay so, (Presentation) MathML is supposed to, more or less, represent the
same data (structure and content) as the relevant AMS-LaTeX subset, but more
machine-readable and amenable to the horrifying existing ecosystem of
XML tools, right? What are the other reasons people think everything should
be in XML? Tim B-L talked about "the fruits of well-formed systems" but
like, TeX and friends don't suffer from the rapidly evolving incompatibilities
that HTML had, nor the ill-formedness problems inherent to SGML descendants
like <b><i>LOL</b></i>
, it's not like influential TeX tools are forgiving of
unmatched braces and handling them in undocumented, ill-understood ways.
Okay so great, MathML lets you leverage existing XML tools for parsing and stuff, maximizing your synergy for win-win solutions, etc. Which is great if you necessarily need a format that makes parsing and stuff hard. But wouldn't it be even better if you could use a format so trivially simple that parsing and stuff is easier to do by hand than it would be to configure and use giant heavyweight XML parsing tools?
Even beyond the whole XML thing, MathML is unnecessarily complex, encompassing
aspects of semantics or presentation that fundamentally are neither structure
nor content. Especially having to specify <mo>
vs <mi>
vs <mn>
, whereas
in LaTeX that's implicit in the normal case, yet no one worries that LaTeX
isn't expressive enough compared to MathML.
This is even more apparent contrasting with MathSpeak. <mo>
vs <mi>
vs
<mn>
? Not even representable, should belong to Content MathML or OpenMath.
Attributes like form
or lspace
or stretchy
? Ignored, belongs solidly in
the domain of visual display styling. <mrow>
? Is that meaningful to anyone?
LaTeX lets you put braces {}
anywhere, which leads to shitty situations
with super/subscripts that aren't representable in MathSpeak nor MathSON.
Well, I could attempt thoughtful, balanced reasoning of why XML's tradeoffs are a poor fit for math, but if I can't do better than this HN commenter, is it really worth it? Instead I shall present a more visceral argument.
"Is parsing XML really that hard and heavyweight?" Look, parsing XML isn't hard like parsing HTML is hard, but just look at this JSON:
[
{
numer: ['1'],
denom: ['2']
},
'x',
{
sup: [
'-',
{
numer: ['1'],
denom: ['2']
}
]
}
]
In MathML, that'd be, what:
<math>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<msup>
<mi>x</mi>
<mrow>
<mo>-</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
</mrow>
</msup>
</math>
Don't worry if that's not valid MathML, this is about XML. My point is, here's how to get to the 2 in the exponent in MathSON:
mathObj[2].sup[1].denom
(That's JS but it'd be similarly straightforward in Python or Ruby or whatever.) By comparison, in MathML:
mathTree.children[1].children[1].children[1].children[1]
That gets you the <mn>
, by the way, not the Text node containing the string
'2'
. There's a difference. Now, which would you rather deal with? These generic
tree node things, or plain old dictionaries and arrays?
This gimmick:
Okay.
was blatantly stolen from @jneen's literary masterpiece.
Not sure if this factors into your research, but MathJSON seems to be a decent standard for Math represented in JSON. I haven't used it, but there's also a corresponding library called MathLive for rendering Math in web components. Examples on their website: https://mathlive.io/examples/. Again, not sure if this relates to your research.