Some very popular Markdown implementations include support for a table syntax (pandoc, PHP Markdown Extra, and GitHub-Flavored Markdown, to name a few). Basically support for the following syntax:
Head cell | Another
----------|--------
Cell text | Another cell
More cells| below...
Inlines | __allowed__
which renders as:
Head cell | Another |
---|---|
Cell text | Another cell |
More cells | below... |
Inlines | allowed |
I am unaware of any great syntax definition for the above table syntax, and while writing support for tables in the Dart "markdown" package, I thought I'd write up a syntax definition, in the spirit of the CommonMark spec.
This spec is more useful to an author of a Markdown parser than it is to someone trying to just write some Markdown (in the same way that the CommonMark spec is more useful to Markdown parser authors). For a more terse explanation of how to write these Markdown tables, GitHub's text is very approachable.
The spec below is not 100% complete. In particular:
- Need to spell out how whitespace around
|
characters is trimmed. - Need to spell out how tables must be separated from other block elements by blank lines.
- Need to see where exactly the rules below are different from the rules of GFM, PHP Markdown Extra, and pandoc.
A table consists of a table row, followed by a table head divider, followed by zero or more [table rows](#table row). The result is a table, with a table head consisting of the parsed results of the first table row, and a table body consisting of the parsed results of all of the table rows that follow the table head divider. Cell alignment can be declared in the table head divider.
A table row consists of a line of text, containing at least one non-whitespace character, with no more than 3 spaces indentation. The line of text must be one that, were it not followed by the table head divider, would be interpreted as part of a paragraph: it cannot be interpretable as a code fence, ATX header, block quote, horizontal rule, list item, or HTML block.
The contents of a table row are the results of a three-step process:
- Any leading and trailing
|
characters at the beginning of the line and end of the line are removed. They are allowed for source readability. - The line is then parsed as Markdown inline content.
- Any textual content is
then scanned for
|
characters. Every|
character results in a table cell boundary.
A table head
divider consists of an optional opening |
character, followed by a
sequence of at least two table column markers, each
separated by a |
character, followed by an optional |
character, with no
more than 3 spaces indentation and any number of trailing spaces. The width and
number of table column markers has no consequences on the contents of the table
(except in the declarations of alignment).
A table column
marker consists of an optional opening :
character, followed by a
sequence of -
characters, followed by an optional closing :
character. The
alignment of the contents of a column are defined by the presence of the
opening and closing :
characters:
- No opening or closing
:
characters indicates no declared alignment. - An opening
:
character without a closing:
character indicates a declared "left" alignment. - A closing
:
character without an opening:
character indicates a declared "right" alignment. - Both an opening and a closing
:
character indicates a declared "center" alignment.
Here is a simple example:
foo | bar <table>
-----|---- <thead>
some | text <tr><th>foo</th><th>bar</th></tr>
in | cells </thead>
<tbody>
<tr><td>some</td><td>text</td></tr>
<tr><td>in</td><td>cells</td></tr>
</tbody>
</table>
Wrapping |
characters do not change the results:
| foo | bar | <table>
|------|-------| <thead>
| some | text | <tr><th>foo</th><th>bar</th></tr>
</thead>
<tbody>
<tr><td>some</td><td>text</td></tr>
</tbody>
</table>
The number of cells in each row can be variable:
| foo | <table>
|------|------| <thead>
| some | text | <tr><th>foo</th></tr>
| in | many | cells | </thead>
<tbody>
<tr><td>some</td><td>text</td></tr>
<tr><td>in</td><td>many</td><td>cells</td></tr>
</tbody>
</table>
Each row is parsed as inline Markdown, and cell divisions cannot occur anywhere except textual content:
| `foo` | <table>
|-------------|-------------------| <thead>
| `foo | bar` | [link](weird|url) | <tr><th><code>foo</code></th></tr>
</thead>
<tbody>
<tr><td><code>foo | bar</code></td><td><a href="link">weird|url</a></td></tr>
</tbody>
</table>
Table bodies are not required:
| foo | bar | <table>
|------|-------| <thead>
<tr><th>foo</th><th>bar</th></tr>
</thead>
<tbody>
</tbody>
</table>
Single-columned tables are allowed, but the table head divider still must contain
at least one |
, to distinguish it from a setext header.
foo <table>
|-----| <thead>
<tr><th>foo</th></tr>
</thead>
<tbody>
</tbody>
</table>
foo <table>
-|- <thead>
<tr><th>foo</th></tr>
</thead>
<tbody>
</tbody>
</table>
foo <h2>foo</h2>
---
Alignment can be specified in the table head divider:
foo | bar | baz| quux <table>
:---|:---:|---:|----- <thead>
left | center | right | unspecified <tr><th style="text-align: left;">foo</th><th style="text-align: center;">bar</th><th style="text-align: right;">baz</th><th>quux</th></tr>
</thead>
<tbody>
<tr><td style="text-align: left;">left</td><td style="text-align: center;">center</td><td style="text-align: right;">right</td><td>unspecified</td></tr>
</tbody>
</table>
Has anyone ever considered an extension to allow multiline cells? I find myself always yearning for those whenever I use markdown tables. Something like this:
which would then produce something like this: