Putting up a bounty didn't produce a solution to this, so I've spent a while figuring out the best one I could. I don't have any source confirming that this is the correct way to do it, so use at your own risk.
To convert any string into a SQLite identifier:
- Ensure the string can be encoded as UTF-8.
- Ensure the string does not include any NUL characters.
- Replace all
"
with""
. - Wrap the entire thing in double quotes.
My implementation:
import codecs
def quote_identifier(s, errors="strict"):
encodable = s.encode("utf-8", errors).decode("utf-8")
nul_index = encodable.find("\x00")
if nul_index >= 0:
error = UnicodeEncodeError("utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("\x00", replacement)
return "\"" + encodable.replace("\"", "\"\"") + "\""
Given a string single argument, it will escape and quote it correctly or raise an exception. The second argument can be used to specify any error handler registered in the codecs
module. The built-in ones are:
'strict'
: raise an exception in case of an encoding error'replace'
: replace malformed data with a suitable replacement marker, such as'?'
or'\ufffd'
'ignore'
: ignore malformed data and continue without further notice'xmlcharrefreplace'
: replace with the appropriate XML character reference (for encoding only)'backslashreplace'
: replace with backslashed escape sequences (for encoding only) This doesn't check for reserved identifiers, so if you try to create a newSQLITE_MASTER
table it won't stop you.
import sqlite3
def test_identifier(identifier):
"Tests an identifier to ensure it's handled properly."
with sqlite3.connect(":memory:") as c:
c.execute("CREATE TABLE " + quote_identifier(identifier) + " (foo)")
assert identifier == c.execute("SELECT name FROM SQLITE_MASTER").fetchone()[0]
test_identifier("'Héllo?'\\\n\r\t\"Hello!\" -☃") # works
test_identifier("北方话") # works
test_identifier(chr(0x20000)) # works
print(quote_identifier("Fo\x00o!", "replace")) # prints "Fo?o!"
print(quote_identifier("Fo\x00o!", "ignore")) # prints "Foo!"
print(quote_identifier("Fo\x00o!")) # raises UnicodeEncodeError
print(quote_identifier(chr(0xD800))) # raises UnicodeEncodeError
- SQLite identifiers are
TEXT
, not binary.- Reference:
SQLITE_MASTER
schema in the FAQ. - Reference: Python 2 SQLite API yelling at me when I gave it bytes it couldn't decode as text.
- Reference: Python 3 SQLite API requires queries be
str
s, notbytes
.
- Reference:
- SQLite identifiers are quoted using double-quotes.
- Reference: SQL as Understood by SQLite.
- Double-quotes in SQLite identifiers are escaped as two double quotes.
- SQLite identifiers preserve case, but they are case-insensitive towards ASCII letters. It is possible to enable unicode-aware case-insensitivity.
- Source: SQLite FAQ Question #18
- SQLite stops reading queries at the NUL character, but does not have any way to escape it.
- Reference: PHP came up with their own binary encoding to handle NUL; presumably if they could just store it directly they would.
sqlite3
can handle any other unicode string as long as it can be properly encoded to UTF-8. Invalid strings could cause crashes between Python 3.0 and Python 3.1.2 or thereabouts. Python 2 accepted these invalid strings, but this is considered a bug.- Reference: Python Issue #12569
- Reference: Modules/_sqlite/cursor.c
- Reference: I tested it a bunch.