The default value for errors
, although specified as None
in the
function signature is surrogate_then_replace
The most common and recommended values for compatibility between python2 and python3 are:
surrogate_then_replace
surrogate_or_strict
When to use which?
surrogate_then_replace
should be used when the data is informational only,
such as when displaying information to the user. Ultimately, just heading to
a log or displayed to the user.
surrogate_or_strict
should be used when the data makes a difference to the
computer's understanding of the world. Such as with file paths or database
keys.
This specifies the strategy to use if a nonstring is passed. The default is
simplerepr
and will return a string representation using either str(obj)
or repr(obj)
preferring the str()
method.
Other values are empty
which returns an empty string, passthru
which
returns the original object, or strict
which will raise a TypeError
exception.
An example of using passthru
would be when either passing a string or a
file like object for use in a HTTP POST request with to_bytes
.
"native" in this context is meant to indicate the default string type on
Python 2 and 3 as produced by str
to_native
on the controller, is used for a small set of functionality:
- When converting information for use in exceptions
- When the underlying python API expects a native string type
Typically speaking, native values should not be long lived, and should be
converted at the borders to native where they are needed. If a variable
must be assigned to a native value, the variable should be prefixed with
n_
such as n_output
.
- Typically most all strings on the target should utilize the native string
type for the most easy integration of the underlying python APIs. However,
be careful to note the information from the
errors
section, which dictates whicherrors
value to use for informational vs operational values.
"bytes" in this context refers to the data type produced by bytes
on Python 2
and Python3.
On Python 2 this is str
and on Python 3 this is bytes
.
Values converted to bytes
should not be long lived. Typically values should
be converted at the borders to bytes where they are needed. If a variable must
be assigned to a bytes value, the variable should be prefixed with b_
such
as b_path
. This includes params in the function signature, if a function
accepts a bytes value.
When dealing with byte-oriented APIs. This is common when dealing with file paths, or with data being passed through HTTP requests.
"text" in this context is meant to indicate the type produced by the unicode
function on Python2, and str
on Python3.
- When data is ingested into Ansible, values should typically be cast to text for the lifetime of that data.
- All information sent to the
Display
class, such asdisplay.display
ordisplay.vvv
should be cast to text.
NOTE: Only on the borders where the data leaves Ansible should it be converted to bytes or native.
It is not likely to need to_text
in many scenarios on the target. Only when
the API you are dealing with specifically needs text types, such as in some
MySQL libraries.
I would not use unicode and non-unicode as used here. Only use "unicode" for the python2 unicode type. Most other places should say "text" (or "text string") or "bytes" (or "byte string"). The reason is that the term "unicode" is not very clear in most programmers minds. They associate "unicode" with one of the encodings of unicode (typically utf-16 or utf-8) rather than an abstract idea of a string of human-readable characters. That association of "unicode" with encodings means that they think of unicode as a byte string which it most certainly is not in Python.