This Gist features two-way interactions between an Excel spreadsheet and a SQLite database exported from the Regularly Android app (APK available from other sources), enabling synchronization and updating of chores in the app. This writeup goes in a few different directions, so feel free to poke around:
- The usage guide is brief, but links to more details for all skill levels.
- Jump into changing the sheets for a tour of the kind of end-user code that carefully-prepared SQLModel models facilitate.
- See behind the scenes in the database model deep dive, powered by SQLModel, with special handling of quirks and gotchas.
- See the Excel and CSV data models for data model philosophy, and a few cool features:
- A
DataclassWriter
akin tocsv.DictWriter
. - An XLWings-powered function returning a sequence of mappings to Excel tables for row/header title addressing.
- A
- See experiments for some other examples.
A brief description of relevant scripts and modules follows:
_update_db_chores.py
: Update a Regularly database with chores from an Excel spreadsheet._update_xlsx_chores.py
: Like the above, but in reverse!_update_csv_chores.py
: Compute some statistics from the database and export to CSV.chores_db.py
: Regularly Android app database model and helpers.chores_xlsx.py
: Excel chore model and Excel table address/value helpers.chores_csv.py
: CSV chore model and a dataclass CSV writer.
See also the requirements.txt
for exact version pins that have worked for me on Python 3.11.
Ensure Python is installed. If on Windows, install Python from here rather than from the more limited Microsoft Store install. If on MacOS or a UNIX-like system, install the Python Launcher and obtain different Python versions as needed from deadsnakes.
Download this Gist and run setup.ps1
in a cross-platform PowerShell terminal window (or manually run the commands below in any terminal). If you open the downloaded Gist folder in VSCode, open a new terminal (Ctrl+`
) and run setup.ps1
there. If on Windows, you may need to complete Task 1
in this guide to allow scripts to run. The setup.ps1
script essentially does the following:
- Creates a virtual environment by running
py -m venv .venv
. - Activates the virtual environment with
.venv/scripts/activate
on Windows, or.venv/bin/activate
on UNIX-like/MacOS systems. - Installs requirements with
pip install -r requirements.txt
. - Installs some VSCode tooling configs.
I have not provided my personal Regularly tasks database, but will eventually mock one up so that you don't have to provide your own (by installing the Android app and exporting the database via settings). This Gist does represent some good practices for general database modeling with SQLModel, so you don't necessarily need to run the code to get it. See the writeup below for more detail.
See a more in-depth guide for first-time setup instructions on a new machine, or if this is your first time using Python. Now you can run Python scripts in this Gist, in an activated terminal window, for example like python _update_db_chores.py
. If you plan to modify this code or use it as a starting point for your own development, the template from which this Gist is derived details the additional VSCode tooling available.
The Regularly Android app is no longer actively supported, but it's the only interface I've found that allows for tracking of chores on a cadence, e.g. "I want to change the sheets once a month," rather than, "I will change the sheets on the 1st of every month." It orders your tasks by urgency, and allows flexible tracking of their completion.
I want to be able to bulk add, modify, and even synchronize chores to and from this app on my phone, and thankfully it allows import/export of a SQLite database to do so. Maybe I was just looking for an excuse to learn SQLModel, but that's beside the point. What follows is an illustrative example of adding a "Change the sheets"
task to the database, and assigning it to a tag named "chore"
(actually "chores"
in the full code, for legacy reasons).
SQLModel facilitates setting up an object-relational-mapping (ORM) that represents the database schema in an object-oriented way in our Python code. Models in chores_db.py
represent individual database table rows, e.g. Task
objects are drawn from the tasks
table using SQLModel. The mapping between models and the tables from which they are drawn is as follows:
Table | Model |
---|---|
tasks | Task |
tags | Tag |
tasktags | TaskTagLink |
log | Done |
The tasks
table has tasks tracked by the app. The tags
table lists tags, e.g. some tasks have the tag named "chore"
attached to them. The tasktags
table facilitates this tagging mechanism. A TaskTagLink
represents one such link in tasktags
.
SQLModel (and SQLAlchemy) allows representing the tasktags
link between Task
and Tag
instances by adding a computed tags
attribute directly on Task
instances. If that was a mouthful, basically it means that we don't have to manually cross-reference a third table in order to find connections between Task
s and Tag
s, we can get those attriutes directly on instances with Relationship
s. These are Relationship
attributes, and Tag
instances also have a tasks
relationship attribute. The relationships defined in chores_db.py
facilitate our goal of assigning a tag named "chore"
to a task named "Change the sheets"
:
from sqlmodel import Session
from chores_db import Tag, Task, engine
with Session(engine) as session:
# Create new instances
task = Task(name="Change the sheets", period=30)
tag = Tag(name="chore", position=0) # `position` for UI only
# Add the tag to the task
task.tags.append(tag)
# Do the database transactions, print some things
session.add(task)
latest = tag.tasks[-1]
print(f"{latest.name = }, {latest.id = } (before commit)")
session.commit()
print(f"{latest.name = }, {latest.id = } (after commit)")
which produces the following:
latest.name = 'Change the sheets', latest.id = None (before commit)
latest.name = 'Change the sheets', latest.id = 87 (after commit)
where latest
is the latest task that tag
was attached to. Note that the id
field is only populated after committing to the database. Now, we added tag
to task.tags
, not the other way around. So, how does tag.tasks
know that this happened? This is the magic of the SQLModel.Relationship
used when setting up the models (derived from SQLAlchemy). When set up properly, it is equally valid to append to the tag.tasks
or the tag.tasks
attributes, and changes in one will back-populate to the other.
In the previous example, tag
and task
are created, and have no existing relationships to other tags or tasks. But we could have instead searched for a task named "Change the sheets"
in the database:
from sqlmodel import Session, select
from chores_db import Tag, Task, engine
with Session(engine) as session:
# Search for or create the task
name = "Change the sheets"
query = select(Task).where(Task.name == name)
task = session.exec(query).first() or Task(name=name, period=30)
# Search for or create the tag
name = "chore"
query = select(Tag).where(Tag.name == name)
tag = session.exec(query).first() or Tag(name="chore", position=0)
# ...the example continues as before
Here, we construct a query
for an existing task named "Change the sheets"
. When we execute that query
in the session
, we ask for the first
result, which will either give us a Task
instance or None
. By using the Python construct thing = this or that
, we know that thing
becomes this
unless this is None
, in which case thing
will become that
. We also do the same for the tag
. We can enable engine.echo
(not shown) to see the equivalent SQL query for what the task query (effectively) is:
SELECT * FROM tasks WHERE tasks.name = "Change the sheets"
where the binding of these attributes to a Python Task
instance is handled by SQLModel/SQLAlchemy.
We now know how to find or create new tasks and tags, and attach tags to tasks. We used relationship attributes to facilitate that. The scripts in this Gist essentially do just this, but for many such tasks in an Excel spreadsheet, and assign "owners" as tags, in addition to a chores tag. Most of the hard work is done in
chores_db.py
so that our scripts read nicely, more like Python code than like SQL queries and explicit database transactions. The next thing we'll look at is how
chores_db.py
is structured.
The aforementioned ORM is defined in
chores_db.py
and used everywhere. We don't have to write it all by hand, we can in fact generate the basic skeleton of it with sqlacodegen
, but we will have to do some work to hook up the fancy relationship attributes in the right fashion, occasionally dropping down from the higher-level SQLModel abstraction into the lower-level SQLAlchemy abstraction.
Of course, both abstractions are preferable to hand-writing query strings and hitting the database directly, at least at a certain scale of interaction with the database. In fact, in the section below detailing the Excel and CSV data models, it is shown that a similar approach may be taken for data represented in those formats. It reduces the mismatch between formats when trying to do such forbidden things as updating a database from Excel spreadsheets (gasp!).
After a discussion about automatic model generation, the anatomy of chores_db.py
is detailed:
- Initialization
- Reusable field definitions
- Database models, relationships, and computed fields
- Task and tag helpers
The code snippets throughout are shown in a modified/ and reduced form for illustrative purposes. The anatomy here may differ from the exact implementation in chores_db.py
, but suffices to outline the important concepts for defining your own database models in such a fashion.
It can be tedious to pore over the database schema (using e.g. SQLTools or DB Browser for SQLite) and code up the SQLModel classes by hand for all the tables in the database. However, a utility called sqlacodegen
comes to the rescue. We can point it to our database from the command line, tell it the flavor of models we want generated, and it will generate the models for us. The models in chores_db.py
were initially generated from the following command (in PowerShell, Out-File
sometimes behaves better than the similar >
operation):
sqlacodegen --generator 'sqlmodels' 'sqlite:///regularly.db' | Out-File 'models.py'
This produces models.py
(which has been heavily modified and renamed to
chores_db.py
), but even at this point, it should allow hitting the database, just without the fancy relationships used in the changing the sheets example. Well, it mostly works, but since SQLAlchemy 2 and Pydantic 2 have dropped, sqlacodegen
hasn't quite caught up yet, but we pin sqlacodegen
to a particular commit containing a hacky fix that gets us mostly there. We just have to delete the AndroidMetadata
generated model (and re-create it properly later), but it only represents Android system locale anyways, so we don't really need to worry about it.
Imagine a Spongebob Squarepants "one eternity later" screen wipe between models.py
and the finished product, chores_db.py
. I definitely asked myself, "Should I have just handwritten all of this?" However, it is much nicer to incrementally change SQLAlchemy Column
s to SQLModel Field
s, each time checking the engine
for a heartbeat, until arriving at the more ergonomic SQLModel represention.
Some SQLALchemy fallbacks do remain, but the complete database model comes with IDE auto-completion benefits. Did you forget to specify UI position
for that newly-created Tag
? Your IDE will tell you so! It also minimizes boilerplate in the changing the sheets example, as it does with the rest of the examples in this Gist.
The SQLModel tutorial goes into more detail as to why the engine should be initialized in this fashion, but I define init()
up top, and call it like engine = init()
at the very bottom of chores_db.py
. This pattern is necessary so that the engine
"knows" about all the relevant models, and could be accomplished by isolation across multiple modules, but works just as well in this fashion.
from pathlib import Path
from sqlmodel import SQLModel, create_engine, select
def init():
db = Path("regularly.db")
engine = create_engine(f"sqlite:///{db}")
SQLModel.metadata.create_all(engine)
return engine
The only way to know when to drop down from the higher-level abstraction of SQLModel to the lower-level abstraction of SQLAlchemy is to try to do something with SQLModel and see if it works. All the while you are using classes that are a lot like Pydantic models, but different enough to bite you when you try to use a Pydantic concept that's not handled properly by SQLModel. Since we're working off of a functioning generated model, though, it's not so bad. But there are some quirks.
To that end, it helps to define some custom Field
s that encapsulate these quirks, and then use them in the model definitions that follow. We have to define these here (or in another module and import them) rather than below the models, because the models use these fields at import time. Field
is a class-like function, meaning we can't subclass it, but we follow the pattern and make our derived fields a class-like function as well. Normally, functions should be snake_case
, but the PascalCase
here means they behave in a "classy" sort of way, and in any case, the rules are all made up and the points don't matter!
Here are the players:
IdField
: We're working from an existing database schema that we don't have control over. They've gone and named their primary keys_id
in the database, but SQLModel/Pydantic won't let us have an underscore-prefixed field in our models! SQLModel doesn't let us use the ergonomic type annotations when our field name differs from the equivalent database field, so we have to use thesa_column
argument and pass it a raw SQLAlchemyColumn
, configured accordingly.ForeignField
: Similarly, our foreign keys are also_id
, which throws a wrench. So we have to use thesa_column
approach again.DbField
: Sometimes we want different names for other fields to. For instance, thetags
table has atagname
relation, so without this workaround, we would have to useTag.tagname
. But then we sayTask.name
, and we know we'll forget that distinction, so let's useDbField
to shift the name on the SQLModel side.
from sqlalchemy import Column, Integer
from sqlmodel import Field
def IdField():
return Field(
default=None,
sa_column=Column("_id", Integer, primary_key=True, autoincrement=True),
)
def ForeignField(foreign_key: str):
return Field(default=None, foreign_key=foreign_key, primary_key=True)
def DbField(name: str, sa_type):
return Field(sa_column=Column(name, sa_type))
Now we're finally ready to define our database models. Much of the complexity to follow has to do with us inheriting an existing database schema that was not crafted specifically to play nicely with SQLModel. If you're starting a database design from scratch, following the SQLModel tutorial will lead to fewer warts and workarounds. But read on for some of the ways you can work around existing patterns in inherited databases.
The Relationship
fields are not in the database, but rather derived from the primary and foreign key relations in the database. We can use simple type annotations for most fields, referring either to the database schema (is a field nullable?) for most, but deferring to the actual manifestion of database entries (does a non-nullable field actually contain NULL
entries in practice?) when deciding whether to union the field type with None
and supply a None
default.
We can derive singular types from pluralized database tables by using the singular in our class definitions, then setting __tablename__
to the actual table related to this type. There are a few ways to accomplish this, but this approach is the cleanest, even if we have to do a # type: ignore
due to incompatible overloading. It works fine, is explicit, and simpler than the alternatives. We prefer singular types so that the Python code reads more naturally. Here we're smoothing over one of the many object-relational impedence mismatches by linking singular types to pluralized tables.
Here are some other quirks to be aware of:
key
-related arguments toField
types use the canonical names in the database, liketags
notTag
, and_id
notid
.- Type annotations always refer to our Python model types, and we need to quote them, e.g.
list["Task"]
even if we could technically omit the quotes (e.g. if the class is already defined above). This is a SQLModel quirk. We should be careful not to import__annotations__
from__future__
either, or stuff breaks. - Arguments to
Relationship
, likeback_populates
, refer to the model attribute name on models named by the type annotation immediately preceding theRelationship
call.
Here are some highlights of the models that follow:
TaskTagLink
: Defined above the others because it's referenced in theirRelationship
fields.Tag
: Our firstRelationship
appears here, one half of enabling our expressive code in the changing the sheets example. The other half is inTask
(see below).Done
: Represents the days on which tasks are completed in thelog
database table. Its usage is explored briefly in the non-chronological log entries experiment. This has a many-to-one relationship with tasks, so an intermediary link (likeTaskTagLink
) is not needed.Task
: HereTask.tags
is the mirrorRelationship
toTag.tasks
, both are linked throughTaskTagLink
and represent a many-to-many relationship. See also a computed field, decorated using a Pydantic construct so that SQLModel doesn't try to stuff them back in the database.
class TaskTagLink(SQLModel, table=True):
__tablename__ = "tasktags"
id: int | None = IdField()
taskid: int | None = ForeignField("tasks._id")
tagid: int | None = ForeignField("tags._id")
class Tag(SQLModel, table=True):
__tablename__ = "tags"
tasks: list["Task"] = Relationship(back_populates="tags", link_model=TaskTagLink)
id: int | None = IdField()
name: str = DbField("tagname", sa.Text)
position: int
class Done(SQLModel, table=True):
__tablename__ = "log"
task: "Task" = Relationship(back_populates="completions")
id: int | None = IdField()
taskid: int | None = ForeignField("tasks._id")
entrydate: str = date.today().isoformat()
note: str = ""
class Task(SQLModel, table=True):
__tablename__ = "tasks"
tags: list["Tag"] = Relationship(back_populates="tasks", link_model=TaskTagLink)
completions: list["Done"] = Relationship(back_populates="task")
id: int | None = IdField()
name: str
period: int
firstdue: str | None = None
lastperformed: str | None = None
lastnotified: str | None = None
created: str = date.today().isoformat()
details: str = ""
notifications_enabled: int = 1
notifications_time: str | None = None
notifications_period: int | None = None
@computed_field
@property
def age(self) -> int:
return (date.today() - date.fromisoformat(self.created)).days
SQLModel helps us write quite a bit less boilerplate, but carries the additional complexity of needing to know when to use the SQLAlchemy escape hatch. The implementation in chores_db.py
is more verbose than shown here, and details rationale for some of these decisions.
The changing the sheets example featured a repeated pattern of querying for an existing task or tag, then creating it if it doesn't exist. Maybe this can be done with some combination of SQL queries (via select()
statements), but I found it beneficial to represent these common interactions in helper functions, defined in chores_db.py
after the models. Both get_task
and get_tag
are examples of this.
Also, since session.add(new_or_changed_item)
is sometimes necessary before follow-up operations recognize the changes (e.g. queries involving the changed objects), this can be done inside the helper function to make sure it's done consistently.See add_tag
for an example of this.
from datetime import date, timedelta
from sqlmodel import Session, select
def get_task(session: Session, id: int | None, name: str, period: int) -> Task:
query = session.exec(select(Task).where(Task.id == id))
if task := query.first():
task.name = name
task.period = period
return task
task = Task(name=name, period=period)
task.firstdue = (date.today() + timedelta(days=period)).isoformat()
return task
def get_tag(session: Session, name: str, position: int | None = None) -> Tag:
query = session.exec(select(Tag).where(Tag.name == name))
if (tag := query.first()) and position is not None:
tag.position = position
return tag
if not position:
raise ValueError("Tag not found, but desired UI position not given.")
return Tag(name=name, position=position)
def add_tag(session: Session, task: Task, tag: Tag):
if tag not in task.tags:
task.tags.append(tag)
session.add(tag)
All the magic is done, now we finally have the superpowers vested in us by SQLModel, Pydantic, and SQLAlchemy. This unholy trinity lets us write the scripts linked in the intro. This Gist also features models of CSV and Excel spreasheet task/chore representations, detailed briefly in the next section.
It turns out that data representations, be they in database, a spreadsheet, or the humble comma-separated-variables (CSV) file, can be quite similar. The similarities that interest me most are the structural similarities when trying to represent them in Python code. We could hit the database with ad-hoc SQL query strings, or hard-code the row we want to grab from the Excel spreadsheet, but these approaches are error-prone and tedious once we're doing more than a one-off script. I've identified the important structural similarities between chores_db.py
, chores_csv.py
, and chores_xlsx.py
:
- Constants and initialization: Initialize the engine, copy files, etc.
- Model definitions: Plain
dataclass
es, PydanticBaseModel
s, or SQLModels. - Helpers: Functions representing repetitive domain-specific patterns.
The drivers for these models are of course SQLModel, XLWings for Excel spreadsheets, and the standard library dataclasses
and csv
module for CSVs. Even if these players changed (for instance pyxll or OpenPyXL instead of XLWings), the structure of this kind of "model module" would be similar. Some notable features are:
- A
DataclassWriter
that takes ordered fields from a stockdataclass
and writes a sequence of instances to CSV. - A generic function (not task/chore-specific) returning a sequence of mappings to Excel "Table" cells (not bare cells), allowing getting and setting via row number and header title
The cell address mapping is probably not particularly performant, but it does facilitate two-way interactions between Excel and Python, tucking away some of the XLWings boilerplate. If you wander by and want to scrape some of these approaches into a cohesive package, please do!
Now that I've spun this complex web of data models in three formats, what can I do with them? Well, I've already accomplished my primary goal in _update_db_chores.py
, to update a Regularly Android app database with chores from an Excel spreadsheet. This allows me to leverage metrics not available in the app itself, such as the actual versus target cadence of completion for tasks, how realistic the target period is, and so-on. It also facilitates bulk updating and adding of tasks. I'll be able to synchronize the database with the app on my partner's phone periodically, and obviously all of this was done for the noble purpose of a cleaner house! Or was it just an excuse for some more Python practice?
Whatever the case, writeups of other experiments follow, representing my usage of these models going forward. At some point, I'll have to write some code to merge the databases on mine and my partner's phone, hacking together that buzzword "eventual consistency" on a much longer timeline than is usually meant by that word (e.g. months). Maybe this warrants getting into Alembic, but by that time this Gist (is it even?) will probably have grown big enough to warrant becoming a full-fledged repository.
At first glance, the log
table looks chronological. However, this is mostly because I usually make entries close to current. Let's see if the Regularly Android app can handle out-of-order log entries. In make_log_not_chronological.py
, I find the latest entrydate
and try appending a Done
instance with an entrydate
before that.
In an earlier rendition I tried this with an order_by
clause. In the select
statement I called on the entrydate
class attribute of the SQLModel type Done
to indicate that I want to do something with the entrydate
column in the related log
table (via Done.__tablename__
). Our IDE completion doesn't expose all the possible methods I can call on entrydate
, but since this is a "column", I can call on any valid ColumnElement
methods, and other expressions besides. desc()
and asc()
are valid chained method calls within an order_by
clause. Copilot auto-completed this, but it seems more explicit to actually import the desc
function from sqlalchemy.sql.expression
or sqlmodel.sql.expression
, and our IDE doesn't complain.
In the end, it was easier to just take an example task's completions
relationship and sort using Python's sorted
function and a key
argument. SQL is expressive by itself. Every one of these expressive SQL queries may have an equivalent representation via SQLAlchemy, but it's not always worth the effort to find it. If I already have the two-way relationships set up with back_propagates
, I can just sort the Python objects until I really need performant querying.
In the end, the Regularly Android app handled the resulting out-of-order log just fine.