Database Data Versioning

Prolog

one issue that regularily occurs to me in a development project is how to keep in sync with the database. This is often no problem if you develop it yourself, alone. But if you're developing a project with more than one person the generally accepted best practice is to develop locally, commit changes and then deploy to the testing/staging/production. This is all and well with code of course. But keeping the database in sync is quite a hassle sometimes. Most systems provide at least schema-migrations or other means of analysing + patching the database. But what about the data itself? Currently there are 2-3 Scenarios that you could do:

develop locally and regularily dump the production database and import it locally
- pro: fast project because everything is local and easily to develop
- con: you regularily get out of sync and need to redo stuff manually or override the database
open up the production/staging database and let every local developer use that
- pro: everyone uses the same database
- con: possibly slow performance, possibly issues because of out of sync code<>db
develop together on staging
- pro: everyone uses the same code + db
- con: slow development, possible accidental overriding of changes of another developer, bad-practice

The main issue imho

i think the main issue we have, is that we have no proper way to diff and merge database data. if we could diff and merge database data properly we could easily commit our database changes to git and let it handle the diff/merging. Problem is, that an sql dump is one of the worst formats for git to understand merge-wise, because the data is "crammed" into single rows. If it were in form of an json/yaml/xml file for example with every property on a single row git could easily spot a difference in a database row and even merge 2 changes to different columns of the same database row.

The idea

a little commandline tool would be great, that would offer these features:

dbgit pull

dump all database rows into a folder structure like: "Database/Tablename/RowId.json" or "Database/Tablename.json" old rows will be deleted and updated ones will be changed. This data can then easily be commited to git and diffed + merged

dbgit push

push the data in the "Database/*" files back into the database.

dbgit diff

check for changes between the database and the local files

Example

Using this we could implement a workflow like this based on the first type of workflow in the prolog:

The Developer checks out the master of the project and changes something to the database
The Developer pulls the changes from the database and commits it together with the code changes to git and pushed it
The Designer/Content Editor pulls it's local database changes into the files and commits it locally
The Designer/Content Editor pulls the changes of the Developer and git successfully merges thos automatically
The Designer/Content Editor pushes the database changes to his database and it's database changes to the git remote
The Developer pulls the git remote changes and pushes the database changes back to his database

Exections

You should of course not push security relavant things to a git remote that might not be safe, but for this we would need a way to ignore some tables anyway. Versioning Logs and caches doesn't make sense either ;)

mneuhaus/Idea.md

Prolog

The main issue imho

The idea

Example

Exections

ssteinker commented Nov 12, 2013

adriankremer commented Feb 12, 2015