Skip to content

Instantly share code, notes, and snippets.

@mneuhaus
Created November 12, 2013 13:12
Show Gist options
  • Save mneuhaus/7430578 to your computer and use it in GitHub Desktop.
Save mneuhaus/7430578 to your computer and use it in GitHub Desktop.
Database Data Versioning

Prolog

one issue that regularily occurs to me in a development project is how to keep in sync with the database. This is often no problem if you develop it yourself, alone. But if you're developing a project with more than one person the generally accepted best practice is to develop locally, commit changes and then deploy to the testing/staging/production. This is all and well with code of course. But keeping the database in sync is quite a hassle sometimes. Most systems provide at least schema-migrations or other means of analysing + patching the database. But what about the data itself? Currently there are 2-3 Scenarios that you could do:

  • develop locally and regularily dump the production database and import it locally
    • pro: fast project because everything is local and easily to develop
    • con: you regularily get out of sync and need to redo stuff manually or override the database
  • open up the production/staging database and let every local developer use that
    • pro: everyone uses the same database
    • con: possibly slow performance, possibly issues because of out of sync code<>db
  • develop together on staging
    • pro: everyone uses the same code + db
    • con: slow development, possible accidental overriding of changes of another developer, bad-practice

The main issue imho

i think the main issue we have, is that we have no proper way to diff and merge database data. if we could diff and merge database data properly we could easily commit our database changes to git and let it handle the diff/merging. Problem is, that an sql dump is one of the worst formats for git to understand merge-wise, because the data is "crammed" into single rows. If it were in form of an json/yaml/xml file for example with every property on a single row git could easily spot a difference in a database row and even merge 2 changes to different columns of the same database row.

The idea

a little commandline tool would be great, that would offer these features:

dbgit pull

dump all database rows into a folder structure like: "Database/Tablename/RowId.json" or "Database/Tablename.json" old rows will be deleted and updated ones will be changed. This data can then easily be commited to git and diffed + merged

dbgit push

push the data in the "Database/*" files back into the database.

dbgit diff

check for changes between the database and the local files

Example

Using this we could implement a workflow like this based on the first type of workflow in the prolog:

  1. The Developer checks out the master of the project and changes something to the database
  2. The Developer pulls the changes from the database and commits it together with the code changes to git and pushed it
  3. The Designer/Content Editor pulls it's local database changes into the files and commits it locally
  4. The Designer/Content Editor pulls the changes of the Developer and git successfully merges thos automatically
  5. The Designer/Content Editor pushes the database changes to his database and it's database changes to the git remote
  6. The Developer pulls the git remote changes and pushes the database changes back to his database

Exections

You should of course not push security relavant things to a git remote that might not be safe, but for this we would need a way to ignore some tables anyway. Versioning Logs and caches doesn't make sense either ;)

@ssteinker
Copy link

+1

@adriankremer
Copy link

+2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment