- Published on
Containerized Database Development Workflow
- Authors
- Name
- David Gordon
Resources:
- Postgres Docker Container Documentation
- Postgres SQL Dump
- Postgres pg_dump
- Podman Command Documentation
- though I will be using Podman, many of the same commands work identically in Docker. You'll want to refer to Docker's documentation to be sure however.
In this post I will be covering a basic workflow for developing a database using containers. The workflow is essentially just starting a container of our chosen database sever, developing as we need to, then dumping the schema to a file on the host machine that is suitable for a version control system like git. I am using PostgreSQL here and while other SQL-like databases will likely involve many of the same processes there will be some differences. As always, refer to your documentation. Document or Key-Value databases may work in a completely different manner in terms of schema development (or lack thereof) and file backup options (or lack thereof).
Starting a Container
podman run --name testdb -d -e POSTGRES_PASSWORD=root -p 5432:5432 postgres
Take note of the following options:
d
: start the container in detatched mode, this allows us to reuse the shell we are working in.--name testdb
: Podman (and other OCI runtimes like Docker) give containers randomized names when they are created, we can choose now to give our container a name. This name can be used to refer to the container in future operations (like starting, stopping, removing, etc.).-e POSTGRES_PASSWORD=root
: We are setting an environment variable for the container. You'll need to refer back to the container's documentation to see what environment variables are need, are supported, and what they do. In this case, this environment variable sets a password for the database server's admin user.-p 5432:5432
: Postgres, by default, listens to port5432
. We can tell our container runtime to allow this port to be open. Optionally we can map any host port to any internal container port using the following syntax:<HOST>:<CONTAINER>
. This is nice in the case that we are running many instances of containers that internally listen to the same port.
Open a client shell (or make any other admin connection)
If you prefer or need something other than a shell (like SSRS or something simillar), you can connect to this running instance in the same way as any other given over localhost or network. In this case, I am just using psql
, a PostgreSQL shell client. PGAdmin is a nice alternative if you need a gui DBMS for PostgreSQL.
psql -U postgres -d postgres -p 5432 -h localhost
-U postgres
: This option sets the desired username to connect as. By default, the PostgreSQL container sets apostgres
username (this is the default admin username).-d postgres
: This option sets the desired database to connect to. By default, PosgreSQL has apostgres
database created.-p 5432
: This sets the port to connect through.-h localhost
: This is the hostname used in the connection.
At this point the database can be developed and used as normal.
Restarting a Container
The state of the filesystem in our development container is saved unless we specifically remove the container. What this means is that if we stop the running container and/or shutdown the machine we're working on, the container will still be in the same state. We will simply need to restart the container to continue where we left off:
podman start testdb
Backup and Recreating the Database
We will likely want to take snapshots of the database as we develop. What I'm proposing here is a manual workflow, which has the primary disadvantage of needing to be done manually to get any benefit from it. There are many ways to automate this process and how you choose to automate will likely depend on many other factors.
Backing up the database is as simple as running a pg_dump
command from the host machine. Note that pg_dump
is essentially a PostgreSQL client that can connect over a network, and can be used in much the same way as psql
.
pg_dump -s -h localhost -p 5432 -U postgres -d postgres > my_db_dump.sql
Many of the options and commands are the same as psql
, with the following changes:
-s
: This runs the command in schema only mode, i.e. it does not include data. By default,pg_dump
will dump both schema and data. To include only the schema use the-s
option, and to include only data use the--data-only
option.> my_db_dump.sql
: By defaultpg_dump
will output the file overSTDOUT
. This allows you to use the output in any number of ways on a unix system (tee
,|
,<()
, etc.). Here I am simple creating (or overwriting if it exists) a file calledmy_db_dump.sql
.
Example Workflow: A Workout Database
Imagine we are designing a simple system to log a workout plan. We wish to keep track of sets of exercises -- like 'cardio', 'benchpress', 'deadlift' -- and link them with a set of workouts -- we can use days of the week like 'tuesday workout' or 'friday weightlifting'. At this stage, we know we need the following schema:
workouts
row | type |
---|---|
id | uuid |
name | varchar(50) |
exercises
row | type |
---|---|
id | uuid |
name | varchar(50) |
workout_exercises
row | type |
---|---|
id | serial |
workout | uuid references exercises(id) |
exercise | uuid references workouts(id) |
We can spin up our development container:
podman start testdb
(I've created this container already)
Then shell into the container and being issuing commands and schema creations:
host$: psql -U postgres -h localhost -p 5432
psql$: CREATE DATABASE workoutdb;
psql$: \c workoutdb
psql$: CREATE TABLE workouts(
id UUID PRIMARY KEY,
name VARCHAR(50));
psql$: CREATE TABLE exercises(
id UUID PRIMARY KEY,
name VARCHAR(50));
psql$: CREATE TABLE workout_exercises(
id UUID PRIMARY KEY,
workout UUID REFERENCES workouts(id),
exercise UUID REFERENCES exercises(id));
psql$: CREATE VIEW workout_exercises_v AS
SELECT
workouts.name as Workout,
exercises.name as Exercise
FROM workouts_exercises
JOIN workouts
ON workouts_exercises.workouts=workouts.id
JOIN exercises
on workout_exercises.exercise=exercises.id;
psql$: CREATE PROCEDURE workout_exercises_insert
(workoutName varchar, exerciseName varchar)
LANGUAGE SQL
AS $$
INSERT INTO
workout_exercises (workout, exercise)
SELECT workouts.id, exercises.id
FROM workouts
JOIN exercises
ON workouts.name=$1 AND excersises.name=$2;
$$;
Once we've put our simple database together, we simple exit the psql
client (or open a separate host shell), and issue a pg_dump
command. After this the file can be added to a git repository like any other bit of source code.
psql$ \q
host$ pg_dump -U postgres -d workoutdb -h localhost -p 5432 -s > workout_db_schema.sql