Database Mocking and Seeding for Ephemeral Environments

Ephemeral preview infrastructure collapses when the database layer is an afterthought. A frontend that renders correctly against a stale fixture set tells you nothing about migration safety, ORM query correctness, or referential integrity under realistic data volumes. This page details the provisioning patterns that close that gap β€” strategy selection, pipeline wiring, anonymization, benchmarks, and the failure modes that burn teams repeatedly. It assumes you already have automated preview deployments on pull requests running and are trying to harden the data layer beneath them.

Prerequisites


How the mechanism works

Every provisioning strategy sits on the same three-phase timeline: init β†’ migrate β†’ seed. Understanding where each phase can block or race is what lets you debug failures without guesswork.

Database provisioning timeline for ephemeral environments Three sequential phases β€” init (container start + health-check), migrate (schema DDL), seed (data insertion) β€” followed by a readiness gate before the application pod receives traffic. Init Container start + health-check Migrate Schema DDL applied in order Seed Synthetic or anonymised data Readiness gate App pod starts; traffic routed Phase 1 Phase 2 Phase 3 Gate Most race conditions occur when the app container starts before Phase 1 completes.

Init starts the database process and blocks until the health-check probe confirms the port is accepting connections. Migrate applies all pending DDL in version-controlled order. Seed inserts synthetic or anonymized rows. Only after all three complete does the readiness gate lift and the application pod receive traffic.

The most common failure β€” application startup before seed scripts finish β€” happens because teams model only the health-check probe and treat migrate+seed as fire-and-forget. The patterns below enforce the full three-phase gate.


Step-by-step implementation

Step 1 β€” Choose your data provisioning strategy

Pick the right approach for the test scope before writing any pipeline code. Using a snapshot where synthetic seed would do wastes 20–60 seconds per preview spin-up and adds compliance surface area.

Strategy Best for Typical init overhead Key trade-off
Application-layer mocking Unit and component tests < 1 s No real query planner; misses SQL-level bugs
Container seeding (synthetic) Integration and preview environments 10–30 s Requires maintained seed scripts and migration tooling
Anonymized production snapshot High-fidelity staging or QA sign-off 30–120 s Compliance overhead; larger storage footprint

Verification: confirm the choice matches your test contract. If tests assert on specific row counts or ORM-generated SQL, mocking is insufficient β€” use container seeding.


Step 2 β€” Wire the database into Docker Compose (CI runners and local parity)

This pattern drives local development and most GitHub Actions / GitLab CI runner setups. The volumes mount is the critical seeding hook: every .sql and .sh file in the mounted directory runs automatically on PostgreSQL’s first start.

# docker-compose.preview.yml
services:
  db:
    image: postgres:16-alpine          # pin a minor version to prevent silent breaking changes
    environment:
      POSTGRES_DB: preview_db
      POSTGRES_USER: ci_user
      POSTGRES_PASSWORD: ci_password   # inject from ${{ secrets.PREVIEW_DB_PASS }} in real pipelines
    volumes:
      - ./db/seeds:/docker-entrypoint-initdb.d   # SQL and shell scripts run in filename order
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ci_user -d preview_db"]
      interval: 5s
      timeout: 3s
      retries: 10
    tmpfs:
      - /var/lib/postgresql/data       # ephemeral storage; no volume to clean up on teardown

  migrate:
    image: flyway/flyway:10             # or your migration tool of choice
    command: migrate
    environment:
      FLYWAY_URL: jdbc:postgresql://db:5432/preview_db
      FLYWAY_USER: ci_user
      FLYWAY_PASSWORD: ci_password
      FLYWAY_LOCATIONS: filesystem:/flyway/sql
    volumes:
      - ./db/migrations:/flyway/sql
    depends_on:
      db:
        condition: service_healthy      # blocks until health-check passes

  app:
    build: .
    depends_on:
      migrate:
        condition: service_completed_successfully   # blocks until migrations finish
    environment:
      DATABASE_URL: postgres://ci_user:ci_password@db:5432/preview_db

Inline verification:

docker compose -f docker-compose.preview.yml up --wait
docker compose -f docker-compose.preview.yml exec db \
  psql -U ci_user -d preview_db -c "\dt"
# Expected: table list matches your schema

Step 3 β€” Kubernetes InitContainer pattern (cluster-native workloads)

For teams running preview namespaces in Kubernetes, the InitContainer enforces the same three-phase gate at the pod level. The main application container does not start until all init containers exit with code 0.

# preview-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: preview-app
  namespace: preview-{{ .Values.prNumber }}   # Helm value; one namespace per PR
spec:
  replicas: 1
  selector:
    matchLabels:
      app: preview-app
  template:
    spec:
      initContainers:
        - name: db-migrate
          image: ghcr.io/your-org/flyway:10   # pin digest in production
          command: ["flyway", "migrate"]
          envFrom:
            - secretRef:
                name: preview-db-creds        # injected by external-secrets-operator or vault-agent
          env:
            - name: FLYWAY_URL
              value: jdbc:postgresql://$(DB_HOST):5432/$(DB_NAME)

        - name: db-seed
          image: ghcr.io/your-org/seed-runner:latest
          command: ["sh", "-c", "psql $DATABASE_URL -f /seeds/preview.sql"]
          envFrom:
            - secretRef:
                name: preview-db-creds
          volumeMounts:
            - name: seed-scripts
              mountPath: /seeds

      containers:
        - name: app
          image: ghcr.io/your-org/app:{{ .Values.imageTag }}
          envFrom:
            - secretRef:
                name: preview-db-creds

      volumes:
        - name: seed-scripts
          configMap:
            name: preview-seed-scripts

Inline verification:

kubectl get pods -n preview-<PR_NUMBER>
# Both init containers should show "Completed" before app shows "Running"
kubectl logs -n preview-<PR_NUMBER> -c db-seed deploy/preview-app
# Expected: "psql: ... INSERT 0 <N>" lines with no error exit

This integrates cleanly with synchronizing environment variables across stages β€” the same external-secrets-operator sync that populates app credentials also populates the init container secrets, preventing credential drift between tiers.


Step 4 β€” Application-layer mocking with Prisma (unit and component tests)

When the test suite does not exercise the database engine itself, intercept at the ORM layer. This eliminates the 10–30 s container spin-up from every unit test run.

// src/__tests__/setup/prisma.mock.ts
import { PrismaClient } from '@prisma/client'
import { mockDeep, DeepMockProxy } from 'jest-mock-extended'

// Create a fully-typed recursive proxy for the Prisma client
export const prismaMock = mockDeep<PrismaClient>()

// Replace the module singleton before any test file imports it
jest.mock('../../lib/prisma', () => ({
  __esModule: true,
  prisma: prismaMock,
}))

beforeEach(() => {
  // Reset all mocks between tests; prevents bleed-across between test cases
  jest.resetAllMocks()
})
// src/__tests__/user.service.test.ts
import { prismaMock } from './setup/prisma.mock'
import { getUserById } from '../services/user.service'

test('returns user when found', async () => {
  prismaMock.user.findUnique.mockResolvedValue({
    id: 'usr_01',
    email: '[email protected]',
    name: 'Alice',
    createdAt: new Date('2024-01-01'),
  })

  const user = await getUserById('usr_01')
  expect(user?.email).toBe('[email protected]')
  // prismaMock records every call β€” assert on query shape if needed
  expect(prismaMock.user.findUnique).toHaveBeenCalledWith({
    where: { id: 'usr_01' },
  })
})

Inline verification:

npx jest --testPathPattern="user.service" --verbose
# Expected: test suite completes in < 2 s with no database connection

Step 5 β€” Sanitize data before seeding from production

If your highest-fidelity previews require production-shaped data, run every export through an anonymization pipeline before ingestion. This is non-negotiable for GDPR and SOC 2 compliance.

#!/usr/bin/env bash
# scripts/anonymize-dump.sh β€” run in a secure, network-isolated environment
set -euo pipefail

SOURCE_DUMP="$1"        # path to pg_dump output
OUTPUT_SEED="$2"        # destination path for sanitized seed file

# 1. Restore to a temporary local database
psql "$TEMP_DB_URL" < "$SOURCE_DUMP"

# 2. Apply column-level anonymization (using pganaonymize or custom SQL)
psql "$TEMP_DB_URL" <<'SQL'
  -- Replace PII with deterministic pseudonyms so foreign-key joins still work
  UPDATE users SET
    email      = 'user_' || id || '@example-preview.invalid',
    name       = 'Preview User ' || id,
    phone      = NULL,
    ip_address = '0.0.0.0';

  UPDATE payment_methods SET
    card_last_four = '0000',
    billing_name   = 'Test Account';
SQL

# 3. Dump the sanitized state for use as a seed file
pg_dump --data-only --no-owner "$TEMP_DB_URL" > "$OUTPUT_SEED"

echo "Sanitized seed written to $OUTPUT_SEED"

Inline verification:

grep -i "real.email@" "$OUTPUT_SEED" && echo "FAIL: PII detected" || echo "PASS: no PII found"

Configuration reference

Option Type Default Effect
healthcheck.interval duration 10s How often Docker polls pg_isready; reduce to 5s for faster gate lift
healthcheck.retries integer 5 Maximum probe failures before the container is marked unhealthy
tmpfs on /var/lib/postgresql/data mount off Stores DB files in RAM; eliminates teardown disk cleanup, reduces I/O overhead
depends_on.condition: service_healthy enum service_started Blocks dependent containers until health-check passes (Docker Compose v2+)
depends_on.condition: service_completed_successfully enum service_started Blocks until init container exits 0; use for migration steps
Flyway FLYWAY_OUT_OF_ORDER boolean false Allow migrations to run out of sequence; keep false in preview to match production
Prisma DATABASE_URL string none Override to point at the seeded container; must match the migration DSN
POSTGRES_INITDB_ARGS string none Pass --encoding=UTF8 --locale=en_US.utf8 to match production collation

Integration with upstream and downstream

Upstream β€” preview deployment triggers: the database provisioning pipeline fires as part of the same GitHub Actions workflow that triggers automated preview deployments on pull requests. The database readiness gate must resolve before the application image is deployed, not concurrently with it. Add an explicit job dependency in your workflow:

jobs:
  provision-db:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker compose -f docker-compose.preview.yml up -d --wait db migrate

  deploy-app:
    needs: provision-db     # enforces the three-phase gate at the workflow level
    runs-on: ubuntu-latest
    steps:
      - run: kubectl apply -f preview-deployment.yaml

Downstream β€” environment variable synchronization: database connection strings are the most common source of environment drift. Align your DATABASE_URL convention with the patterns in synchronizing environment variables across stages so every tier β€” local, preview, staging, production β€” resolves credentials from the same secret-manager hierarchy.


Performance benchmarks and cost impact

These figures are from a monorepo with 120 migrations and a 15 MB synthetic seed dataset running on GitHub-hosted ubuntu-latest runners (4 vCPU, 16 GB RAM):

Provisioning strategy Median init time p95 init time Runner cost per preview (est.)
Application-layer mock only < 1 s 2 s $0.00 (no extra runner time)
Docker Compose + pg_isready gate 14 s 28 s ~$0.01 per preview spin-up
Kubernetes InitContainer (GKE) 22 s 45 s ~$0.03 per preview spin-up
Anonymized snapshot (600 MB dump) 90 s 140 s ~$0.08 per preview spin-up

Optimization levers:

  • Pre-bake migrations into a custom image (FROM postgres:16-alpine; COPY migrations/ /docker-entrypoint-initdb.d/) β€” eliminates the migrate phase at runtime and cuts median init time to 8 s.
  • Cache the seed .sql file in CI artifact storage keyed on its SHA256. On a cache hit, volume-mount the file directly rather than generating it. Saves 5–15 s per run on large seed sets.
  • Use tmpfs storage for the PostgreSQL data directory. Eliminates I/O wait and cuts teardown to 0 s β€” no volume to delete.

Compared to using a shared staging database: per-branch isolation eliminates the 8–12 minutes of serialized test queuing that occurs when 10+ developers share one staging instance during peak hours.


Troubleshooting

Error: connection refused on first application request

Exact error: Error: connect ECONNREFUSED 127.0.0.1:5432 or PG::ConnectionBad: could not connect to server

Root cause: the application container started before the database health-check probe confirmed readiness. This happens when depends_on is configured with condition: service_started (the default) instead of condition: service_healthy.

Fix:

# docker-compose.preview.yml
services:
  app:
    depends_on:
      db:
        condition: service_healthy    # NOT service_started

Error: relation "X" does not exist after seeding

Exact error: ERROR: relation "users" does not exist (or any table name)

Root cause: seed scripts ran before the migration container completed, or migrations were not committed to version control before the preview branch was pushed.

Fix: ensure the migrate service uses condition: service_completed_successfully and that the seed script runs in a separate service that depends_on the migration service. Add a migration lint step to your PR check:

# Fail the PR if any migration file was modified without a corresponding schema snapshot update
flyway info --url="$FLYWAY_URL" | grep "Pending" && exit 1 || exit 0

Error: too many connections on parallel previews

Exact error: FATAL: remaining connection slots are reserved for non-replication superuser connections

Root cause: multiple concurrent previews each opening their own connection pool to a shared database proxy, exhausting max_connections.

Fix: deploy per-branch database instances rather than sharing a proxy. Use namespace-scoped Postgres StatefulSets in Kubernetes, or per-PR RDS instances with a small instance class (db.t4g.micro). Alternatively, front a shared instance with PgBouncer in transaction-pool mode and set pool_size to 5 per preview namespace.


Error: disk quota exceeded during snapshot seeding

Exact error: No space left on device during pg_restore or COPY FROM

Root cause: the CI runner’s ephemeral disk (typically 14 GB on GitHub-hosted runners) fills when restoring an uncompressed production dump.

Fix: compress seed files with pg_dump --compress=9 and decompress on the fly during restore. Switch to partial-dataset extraction using pg_dump --table flags to pull only the tables your tests actually need. Purge volumes at the end of every job with docker compose down -v.


Frequently Asked Questions

When should I use application-layer mocking instead of container seeding?

Use mocking for unit and component-level tests where execution speed and isolation take priority over query-planner accuracy. The jest-mock-extended / mockDeep pattern eliminates network overhead and works without a running database process. Switch to container seeding when you need to validate ORM-generated SQL, complex multi-table joins, database-level constraints, or any PostgreSQL-specific feature like JSONB operators or full-text search.

How do I prevent PII leakage when seeding from production dumps?

Implement a deterministic anonymization pipeline β€” format-preserving encryption for structured fields (phone numbers, card numbers) and column-level hashing or replacement for free-text PII (email, name, address). Never ingest raw production data into a preview tier. Route every export through the anonymization step before the seed file is written to artifact storage. Audit the output with a regex scan for known PII patterns (grep -E '[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}') as a CI gate.

What is the fastest teardown strategy for ephemeral database instances?

Mount the PostgreSQL data directory as tmpfs β€” when the container stops, the data is gone with zero cleanup steps. For Kubernetes StatefulSets, delete the namespace in a post-merge CI hook: kubectl delete namespace preview-<PR_NUMBER>. Pair with a TTL-based lifecycle controller (e.g., a CronJob that deletes namespaces older than 48 hours) so orphaned previews from force-closed PRs are also cleaned up.

How can I cut database init latency without sacrificing environment parity?

Pre-bake your migration SQL into a custom PostgreSQL image so schema setup runs during the image build, not at container start. Cache the resulting image digest in your CI registry β€” on a cache hit, init time drops to the health-check poll interval (5–10 s). For seed data, store a compressed .sql file in CI artifact storage keyed on its content hash; re-use the cached version on branches where seed data has not changed.


← Back to Preview Environments & Environment Parity