Fixing Cache Poisoning Issues in Distributed CI Runners

Turborepo’s remote cache delivers major build-time savings, but shared artifact storage across parallel runners creates an integrity risk: a corrupted, architecture-mismatched, or branch-contaminated artifact can be restored silently and propagate non-deterministic failures through every downstream job.

When to use this pattern

Apply this recovery and hardening workflow when:

  • Build outputs differ between runners pulling the same cache key (same commit, different machines).
  • You see Exec format error or MODULE_NOT_FOUND on packages that ls node_modules confirms are present — a clear sign a foreign-architecture binary was restored.
  • A new runner type (e.g., ARM-based GitHub-hosted runner or a Kubernetes node with a different glibc) joined the pool after the cache was warm.

Prerequisites


Anatomy of a poisoned pipeline

Before touching config, it helps to know how poisoning happens. The diagram below shows the failure path: two runner types writing to the same hash slot, with one restoring the other’s incompatible artifact.

Cache poisoning sequence diagram Shows an x86 runner writing a compiled artifact to a shared cache key, then an ARM runner restoring the same key and receiving the incompatible x86 binary, causing an Exec format error at runtime. x86 Runner Remote Cache ARM Runner PUT artifact (key lacks arch) esbuild — linux/amd64 GET same key returns linux/amd64 binary Exec format error ARM cannot run amd64 binary Fix: add runner.arch to cache key

Complete working example

The snippet below is a self-contained GitHub Actions job that covers the full remediation: key hardening, a pre-restore integrity check, forced rebuild on poisoning detection, and scoped write permissions.

# .github/workflows/build.yml
name: Build (poison-resistant)

on:
  push:
    branches: [main, "release/**"]
  pull_request:

jobs:
  build:
    runs-on: ${{ matrix.runner }}
    strategy:
      matrix:
        runner: [ubuntu-latest, ubuntu-24.04-arm]  # test both arch classes

    steps:
      # 1. Check out with full history so Turborepo affected detection works
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-node@v4
        with:
          node-version: "20"

      # 2. Restore node_modules with a key that includes OS + arch + lockfile hash.
      #    Omitting runner.arch is the single most common poisoning root cause.
      - name: Restore node_modules cache
        id: nm-cache
        uses: actions/cache@v4
        with:
          path: |
            node_modules
            **/node_modules
          key: nm-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            nm-${{ runner.os }}-${{ runner.arch }}-

      # 3. Integrity gate: verify the SHA-256 manifest written at cache-save time.
      #    If it fails, we clear node_modules and force a clean install below.
      - name: Verify node_modules integrity
        id: integrity
        if: steps.nm-cache.outputs.cache-hit == 'true'
        run: |
          if [ -f node_modules/.cache-manifest.sha256 ]; then
            sha256sum --check node_modules/.cache-manifest.sha256 \
              && echo "ok=true" >> "$GITHUB_OUTPUT" \
              || { echo "ok=false" >> "$GITHUB_OUTPUT"; rm -rf node_modules; }
          else
            # No manifest means the cache predates this hardening — treat as miss
            echo "ok=false" >> "$GITHUB_OUTPUT"
            rm -rf node_modules
          fi

      # 4. Install dependencies (skipped only when cache hit AND integrity passed)
      - name: Install dependencies
        if: steps.nm-cache.outputs.cache-hit != 'true' || steps.integrity.outputs.ok != 'true'
        run: npm ci --prefer-offline

      # 5. Write a fresh SHA-256 manifest after a clean install so future restores
      #    have something to verify against.
      - name: Write integrity manifest
        if: steps.nm-cache.outputs.cache-hit != 'true' || steps.integrity.outputs.ok != 'true'
        run: |
          find node_modules/.bin -type f -exec sha256sum {} \; \
            > node_modules/.cache-manifest.sha256

      # 6. Turborepo remote cache.  Write access restricted to protected branches only.
      - name: Build via Turborepo
        env:
          TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
          TURBO_API: ${{ secrets.TURBO_API }}
          # Prevent writes from PR/feature branches — read-only consumers
          TURBO_REMOTE_CACHE_WRITE: ${{ github.ref_name == 'main' && 'true' || 'false' }}
        run: npx turbo run build --cache-dir=.turbo

Step-by-step walkthrough

Key derivation (steps 2 and 4)

The runner.arch expression resolves to X64 or ARM64 depending on the GitHub-hosted runner class. Without it, the cache key is identical for both architectures and whichever runner writes first wins — poisoning the slot for any runner with a different native binary ABI.

The hashFiles('**/package-lock.json') component invalidates the cache automatically when any workspace lockfile changes, preventing silent dependency downgrades where node_modules lags behind a lockfile update.

Integrity gate (step 3)

The sha256sum --check step re-hashes every file listed in .cache-manifest.sha256 and compares the result to the stored digests. A mismatch here means the tarball was corrupted in transit or a previous runner wrote content under a colliding key. The step clears node_modules and sets ok=false, which causes step 4 to run a full npm ci.

Manifest writing (step 5)

After a clean npm ci, we snapshot the SHA-256 fingerprints of all binaries in node_modules/.bin. Scoping to .bin keeps the manifest small (tens of files rather than thousands) while covering the most failure-prone assets: compiled native addons and platform-specific CLI tools such as esbuild, @swc/core, and turbo itself.

Write-gate for Turborepo artifacts (step 6)

TURBO_REMOTE_CACHE_WRITE=false tells Turborepo to read from the remote cache but never write. This prevents feature branches from writing unvetted artifacts into the shared namespace — a common cross-branch contamination vector when working with Turborepo remote caching.


Purging a poisoned Turborepo artifact

When a poisoned artifact is already in remote storage, pull the affected hash from Turborepo’s verbose output and delete it directly:

# Find the poisoned task hash from a recent failed run
npx turbo run build --verbosity=2 2>&1 | grep "cache miss\|cache hit\|[0-9a-f]\{40\}" | head -30

# Delete from S3-compatible backend
POISONED_HASH="<paste hash here>"
aws s3 rm "s3://${TURBO_CACHE_BUCKET}/${POISONED_HASH}.tar.gz"

# Delete from GitHub Actions cache by key (if using actions/cache for .turbo)
gh cache delete --repo "${GITHUB_REPOSITORY}" "turbo-${POISONED_HASH}"

For a self-hosted turborepo-remote-cache server, the artifact lives at /v8/artifacts/<hash> on the server’s storage volume — delete the file and restart the server to clear any in-memory reference.


Verification

After the fix is in place, confirm parity across runner architectures:

# 1. Confirm cache key includes arch — look for X64 or ARM64 in the key string
gh cache list --repo "${GITHUB_REPOSITORY}" --limit 20 | grep -E 'X64|ARM64'

# 2. Verify the restored binary matches the current runner
file node_modules/.bin/esbuild
# Expected on ARM: ELF 64-bit LSB executable, ARM aarch64
# Expected on x86: ELF 64-bit LSB executable, x86-64

# 3. Confirm no phantom dependency mismatches after restore
npm ls --depth=0 2>&1 | grep -E 'UNMET|invalid' || echo "Dependency tree is consistent"

# 4. Spot-check Turborepo sees a clean cache miss after purge (not a poisoned hit)
TURBO_REMOTE_CACHE_WRITE=false npx turbo run build --verbosity=2 2>&1 \
  | grep -E "MISS|HIT|FULL TURBO"

Expected output after successful remediation:

node_modules/.bin/esbuild: ELF 64-bit LSB executable, ARM aarch64 ...
Dependency tree is consistent
  build:my-app cache miss, executing ...

A cache miss on the first post-purge run is correct; the subsequent run should show cache hit from a freshly written, architecture-correct artifact.


Common pitfalls

Pitfall 1 — Restore keys fall back to a cross-arch entry

restore-keys act as progressively broader fallbacks. If your restore key hierarchy includes a prefix that matches an entry from a different architecture, Actions will restore it:

# Dangerous: the fallback key has no arch component
restore-keys: |
  nm-${{ runner.os }}-

Fix: make every restore-key level architecture-aware:

restore-keys: |
  nm-${{ runner.os }}-${{ runner.arch }}-

Pitfall 2 — Turborepo outputs globs too broad

When outputs in turbo.json captures build artifacts that include native binaries, those binaries become part of the Turborepo artifact hash. A too-broad glob such as "outputs": ["**/*"] can inadvertently pull platform-specific shared objects into a cache entry expected to be portable:

// turbo.json — scope outputs tightly
{
  "tasks": {
    "build": {
      "outputs": ["dist/**", ".next/**", "!**/*.node"]
    }
  }
}

The !**/*.node exclusion keeps native addon .node files out of the Turborepo artifact entirely; let the platform-scoped node_modules cache handle them.

Pitfall 3 — Branch protection race at cache write time

Two PRs merging within seconds can both attempt to write to main’s remote cache slot simultaneously. The second write may overwrite the first with a partially-assembled artifact. Guard against this using Turborepo’s --concurrency=1 flag on cache-writing jobs, or route remote cache writes through a dedicated serialized job:

# Dedicated cache-population job, runs after all parallel test jobs finish
cache-populate:
  needs: [test-matrix]
  if: github.ref_name == 'main'
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - run: npx turbo run build --cache-dir=.turbo --concurrency=1
      env:
        TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
        TURBO_API: ${{ secrets.TURBO_API }}

← Back to Implementing Remote Build Caching with Turborepo