Incremental Builds and Affected Detection in Monorepos

PR validation pipelines in a growing monorepo routinely rebuild every workspace regardless of what changed — burning runner minutes, delaying developer feedback, and masking legitimate failures in noise. Affected detection solves this by computing a directed acyclic graph (DAG) of workspace dependencies, diffing the change set against it, and dispatching only the tasks whose transitive inputs actually changed. This guide covers the full production path: graph construction, CI integration, remote cache synchronisation, and the failure modes that corrupt results at scale.


Incremental build pipeline flow Diagram showing how a git diff feeds into dependency graph traversal which produces an affected set, then routes only those workspaces through lint, test and build tasks in CI. git diff base..HEAD Dependency Graph (DAG) A B C Affected Set workspace A, C lint test build unaffected workspaces skipped task boundary scheduled task

Prerequisites

Before wiring affected detection into CI, confirm the following are in place:


How the Dependency Graph and Affected Scope Work

Both Nx and Turborepo parse workspace manifests and build a DAG where each node is a project and each directed edge is a dependencies / devDependencies declaration pointing to another workspace package. When a change set arrives (a list of file paths from git diff base..HEAD), the tool walks the graph upward — any project that owns a changed file, plus any project that depends on it transitively, is marked affected.

The critical constraint: the graph reflects declared dependencies only. A dynamic import() evaluated at runtime, or a path alias that bypasses the manifest, is invisible to the static analyser. This is why teams maintain strict eslint-plugin-import rules and explicit outputs declarations in turbo.json — undeclared edges are the root cause of most false-negative affected misses.

The base commit for the diff matters enormously. In a PR pipeline the correct base is git merge-base HEAD origin/main, not origin/main directly — the latter may exclude commits from the feature branch that landed after the branch diverged, producing a wider diff and triggering unnecessary rebuilds.


Step-by-Step Implementation

1. Initialize workspace dependency declarations

Verify that every internal dependency is declared explicitly in each workspace’s package.json:

{
  "name": "@acme/web",
  "dependencies": {
    "@acme/ui": "workspace:*",
    "@acme/api-client": "workspace:*"
  }
}

Run the graph visualiser to confirm edges are resolved:

# Nx
npx nx graph --file=graph.json && cat graph.json | jq '.graph.nodes | keys'

# Turborepo
npx turbo run build --dry=json | jq '.tasks[].package'

If a workspace you expected to appear is missing, the dependency declaration is absent from its manifest.

2. Configure base-commit tracking

Export the merge-base SHA as an environment variable that all subsequent steps share. This is the single most important configuration decision — an incorrect base produces unreliable results:

# Compute the true divergence point from the target branch
BASE=$(git merge-base HEAD origin/main)
echo "BASE_SHA=$BASE" >> "$GITHUB_ENV"   # GitHub Actions
# or
echo "BASE_SHA=$BASE" >> build.env       # GitLab artifact env

3. Wire GitHub Actions

name: Monorepo — Affected CI
on:
  pull_request:
    branches: [main]

jobs:
  affected:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0          # full history required for merge-base calculation

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: corepack enable && pnpm install --frozen-lockfile

      # Restore Nx / Turborepo local cache between runs
      - uses: actions/cache@v4
        with:
          path: |
            .nx/cache
            .turbo
          key: ${{ runner.os }}-nx-${{ hashFiles('**/pnpm-lock.yaml') }}-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-nx-${{ hashFiles('**/pnpm-lock.yaml') }}-
            ${{ runner.os }}-nx-

      # Nx: run lint + test + build only on affected projects
      - name: Nx affected
        run: |
          BASE=$(git merge-base HEAD origin/${{ github.base_ref }})
          npx nx affected \
            --target=lint,test,build \
            --base=$BASE \
            --head=${{ github.sha }} \
            --parallel=4
        env:
          NX_CLOUD_ACCESS_TOKEN: ${{ secrets.NX_CLOUD_ACCESS_TOKEN }}

Line-by-line explanation:

  • fetch-depth: 0 — without full history git merge-base returns nothing.
  • hashFiles('**/pnpm-lock.yaml') — cache key changes only when the lockfile changes, protecting against stale node_modules hits.
  • --target=lint,test,build — a single affected invocation fans out across all three targets; Nx respects dependsOn ordering automatically.
  • --parallel=4 — caps concurrent task processes; tune to available runner vCPUs.
  • NX_CLOUD_ACCESS_TOKEN — enables distributed task execution and remote cache on Nx Cloud.

Verify the step produces the expected project list before running tasks:

npx nx show projects --affected --base=$BASE --head=${{ github.sha }}

4. Wire GitLab CI

GitLab does not expose the target branch tip SHA as a ready-made variable. Use git merge-base against the fetched target branch:

stages: [affected, build]

.affected_base: &affected_base
  image: node:20-alpine
  before_script:
    - git fetch --unshallow origin $CI_MERGE_REQUEST_TARGET_BRANCH_NAME
    - export BASE=$(git merge-base HEAD origin/$CI_MERGE_REQUEST_TARGET_BRANCH_NAME)
  cache:
    key:
      files: [pnpm-lock.yaml]
    paths: [.nx/cache, node_modules/.cache]

lint-and-test:
  <<: *affected_base
  stage: affected
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - corepack enable && pnpm install --frozen-lockfile
    - npx nx affected --target=lint,test --base=$BASE --head=$CI_COMMIT_SHA --parallel=4

build:
  <<: *affected_base
  stage: build
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - corepack enable && pnpm install --frozen-lockfile
    - npx nx affected --target=build --base=$BASE --head=$CI_COMMIT_SHA --parallel=4
  artifacts:
    paths: [dist/]
    expire_in: 1 day

Line-by-line explanation:

  • --unshallow — GitLab shallow clones omit ancestry required for merge-base; this restores it.
  • cache.key.files — generates a unique cache key per lockfile state, preventing cross-branch pollution.
  • CI_PIPELINE_SOURCE == "merge_request_event" — gates execution to MR pipelines only; branch pipelines on main should run a full build via a separate job.
  • --parallel=4 — GitLab shared runners provide 2 vCPUs; 4 keeps the queue saturated without OOM.

Verify affected output before committing to the full pipeline:

npx nx show projects --affected --base=$BASE --head=$CI_COMMIT_SHA

5. Integrate remote caching

For Turborepo remote caching, inject the three required environment variables and enable remote-only mode on PR builds:

env:
  TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
  TURBO_TEAM:  ${{ vars.TURBO_TEAM }}
  TURBO_API:   ${{ vars.TURBO_API }}

- name: Build affected (Turborepo)
  run: |
    BASE=$(git merge-base HEAD origin/${{ github.base_ref }})
    npx turbo run build \
      --filter="...[origin/main]" \
      --remote-only \
      --cache-dir=.turbo

The --filter="...[origin/main]" syntax is Turborepo’s native affected scope: it expands to all workspaces with changes since origin/main.


Configuration Reference

Option Type Default Effect
--base (Nx) SHA string last successful CI run Oldest commit included in the diff; set to merge-base in PRs
--head (Nx) SHA string HEAD Newest commit included in the diff
--parallel (Nx) integer 3 Max concurrent task processes per runner
--target (Nx) string / list Task(s) to run across affected projects
--filter (Turborepo) glob / range Workspace scope; ...[origin/main] means affected since main
--remote-only (Turborepo) flag off Skip local cache; enforce remote artifact reads
remoteCache.enabled (turbo.json) boolean false Activates remote cache feature
NX_CLOUD_ACCESS_TOKEN env var Auth token for Nx Cloud distributed cache

Integration with Sibling Topics

Incremental builds do not operate in isolation — they are most effective when combined with two adjacent capabilities from the Build Optimization & Caching Strategies pillar:

Remote Build Caching: Affected detection narrows which tasks run; remote caching eliminates re-running those tasks when artifacts already exist from a previous identical input hash. Without remote cache, every new runner restarts the affected tasks from scratch even if the source code is unchanged from a sibling branch.

Docker Layer Caching for Full-Stack Applications: When containerised runners produce Docker images as build artifacts, align Dockerfile COPY directives with workspace boundaries so that only the layers corresponding to affected workspaces are invalidated. Mount the .nx/cache or .turbo directory into the build container to avoid redundant dependency resolution steps.

Nx affected PR gating: The sibling deep-dive covers Nx-specific configuration in detail — branch protection rules, required status check wiring, and handling the edge case where no projects are affected (the empty-affected guard).

For tracking the financial impact of incremental builds, feed affected-count metrics into CI/CD compute cost tracking to demonstrate ROI to stakeholders.


Performance Benchmarks

Field data from platform teams running Nx or Turborepo affected pipelines on mid-sized monorepos (20–60 workspaces, active team of 10–30 engineers):

Metric Full-rebuild baseline Affected pipeline Change
Median PR build time 14 min 3.5 min −75 %
p95 PR build time 28 min 9 min −68 %
Runner minutes / day ~840 ~210 −75 %
Remote cache hit rate (steady state) n/a 72–85 %
Graph init overhead 8–15 s added fixed cost

The graph initialisation overhead (8–15 s) is a fixed cost that dominates on very small PRs. For single-file hotfixes touching a standalone utility package, the overhead can exceed the build time — acceptable given the savings across the broader PR queue.

Remote cache hit rates plateau at 72–85 % after ~2 weeks of steady-state traffic, once the most common input hashes have been populated. Hit rates below 60 % typically indicate environment drift (mismatched Node.js versions or OS between runners).


Troubleshooting

Cannot find module '@acme/ui' after workspace install

Root cause: pnpm install --frozen-lockfile ran before internal packages were built; a consumer workspace imports the compiled output that does not yet exist.

Fix: Add an explicit dependsOn in nx.json or turbo.json so the dependency’s build task runs before the consumer’s:

{
  "tasks": {
    "build": {
      "dependsOn": ["^build"]
    }
  }
}

The ^ prefix means “all upstream workspaces’ build tasks”.

merge-base returns empty string in GitLab

Error text: fatal: Not a valid object name ''

Root cause: The pipeline checked out a shallow clone. git merge-base requires ancestry data that shallow clones omit.

Fix:

git fetch --unshallow origin $CI_MERGE_REQUEST_TARGET_BRANCH_NAME

Add this to before_script before computing BASE.

Affected set includes every workspace on every PR

Root cause: A root-level file (package.json, nx.json, turbo.json, .eslintrc.js) changed, causing the entire graph to be marked affected. Or --base is set to the initial commit (HEAD~1 on a squash merge).

Fix: Exclude root config changes from the affected expansion using project-level tags and targetDefaults to scope impact. For release-config files, accept the full rebuild — correctness trumps speed here.

Cache hit rate drops below 50 % after adding a new runner pool

Root cause: The new runners run a different OS or Node.js patch version; the cache key includes the runner OS (runner.os in GitHub Actions), which differs.

Fix: Normalise all runners to the same base image. Include OS and Node version in the cache key explicitly:

key: ${{ runner.os }}-node20-nx-${{ hashFiles('**/pnpm-lock.yaml') }}

Frequently Asked Questions

How does affected detection handle dynamic imports or runtime-only dependencies?

Static analysis tools — both Nx and Turborepo — miss dynamic import() calls and path aliases that bypass workspace manifests. The practical mitigation is to enforce explicit eslint-plugin-import no-cycle and no-unresolved rules that catch undeclared cross-workspace imports at lint time. For high-risk modules (shared config, design tokens), configure a fallback full-build by tagging them as implicitDependencies in nx.json so any change always marks all consumers affected.

What is the recommended cache TTL for monorepo build artifacts?

7–14 days with LRU eviction works well for active monorepos. Shorter TTLs (3 days) suit repos with high branch churn where stale artifacts from dead branches consume disproportionate storage. Longer TTLs (30 days) make sense for release branches where the same artifact hashes recur frequently. Pair TTL with a maximum total storage budget and an automated pruning script rather than relying on TTL alone.

When should I disable incremental builds in CI?

Disable for merge commits to main or release/* — run a full build to guarantee artifact consistency and exercise all integration tests. Also disable when the dependency graph integrity is in question: after a major dependency upgrade, after a significant workspace restructure, or when you see divergence between local and CI build outputs. The weekly main full-rebuild also serves as a canary for infrastructure drift.

How do I ensure environment parity between local and CI runners?

Pin Node.js via .nvmrc or .tool-versions and commit the file. Use a containerised runner image that hard-codes the same Node version. Include process.env.NODE_VERSION and process.env.npm_config_cache in the cache key fingerprint so a mismatch produces a cache miss rather than a poisoned hit. Validate parity regularly by comparing node --version output across local, PR, and release runner logs.


← Back to Build Optimization & Caching Strategies