dbt at Scale: Lessons From a 5,000-Model Project

Oct 08, 2025 5 min

dbt makes the first hundred models easy. The next four thousand are software engineering.

dbt is the de facto transformation tool for the modern data warehouse. Its first hundred models are friction-free. Past a thousand, it starts behaving like the software project it always was. We recently helped scale a dbt codebase to 5,000+ models. Here is what survived.

Project structure: layered, not flat

The standard dbt structure (staging / intermediate / marts) holds up to about 200 models. Past that, you need a deeper hierarchy. Separate raw, cleaned, conformed, presentation directories. Use sub-folders by source system. Use sub-folders within those by entity. Make navigation possible.

Naming conventions earn their keep

Every model name follows a pattern: stg__, int____, fact__, dim__. Engineers who have never seen the project can find what they need in seconds. Make this a CI rule, not a wiki suggestion.

Test coverage matters more than line coverage

dbt tests (unique, not_null, accepted_values, relationships, custom singular tests) are your only defense against silently broken pipelines. Aim for 100% coverage of primary keys, foreign keys, and known invariants. Schedule the full test suite every hour in production.

Macros: the abstractions that pay back

The first hundred models do not need macros. The next thousand do. Pull repeated logic — date dimension joins, cohort calculations, audit columns — into macros. The DRY-ness compounds.

Performance: incremental everything

By the time you have a thousand models, full refreshes take hours and cost money. Convert downstream models to incremental. Use partition predicates. Profile slow models with --debug and your warehouse's query planner. dbt is software, and software needs profiling.

The deployment shape

dbt Cloud or your own dbt-core deployment with Airflow / Dagster / GitHub Actions. CI runs on every PR (build affected models, run tests). Production deploys via a release branch with manual approval for risky changes. Source-controlled, code-reviewed, observable.

The biggest pitfall

Treating dbt as a SQL editor instead of a software project. Once the codebase passes 500 models, it needs the same engineering discipline as your application code: code review, testing, documentation, refactoring. Teams that skip this end up with a 5,000-model spaghetti graph nobody understands.