3 Common dbt Mistakes And How to Avoid Them.

Mustafa Idris
4 min readMar 10, 2024

A Guide to Avoiding Common Mistakes in Data Transformation and Modeling with dbt.

source: ThoughtSpot

Imagine, you’ve carefully collected your data, designed your models, and are ready to present a compelling set of insightful reports. But then, disaster strikes!
Corrupted data, broken pipelines, and a wave of strong frustration haunt you long after the analysis.

Lol…You are not alone, and in this article, we will explore the three most common dbt mistakes you might encounter, offering clear solutions to help you navigate them confidently.
Whether you’re a seasoned dbt user or just starting out, understanding these mistakes and how to avoid them, can significantly boost your data transformation efficiency.

So, lesssgo.

1. Unclear Naming Convention:

Picture walking into a messy storage room with stuff scattered everywhere. You’re looking for a particular tool, but all you can find are boxes without labels and random bits and pieces.
Annoying, right? Well, that’s pretty much how it feels when you’re dealing with unclear names in dbt.

Instead of using names that clearly describe what things are, like models, sources, and columns, you find names with confusing short forms or general words. For example, if you see “emp_type” in your data. What does it mean? Is it “employee type,” “employment type,” or something else entirely? These shortened names can be confusing and take time to figure out.

So what should we do?
- Use descriptive names that reflect the purpose of the model, source, or column.

-- Instead of unclear names
SELECT id, name, salary
FROM employee_data
WHERE emp_type = 'FT';

-- Use a descriptive name
SELECT id, name, salary
FROM employee_data
WHERE employment_type = 'Full-Time';

- Establish a consistent naming convention across your dbt project.

-- Instead of inconsistent naming
SELECT name, city, order_type
FROM users AS u
JOIN orders AS o ON u.id = o.user_id;

-- Use standardized naming
SELECT name, city, order_type
FROM users AS usr
JOIN orders AS ord ON usr.id = ord.user_id;

- Document your chosen naming conventions and make them easily accessible for the team.

## Naming Conventions
- **Sources:** Use snake_case with a consistent prefix (e.g., 'my_') for custom sources.
- **Columns:** Opt for snake_case for consistency and clarity, using descriptive names to convey content.
- **Models:** Choose clear, meaningful names reflecting the transformation's purpose.

By adopting clear naming conventions and adequate documentation, you transform your dbt code from a messy storeroom into a well-organized workshop. Everything has its designated place, making it easy to find exactly what you need and collaborate effectively with others.

2. Unoptimized SQL Queries:

Ever feel like your computer is taking forever to fetch the data you need? In dbt, one reason for this could be not-so-smart instructions, known as poorly written SQL queries. These instructions guide your computer on what data to grab, and if they’re not written well, it’s like sending your computer on a confusing treasure hunt.

Let’s say you have a table called “orders” with details like id, status, and other things. To get the id and status, you might be tempted to do something like:

--Unoptimized: Selecting all columns when only a few are needed
SELECT *
FROM orders
WHERE status = 'pending';

The above query can lead to;

  • Long wait time
  • Resource overload
  • Subpar performance

Instead of grabbing everything from the table, we can be more specific and select only relevant columns.

-- Optimized: Selecting only the required columns
SELECT id,
status
FROM orders
WHERE status = 'pending';

3. Overlooking Testing:

We’ve covered other mistakes people make in dbt, but let’s face it, even the most skilled tightrope walker needs a safety net. In dbt, that net is testing. Without it, a small mistake in your queries can bring down your entire data analysis.

How do You Test in dbt?

  • By checking row counts: test that the expected number of rows is generated by a model.
-- tests/my_model_test.sql

{{ config(
materialized='table',
post-hook=
'SELECT COUNT(*) AS row_count
FROM my_model'
)}}
  • By validating Data Type: validate the data types of specific columns
-- tests/my_model_test.sql

{{ config(
materialized='table',
post-hook= 'SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = ''my_model''
AND column_name = ''expected_column'' AND data_type = ''expected_data_type'''
)}}
  • By checking for Null Values: ensure that certain columns do not contain null values.
-- tests/my_model_test.sql

{{ config(
materialized='table',
post-hook=
'SELECT COUNT(*) AS null_count
FROM my_model
WHERE nullable_column IS NULL'
)}}
  • Custom Tests: write custom tests tailored to your specific requirements.
-- tests/my_model_test.sql

{{ config(
materialized='view',
post-hook='SELECT custom_test_function()'
)}}

Why Testing Matters So Much:

Testing acts as your watchful protector, catching errors early in the process. Here’s how it saves the day;

  • Catch Data Mistakes Early: Errors can sneak into your data silently, like an unexpected guest on a ship. Testing reveals these hidden intruders before they cause chaos in your downstream analyses.
  • Handle Surprises in Your Data: Data can sometimes act unexpectedly — with scenarios you might not have thought about. Testing helps identify these quirks and ensures your models handle them gracefully.
  • Build Trust and Confidence: When your team can trust the accuracy of your dbt models, confidence grows. Testing is the foundation of this trust, making sure everyone works with reliable data.

Conclusion:

By following these simple tips, you can avoid these common mistakes and significantly enhance the efficiency and reliability of your dbt workflows. Embrace clear naming conventions and documentation, optimize your SQL queries, and leverage dbt’s testing features for robust data transformations.

--

--