Insights

The following insights are available in DataPilot:

1. Modelling Insights

Name

Description

Files Required

Overrides

source_staging_model_integrity

Ensures each source has a dedicated
staging model and is not directly
joined to downstream models.

Manifest

None

downstream_source_dependence

Evaluates if downstream models
(marts or intermediates) are improperly
dependent directly on a source. This
check ensures that all downstream
models depend on staging models,
not directly on the source nodes.

Manifest

None

Duplicate_Sources

Identifies cases where multiple source
nodes in a dbt project refer to the
same database object. Ensures that each
database object is represented by a single,
unique source node.

Manifest

None

hard_coded_references

Identifies instances where SQL code
within models contains hard-coded references,
which can obscure data lineage and complicate
project maintenance.

Manifest

None

rejoining_upstream_concepts

Detects scenarios where a parent’s direct
child is also a direct child of another
one of the parent’s direct children, indicating
potential loops or unnecessary complexity
in the DAG.

Manifest

None

model_fanout

Assesses parent models to identify
high fanout scenarios, which may
indicate opportunities for more
efficient transformations in the
BI layer or better positioning
of common business logic upstream
in the data pipeline.

Manifest

max_fanout

multiple_sources_joined

Checks if a model directly joins
multiple source tables, encouraging
the use of a single staging model
per source for downstream models
to enhance data consistency
and maintainability.

Manifest

None

root_model

Identifies models without direct
parents, either sources or other
models within the dbt project.
Ensures all models can be traced
back to a source or interconnected
within the project, which is crucial
for clear data lineage and project
integrity.

Manifest

None

source_fanout

Evaluates sources for high fanout,
identifying when a single source
has a large number of direct child
models. High fanout may indicate
an overly complex or source reliant
data model, potentially introducing
risks and complicating maintenance
and scalability.

Manifest

max_fanout

staging_models_dependency

Checks whether staging models depend
on downstream models, rather than
on source or raw data models. Staging
models should ideally depend on
upstream data sources to maintain
a clear and logical data flow.

Manifest

None

staging_models_on_staging

Checks if staging models are dependent
on other staging models instead of
on source or raw data models, ensuring
that staging models are used
appropriately to maintain a clear
and logical data flow from sources
to staging.

Manifest

None

unused_sources

Identifies sources that are defined
in the project’s YML files but not
used in any models or sources. They
may have become redundant due to
model deprecation, contributing to
unnecessary complexity and clutter
in the dbt project.

Manifest

None

2. Performance Insights

Name

Description

Files Required

Overrides

chain_view_linking

Analyzes the dbt project to identify
long chains of non materialized
models (views and ephemerals).
Such long chains can result in increased
runtime for models built on top of them
due to extended computation and
memory usage.

Manifest

None

exposure_parent_bad_materialization

Evaluates the materialization types of
parent models of exposures to ensure
they rely on transformed dbt models
or metrics rather than raw sources,
and checks if these parent models are
materialized efficiently for performance

Manifest

None

3. Governance Insights

Name

Description

Files Required

Overrides

documentation_on_stale_columns

Checks for columns that are documented
in the dbt project but have been removed
from their respective models.

Manifest, Catalog

None

exposures_dependent_on_private_models

Detects if exposures in the dbt project
are dependent on private models. Recommends
using public, well documented, and
contracted models as trusted data
sources for downstream consumption.

Manifest

None

public_models_without_contracts

Identifies public models in the dbt project
that are accessible to all downstream
consumers but lack contracts specifying
data types and columns.

Manifest

None

missing_documentation

Detects columns and models that don’t
have documentation.

Manifest, Catalog

None

undocumented_public_models

Identifies models in the dbt project
that are marked as public but don’t
have documentation.

Manifest

None

4. Testing Insights

Name

Description

Files Required

Overrides

missing_primary_key_tests

Identifies dbt models in the project
that lack primary key tests, which are
crucial for ensuring data integrity
and correctness.

Manifest

None

dbt_low_test_coverage

Identifies dbt models in the project
that have tests coverage percentage
below the required threshold.

Manifest

min_test_coverage_percent

5. Project Structure Insights

Name

Description

Files Required

Overrides

model_directory_structure

Checks for correct placement of models
in their designated directories. Proper
directory structure is essential for ,
organization, discoverability, and maintenance
within the dbt project.

Manifest

None

model_naming_convention_check

Ensures all models adhere to a predefined
naming convention. A consistent naming
convention is crucial for clarity,
understanding of the model’s purpose, and
enhancing navigation within the dbt project.

Manifest

None

source_directory_structure

Verifies if sources are correctly placed in
their designated directories. Proper directory
placement for sources is important for
organizationand easy searchability.

Manifest

None

test_directory_structure

Checks if tests are correctly placed in the
same directories as their corresponding models.
Co locating tests with models aids in
maintainability and clarity.

Manifest

None

6. Check Insights

Name

Description

Files Required

Overrides

column_descriptions_are_same

Checks if the column descriptions in the dbt
project are consistent across the project.

Manifest

None

column_name_contract

Checks if the column names in the dbt project
abide by the column name contract which
consists of a regex pattern and a series
of data types.

Manifest, Catalog

None

check_macro_args_have_desc

Checks if the macro arguments in the dbt
project have descriptions.

Manifest

None

check_macro_has_desc

Checks if the macros in the dbt project
have descriptions.

Manifest

None

check_model_has_all_columns

Checks if the models in the dbt project
have all the columns that are present in
the data catalog.

Manifest, Catalog

None

check_model_has_valid_meta_keys

Checks if the models in the dbt project
have meta keys.

Manifest

None

check_model_has_properties_file

Checks if the models in the dbt project
have a properties file.

Manifest

None

check_model_has_tests_by_name

Checks if the models in the dbt project
have tests by name.

Manifest

None

check_model_has_tests_by_type

Checks if the models in the dbt project
have tests by type.

Manifest

None

check_model_has_tests_by_group

Checks if the models in the dbt project
have tests by group.

Manifest

None

check_model_materialization_by_childs

Checks if the models in the dbt project
have materialization by a given threshold
of child models.

Manifest

None

model_name_by_folder

Checks if the models in the dbt project
abide by the model name contract which
consists of a regex pattern.

Manifest

None

check_model_parents_and_childs

Checks if the model has min/max parents
and childs.

Manifest

None

check_model_parents_database

Checks if the models in the dbt project
has parent database in whitelist and
not in blacklist.

Manifest

None

check_model_parents_schema

Checks if the models in the dbt project
has parent schema in whitelist and
not in blacklist.

Manifest

None

check_model_tags

Checks if the models in the dbt project
have tags in provided list of tags.

Manifest

None

check_source_childs

Checks if the source has min/max childs

Manifest

None

check_source_columns_have_desc

Checks if the source columns have descriptions
in the dbt project.

Manifest, Catalog

None

check_source_has_all_columns

Checks if the source has all columns
present in the data catalog.

Manifest, Catalog

None

check_source_has_freshness

Checks if the source has freshness
options.

Manifest

None

check_source_has_loader

Checks if the source has loader

Manifest

None

check_source_has_meta_keys

Checks if the source has meta keys

Manifest

None

check_source_has_tests_by_name

Checks if the source has tests by name

Manifest

None

check_source_has_tests_by_type

Checks if the source has tests by type

Manifest

None

check_source_has_tests_by_group

Checks if the source has tests by group

Manifest

None

check_source_has_tests

Checks if the source has tests

Manifest

None

check_source_table_has_desc

Checks if the source table has description

Manifest

None

check_source_tags

Checks if the source has tags

Manifest

None