Insights
The following insights are available in DataPilot:
1. Modelling Insights
Name |
Description |
Files Required |
Overrides |
---|---|---|---|
source_staging_model_integrity |
Ensures each source has a dedicated
staging model and is not directly
joined to downstream models.
|
Manifest |
None |
downstream_source_dependence |
Evaluates if downstream models
(marts or intermediates) are improperly
dependent directly on a source. This
check ensures that all downstream
models depend on staging models,
not directly on the source nodes.
|
Manifest |
None |
Duplicate_Sources |
Identifies cases where multiple source
nodes in a dbt project refer to the
same database object. Ensures that each
database object is represented by a single,
unique source node.
|
Manifest |
None |
hard_coded_references |
Identifies instances where SQL code
within models contains hard-coded references,
which can obscure data lineage and complicate
project maintenance.
|
Manifest |
None |
rejoining_upstream_concepts |
Detects scenarios where a parent’s direct
child is also a direct child of another
one of the parent’s direct children, indicating
potential loops or unnecessary complexity
in the DAG.
|
Manifest |
None |
model_fanout |
Assesses parent models to identify
high fanout scenarios, which may
indicate opportunities for more
efficient transformations in the
BI layer or better positioning
of common business logic upstream
in the data pipeline.
|
Manifest |
max_fanout |
multiple_sources_joined |
Checks if a model directly joins
multiple source tables, encouraging
the use of a single staging model
per source for downstream models
to enhance data consistency
and maintainability.
|
Manifest |
None |
root_model |
Identifies models without direct
parents, either sources or other
models within the dbt project.
Ensures all models can be traced
back to a source or interconnected
within the project, which is crucial
for clear data lineage and project
integrity.
|
Manifest |
None |
source_fanout |
Evaluates sources for high fanout,
identifying when a single source
has a large number of direct child
models. High fanout may indicate
an overly complex or source reliant
data model, potentially introducing
risks and complicating maintenance
and scalability.
|
Manifest |
max_fanout |
staging_models_dependency |
Checks whether staging models depend
on downstream models, rather than
on source or raw data models. Staging
models should ideally depend on
upstream data sources to maintain
a clear and logical data flow.
|
Manifest |
None |
staging_models_on_staging |
Checks if staging models are dependent
on other staging models instead of
on source or raw data models, ensuring
that staging models are used
appropriately to maintain a clear
and logical data flow from sources
to staging.
|
Manifest |
None |
unused_sources |
Identifies sources that are defined
in the project’s YML files but not
used in any models or sources. They
may have become redundant due to
model deprecation, contributing to
unnecessary complexity and clutter
in the dbt project.
|
Manifest |
None |
2. Performance Insights
Name |
Description |
Files Required |
Overrides |
---|---|---|---|
chain_view_linking |
Analyzes the dbt project to identify
long chains of non materialized
models (views and ephemerals).
Such long chains can result in increased
runtime for models built on top of them
due to extended computation and
memory usage.
|
Manifest |
None |
exposure_parent_bad_materialization |
Evaluates the materialization types of
parent models of exposures to ensure
they rely on transformed dbt models
or metrics rather than raw sources,
and checks if these parent models are
materialized efficiently for performance
|
Manifest |
None |
3. Governance Insights
Name |
Description |
Files Required |
Overrides |
---|---|---|---|
documentation_on_stale_columns |
Checks for columns that are documented
in the dbt project but have been removed
from their respective models.
|
Manifest, Catalog |
None |
exposures_dependent_on_private_models |
Detects if exposures in the dbt project
are dependent on private models. Recommends
using public, well documented, and
contracted models as trusted data
sources for downstream consumption.
|
Manifest |
None |
public_models_without_contracts |
Identifies public models in the dbt project
that are accessible to all downstream
consumers but lack contracts specifying
data types and columns.
|
Manifest |
None |
missing_documentation |
Detects columns and models that don’t
have documentation.
|
Manifest, Catalog |
None |
undocumented_public_models |
Identifies models in the dbt project
that are marked as public but don’t
have documentation.
|
Manifest |
None |
4. Testing Insights
Name |
Description |
Files Required |
Overrides |
---|---|---|---|
missing_primary_key_tests |
Identifies dbt models in the project
that lack primary key tests, which are
crucial for ensuring data integrity
and correctness.
|
Manifest |
None |
dbt_low_test_coverage |
Identifies dbt models in the project
that have tests coverage percentage
below the required threshold.
|
Manifest |
min_test_coverage_percent |
5. Project Structure Insights
Name |
Description |
Files Required |
Overrides |
---|---|---|---|
model_directory_structure |
Checks for correct placement of models
in their designated directories. Proper
directory structure is essential for ,
organization, discoverability, and maintenance
within the dbt project.
|
Manifest |
None |
model_naming_convention_check |
Ensures all models adhere to a predefined
naming convention. A consistent naming
convention is crucial for clarity,
understanding of the model’s purpose, and
enhancing navigation within the dbt project.
|
Manifest |
None |
source_directory_structure |
Verifies if sources are correctly placed in
their designated directories. Proper directory
placement for sources is important for
organizationand easy searchability.
|
Manifest |
None |
test_directory_structure |
Checks if tests are correctly placed in the
same directories as their corresponding models.
Co locating tests with models aids in
maintainability and clarity.
|
Manifest |
None |
6. Check Insights
Name |
Description |
Files Required |
Overrides |
---|---|---|---|
column_descriptions_are_same |
Checks if the column descriptions in the dbt
project are consistent across the project.
|
Manifest |
None |
column_name_contract |
Checks if the column names in the dbt project
abide by the column name contract which
consists of a regex pattern and a series
of data types.
|
Manifest, Catalog |
None |
check_macro_args_have_desc |
Checks if the macro arguments in the dbt
project have descriptions.
|
Manifest |
None |
check_macro_has_desc |
Checks if the macros in the dbt project
have descriptions.
|
Manifest |
None |
check_model_has_all_columns |
Checks if the models in the dbt project
have all the columns that are present in
the data catalog.
|
Manifest, Catalog |
None |
check_model_has_valid_meta_keys |
Checks if the models in the dbt project
have meta keys.
|
Manifest |
None |
check_model_has_properties_file |
Checks if the models in the dbt project
have a properties file.
|
Manifest |
None |
check_model_has_tests_by_name |
Checks if the models in the dbt project
have tests by name.
|
Manifest |
None |
check_model_has_tests_by_type |
Checks if the models in the dbt project
have tests by type.
|
Manifest |
None |
check_model_has_tests_by_group |
Checks if the models in the dbt project
have tests by group.
|
Manifest |
None |
check_model_materialization_by_childs |
Checks if the models in the dbt project
have materialization by a given threshold
of child models.
|
Manifest |
None |
model_name_by_folder |
Checks if the models in the dbt project
abide by the model name contract which
consists of a regex pattern.
|
Manifest |
None |
check_model_parents_and_childs |
Checks if the model has min/max parents
and childs.
|
Manifest |
None |
check_model_parents_database |
Checks if the models in the dbt project
has parent database in whitelist and
not in blacklist.
|
Manifest |
None |
check_model_parents_schema |
Checks if the models in the dbt project
has parent schema in whitelist and
not in blacklist.
|
Manifest |
None |
check_model_tags |
Checks if the models in the dbt project
have tags in provided list of tags.
|
Manifest |
None |
check_source_childs |
Checks if the source has min/max childs
|
Manifest |
None |
check_source_columns_have_desc |
Checks if the source columns have descriptions
in the dbt project.
|
Manifest, Catalog |
None |
check_source_has_all_columns |
Checks if the source has all columns
present in the data catalog.
|
Manifest, Catalog |
None |
check_source_has_freshness |
Checks if the source has freshness
options.
|
Manifest |
None |
check_source_has_loader |
Checks if the source has loader
|
Manifest |
None |
check_source_has_meta_keys |
Checks if the source has meta keys
|
Manifest |
None |
check_source_has_tests_by_name |
Checks if the source has tests by name
|
Manifest |
None |
check_source_has_tests_by_type |
Checks if the source has tests by type
|
Manifest |
None |
check_source_has_tests_by_group |
Checks if the source has tests by group
|
Manifest |
None |
check_source_has_tests |
Checks if the source has tests
|
Manifest |
None |
check_source_table_has_desc |
Checks if the source table has description
|
Manifest |
None |
check_source_tags |
Checks if the source has tags
|
Manifest |
None |