Teradata configurations
General
Set
quote_columns
- to prevent a warning, make sure to explicitly set a value forquote_columns
in yourdbt_project.yml
. See the doc on quote_columns for more information.seeds:
+quote_columns: false #or `true` if you have csv column headers with spacesEnable view column types in docs - Teradata Vantage has a dbscontrol configuration flag called
DisableQVCI
. This flag instructs the database to createDBC.ColumnsJQV
with view column type definitions. To enable this functionality you need to:Enable QVCI mode in Vantage. Use
dbscontrol
utility and then restart Teradata. Run these commands as a privileged user on a Teradata node:# option 551 is DisableQVCI. Setting it to false enables QVCI.
dbscontrol << EOF
M internal 551=false
W
EOF
# restart Teradata
tpareset -y Enable QVCIInstruct
dbt
to useQVCI
mode. Include the following variable in yourdbt_project.yml
:vars:
use_qvci: trueFor example configuration, see dbt_project.yml in
dbt-teradata
QVCI tests.
Models
table
table_kind
- define the table kind. Legal values areMULTISET
(default for ANSI transaction mode required bydbt-teradata
) andSET
, e.g.:{{
config(
materialized="table",
table_kind="SET"
)
}}For details, see CREATE TABLE documentation.
table_option
- defines table options. The config supports multiple statements. The definition below uses the Teradata syntax definition to explain what statements are allowed. Square brackets[]
denote optional parameters. The pipe symbol|
separates statements. Use commas to combine multiple statements as shown in the examples below:{ MAP = map_name [COLOCATE USING colocation_name] |
[NO] FALLBACK [PROTECTION] |
WITH JOURNAL TABLE = table_specification |
[NO] LOG |
[ NO | DUAL ] [BEFORE] JOURNAL |
[ NO | DUAL | LOCAL | NOT LOCAL ] AFTER JOURNAL |
CHECKSUM = { DEFAULT | ON | OFF } |
FREESPACE = integer [PERCENT] |
mergeblockratio |
datablocksize |
blockcompression |
isolated_loading
}where:
- mergeblockratio:
{ DEFAULT MERGEBLOCKRATIO |
MERGEBLOCKRATIO = integer [PERCENT] |
NO MERGEBLOCKRATIO
} - datablocksize:
DATABLOCKSIZE = {
data_block_size [ BYTES | KBYTES | KILOBYTES ] |
{ MINIMUM | MAXIMUM | DEFAULT } DATABLOCKSIZE
} - blockcompression:
BLOCKCOMPRESSION = { AUTOTEMP | MANUAL | ALWAYS | NEVER | DEFAULT }
[, BLOCKCOMPRESSIONALGORITHM = { ZLIB | ELZS_H | DEFAULT } ]
[, BLOCKCOMPRESSIONLEVEL = { value | DEFAULT } ] - isolated_loading:
WITH [NO] [CONCURRENT] ISOLATED LOADING [ FOR { ALL | INSERT | NONE } ]
Examples:
Separators between statementsNote the commas that separate statements in
table_option
config.{{
config(
materialized="table",
table_option="NO FALLBACK"
)
}}{{
config(
materialized="table",
table_option="NO FALLBACK, NO JOURNAL"
)
}}{{
config(
materialized="table",
table_option="NO FALLBACK, NO JOURNAL, CHECKSUM = ON,
NO MERGEBLOCKRATIO,
WITH CONCURRENT ISOLATED LOADING FOR ALL"
)
}}For details, see CREATE TABLE documentation.
- mergeblockratio:
with_statistics
- should statistics be copied from the base table, e.g.:{{
config(
materialized="table",
with_statistics="true"
)
}}For details, see CREATE TABLE documentation.
index
- defines table indices:[UNIQUE] PRIMARY INDEX [index_name] ( index_column_name [,...] ) |
NO PRIMARY INDEX |
PRIMARY AMP [INDEX] [index_name] ( index_column_name [,...] ) |
PARTITION BY { partitioning_level | ( partitioning_level [,...] ) } |
UNIQUE INDEX [ index_name ] [ ( index_column_name [,...] ) ] [loading] |
INDEX [index_name] [ALL] ( index_column_name [,...] ) [ordering] [loading]
[,...]where:
- partitioning_level:
{ partitioning_expression |
COLUMN [ [NO] AUTO COMPRESS |
COLUMN [ [NO] AUTO COMPRESS ] [ ALL BUT ] column_partition ]
} [ ADD constant ] - ordering:
ORDER BY [ VALUES | HASH ] [ ( order_column_name ) ]
- loading:
WITH [NO] LOAD IDENTITY
Examples:
Separators between statementsNote, unlike with
table_option
statements, there are no commas between statements inindex
config.{{
config(
materialized="table",
index="UNIQUE PRIMARY INDEX ( GlobalID )"
)
}}{{
config(
materialized="table",
index="PRIMARY INDEX(id)
PARTITION BY RANGE_N(create_date
BETWEEN DATE '2020-01-01'
AND DATE '2021-01-01'
EACH INTERVAL '1' MONTH)"
)
}}{{
config(
materialized="table",
index="PRIMARY INDEX(id)
PARTITION BY RANGE_N(create_date
BETWEEN DATE '2020-01-01'
AND DATE '2021-01-01'
EACH INTERVAL '1' MONTH)
INDEX index_attrA (attrA) WITH LOAD IDENTITY"
)
}}- partitioning_level:
Seeds
As explained in dbt seeds documentation, seeds should not be used to load raw data (for example, large CSV exports from a production database).
Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees.
Loading CSVs using dbt's seed functionality is not performant for large files. Consider using a different tool to load these CSVs into your data warehouse.
use_fastload
- use fastload when handlingdbt seed
command. The option will likely speed up loading when your seed files have hundreds of thousands of rows. You can set this seed configuration option in yourproject.yml
file, e.g.:seeds:
<project-name>:
+use_fastload: true
Common Teradata-specific tasks
collect statistics - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using
COLLECT STATISTICS
command. You can perform this step using dbt'spost-hooks
, e.g.:{{ config(
post_hook=[
"COLLECT STATISTICS ON {{ this }} COLUMN (column_1, column_2 ...);"
]
)}}See Collecting Statistics documentation for more information.