08-30-2022 09:17 AM
Snowflake is one of the most popular destinations for Fivetran, also with the use of the HVR Technology. When Fivetran syncs data into Snowflake we stage data using files.
This post covers considerations between using external and internal staging.
Why Staging?
Snowflake points out that bulk load is the most efficient way to load large volumes of data. Naturally for the insert-only initial sync we leverage staging. Data from files in the staging area is loaded directly into the target tables using copy.
For increments following the initial sync, Fivetran - including HVR - processes changes using controlled routines. With the many different data sources we support - SaaS applications, databases and others - we rarely come across an insert-only workload. Almost all use cases include updates, and deletes. We use staging to bulk load incremental changes into staging tables, followed by merge statements to apply the changes to the target tables. With data already residing in Snowflake the merges are very efficient, allowing for high throughput (averaging a high number of changes per second).
Internal vs. external staging
The Fivetran managed service always uses external staging in our staging environment. Data is always encrypted, and we automatically generate temporary credentials to load the data. Data is deleted when it was successfully loaded. This is all part of the managed service, and you don't have to worry about the management of the data in the staging area.
With the use of the HVR solution you have a choice between the use of external or internal staging. In either case, HVR will manage the files in the staging area (i.e. delete the files when they are no longer needed). Keeping the files HVR uses to apply changes to the target in the staging area would lead to inconsistencies. You can configure an alternative (file) destination to keep data in files, for example in a data lake.
Given the options internal versus external staging, here are a number of considerations:
Given these considerations, what option should you use? With HVR I recommend you use internal staging unless you see a strong reason not to.