10-25-2023 01:51 AM
Note: The purpose of this post is to outline all the transformation methods that Fivetran supports. While Integrated Scheduling, Airflow, and Webhooks stands out as the most popular options, effectively serving their intended purposes, it's crucial to take into account Fivetran's seamless integrations with External Logging Services, particularly when customers are considering alternative data modeling solutions powered by AWS, Azure, and GCP.
Transformations play a vital role in automating the data process by preparing and modeling raw data for analytics and reporting. Orchestration of transformations is essential because it allows you to define, schedule, and monitor these processes, leading to increased efficiency, reduced human error, and better resource utilization.
In a rapidly changing business landscape, being able to process and analyze data in real-time or near-real-time provides a competitive edge. It allows organizations to make informed decisions promptly, respond to market shifts, identify trends, and capitalize on emerging opportunities.
Without orchestration, manual data transformations are time-consuming, error-prone, and lack consistency. Scaling becomes challenging, timeliness of insights is compromised, and complex transformations are difficult to manage. The risk of human error is high, leading to inaccurate analyses and missed opportunities. Orchestration streamlines processes, ensures consistency, and enables timely, scalable, and accurate insights for agile decision-making.
Let’s discuss how Fivetran enables you to orchestrate the ELT process.
Fivetran provides a broad spectrum of orchestration solutions tailored to accommodate customers' preferences in data platform architecture. In this section, we will offer a comprehensive breakdown of these available options.
As we’ve previously discussed in the blog "Data transformations: what’s the T in ELT?", Fivetran Transformations for dbt Core empower data analysts to automate and seamlessly orchestrate the transformation process alongside their data load. This comprehensive solution enables teams to efficiently manage the entire ELT process within a single platform: Fivetran.
As we’ve also discussed in the blog "Orchestrating ELT in Airflow", Fivetran partnered with Astronomer to develop a provider that allows users to start Fivetran data syncs, monitor the completion of Fivetran data syncs and trigger subsequent processes such as data transformations.
A webhook is a generic way for applications to communicate with each other in real-time by sending instant notifications or data updates.
Fivetran has the ability to send webhooks to external applications (HTTPS) and trigger processes at the time pre-defined processes have completed. For instance, it is possible to trigger a webhook upon data sync completion to an external application which will execute transformations.
Given the generic design of Webhooks, the consuming application needs to adapt to its specification (authorisation, headers, json content). This typically requires a fair amount of customisation on the customer end therefore it requires a higher level of effort during the implementation phase to have a robust and secure solution in place, but can also lead to a higher number of maintenance tasks. Therefore it should only be considered when neither Integrated Scheduling, nor Airflow nor External Logging can be implemented.
External Logs Services offer a vital advantage in cloud and hybrid environments by enhancing visibility and control across the distributed systems and solutions in the customer landscape. This approach also aids in meeting regulatory compliance requirements, detecting and responding to issues, enhancing problem-solving efficiency and reducing downtime. Additionally, it can also be leveraged to implement trigger-based processes.
Fivetran has developed integrations with key External Logs Services like AWS Cloud Watch, Azure Monitor Log and Google Cloud Logging (full list available here). With these integrations, Fivetran has the ability to log the completion of data syncs to the external logging system. When that process is complete, the external log can trigger relevant downstream data transformations.
This has the following benefits:
Once you have connected Fivetran to your External Logs Services, all Fivetran account-level and connector-level events will securely flow towards the Cloud Platform of your choice (AWS, Azure or GCP). You can refer to the Fivetran documentation to better understand the logs produced during common scenarios and find below some of the main events triggered during a Fivetran Sync:
Event Name |
Description |
Step |
sync_start |
Connector started syncing data |
Extract |
api_call |
API call made to a source service |
Extract |
sql_query |
SQL query executed on a source database |
Extract |
write_to_table_start |
Started writing records to destination table |
Load |
create_schema |
Schema created in destination |
Load |
create_table |
Table created in destination |
Load |
drop_table |
Table dropped from destination |
Load |
alter_table |
Table columns added to, modified in or dropped from destination table |
Load |
change_schema_config_via_sync |
Schema configuration updated during a sync |
Process |
copy_rows |
Data copied to staging table |
Load |
delete_rows |
Stale rows deleted from main table |
Load |
insert_rows |
Updated rows inserted in main table |
Load |
update_rows |
Existing rows in main table updated with new values |
Load |
write_to_table_end |
Finished writing records to destination table |
Load |
records_modified |
Number of records modified during sync |
Load |
sync_end |
Data sync completed |
Load |
Centralized logging services offer diverse functionalities, encompassing efficient monitoring of organizational systems for proactive issue management or the establishment of a resilient alert hub. Beyond these functions, logging services hold the potential for various other event-driven applications, including orchestrating EL and T processes.
After enabling the integration between Fivetran and an external logging system, there are three additional steps that need to be implemented:
AWS, Azure and GCP have service offerings which allow the execution of a full end-to-end ELT process so you can follow a similar implementation concept (filter, trigger & execute) with these hyperscalers. Below are implementation suggestions for this pattern in Fivetran’s hosting data centers:
Cloud Provider |
Extract & Load |
External Logs Service |
Filter |
Trigger |
Execute |
GCP |
Fivetran |
Logs Service |
Sink |
Pub/Sub |
DataForm |
AWS |
Fivetran |
Cloud Watch |
Event Rules |
Glue Job |
Glue |
Azure |
Fivetran |
Monitor Log |
Alert Rules |
Logic Apps |
Synapse Data Flow |
Let’s deep dive into the GCP use case where a customer uses Fivetran for data automation and DataForm for Transformations.
Google Cloud Platform offers many services which can be easily integrated. In order to orchestrate EL in Fivetran and T in DataForm, we are considering the following services:
Using Fivetran dashboard, you can easily schedule your data pipeline so it runs automatically on the predefined frequency. When Fivetran extracts, processes and loads the data, it will generate logs and start sending these logs to Google Cloud Logging. Based on rules defined in Logging Sinks, only a subset of these events (successful sync end events) are published a Google PubSub. Google workflows which subscribe the Google PubSub are then notified of the successful completion of the sync and execute the DataForm models.
Below is a representation of the flow (refer to Fivetran Documentation site for a full list of supported sources)
main: steps: - init: assign: - repository: projects/internal-sales/locations/<>/repositories/<> - createCompilationResult: call: http.post args: url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/compilationResults"} auth: type: OAuth2 body: gitCommitish: main result: compilationResult - createWorkflowInvocation: call: http.post args: url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/workflowInvocations"} auth: type: OAuth2 body: compilationResult: ${compilationResult.body.name} result: workflowInvocation - complete: return: ${workflowInvocation.body.name}
Orchestration of the ELT ensures streamlined, consistent, and timely insights and overcoming human errors and driving agile decisions. This is vital for competitive advantage, aiding swift decisions, market responses, trend identification, and capitalisation on opportunities.
Fivetran boasts a top-tier EL solution and acknowledges the vital role of seamless integration with leading T solutions in ensuring customer success. While Fivetran's Integrated Scheduling and Airflow solutions are established, they may not align with your business needs. Orchestrating ELT with external logging services could then offer the ideal way to attain timely insights while maintaining implementation and maintenance simplicity.
10-25-2023 05:24 AM
Thank you for sharing this @Julien
Fivetran Community: please ask your questions and share your comments below.
Cheers,
Toby