cancel
Showing results for 
Search instead for 
Did you mean: 

Orchestrate E, L and T with Fivetran - Spotlight on External Logging Services

Julien
Fivetranner
Fivetranner

Note: The purpose of this post is to outline all the transformation methods that Fivetran supports. While Integrated Scheduling, Airflow, and Webhooks stands out as the most popular options, effectively serving their intended purposes, it's crucial to take into account Fivetran's seamless integrations with External Logging Services, particularly when customers are considering alternative data modeling solutions powered by AWS, Azure, and GCP.


Introduction

Transformations play a vital role in automating the data process by preparing and modeling raw data for analytics and reporting. Orchestration of transformations is essential because it allows you to define, schedule, and monitor these processes, leading to increased efficiency, reduced human error, and better resource utilization.

In a rapidly changing business landscape, being able to process and analyze data in real-time or near-real-time provides a competitive edge. It allows organizations to make informed decisions promptly, respond to market shifts, identify trends, and capitalize on emerging opportunities.

Without orchestration, manual data transformations are time-consuming, error-prone, and lack consistency. Scaling becomes challenging, timeliness of insights is compromised, and complex transformations are difficult to manage. The risk of human error is high, leading to inaccurate analyses and missed opportunities. Orchestration streamlines processes, ensures consistency, and enables timely, scalable, and accurate insights for agile decision-making.

Let’s discuss how Fivetran enables you to orchestrate the ELT process.

 

Orchestrate EL & T with Fivetran

Fivetran provides a broad spectrum of orchestration solutions tailored to accommodate customers' preferences in data platform architecture. In this section, we will offer a comprehensive breakdown of these available options.

Integrated Scheduling with dbt Core

As we’ve previously discussed in the blog "Data transformations: what’s the T in ELT?", Fivetran Transformations for dbt Core empower data analysts to automate and seamlessly orchestrate the transformation process alongside their data load. This comprehensive solution enables teams to efficiently manage the entire ELT process within a single platform: Fivetran.

Orchestrating EL & T in Airflow

As we’ve also discussed in the blog "Orchestrating ELT in Airflow", Fivetran partnered with Astronomer to develop a provider that allows users to start Fivetran data syncs, monitor the completion of Fivetran data syncs and trigger subsequent processes such as data transformations.

Orchestrating EL & T with Webhook

A webhook is a generic way for applications to communicate with each other in real-time by sending instant notifications or data updates.

Fivetran has the ability to send webhooks to external applications (HTTPS) and trigger processes at the time pre-defined processes have completed. For instance, it is possible to trigger a webhook upon data sync completion to an external application which will execute transformations.

Given the generic design of Webhooks, the consuming application needs to adapt to its specification (authorisation, headers, json content). This typically requires a fair amount of customisation on the customer end therefore it requires a higher level of effort during the implementation phase to have a robust and secure solution in place, but can also lead to a higher number of maintenance tasks. Therefore it should only be considered when neither Integrated Scheduling, nor Airflow nor External Logging can be implemented.

Orchestrating EL & T with External Logs Services

External Logs Services offer a vital advantage in cloud and hybrid environments by enhancing visibility and control across the distributed systems and solutions in the customer landscape. This approach also aids in meeting regulatory compliance requirements, detecting and responding to issues, enhancing problem-solving efficiency and reducing downtime. Additionally, it can also be leveraged to implement trigger-based processes.

Fivetran has developed integrations with key External Logs Services like AWS Cloud Watch, Azure Monitor Log and Google Cloud Logging (full list available here). With these integrations, Fivetran has the ability to log the completion of data syncs to the external logging system. When that process is complete, the external log can trigger relevant downstream data transformations.

This has the following benefits:

  • No technical expertise is required to start getting logs into the external logs service
  • The connectivity to the external services is secure by design
  • Anyone with access to the Fivetran Dashboard can authorize the connection to the external logs service

Once you have connected Fivetran to your External Logs Services, all Fivetran account-level and connector-level events will securely flow towards the Cloud Platform of your choice (AWS, Azure or GCP). You can refer to the Fivetran documentation to better understand the logs produced during common scenarios and find below some of the main events triggered during a Fivetran Sync:

Event Name

Description

Step

sync_start

Connector started syncing data

Extract

api_call

API call made to a source service

Extract

sql_query

SQL query executed on a source database

Extract

write_to_table_start

Started writing records to destination table

Load

create_schema

Schema created in destination

Load

create_table

Table created in destination

Load

drop_table

Table dropped from destination

Load

alter_table

Table columns added to, modified in or dropped from destination table

Load

change_schema_config_via_sync

Schema configuration updated during a sync

Process

copy_rows

Data copied to staging table

Load

delete_rows

Stale rows deleted from main table

Load

insert_rows

Updated rows inserted in main table

Load

update_rows

Existing rows in main table updated with new values

Load

write_to_table_end

Finished writing records to destination table

Load

records_modified

Number of records modified during sync

Load

sync_end

Data sync completed

Load

Centralized logging services offer diverse functionalities, encompassing efficient monitoring of organizational systems for proactive issue management or the establishment of a resilient alert hub. Beyond these functions, logging services hold the potential for various other event-driven applications, including orchestrating EL and T processes.

After enabling the integration between Fivetran and an external logging system, there are three additional steps that need to be implemented:

  • Filter: Only listen to the successful sync_end events which mark the successful completion of data extraction from the source and its subsequent loading to the destination
  • Trigger: Notify the transformation service
  • Execute: Run the transformations

AWS, Azure and GCP have service offerings which allow the execution of a full end-to-end ELT process so you can follow a similar implementation concept (filter, trigger & execute) with these hyperscalers. Below are implementation suggestions for this pattern in Fivetran’s hosting data centers:

Cloud Provider

Extract & Load

External Logs Service

Filter

Trigger

Execute

GCP

Fivetran

Logs Service

Sink

Pub/Sub

DataForm

AWS

Fivetran

Cloud Watch 

Event Rules

Glue Job

Glue

Azure

Fivetran

Monitor Log

Alert Rules

Logic Apps

Synapse Data Flow

Let’s deep dive into the GCP use case where a customer uses Fivetran for data automation and DataForm for Transformations.

Orchestrate EL & T with Google Cloud Logging

Overview

Google Cloud Platform offers many services which can be easily integrated. In order to orchestrate EL in Fivetran and T in DataForm, we are considering the following services:

  • Google Cloud Logging: To track all logs centrally
  • Google Cloud Logging Sink: To filter sync_end events out from all log entries
  • Google Pub/Sub: To trigger the execution of the transformation
  • Google Workflow: To execute DataForm transformation models

Using Fivetran dashboard, you can easily schedule your data pipeline so it runs automatically on the predefined frequency. When Fivetran extracts, processes and loads the data, it will generate logs and start sending these logs to Google Cloud Logging. Based on rules defined in Logging Sinks, only a subset of these events (successful sync end events) are published a Google PubSub. Google workflows which subscribe the Google PubSub are then notified of the successful completion of the sync and execute the DataForm models.

Below is a representation of the flow (refer to Fivetran Documentation site for a full list of supported sources)

Julien_0-1698222544321.png

Implementation

  1. Enable Google Cloud Logging for your Fivetran Group as documented here
  2. Create a new PubSubJulien_1-1698159287436.png
  3. Create a logging sink which collects successful syncs and publish them to the PubSub from Step 2Julien_2-1698159287411.png
  4. Create a new Workflow which executes the DataForm model and is triggered from PubSub events (including the script executed by the Workflow)
main:
  steps:
  - init:
    assign:
    - repository: projects/internal-sales/locations/<>/repositories/<>
  - createCompilationResult:
    call: http.post
    args:
      url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/compilationResults"}
      auth:
        type: OAuth2
      body:
        gitCommitish: main
    result: compilationResult
  - createWorkflowInvocation:
    call: http.post
    args:
      url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/workflowInvocations"}
      auth:
        type: OAuth2
      body:
        compilationResult: ${compilationResult.body.name}
    result: workflowInvocation
  - complete:
    return: ${workflowInvocation.body.name}

Julien_3-1698159287453.png

 

Julien_4-1698159287418.png

Summary

Orchestration of the ELT ensures streamlined, consistent, and timely insights and overcoming human errors and driving agile decisions. This is vital for competitive advantage, aiding swift decisions, market responses, trend identification, and capitalisation on opportunities.

Fivetran boasts a top-tier EL solution and acknowledges the vital role of seamless integration with leading T solutions in ensuring customer success. While Fivetran's Integrated Scheduling and Airflow solutions are established, they may not align with your business needs. Orchestrating ELT with external logging services could then offer the ideal way to attain timely insights while maintaining implementation and maintenance simplicity.

1 REPLY 1

Toby-Metcalf
Community Admin
Community Admin

Thank you for sharing this @Julien 
Fivetran Community: please ask your questions and share your comments below.

Cheers,
Toby