cancel
Showing results for 
Search instead for 
Did you mean: 

Why Fivetran’s Terraform provider simplifies the roll out of database pipelines with strict security

lpoulmarck
Fivetranner
Fivetranner

The networking challenge of SaaS

The biggest advantages to adopting SaaS products - speed, scalability and reduced infrastructure management - are usually hindered by what, in Fivetran’s experience, is the most common concern raised by our customers:

“How can we integrate with externally run software in a secure manner?”

This rings especially true for the larger enterprises, where stricter security and networking requirements are essential. In Fivetran’s experience, the lack of customisation, flexibility of implementation of SaaS and security concerns of using a third party provider are the main drivers behind companies opting to build their own solutions.

The most common enterprise use case we work with customers on is the integration of their highest volume, most valuable and security conscious data in a way that is efficient both on the source and destination. This data is usually stored in ERPs like SAP, other databases or data lakes; these are all systems that benefit from the lightweight and performant integrations Fivetran provides to incrementally synchronize changed data.

Fivetran tackles this problem by offering a wide array of network integration options:

  1. Direct connections
  2. SSH/reverse-SSH
  3. VPN tunnels
  4. Private Networking

Aside from the direct connection approach - which security professionals often argue is the simplest but least secure - each of the other options require deployed infrastructure to be stood up and configured by our customers, exchanging ease of deployment for security.

Easing the complexity of networking infrastructure with Fivetran's DevOps integrations

As an example, if you were rolling out database pipelines with VPN connections across multiple cloud providers/cloud regions, you would need to

  1. Have a VPN gateway in each region for which you add a database pipeline
  2. Create tunnels connecting the SaaS network to yours
  3. Modify firewall rules for your VPNs after adding tunnels
  4. Add a new route everytime you add a new database connection

These manual steps can introduce delays when integrating new sources and create further overhead for both your data and networking teams. Common issues include the requested networking infrastructure changes being put in a long backlog of work to be completed and the errors from manual configuration requiring additional effort to troubleshoot.

Fivetran makes it easy to spin up a data pipeline with the click of a button, or a call of an API. It is the role of DevOps to eliminate the networking roadblocks in front of a Data Team trying to integrate source systems within their network.

The benefits of infrastructure as code for your data team

The release of Fivetran’s Terraform provider in general access (GA) also now makes it possible to automatically integrate the creation of data pipelines - and associated networking infrastructure - in one simple terraform file edit without incurring the delays or encountering the issues that manual configuration brings. This has the following benefits:

  • Risk reduction: less manual input required reduces the chance of human error
  • Cost Optimisation: programmatically configuring your resources in the same way every time reduces the chances of over resourcing instances resulting in higher than required costs

Rolling out new database pipelines with networking configurations simultaneously

Out of the box, Fivetran’s Terraform provider allows you to fully configure your target data warehouse/data lake/database and your source connectors:

 

resource "fivetran_connector" "sql_server" {
    group_id = fivetran_group.group.id
    service = "sql_server"

    destination_schema {
        name = "sqlserver_example"
    } 

    config {
        public_key = "string"
        connection_type = "SSH"
        update_method = "NATIVE_UPDATE"
        always_encrypted = true
        tunnel_user = "string"
        database = "string"
        password = "pa$$word"
        tunnel_port = 0
        port = 0
        host = "string"
        tunnel_host = "string"
        user = "string"
    }
}

 

Using AWS/GCP/Azure’s Terraform provider you will also be able to provision any cloud networking infrastructure you might need to configure Fivetran’s connection to your cloud or on-premise databases.

If opting for VPN connections, you will need to:

  1. Define a Google Cloud VPN Gateway.
  2. Define an external VPN Gateway, representing the VPN gateway on the Fivetran side.
  3. Create a VPN Tunnel connecting the two gateways.
  4. Specify the peer's IP address and pre-shared key for authentication.
  5. Create a route using google_compute_route that directs traffic to the specific server's IP address (in this case, "10.0.0.2") via the VPN tunnel.
  6. Create inbound and outbound firewall rules using google_compute_firewall to control traffic to and from the specific server. Replace "your-source-tags" and "your-target-tags" with the appropriate tags for your server and security group configurations.

 

resource "google_compute_vpn_tunnel" "tunnel_sqlserver" {
  name          = "tunnel-sqlserver"
  peer_ip       = var.fivetran_gateway_public_ip # Pass the public gateway IP fivetran provides
  shared_secret = var.shared_secret # Pass your shared secret

  target_vpn_gateway = google_compute_vpn_gateway.target_gateway.id

  depends_on = [
    google_compute_forwarding_rule.fr_esp,
    google_compute_forwarding_rule.fr_udp500,
    google_compute_forwarding_rule.fr_udp4500,
  ]
}

resource "google_compute_vpn_gateway" "target_gateway" {
  name    = "vpn-example"
  network = google_compute_network.network1.id
}

resource "google_compute_network" "network1" {
  name = "network-example"
}

resource "google_compute_address" "vpn_static_ip" {
  name = "vpn-static-ip"
}

resource "google_compute_forwarding_rule" "fr_esp" {
  name        = "fr-esp"
  ip_protocol = "ESP"
  ip_address  = google_compute_address.vpn_static_ip.address
  target      = google_compute_vpn_gateway.target_gateway.id
}

resource "google_compute_forwarding_rule" "fr_udp500" {
  name        = "fr-udp500"
  ip_protocol = "UDP"
  port_range  = "500"
  ip_address  = google_compute_address.vpn_static_ip.address
  target      = google_compute_vpn_gateway.target_gateway.id
}

resource "google_compute_forwarding_rule" "fr_udp4500" {
  name        = "fr-udp4500"
  ip_protocol = "UDP"
  port_range  = "4500"
  ip_address  = google_compute_address.vpn_static_ip.address
  target      = google_compute_vpn_gateway.target_gateway.id
}

resource "google_compute_route" "route1" {
  name       = "route-example"
  network    = google_compute_network.network1.name
  dest_range = var.target_subnet # Pass your SQL server's CIDR
  priority   = 1000

  next_hop_vpn_tunnel = google_compute_vpn_tunnel.tunnel_sqlserver.id
}

resource "google_compute_firewall" "example_inbound_firewall" {
  name        = "example-inbound-firewall"
  network     = google_compute_network.network1.name
  source_tags = ["your-source-tags"]  # Replace with the appropriate source tags
  allow {
    protocol = "tcp"
    ports    = ["22", "80", "443"]  # Replace with the desired ports
  }
  target_tags = ["your-target-tags"]  # Replace with the appropriate target tags
}

resource "google_compute_firewall" "example_outbound_firewall" {
  name        = "example-outbound-firewall"
  network     = google_compute_network.network1.name
  source_tags = ["your-source-tags"]  # Replace with the appropriate source tags
  allow {
    protocol = "tcp"
    ports    = ["80", "443"]  # Replace with the desired outbound ports
  }
  target_tags = ["your-target-tags"]  # Replace with the appropriate target tags
}

 

Fivetran will send you the peer IP and the subnet used to connect to the host. You will use these to provision the resources above.

With Terraform you will be able to neatly package all of these resources into a single module so that you can automatically re-use the information given and by Fivetran. Your end users will not have to provision any of this, they will just give the necessary information in some locals file that your module will use to automatically generate everything.

Summary

Fivetran’s Terraform provider in combination with your Cloud Services provider will allow you to create a packaged module that provisions both your data pipelines and their dependent networking infrastructure in a single entry. This will eliminate any potential human errors that come with manually editing infrastructure with the provisioned infrastructure on each side and remove any networking blockers for your data teams.

 

You can have a look at how to use the Fivetran Terraform provider here.

0 REPLIES 0