11-07-2023 01:23 AM
The biggest advantages to adopting SaaS products - speed, scalability and reduced infrastructure management - are usually hindered by what, in Fivetran’s experience, is the most common concern raised by our customers:
“How can we integrate with externally run software in a secure manner?”
This rings especially true for the larger enterprises, where stricter security and networking requirements are essential. In Fivetran’s experience, the lack of customisation, flexibility of implementation of SaaS and security concerns of using a third party provider are the main drivers behind companies opting to build their own solutions.
The most common enterprise use case we work with customers on is the integration of their highest volume, most valuable and security conscious data in a way that is efficient both on the source and destination. This data is usually stored in ERPs like SAP, other databases or data lakes; these are all systems that benefit from the lightweight and performant integrations Fivetran provides to incrementally synchronize changed data.
Fivetran tackles this problem by offering a wide array of network integration options:
Aside from the direct connection approach - which security professionals often argue is the simplest but least secure - each of the other options require deployed infrastructure to be stood up and configured by our customers, exchanging ease of deployment for security.
As an example, if you were rolling out database pipelines with VPN connections across multiple cloud providers/cloud regions, you would need to
These manual steps can introduce delays when integrating new sources and create further overhead for both your data and networking teams. Common issues include the requested networking infrastructure changes being put in a long backlog of work to be completed and the errors from manual configuration requiring additional effort to troubleshoot.
Fivetran makes it easy to spin up a data pipeline with the click of a button, or a call of an API. It is the role of DevOps to eliminate the networking roadblocks in front of a Data Team trying to integrate source systems within their network.
The release of Fivetran’s Terraform provider in general access (GA) also now makes it possible to automatically integrate the creation of data pipelines - and associated networking infrastructure - in one simple terraform file edit without incurring the delays or encountering the issues that manual configuration brings. This has the following benefits:
Out of the box, Fivetran’s Terraform provider allows you to fully configure your target data warehouse/data lake/database and your source connectors:
resource "fivetran_connector" "sql_server" {
group_id = fivetran_group.group.id
service = "sql_server"
destination_schema {
name = "sqlserver_example"
}
config {
public_key = "string"
connection_type = "SSH"
update_method = "NATIVE_UPDATE"
always_encrypted = true
tunnel_user = "string"
database = "string"
password = "pa$$word"
tunnel_port = 0
port = 0
host = "string"
tunnel_host = "string"
user = "string"
}
}
Using AWS/GCP/Azure’s Terraform provider you will also be able to provision any cloud networking infrastructure you might need to configure Fivetran’s connection to your cloud or on-premise databases.
If opting for VPN connections, you will need to:
resource "google_compute_vpn_tunnel" "tunnel_sqlserver" {
name = "tunnel-sqlserver"
peer_ip = var.fivetran_gateway_public_ip # Pass the public gateway IP fivetran provides
shared_secret = var.shared_secret # Pass your shared secret
target_vpn_gateway = google_compute_vpn_gateway.target_gateway.id
depends_on = [
google_compute_forwarding_rule.fr_esp,
google_compute_forwarding_rule.fr_udp500,
google_compute_forwarding_rule.fr_udp4500,
]
}
resource "google_compute_vpn_gateway" "target_gateway" {
name = "vpn-example"
network = google_compute_network.network1.id
}
resource "google_compute_network" "network1" {
name = "network-example"
}
resource "google_compute_address" "vpn_static_ip" {
name = "vpn-static-ip"
}
resource "google_compute_forwarding_rule" "fr_esp" {
name = "fr-esp"
ip_protocol = "ESP"
ip_address = google_compute_address.vpn_static_ip.address
target = google_compute_vpn_gateway.target_gateway.id
}
resource "google_compute_forwarding_rule" "fr_udp500" {
name = "fr-udp500"
ip_protocol = "UDP"
port_range = "500"
ip_address = google_compute_address.vpn_static_ip.address
target = google_compute_vpn_gateway.target_gateway.id
}
resource "google_compute_forwarding_rule" "fr_udp4500" {
name = "fr-udp4500"
ip_protocol = "UDP"
port_range = "4500"
ip_address = google_compute_address.vpn_static_ip.address
target = google_compute_vpn_gateway.target_gateway.id
}
resource "google_compute_route" "route1" {
name = "route-example"
network = google_compute_network.network1.name
dest_range = var.target_subnet # Pass your SQL server's CIDR
priority = 1000
next_hop_vpn_tunnel = google_compute_vpn_tunnel.tunnel_sqlserver.id
}
resource "google_compute_firewall" "example_inbound_firewall" {
name = "example-inbound-firewall"
network = google_compute_network.network1.name
source_tags = ["your-source-tags"] # Replace with the appropriate source tags
allow {
protocol = "tcp"
ports = ["22", "80", "443"] # Replace with the desired ports
}
target_tags = ["your-target-tags"] # Replace with the appropriate target tags
}
resource "google_compute_firewall" "example_outbound_firewall" {
name = "example-outbound-firewall"
network = google_compute_network.network1.name
source_tags = ["your-source-tags"] # Replace with the appropriate source tags
allow {
protocol = "tcp"
ports = ["80", "443"] # Replace with the desired outbound ports
}
target_tags = ["your-target-tags"] # Replace with the appropriate target tags
}
Fivetran will send you the peer IP and the subnet used to connect to the host. You will use these to provision the resources above.
With Terraform you will be able to neatly package all of these resources into a single module so that you can automatically re-use the information given and by Fivetran. Your end users will not have to provision any of this, they will just give the necessary information in some locals file that your module will use to automatically generate everything.
Fivetran’s Terraform provider in combination with your Cloud Services provider will allow you to create a packaged module that provisions both your data pipelines and their dependent networking infrastructure in a single entry. This will eliminate any potential human errors that come with manually editing infrastructure with the provisioned infrastructure on each side and remove any networking blockers for your data teams.
You can have a look at how to use the Fivetran Terraform provider here.