❄️ Latest: Snowflake customers — stream your data to Postgres! Learn more! 🐘

follow or visit us on
Learning

Real-Time Data Ingestion from Kafka to Snowflake

Glenn Gillen
Glenn Gillen
VP of Product, GTM
LearningReal-Time Data Ingestion from Kafka to Snowflake

In today's data-driven world, organizations are constantly seeking ways to harness the power of real-time data for analytics and decision-making. Apache Kafka has emerged as a powerhouse for handling high-volume, real-time data streams, while Snowflake offers unparalleled capabilities for data warehousing and analytics.

What if you could seamlessly combine the strengths of both platforms in less than 15 minutes, without the headaches of managing IP allow lists, opening firewall ports, or navigating the complexities of services like PrivateLink?

Today, we're excited to introduce a solution that makes this vision a reality: the Pull from Kafka Connector!

Snowflake 💙 Apache Kafka

Apache Kafka (and Kafka-compatible alternatives) is the system of choice for building flexible, scalable, and reliable streaming platforms to connect data producers and consumers. It's often used as the central nervous system for real-time data pipelines and streaming applications.

Snowflake is The Data Cloud and the place to support workloads such as data warehouses, data lakes, data science / ML / AI, and even cybersecurity. This centralization brings a huge amount of convenience through breaking down data silos and allowing teams to make smart data-informed decisions.

Connecting your Kafka broker to Snowflake can be problematic depending on your network topology. It would be convenient to give the broker a public address, but that's a significant increase in risk for a system that handles a lot of important data. Managing IP allow lists and updating firewall ingress rules improves security but can be cumbersome to manage. Alternatives like PrivateLink are better, but they too can be cumbersome to setup and require your systems to be on the same public cloud and in the same region.

In this post I'm going to show you how to securely connect Snowflake to your private Apache Kafka broker, in just a few minutes. We will:

  • Setup a managed Kafka cluster in AWS
  • Prepare Snowflake to receive data from Kafka
  • Connect Snowflake to Kafka with a private encrypted connection - without needing to expose either system to the public internet!

The final architecture diagram will look like this:

Snowflake push to Kafka

Amazon Managed Streaming for Kafka (MSK)

We're going to provision an MSK cluster so we can see an end-to-end experience of data moving from Snowflake to Kafka. If you have an existing Kafka broker you're able to use you can skip this step.

Create an MSK cluster

Within your AWS Console search for MSK in the search field at the top and select the matching result. Visit the Clusters screen, and then click Create Cluster.

The Quick Create option provides a good set of defaults for creating a Kafka cluster, so unless you've previous knowledge or experience to know you might want something different I'd suggest just confirming the details and then clicking Create cluster at the bottom of the screen.

Once you've started the cluster creation it may take about 15 minutes for provisioning to complete and for your broker to be available.

Select a warehouse

Connect Snowflake to Kafka

Create a table to capture data

Run the code here in a Snowflake worksheet. It will create a new table.


_10
CREATE OR REPLACE CUSTOMERS (
_10
key STRING,
_10
id INTEGER,
_10
name STRING,
_10
email STRING
_10
);

We've created a table, Kafka is running. It's now time to connect the two! The next stage is going to complete the picture below, by creating a point-to-point connection between the two systems — without the need to expose any systems to the public internet!

Snowflake push to Amazon AWS Kafka

Get the app

The Snowflake Pull from Kafka Connector by Ockam is invite only, to get an invite please drop us an email at hello@ockam.io. Once invited you'll receive an email with a link to get the app.

Select a warehouse

The first screen you're presented with will ask you to select the warehouse to utilize to activate the app.

Grant account privileges

Click the Grant button to the right of this screen. The app will then be automatically granted permissions to create a warehouse and create a compute pool.

Activate app

Once the permissions grants complete, an Activate button will appear. Click it and the activation process will begin.

Launch app

After the app activates you'll see a page that summarizes the privileges that the application now has. There's nothing we need to review or update on these screens yet, so proceed by clicking the Launch app button.

Download Ockam Command

Run the following command on your local workstation:


_10
curl --proto '=https' --tlsv1.2 -sSfL \
_10
https://install.command.ockam.io \
_10
| bash && source "$HOME/.ockam/env"

This will install the Ockam Command and source in the required environment settings. If you've installed Ockam Command before you can skip this step.

Setup admin account

Once Ockam Command installation is complete you can run:


_10
ockam enroll

Wrapped up in this single ockam enroll command are several steps that will bootstrap your first project and get you ready to go. It will:

  • Generate an Ockam Identity and store its secret keys in a file system based Ockam Vault.
  • Create an account with Ockam Orchestrator.
  • Provision a trial Space and Project in the Orchestrator.
  • Make your Ockam Identity the administrator of your new Project.

Generate enrollment ticket for Kafka

In this section we're going to provision an Ockam node that will run alongside our Kafka broker, and provide one of the ends of our point-to-point connection.

Amazon MSK setup screen

We need to generate an enrollment ticket to allow a new Ockam Node to join the project that was just created. This node will run alongside the Kafka broker, in the network where the Kafka is running:


_10
ockam project ticket --usage-count 1 \
_10
--expires-in 24h --attribute kafka \
_10
--relay kafka > kafka.ticket

Get the appSelect a warehouseGrant account privilegesActivate appLaunch app

Launch Ockam node for Amazon MSK

The Ockam Node for Amazon MSK is a streamlined way to provision a managed Ockam Node within your private AWS VPC.

To deploy the node that will allow Snowflake to reach your Kafka broker visit the Ockam Node for Amazon MSK listing in the AWS Marketplace, and click the Continue to Subscribe button, and then Continue to Configuration.

On the configuration page choose the region that your Amazon MSK cluster is running in, and then click Continue to Launch followed by Launch.

Enter stack details

The initial Create Stack screen pre-fills the fields with the correct information for your node, so you can press Next to proceed.

Enter node configuration

This screen has important details to you need to fill in:

  • Stack name: Give this stack a recognisable name, you'll see this in various locations in the AWS Console. It'll also make it easier to clean these resources up later if you wish to remove them.
  • VPC ID: The ID of the Virtual Private Cloud network to deploy the node in. Make sure it's the same VPC where you've deployed your Amazon MSK cluster.
  • Subnet ID: Choose one of the subnets within your VPC, ensure MSK has a broker address available in that subnet.
  • Enrollment ticket: Copy the contents of the kafka.ticket file we created earlier and paste it in here.
  • Amazon MSK Bootstrap Server with Port: In the client configuration details for your Amazon MSK cluster you will have a list of bootstrap servers. Copy the hostname:port value for the Private endpoint that's in the same subnet you chose above.
  • JSON Node Configuration: Copy the contents of the kafka.json file we created earlier and paste it in here.

We've now completed the highlighted part of the diagram below, and our Kafka broker node is waiting for another node to connect.

Amazon MSK setup

Generate enrollment ticket for Snowflake

Setting up Ockam Node as native Snowflake app

One end of our connection is now setup, it's time to connect the Snowflake end. We need to generate an enrollment ticket to allow another Ockam Node to join our project. This node will run in our Snowflake warehouse:


_10
ockam project ticket \
_10
--usage-count 1 --expires-in 2h \
_10
--attribute snowflake > snowflake.ticket

Ockam node for Amazon MSKOckam node for Amazon MSK - create stack screenOckam node for Amazon MSK - node configuration screenCreate Snowflake ticket

Configure connection details

Take the contents of the file snowflake.ticket that we just created and paste it into "Provide the above Enrollment Ticket" form field in the "Configure app" setup screen in Snowflake.

Grant privileges

To be able to authenticate with Ockam Orchestrator and then discover the route to our outlet, the Snowflake app needs to allow outbound connections to your Ockam project. Toggle the Grant access to egress and reach your Project and approve the connection by pressing Connect.

Toggle the Grant access to the tables where Kafka messages must be inserted and select the CUSTOMERS table.

Map Kafka topics to Snowflake tables

Enter the name of the Kafka topic you want to map to the Snowflake table. In this example, the topic is customers.

Check "Messages are encoded with a schema" if you have a schema registry and the messages are encoded with a schema. The configuration to use to Launch Ockam node for Amazon MSK will need to be updated to include the schema registry details. Update $SCHEMA_REGISTRY_ADDRESS with the address of the schema registry. Make sure the ockam node has access to the schema registry.


_13
{
_13
"http-server-port": 23345,
_13
"relay": "kafka",
_13
"kafka-outlet": {
_13
"bootstrap-server": "$BOOTSTRAP_SERVER_WITH_PORT",
_13
"allow": "snowflake"
_13
},
_13
"tcp-outlet": {
_13
"to": "$SCHEMA_REGISTRY_ADDRESS:9081",
_13
"from": "schema_registry",
_13
"allow": "snowflake"
_13
}
_13
}

Update other options from default values if needed.

Create Snowflake ticketGrant egressGrant egress

With that, we've completed the last step in the setup. We've now got a complete point-to-point connection that allows our Snowflake warehouse to securely pull data through to our private Kafka broker.

Snowflake push to Kafka setup complete

Next steps

Any updates to your data in your Kafka topic will now create a new row in your Snowflake table.

Post the below message to the Kafka topic to verify the setup.

Replace $BROKER_ADDRESS with your actual Kafka broker address, and ensure the topic name (customers in this example) matches the one you've configured in your Snowflake Pull from Kafka Connector setup


_10
echo '{"key": "customer123", "id": 1001, "name": "John Doe", "email": "john.doe@example.com"}' | \
_10
kafka-console-producer --broker-list $BROKER_ADDRESS:9092 --topic customers

The Snowflake connector will then pull these messages from Kafka and insert them into your CUSTOMERS table, mapping the JSON fields to the corresponding columns.

If you'd like to explore some other capabilities of Ockam I'd recommend:

Previous Article

Access a Snowflake stage with WebDAV

Next Article

Build completely private APIs in Snowflake

Edit on Github

Build Trust

Learn

Get Started

Ockam Command

Programming Libraries

Cryptographic & Messaging Protocols

Documentation

Blog

© 2024 Ockam.io All Rights Reserved