Setup Event Syncs from Rudderstack to Snowflake
Overview
This guide describes the process of connecting your Rudderstack instance with Houseware. This connection is required to ensure all your events and user data with all its properties is available to you on your custom Houseware instance to start analyzing.
The entire setup is divided into two parts:
Part I: π οΈ Setting up events to connect with Houseware
Part I helps you setup Rudderstack in the ideal format, sync the events data in a Snowflake instance, and share access to the new DB with the Houseware application. Details about this part are covered in the guide below.
Part II: π Going live with your Houseware instance
Part II will help you connect Houseware with the transformed event data created and stored in Part I above and to a new Snowflake warehouse to run Houseware queries on. Detailed scripts to set up the warehouse, and grant necessary permissions to the Houseware application are given in a separate doc here: How to Connect Snowflake to Houseware
Important Info: Both Part I and Part II are sequential. Hence, you will first have to complete Part I before moving on to Part II or to Secure Data Share setup to successfully setup your Houseware instance.
How will this guide help
The primary intent of this guide is to lay out the entire process, step by step, for your team so that we are aware of all requirements and can preempt any bottlenecks.
The exact code snippets will help your IT and data engineering team configure the setup. For any further queries, doubts, or questions that are not covered here, our Houseware team will be available to support you!
Step-by-Step Guide
Follow the steps listed below to prepare your raw events data in the schema compatible with the Houseware application.
Step 1: Finalize the events to connect with Houseware
Based on the product analytics use cases you want to enable on Houseware, identify the list of events that you will need to drive them.
This step will ensure that you start connecting with the most important events for you and your team.
Also, ensure that your Rudderstack sources are ingesting events accurately, and you have a Snowflake instance connected as your destination. You can refer to Rudderstack's documentation for the same.
Step 2: Transform events in Rudderstack Transformations
To make all events, users, and their properties available on your Houseware instance to analyze and visualize, your events shall be transformed with the following JS snippet in Rudderstack's Control Plane
export function transformEvent(event, metadata) {
if (!event.integrations) event.integrations = {};
event.integrations["SNOWFLAKE"] = {
options: {
jsonPaths: ["track.context.nested_properties"]
}
}
event.context["nested_properties"] = event.properties;
event.context["rudder_id"] = event.rudderId;
return event;
}
The above snippet will help sync the events in the prescribed schema.
Note: If your events are already being synced to the warehouse, then the historic stream of events in your data warehouse needs to be transformed as well. This can be easily done using Housewareβs open-sourced dbt transformations, converting Rudderstack event streams into a format accepted by Houseware.
Step 3: Create a separate Snowflake DB
A new database will be needed to store your transformed events data in the required schema described in detail in the next step.
Here is a code snippet that will help you create a new database:
-- creating variables for database, schema
set db_name='HOUSEWARE_ANALYTICS_DB';
set schema_name='PRODUCT_ANALYTICS';
create database if not exists identifier($db_name);
use database identifier($db_name);
create schema if not exists identifier($schema_name);
Here is how a new DB will help:
- A new database (DB) will be cleaner and easier for your team to give read-write or only read access in case of Secure Data Share to the Houseware application.
- It will also help in the separation of concerns and troubleshooting.
Step 4: Setup Views in your Data Warehouse
To make all events, users, and their properties available on your Houseware instance to analyze and visualize, setup a Snowflake view into the schema given below and name it -allevents
.
Column Name | Data type | Details |
---|---|---|
*event_id | String (UUID) | Unique identifier for the event |
*device_id | String (UUID) | Identifier for the device from which the event was captured. Can be a cookie/ anonymous user id |
*user_id | String (UUID) | Unique identifier of the user, generated after the user has logged in. Null values for this field indicate that the user has not logged in |
*device_ts | Timestamp in UTC | The time when the event was captured on the device. Used for ordering events |
*server_ts | Timestamp in UTC | The time when the event was received at the server. Used for restricting events to certain periods of interest |
*event_name | String | Name of the event |
*properties | JSON | Free-flowing properties associated with the event. Try and stick to event-specific properties only here. Any common event properties or user properties can be kept as a column of their own (as shown below as event_dimension1). Useful for filters or breakdown. |
<event_dimension1> | String | Event dimension that is applicable for all the events. Useful for filters or breakdown. |
<event_dimension2> | String | Event dimension that is applicable for all the events. Useful for filters or breakdown. |
<user_dimension1> | String | User dimension that is applicable for all the events. Useful for filters or breakdown. |
<user_dimension2> | String | User dimension that is applicable for all the events. Useful for filters or breakdown. |
Note : All properties common across all events should be added to the
event_dimension
columns. This helps us leverage Snowflake's columnar storage and run your analysis more efficiently. All properties specific to an event should be part of the properties JSON.
Important:
- The table/view should be named as
allevents
only.- All columns marked with an
*
are necessary and not optional to include in the table.- The column names should be exactly the same as given in the schema above. However, event dimension columns like
event_dimension1
and user dimension columns likeuser_dimension_1
can be named as per your choice.- The properties column in the schema should be a JSON with Object data type. The data type for the
key
will be a string, andvalue
can be a boolean, integer, float, date, or string. Nested JSON object/array are not supported.- The time columns
device_ts
andserver_ts
in the above schema should be in UTC in the following format:YYYY-MM-DD HH24:MI:SS
, for example2023-08-06 15:04:06
βOptimized Queries: To ensure efficient queries run on top of the allevents
table/view make sure to create a cluster key using the columns server_ts
and event_name
. Follow the code snippet given below to add the clustering key to theallevents
table.
ALTER TABLE HOUSEWARE_ANALYTICS_DB.PRODUCT_ANALYTICS.ALLEVENTS CLUSTER BY
(SERVER_TS, EVENT_NAME)
To know more about clustering and micro-partitions in Snowflake, read here π What is clustering.
Note : Houseware's application will need read-and-write access to the new DB where this transformed event schema is stored. The Houseware application periodically generates and writes metadata to this DB to power your Houseware instance.
However, if you are using the secure data share model, no write access will be needed. Secure data share only allows read access. The metdata tables generated will be written to Houseware's Snowflake account only.
As the Note above mentions, your transformed event schema is also used to auto-generate additional metadata tables.
They are primarily used to generate event and user details like their names, properties, timestamps, and other information to pre-fill this information on your Houseware instance. Data from these tables eases your visualization creation process, as specific event names, properties, etc., will automatically show up in the breakdown and filter drop-down menus.
Note : The metadata tables will also be stored in the same DB as the transformed events schema and accessible to you.
Great start! π You are already done with Part I of the setup process. To connect your transformed event data with Houseware and complete your setup, head over to the Part II guide here : π How to Connect Snowflake to Houseware