Ingest data into Azure Synapse Analytics (formerly SQL DW) with StreamSets Cloud

Azure SQL DW Azure Synapse Analytics, the next evolution of Azure SQL Data Warehouse, combines enterprise data warehousing and big data analytics into a single analytics service. StreamSets Cloud‘s new Azure SQL Data Warehouse destination, released today, loads data into Azure Synapse.

Loading data into Azure SQL Data Warehouse destination is a two-stage process. First, data must be written to Azure Storage, then loaded into staging tables in Azure SQL Data Warehouse. The Azure SQL Data Warehouse destination automates this process – all you need to do is to configure the data warehouse and ADLS locations and credentials. The destination can even automatically create a table for you based on the data you are loading.

Ingesting Data into Azure SQL Data Warehouse

Let’s look at a simple use case: loading transactional data into Azure SQL Data Warehouse from Amazon S3. We’ll use New York City taxi data; in our input data set, each transaction contains an accounting of the fare and its components, pickup and dropoff timestamps and coordinates, payment type, and, optionally, a credit card number. Let’s assume we only want to load credit card transactions into Azure SQL Data Warehouse, and we don’t want the actual credit card numbers.

Here’s the StreamSets Cloud pipeline:

You can download the pipeline here and import it into StreamSets Cloud. Sign up for your free trial, if you haven’t already done so!

This short video walks through the pipeline and shows StreamSets Cloud writing data to Azure SQL Data Warehouse.

In this example we’re using the Amazon S3 origin, but we could just as easily read data from Google Cloud Storage, Oracle, Salesforce, or any other data source supported by StreamSets Cloud.

The origin is configured to read CSV data from an S3 bucket. As you saw in the video, for this use case I configured the origin with a single filename and set the pipeline to stop when it finishes reading data from that file. I could instead have used a wildcard, say *.csv, and let the pipeline run continuously, streaming data from S3 to Azure SQL Data Warehouse as it becomes available.

S3 origin

The Stream Selector processor filters records based on some set of conditions. The expression ${record:value('/payment_type')=='CRD'} matches records with payment_type set to CRD; all other records are discarded. StreamSets Cloud’s Expression Language includes a rich set of functions, allowing you to model a wide variety of business logic in your pipelines.

Stream Selector

Next, a Field Remover processor removes the credit_card field – we don’t want this sensitive data in our data warehouse!

Field Remover

Fields read from CSV files are interpreted as strings; we need the Field Type Converter to convert them to float, integer or datetime values as appropriate so that the Azure SQL Data Warehouse destination uses the appropriate data types when it creates the table in Azure SQL Data Warehouse – we don’t want all the columns to be simply VARCHAR!

Field Type Converter

Finally, the destination writes the data to an Azure SQL Data Warehouse table. As mentioned above, you need to configure both data warehouse and staging. Note that the data warehouse user will need INSERT and ADMINISTER DATABASE BULK OPERATIONS permissions, plus, optionally, CREATE TABLE if ‘auto create table’ is enabled.

Azure SQL DW 1

You can configure the destination to leave the staging files in place, which can be useful for debugging purposes, or purge them automatically after they are loaded into the data warehouse.

Azure SQL DW 2

In this simple example, we define the schema and table directly, but we could use expression language to set the values dynamically. For example, if we were reading fields from a hierarchy of paths in S3, we could use the path to set the table name with an expression such as ${file:pathElement(record:attribute('Name'), 0)}.

Azure SQL DW 3

StreamSets Cloud’s Azure SQL Data Warehouse destination manages the process of writing data to staging and then loading data warehouse tables. The destination uses the ‘hash’ distribution strategy, recommended to take advantage of Azure SQL Data Warehouse’s massively parallel architecture.

Conclusion

StreamSets Cloud‘s new Azure SQL Data Warehouse destination makes it easy to load data into tables in Azure Synapse Analytics (formerly SQL Data Warehouse). Configure the data warehouse and staging location and credentials, and the destination does the rest. You can build pipelines for one-time batch operations, or continuously stream data as it arrives. Sign up for a free trial of StreamSets Cloud and try it for yourself!

The post Ingest data into Azure Synapse Analytics (formerly SQL DW) with StreamSets Cloud appeared first on StreamSets.

Ingest data into Azure Synapse Analytics (formerly SQL DW) with StreamSets Cloud

Ingesting Data into Azure SQL Data Warehouse

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112