Data Collecting for Snowflake
Mike Fuller, a consultant at Red Pill Analytics, has been working on ingesting data into Snowflake's cloud data warehouse using StreamSets for Snowflake. In this guest blog post, Mike explains how he...
View ArticleIngesting Data from Relational Databases to Cassandra with StreamSets
This post is summarized content from a full tutorial at https://academy.datastax.com/content/ingesting-data-relational-databases-cassandra-streamsets How do you ingest from an existing...
View ArticleCreating Dataflow Pipelines with Amazon Kinesis
Although the recent public preview of Amazon Managed Streaming for Kafka (MSK) certainly made headlines, Kinesis remains Amazon's supported, production, real-time streaming service. In this blog post,...
View ArticleExecute Machine Learning Jobs in MS Azure Databricks from StreamSets
In my previous blog post, I demonstrated how to achieve low-latency inference using Databricks ML models in StreamSets. Now let's say you have a dataflow pipeline that is ingesting data, enriching it,...
View ArticleScaling Data Collectors on Azure Kubernetes Service
In this blog post, I will present a step-by-step guide on how to scale Data Collector instances on Azure Kubernetes Service (AKS) using provisioning agents—which help automate upgrading and scaling...
View ArticleSolving Data Quality in Streaming Data Flows
Vinu Kumar is Chief Technologist at HorizonX, based in Sydney, Australia. Vinu helps businesses in unifying data, focusing on a centralized data architecture. In this guest post, reposted from the...
View ArticleJoin Us at the First Annual DataOps Summit
StreamSets is proud to be hosting the first annual DataOps Summit in San Francisco, California at the Hilton San Francisco Financial District on September 3rd-5th. The summit will feature a full day...
View ArticleBuilding a Slack Slash Command as a Microservice Pipeline
One of the drivers behind Slack‘s rise as an enterprise collaboration tool is its rich set of integration mechanisms. Bots can monitor channels for messages and reply as if they were users, apps can...
View ArticleIngestion for a Cyber Security Data Lake with Oracle and StreamSets
If you were lucky enough to get the gift of replacing your existing security event and incident system (SEIM) this year, then there is a chance your organization has considered building a cybersecurity...
View ArticleA Cost Comparison of a Cloudera Hadoop Cluster with StreamSets Ingestion...
Introduction It should come as no surprise that a Hadoop cluster and the public cloud go together like peanut butter and jelly because of scale, agility, and economy. It should come as even less of a...
View ArticleSensor data from Azure Event Hub to ADLS Gen2 and Azure SQL DWH
Data that’s in flight is perceived to be more challenging to work with than “landed” data sitting quietly in some storage platform. With the high volume, and variety of data constantly streaming in...
View ArticleBinary Classification of Streaming Data using TensorFlow to ADLS Gen1 and...
Over the past decade, digital transformation has evolved such that every system and device has a digital trail: from IT servers, to factory equipment, to consumer electronics, to buildings, to cars....
View ArticleField Mapper Processor: The Swiss Army Knife of Bulk Field Manipulation
Guest post by Jeff Evans, Senior Software Engineer, StreamSets. The Field Mapper processor, introduced in Data Collector version 3.8.0, provides a flexible and powerful way to manipulate fields en...
View ArticleCreating the OmniSci F1 Demo: Real-Time Data Ingestion With StreamSets
Randy Zwitch is a Senior Director of Developer Advocacy at OmniSci, enabling customers and community users alike to utilize OmniSci to its fullest potential. With broad industry experience in energy,...
View ArticleReplicating Oracle to MySQL and JSON
Yannick Jaquier is a Database Technical Architect at STMicroelectronics in Geneva, Switzerland. Recently, Yannick started experimenting with StreamSets Data Collector's Oracle CDC Client origin,...
View ArticleIngesting Data from Apache Kafka to TimescaleDB
The Glue Conference (better known as GlueCon) is always a treat for me. I've been speaking there since 2012, and this year I presented a session explaining how I use StreamSets Data Collector to ingest...
View ArticleAnnouncing StreamSets Data Collector 3.9.0 and StreamSets Data Collector Edge...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.9.0 and StreamSets Data Collector Edge 3.9.0. StreamSets Data Collector is open source under Apache License...
View ArticleFrom Zero to Production ETL in 30 minutes with StreamSets
Jeff Schmitz has been working with big data for over a decade: at Shell, Sanchez Energy, MapR and, currently, as a senior solutions architect at MongoDB. Here, in a guest post reposted with permission...
View ArticleEnhanced Error Diagnostics in StreamSets Data Collector 3.9.0
StreamSets Data Collector reads from and writes to a wide variety of data stores and messaging platforms. Any interaction with an external system brings with it the risk of an error, and error messages...
View ArticleA New Definition of DataOps
This is short post, but relevant. Ever since DataOps was started (about 5 years) it hasn’t had a well-adopted and common definition. Wikipedia is partially OK at: DataOps is an automated,...
View Article