Creating a Custom Origin for StreamSets Data Collector
Since writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I’ve been looking for a good use case for a custom origin tutorial. It’s been trickier than...
View ArticleContinuous Data Integration with StreamSets Data Collector and Spark...
I’m frequently asked, ‘How does StreamSets Data Collector (SDC) integrate with Spark Streaming? How about on Databricks?’. In this blog entry, I’ll explain how to use SDC to ingest data into a Spark...
View ArticleBuilding an Amazon SQS Custom Origin for StreamSets Data Collector
As I explained in my recent tutorial, Creating a Custom Origin for StreamSets Data Collector, it’s straightforward to extend StreamSets Data Collector (SDC) to ingest data from pretty much any source....
View ArticleCalling External Java Code from Script Evaluators
When you’re building a pipeline with StreamSets Data Collector (SDC), you can often implement the data transformations you require using a combination of ‘off-the-shelf’ processors. Sometimes, though,...
View ArticleData in Motion Evolution: Where We’ve Been…Where We Need to Go
Today we hear a lot about streaming data, fast data, and data in motion. But the truth is that we have always needed ways to move our data. Historically, the industry has been pretty inventive about...
View ArticleIngest Data into Splunk with StreamSets Data Collector
Splunk indexes and correlates log and machine data, providing a rich set of search, analysis and visualization capabilities. In this blog post, I’ll explain how to efficiently send high volumes of data...
View ArticleIngesting data into Couchbase using StreamSets Data Collector
Nick Cadenhead, a Senior Consultant at 9th BIT Consulting in Johannesburg, South Africa, uses Couchbase Server to power analytics solutions for his clients. In this blog entry, reposted from his...
View ArticleAnnouncing Data Collector ver 2.3.0.0
We’re excited to release the next version of the StreamSets Data Collector. This release has 80+ new features and improvements, and 150+ bug fixes. Multithreaded Pipelines We’ve updated the SDC...
View ArticleReplicating Relational Databases with StreamSets Data Collector
StreamSets Data Collector has long supported both reading and writing data from and to relational databases via Java Database Connectivity (JDBC). While it was straightforward to configure pipelines to...
View ArticleIngest Data into Azure Data Lake Store with StreamSets Data Collector
Azure Data Lake Store (ADLS) is Microsoft's cloud repository for big data analytic workloads, designed to capture data for operational and exploratory analytics. StreamSets Data Collector (SDC) version...
View ArticleRunning Scala Code in StreamSets Data Collector
The Spark Evaluator, introduced in StreamSets Data Collector (SDC) version 2.2.0.0, lets you run an Apache Spark application, termed a Spark Transformer, as part of an SDC pipeline. Back in December,...
View ArticleAnnouncing StreamSets Data Collector ver 2.4.0.0
We are happy to announce the newest version of StreamSets Data Collector is available for download. This short release has over 25 new features and improvements and over 50 bug fixes. This is an...
View ArticleRead and Write JSON to MapR DB with StreamSets Data Collector
MapR-DB is an enterprise-grade, high performance, NoSQL database management system. As a multi-model NoSQL database, it supports both JSON document models and wide column data models. MapR-DB stores...
View ArticleDrift Synchronization with StreamSets Data Collector and Azure Data Lake
One of the great things about StreamSets Data Collector is that its record-oriented architecture allows great flexibility in creating data pipelines – you can plug together pretty much any combination...
View ArticleTransform Data in StreamSets Data Collector
I've written quite a bit over the past few months about the more advanced aspects of data manipulation in StreamSets Data Collector (SDC) – writing custom processors, calling Java libraries from...
View ArticleInstalling StreamSets Data Collector on Amazon Web Services EC2
Mike Fuller, a consultant at Red Pill Analytics, recently wrote Stream Me Up (to the Cloud), Scotty, a tutorial on installing StreamSets Data Collector (SDC) on Amazon Web Services EC2. Mike's article...
View ArticleStreamSets Data Collector v2.5 Adds IoT, Spark, Performance and Scale
We’re thrilled to announce version 2.5 of StreamSets Data Collector, a major release which includes important functionality related to the Internet of Things (IoT), high-performance database ingest,...
View ArticleMaking Sense of Stream Processing
There has been an explosion of innovation in open source stream processing over the past few years. Frameworks such as Apache Spark and Apache Storm give developers stream abstractions on which they...
View ArticleCreating a Custom Multithreaded Origin for StreamSets Data Collector
Multithreaded Pipelines, introduced a couple of releases back, in StreamSets Data Collector (SDC) 2.3.0.0, enable a single pipeline instance to process high volumes of data, taking full advantage of...
View ArticleCreate a Custom Expression Language Function for StreamSets Data Collector
One of the most powerful features in StreamSets Data Collector (SDC) is support for Expression Language, or ‘EL' for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing...
View Article