Quantcast
Channel: StreamSets
Browsing all 475 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Creating a Custom Origin for StreamSets Data Collector

Since writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I’ve been looking for a good use case for a custom origin tutorial. It’s been trickier than...

View Article


Image may be NSFW.
Clik here to view.

Continuous Data Integration with StreamSets Data Collector and Spark...

I’m frequently asked, ‘How does StreamSets Data Collector (SDC) integrate with Spark Streaming? How about on Databricks?’. In this blog entry, I’ll explain how to use SDC to ingest data into a Spark...

View Article


Image may be NSFW.
Clik here to view.

Building an Amazon SQS Custom Origin for StreamSets Data Collector

As I explained in my recent tutorial, Creating a Custom Origin for StreamSets Data Collector, it’s straightforward to extend StreamSets Data Collector (SDC) to ingest data from pretty much any source....

View Article

Image may be NSFW.
Clik here to view.

Calling External Java Code from Script Evaluators

When you’re building a pipeline with StreamSets Data Collector (SDC), you can often implement the data transformations you require using a combination of ‘off-the-shelf’ processors. Sometimes, though,...

View Article

Image may be NSFW.
Clik here to view.

Data in Motion Evolution: Where We’ve Been…Where We Need to Go

Today we hear a lot about streaming data, fast data, and data in motion. But the truth is that we have always needed ways to move our data.  Historically, the industry has been pretty inventive about...

View Article


Image may be NSFW.
Clik here to view.

Ingest Data into Splunk with StreamSets Data Collector

Splunk indexes and correlates log and machine data, providing a rich set of search, analysis and visualization capabilities. In this blog post, I’ll explain how to efficiently send high volumes of data...

View Article

Image may be NSFW.
Clik here to view.

Ingesting data into Couchbase using StreamSets Data Collector

Nick Cadenhead, a Senior Consultant at 9th BIT Consulting in Johannesburg, South Africa, uses Couchbase Server to power analytics solutions for his clients. In this blog entry, reposted from his...

View Article

Announcing Data Collector ver 2.3.0.0

We’re excited to release the next version of the StreamSets Data Collector. This release has 80+ new features and improvements, and 150+ bug fixes. Multithreaded Pipelines We’ve updated the SDC...

View Article


Image may be NSFW.
Clik here to view.

Replicating Relational Databases with StreamSets Data Collector

StreamSets Data Collector has long supported both reading and writing data from and to relational databases via Java Database Connectivity (JDBC). While it was straightforward to configure pipelines to...

View Article


Image may be NSFW.
Clik here to view.

Ingest Data into Azure Data Lake Store with StreamSets Data Collector

Azure Data Lake Store (ADLS) is Microsoft's cloud repository for big data analytic workloads, designed to capture data for operational and exploratory analytics. StreamSets Data Collector (SDC) version...

View Article

Image may be NSFW.
Clik here to view.

Running Scala Code in StreamSets Data Collector

The Spark Evaluator, introduced in StreamSets Data Collector (SDC) version 2.2.0.0, lets you run an Apache Spark application, termed a Spark Transformer, as part of an SDC pipeline. Back in December,...

View Article

Announcing StreamSets Data Collector ver 2.4.0.0

We are happy to announce the newest version of StreamSets Data Collector is available for download. This short release has over 25 new features and improvements and over 50 bug fixes. This is an...

View Article

Image may be NSFW.
Clik here to view.

Read and Write JSON to MapR DB with StreamSets Data Collector

MapR-DB is an enterprise-grade, high performance, NoSQL database management system. As a multi-model NoSQL database, it supports both JSON document models and wide column data models. MapR-DB stores...

View Article


Image may be NSFW.
Clik here to view.

Drift Synchronization with StreamSets Data Collector and Azure Data Lake

One of the great things about StreamSets Data Collector is that its record-oriented architecture allows great flexibility in creating data pipelines – you can plug together pretty much any combination...

View Article

Image may be NSFW.
Clik here to view.

Transform Data in StreamSets Data Collector

I've written quite a bit over the past few months about the more advanced aspects of data manipulation in StreamSets Data Collector (SDC) – writing custom processors, calling Java libraries from...

View Article


Image may be NSFW.
Clik here to view.

Installing StreamSets Data Collector on Amazon Web Services EC2

Mike Fuller, a consultant at Red Pill Analytics, recently wrote Stream Me Up (to the Cloud), Scotty, a tutorial on installing StreamSets Data Collector (SDC) on Amazon Web Services EC2. Mike's article...

View Article

StreamSets Data Collector v2.5 Adds IoT, Spark, Performance and Scale

We’re thrilled to announce version 2.5 of StreamSets Data Collector, a major release which includes important functionality related to the Internet of Things (IoT), high-performance database ingest,...

View Article


Image may be NSFW.
Clik here to view.

Making Sense of Stream Processing

There has been an explosion of innovation in open source stream processing over the past few years. Frameworks such as Apache Spark and Apache Storm give developers stream abstractions on which they...

View Article

Image may be NSFW.
Clik here to view.

Creating a Custom Multithreaded Origin for StreamSets Data Collector

Multithreaded Pipelines, introduced a couple of releases back, in StreamSets Data Collector (SDC) 2.3.0.0, enable a single pipeline instance to process high volumes of data, taking full advantage of...

View Article

Image may be NSFW.
Clik here to view.

Create a Custom Expression Language Function for StreamSets Data Collector

One of the most powerful features in StreamSets Data Collector (SDC) is support for Expression Language, or ‘EL' for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing...

View Article
Browsing all 475 articles
Browse latest View live