StreamSets

↧

Announcing Data Collector ver 1.5.1.0

July 22, 2016, 9:56 am

We’re happy to announce a version release of StreamSets Data Collector. This is a relatively minor mid term update with a number of important bug fixes, yet packs in a couple of fun features. Support...

View Article

Image may be NSFW.
Clik here to view.

Standard Deviations on Cassandra – Rolling Your Own Aggregate Function

July 28, 2016, 6:18 am

If you’ve been following the StreamSets blog over the past few weeks, you’ll know that I’ve been building an Internet of Things testbed on the Raspberry Pi. First, I got StreamSets Data Collector (SDC)...

View Article

Image may be NSFW.
Clik here to view.

Dynamic Outlier Detection with StreamSets and Cassandra

August 19, 2016, 6:11 am

This blog post concludes a short series building up a IoT sensor testbed with StreamSets Data Collector (SDC), a Raspberry Pi and Apache Cassandra. Previously, I covered: Part 1: Ingesting Sensor Data...

View Article

Image may be NSFW.
Clik here to view.

Whole File Transfer with StreamSets Data Collector

August 22, 2016, 6:14 am

A key aspect of StreamSets Data Collector (SDC) is its ability to parse incoming data, giving you unprecedented flexibility in processing data flows. Sometimes, though, you don’t need to see ‘inside’...

View Article

Announcing Data Collector ver 1.6.0.0

August 31, 2016, 8:56 pm

It’s been a busy summer here at StreamSets, we’ve been enabling some exciting use-cases for our customers, partners and the community of open-source users all over the world. We are excited to announce...

View Article

Image may be NSFW.
Clik here to view.

Ingesting Drifting Data into Hive and Impala

September 8, 2016, 6:38 am

Importing data into Apache Hive is one of the most common use cases in big data ingest, but gets tricky when data sources ‘drift’, changing the schema or semantics of incoming data. Introduced in...

View Article

Image may be NSFW.
Clik here to view.

StreamSets Data Collector in Action at IBM Ireland

September 12, 2016, 5:32 am

After Guglielmo Iozzia, a big data infrastructure engineer on the Ethical Hacking Team at IBM Ireland, recently spoke about building data pipelines using StreamSets Data Collector at Hadoop User Group...

View Article

Image may be NSFW.
Clik here to view.

Introducing StreamSets DPM – Operational Control of Your Data in Motion

September 12, 2016, 7:58 am

Friends of StreamSets, Today I am delighted to announce our new product, StreamSets Dataflow Performance Manager, or DPM, the industry’s first solution for managing operations of a company’s end-to-end...

View Article

Image may be NSFW.
Clik here to view.

Creating a Post-Lambda World with Apache Kudu

September 23, 2016, 8:59 am

Apache Kudu and Open Source StreamSets Data Collector Simplify Batch and Real-Time Processing As originally posted on the Cloudera VISION Blog. At StreamSets, we come across dataflow challenges for a...

View Article

Image may be NSFW.
Clik here to view.

MySQL Database Change Capture with MapR Streams, Apache Drill, and StreamSets

September 26, 2016, 6:14 am

Today’s post is from Raphaël Velfre, a senior data engineer at MapR. Raphaël has spent some time working with StreamSets Data Collector (SDC) and MapR’s Converged Data Platform. In this blog entry,...

View Article

Image may be NSFW.
Clik here to view.

Announcing StreamSets Data Collector version 2.0

September 27, 2016, 1:55 am

Last October, we publicly announced StreamSets Data Collector version 1.0. Over the last 12 months we have seen an awesome (a word we don’t use lightly) amount of adoption of our first product – from...

View Article

Image may be NSFW.
Clik here to view.

Visualizing NetFlow Data with StreamSets Data Collector, Kudu, Impala and D3

October 13, 2016, 6:31 am

Sandish Kumar, a Solutions Engineer at phData, builds and manages solutions for phData customers. In this article, reposted from the phData blog, he explains how to generate simulated NetFlow data,...

View Article

Announcing Data Collector ver 2.1.0.0

October 13, 2016, 10:53 am

We’re happy to announce a new release of the Data Collector. This minor release has over 30+ bug fixes and a number of improvements and a few new features : A Package Manager that allows you to...

View Article

Image may be NSFW.
Clik here to view.

Creating a Custom Processor for StreamSets Data Collector

October 19, 2016, 6:25 am

Back in March, I wrote a tutorial showing how to create a custom destination for StreamSets Data Collector (SDC). Since then I’ve been looking for a good sample use case for a custom processor. It’s...

View Article

Image may be NSFW.
Clik here to view.

The Challenge of Fetching Data for Apache Spot (incubating)

October 19, 2016, 8:35 am

Reposted from the Cloudera Vision blog. What do Sony, Target and the Democratic Party have in common? Besides being well-respected brands, they’ve all been subject to some very public and embarrassing...

View Article

Image may be NSFW.
Clik here to view.

Contributing to the StreamSets Data Collector Community

October 25, 2016, 6:20 am

As you likely already know, StreamSets Data Collector (SDC) is open source, made available via the Apache 2.0 license. The entire source code for the product is hosted in a GitHub project and the...

View Article

Image may be NSFW.
Clik here to view.

More Than One Third of the Fortune 100 Have Downloaded StreamSets Data Collector

November 9, 2016, 4:29 pm

It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software...

View Article

Image may be NSFW.
Clik here to view.

Upgrading From Apache Flume to StreamSets Data Collector

November 30, 2016, 9:05 pm

Apache Flume “is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data”. The typical use case is collecting log data and pushing...

View Article

Announcing Data Collector ver 2.2.0.0

December 1, 2016, 7:03 pm

And here it is folks, the last release of 2016 – StreamSets Data Collector version 2.2.0.0. We’ve put in a host of important new features and resolved 120+ bugs. We’re gearing up for a solid roadmap in...

View Article

Image may be NSFW.
Clik here to view.

Running Apache Spark Code in StreamSets Data Collector

December 8, 2016, 11:14 am

New in StreamSets Data Collector (SDC) 2.2.0.0 is the Spark Evaluator, a processor stage that allows you to run an Apache Spark application, termed a Spark Transformer, as part of an SDC pipeline. With...

View Article