Quantcast
Channel: StreamSets
Browsing all 475 articles
Browse latest View live

Start with Why: Data Drift

Today, after a year of working in stealth mode with a number of enterprise charter customers, we are excited to launch StreamSets. Arvind and I started StreamSets in June 2014 because, as they say in...

View Article


Image may be NSFW.
Clik here to view.

State of the Art Data Ingestion

Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-­transactional, machine­-generated...

View Article


What is StreamSets?

StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion....

View Article

Introducing the StreamSets Data Collector (video)

Wondering how the StreamSets Data Collector works? Have a look at this quick 4 minute introduction to the software. The post Introducing the StreamSets Data Collector (video) appeared first on...

View Article

Using Open Source StreamSets to Tackle Data Drift (video)

Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of...

View Article


Image may be NSFW.
Clik here to view.

Ingesting Streaming Data from JMS into HDFS and Solr using StreamSets

A step-by-step walkthrough of how Mac Noland implemented StreamSets to move away from hand-coded ETL and scale out an increasingly complex ingestion pipeline. Mac is a Solution Architect for phData, a...

View Article

Elasticsearch plus StreamSets for Reliable Data Ingestion

StreamSets Data Collector is open source software that lets you easily build continuous data ingestion pipelines for Elasticsearch. By being resistant to “data drift”, StreamSets minimizes...

View Article

Use Cloudera Manager to Install StreamSets Data Collector in Minutes (video)

You can now install StreamSets Data Collector in minutes using Cloudera Manager. Watch this short video clip to see how easy it is.   Download Open Source StreamSets Data Collector at...

View Article


Image may be NSFW.
Clik here to view.

StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans

The ability to monitor your critical infrastructure is a must, and we designed the StreamSets Data Collector (SDC) with this in mind: metrics are exposed through both the REST API and JMX. While there...

View Article


Announcing StreamSets Data Collector 1.2.0.0

We are very excited to announce the next version of the StreamSets Data Collector. This version has seen over 250 JIRAs with a host of new features, performance enhancements and bug fixes. Without...

View Article

Continuous Ingest in the Face of Data Drift (reposted from the Cloudera...

Big data has come a long way, with adoption accelerating as CIOs recognize the business value of extracting insights from the troves of data collected by their companies and business partners. But, as...

View Article

Image may be NSFW.
Clik here to view.

Continuous Ingest in the Face of Data Drift – Part 2 (from the Cloudera...

In my previous post I discussed the causes and impacts of data drift, a natural consequence of Big Data which creates serious data quality and data pipeline operational issues. Now I will describe the...

View Article

Continuous Ingest to Elasticsearch (video)

You can send log files to Elasticsearch using StreamSets Data Collector. Watch this short step-by-step to see how. https://vimeo.com/152097120 The post Continuous Ingest to Elasticsearch (video)...

View Article


Image may be NSFW.
Clik here to view.

How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and...

The following was re-posted from the Cloudera Engineering Blog. Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source,...

View Article

Announcing StreamSets Data Collector ver 1.2.1.0

We’re happy to announce a new version of the StreamSets Data Collector. This version has a number of bug fixes and – most importantly – support for Elasticsearch 2.x. Updates to the Elasticsearch...

View Article


Simple Kafka Enablement Using StreamSets (video)

You can simplify use of Kafka within your infrastructure using StreamSets Data Collector. Watch this short step-by-step tutorial to learn how. The post Simple Kafka Enablement Using StreamSets (video)...

View Article

Image may be NSFW.
Clik here to view.

Binlog Processing Using Maxwell, Kafka & StreamSets

This is a nice example of Kafka enablement using Maxwell (a mysql-to-kafka binlog processor) and StreamSets Data Collector from the folks at B23.   It includes a schema change listener for handling...

View Article


Image may be NSFW.
Clik here to view.

Building a Real-Time Retail Analytics Solution with StreamSets, MapR Streams...

Today’s complex retail applications have changed dramatically and in order to compete, enterprises must adopt new strategies for working with data. Big data and Hadoop enable retailers to connect with...

View Article

Announcing StreamSets Data Collector ver 1.2.2.0

We’re happy to announce a new version of the StreamSets Data Collector. This release marks the first in a series of integrations with the MapR Converged Data Platform.  We’ve also added a few...

View Article

Image may be NSFW.
Clik here to view.

Getting Started with StreamSets Data Collector

Hi, I’m Pat Patterson, newly minted ‘community champion’ here at StreamSets. As I get up to speed with big data in general and StreamSets Data Collector (SDC) in particular, I’ll write up my exploits...

View Article
Browsing all 475 articles
Browse latest View live