Start with Why: Data Drift
Today, after a year of working in stealth mode with a number of enterprise charter customers, we are excited to launch StreamSets. Arvind and I started StreamSets in June 2014 because, as they say in...
View ArticleState of the Art Data Ingestion
Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-transactional, machine-generated...
View ArticleWhat is StreamSets?
StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion....
View ArticleIntroducing the StreamSets Data Collector (video)
Wondering how the StreamSets Data Collector works? Have a look at this quick 4 minute introduction to the software. The post Introducing the StreamSets Data Collector (video) appeared first on...
View ArticleUsing Open Source StreamSets to Tackle Data Drift (video)
Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of...
View ArticleIngesting Streaming Data from JMS into HDFS and Solr using StreamSets
A step-by-step walkthrough of how Mac Noland implemented StreamSets to move away from hand-coded ETL and scale out an increasingly complex ingestion pipeline. Mac is a Solution Architect for phData, a...
View ArticleElasticsearch plus StreamSets for Reliable Data Ingestion
StreamSets Data Collector is open source software that lets you easily build continuous data ingestion pipelines for Elasticsearch. By being resistant to “data drift”, StreamSets minimizes...
View ArticleUse Cloudera Manager to Install StreamSets Data Collector in Minutes (video)
You can now install StreamSets Data Collector in minutes using Cloudera Manager. Watch this short video clip to see how easy it is. Download Open Source StreamSets Data Collector at...
View ArticleStreamSets Monitoring with Grafana, InfluxDB, and jmxtrans
The ability to monitor your critical infrastructure is a must, and we designed the StreamSets Data Collector (SDC) with this in mind: metrics are exposed through both the REST API and JMX. While there...
View ArticleAnnouncing StreamSets Data Collector 1.2.0.0
We are very excited to announce the next version of the StreamSets Data Collector. This version has seen over 250 JIRAs with a host of new features, performance enhancements and bug fixes. Without...
View ArticleContinuous Ingest in the Face of Data Drift (reposted from the Cloudera...
Big data has come a long way, with adoption accelerating as CIOs recognize the business value of extracting insights from the troves of data collected by their companies and business partners. But, as...
View ArticleContinuous Ingest in the Face of Data Drift – Part 2 (from the Cloudera...
In my previous post I discussed the causes and impacts of data drift, a natural consequence of Big Data which creates serious data quality and data pipeline operational issues. Now I will describe the...
View ArticleContinuous Ingest to Elasticsearch (video)
You can send log files to Elasticsearch using StreamSets Data Collector. Watch this short step-by-step to see how. https://vimeo.com/152097120 The post Continuous Ingest to Elasticsearch (video)...
View ArticleHow-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and...
The following was re-posted from the Cloudera Engineering Blog. Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source,...
View ArticleAnnouncing StreamSets Data Collector ver 1.2.1.0
We’re happy to announce a new version of the StreamSets Data Collector. This version has a number of bug fixes and – most importantly – support for Elasticsearch 2.x. Updates to the Elasticsearch...
View ArticleSimple Kafka Enablement Using StreamSets (video)
You can simplify use of Kafka within your infrastructure using StreamSets Data Collector. Watch this short step-by-step tutorial to learn how. The post Simple Kafka Enablement Using StreamSets (video)...
View ArticleBinlog Processing Using Maxwell, Kafka & StreamSets
This is a nice example of Kafka enablement using Maxwell (a mysql-to-kafka binlog processor) and StreamSets Data Collector from the folks at B23. It includes a schema change listener for handling...
View ArticleBuilding a Real-Time Retail Analytics Solution with StreamSets, MapR Streams...
Today’s complex retail applications have changed dramatically and in order to compete, enterprises must adopt new strategies for working with data. Big data and Hadoop enable retailers to connect with...
View ArticleAnnouncing StreamSets Data Collector ver 1.2.2.0
We’re happy to announce a new version of the StreamSets Data Collector. This release marks the first in a series of integrations with the MapR Converged Data Platform. We’ve also added a few...
View ArticleGetting Started with StreamSets Data Collector
Hi, I’m Pat Patterson, newly minted ‘community champion’ here at StreamSets. As I get up to speed with big data in general and StreamSets Data Collector (SDC) in particular, I’ll write up my exploits...
View Article