What’s the Biggest Lot in the City of San Francisco?
After building my first pipeline with StreamSets Data Collector, I wanted to give the framework more of a workout. I’ve spent a lot of time working with JSON data over the past few years, and the...
View ArticleVisualizing Apache Log Data… with Minecraft!
A key differentiator of StreamSets Data Collector (SDC) is that it operates in continuous mode – set a pipeline running and it will continue to read files from a directory or take messages from a...
View ArticleHow Trend Micro Uses StreamSets – An Interview with the Threat Research Team
The Forward-Looking Threat Research team at Trend Micro were early adopters of StreamSets Data Collector. They use StreamSets to ingest data from a wide variety of sources to create a Threat...
View ArticleNew Tutorial: Creating a Custom StreamSets Destination
One of the first things I hear after I explain the basics of StreamSets Data Collector is, “Cool, so can I ingest data from/send data to X?”, for varying values of X. The short answer is, “Yes, you...
View ArticleIntegrating StreamSets with Salesforce Wave Analytics
In my last blog entry I explained how you can write custom destinations to send data to systems not currently supported by StreamSets Data Collector. As you might know, my last gig was as a developer...
View ArticleData in Motion: Simplifying Security & Building Custom Integrations
At the Strata+Hadoop World conference last week, I met with Pratik Verma, Chief Product Officer at BlueTalon, a Bay Area startup focused on big data security. As Pratik and I were talking, he explained...
View ArticleAnnouncing Data Collector ver 1.3.0.0
With this release we have a number of exciting new features and integrations. And as usual, we’ve addressed a number of bug fixes. Integrations: Want to send data to Amazon Redshift? Use the new...
View ArticleIngesting JSON Data Into Apache Kudu with StreamSets Data Collector
At the Hadoop Summit in Dublin this week, Ted Malaska, Principal Solutions Architect at Cloudera, and I presented Ingest and Stream Processing – What Will You Choose?, looking at the big data...
View ArticleUsing StreamSets to Ingest Salesforce Data for Analysis
As I’ve mentioned a couple of times, my previous gig was as a developer evangelist at Salesforce, with particular focus on integration. A few weeks ago, I wrote a custom destination allowing StreamSets...
View ArticleMay the 4th Be With You – Analyzing Star Wars Twitter Mentions in Minecraft
A couple of weeks ago, as May the 4th approached, a lively Star Wars debate brewed at StreamSets: “Do new school characters get as much play as old favorites like Darth Vader, Yoda and Han Solo?” “Does...
View ArticleThe Complementary Nature of Data Ingestion and Data Preparation
I am always eager to learn about new architectures and best big data practices. Recently I came across a paper from Trifacta discussing the role of data preparation and it got me thinking about the...
View ArticleAnnouncing Data Collector ver 1.4.0.0
We are excited to announce the release of the next version of StreamSets Data Collector. With this release we have a number of new features and enhancements and 60+ bug fixes. New Features : SFTP/FTP...
View ArticleAnalyzing Salesforce Data with StreamSets, Elasticsearch, and Kibana
After I published a proof-of-concept Salesforce Origin for StreamSets Data Collector (SDC), I noticed an article on the Elastic blog, Analyzing Salesforce Data with Logstash, Elasticsearch, and Kibana....
View ArticleIngesting MQTT Traffic into Riak TS via RabbitMQ and StreamSets
Riak KV is an open source, distributed, NoSQL key-value data store oriented towards high availability, fault tolerance and scalability. With its initial release in 2009, Risk KV is in use at companies...
View ArticleVisualize StreamSets Data Collector Metrics with Datadog
Back in January, Adam blogged about StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans. While the Grafana/InfluxDB/jmxtrans open source stack works great, there’s quite a lot of setup and...
View ArticleIngesting Sensor Data on the Raspberry Pi with StreamSets Data Collector
In the unlikely event you’re not familiar with the Raspberry Pi, it’s an ARM-based computer about the same size as a deck of playing cards. The latest iteration, Raspberry Pi 3, has a 1.2GHz ARMv8 CPU,...
View ArticleSurvey Shows Enterprises Struggling with Bad Data
Last week we announced the results of a survey of over 300 enterprise data professionals conducted by Dimensional Research and sponsored by StreamSets. We were trying to understand the market’s state...
View ArticleAnnouncing Data Collector ver 1.5.0.0
We are excited to announce the release of the next version of StreamSets Data Collector. With this release we have a number of new features and enhancements and 40+ bug fixes. Automatic creation and...
View ArticleRetrieving Metrics via the StreamSets Data Collector REST API
Last week, I explained how I was able to run StreamSets Data Collector (SDC) on a Raspberry Pi 3, ingesting sensor data and writing it to Cassandra. With that working, I wanted to show pipeline metrics...
View ArticleChat with the StreamSets Team via Slack!
Since its inception last October, the sdc-user Google group has been the primary medium for you, our community, to communicate with us, the StreamSets Team. We’ve seen over a thousand messages, and...
View Article