Visualizing and Analyzing Salesforce Data with Neo4j
Graph databases represent and store data in terms of nodes, edges and properties, allowing quick, easy retrieval of complex hierarchical structures that may be difficult to model in traditional...
View ArticleEmbrace Diversity in Your Data Architecture
Many Roads Lead to Rome Over the last ten years, the data management landscape has changed dramatically — on that, I think we can all agree. The rise of big data and the new data management ecosystem...
View ArticleAnnouncing Data Collector ver 2.6.0.0
We are excited to announce version 2.6 of StreamSets Data Collector. This release has important functionality focused on helping customers to modernize their enterprise data warehouses on Hadoop,...
View ArticleIntroducing the Data Collector Support Bundle
Hi, my name is Wagner Camarao and I'm a Software Engineer at StreamSets focusing on the user-facing aspects of our products. Today I'm going to talk about a new feature in the StreamSets Data Collector...
View ArticleTriggering Databricks Notebook Jobs from StreamSets Data Collector
Last December, I covered Continuous Data Integration with StreamSets Data Collector and Spark Streaming on Databricks. In StreamSets Data Collector (SDC) version 2.5.0.0 we added the Spark Executor,...
View ArticleCache Salesforce Data in Redis with StreamSets Data Collector
Redis is an open-source, in-memory, NoSQL database implementing a networked key-value store with optional persistence to disk. Perhaps the most popular key-value database, Redis is widely used for...
View ArticleScaling out StreamSets with Kubernetes
In today’s microservice revolution, where software applications are designed as independent services that work together, two technologies stand out. Docker, the defacto standard for containerization,...
View ArticleAnnouncing Data Collector v2.7.0.0
We have discovered a regression with the 2.7.0.0 release build and have decided to remove the build from distribution. The bug has been fixed and we are in the process of releasing an update as soon...
View ArticleFast, Easy Access to Secure Kafka Clusters
It’s simple to connect StreamSets Data Collector (SDC) to Apache Kafka through the Kafka Consumer Origin and Kafka Producer Destination connectors. And because those connectors support all Kafka Client...
View ArticleAnnouncing Data Collector v2.7.1.0
We are happy to announce version 2.7.1.0 of StreamSets Data Collector. This release has a number of new features, improvements and bug fixes. For a list of all our new features, please see What's New....
View ArticleGetting Started with StreamSets Data Collector on Docker
‘Simplicity is the ultimate sophistication.’ – Leonardo da Vinci As a recent hire on the Engineering Productivity team here at StreamSets, my early days at the company were marked by efforts to dive...
View ArticleAsk StreamSets: Questions and Answers for the StreamSets Community
It's fair to say that most developers are familiar with Stack Overflow and the Stack Exchange network of question and answer sites. Q&A sites such as Stack Overflow serve communities of users...
View ArticleStraight from Our Customers: The Benefits of Modern Ingestion
Three months into my journey here at StreamSets and I’ve had a chance to talk with many of our customers and prospects to understand how they are using the open source StreamSets Data Collector (SDC)...
View ArticleGetting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets,...
This post was originally published on the Cloudera VISION blog by Sam Heywood. StreamSets configurations and images of Apache Spot Open Data Model ingest pipelines can be found here on Github. A...
View ArticleEvolving Avro Schemas with Apache Kafka and StreamSets Data Collector
Apache Avro is widely used in the Hadoop ecosystem for efficiently serializing data so that it may be exchanged between applications written in a variety of programming languages. Avro allows data to...
View ArticleHow to Convert Apache Sqoop™ Commands Into StreamSets Data Collector Pipelines
When it comes to loading data into Apache Hadoop™, the de facto choice for bulk loads of data from leading relational databases is Apache Sqoop™. After initially entering Apache Incubator status in...
View ArticleFun with FileRefs – Manipulating Whole File Data
As well as parsing incoming data into records, many StreamSets Data Collector (SDC) origins can be configured to ingest Whole Files. The blog entry Whole File Transfer with StreamSets Data Collector...
View ArticleBulk Loading Data into Snowflake Data Warehouse
Mike Fuller, a consultant at Red Pill Analytics, has been busy integrating an Oracle RDS database with Snowflake's cloud data warehouse via StreamSets Data Collector. His blog post on bulk loading data...
View ArticleAnnouncing StreamSets Data Collector version 3.0
Version 3.0 marks an important new milestone for StreamSets. With close to a million downloads and a strong community and customer base, we are very excited to offer a host of powerful new capabilities...
View ArticleAnnouncing StreamSets Data Collector Edge
Today an increasing amount of data is being generated from outside the data center or cloud – it isn’t always easy to get this data out of source systems or perform analytics right where it’s...
View Article