How DataOps is Adding Value to Data Lakes
For those of you who joined us on June 6th, you dialed into a forward-thinking conversation between three industry experts. They waxed poetic about topics including big data, DataOps, governance, data...
View ArticleAnnouncing StreamSets Data Collector 3.10.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.10.0 and StreamSets Data Collector Edge 3.10.0. StreamSets Data Collector is open source under Apache License...
View ArticleAutomating Kerberos KeyTab Generation for Kubernetes-based Deployments
A major challenge when deploying dataflow pipelines to run on Kubernetes is how to handle Kerberos principals and keytabs needed when pipelines write to secure Hadoop. One approach, of using Kerberos...
View ArticleStreamSets Transformer is Here
The emerging practice of DataOps encompasses many activities that an enterprise may execute today including ingestion, ETL, and real-time stream processing. The trick to DataOps is to execute these...
View ArticleStreamSets Transformer Extensibility: Spark and Machine Learning
Apache Spark has been on the rise for the past few years and it continues to dominate the landscape when it comes to in-memory and distributed computing, real-time analysis and machine learning use...
View ArticleStreamSets Transformer Extensibility — Part 2: Spark MLeap Bundles to S3
In part 1, you learned how to extend StreamSets Transformer in order to train Spark ML RandomForestRegressor model. In this part 2, you will learn how to create Spark MLeap bundle to serialize the...
View ArticleThe StreamSets Cloud Beta is Open for Participation!
Today we are opening the StreamSets Cloud Beta program, inviting you to experience and give feedback on the latest addition to the StreamSets product family. StreamSets Cloud is a cloud service for...
View ArticleStreamSets Cloud Unlocking Insights: Amazon S3 to Snowflake
StreamSets Cloud is a cloud service for designing, deploying and operating smart data pipelines, combining ease and scalability with the flexibility to execute pipelines anywhere – on-premise, or in a...
View ArticleAnnouncing StreamSets Data Collector 3.11.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.11.0 and StreamSets Data Collector Edge 3.11.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer: Your Questions Answered
StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, is the newest addition to the StreamSets DataOps Platform. StreamSets enables...
View ArticleStreamSets Data Collector: Simple Network Management Protocol And Management...
This is a guest post by Clark Bradley, Solutions Engineer, StreamSets SNMP stands for simple network management protocol and allow for network devices to share information. SNMP is supported across a...
View ArticleIngest data into Azure Synapse Analytics (formerly SQL DW) with StreamSets Cloud
Azure Synapse Analytics, the next evolution of Azure SQL Data Warehouse, combines enterprise data warehousing and big data analytics into a single analytics service. StreamSets Cloud‘s new Azure SQL...
View ArticleStreamSets Transformer: Design Patterns For Slowly Changing Dimensions
In this blog, we will look at a few design patterns for Slowly Changing Dimensions (SCD) Type 2 and see how StreamSets Transformer, the newest addition to the StreamSets DataOps Platform, makes it easy...
View ArticleAnnouncing StreamSets Data Collector 3.12.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.12.0 and StreamSets Data Collector Edge 3.12.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer: Natural Language Processing in PySpark
In two of my previous blogs I illustrated how easily you can extend StreamSets Transformer using Scala: 1) to train Spark ML RandomForestRegressor model, and 2) to serialize the trained model and save...
View ArticleStreamSets Transformer Extensibility — Part 2: Spark MLeap Bundles to S3
In part 1, you learned how to extend StreamSets Transformer in order to train Spark ML RandomForestRegressor model. In this part 2, you will learn how to create Spark MLeap bundle to serialize the...
View ArticleThe StreamSets Cloud Beta is Open for Participation!
Today we are opening the StreamSets Cloud Beta program, inviting you to experience and give feedback on the latest addition to the StreamSets product family. StreamSets Cloud is a cloud service for...
View ArticleStreamSets Cloud Unlocking Insights: Amazon S3 to Snowflake
StreamSets Cloud is a cloud service for designing, deploying and operating smart data pipelines, combining ease and scalability with the flexibility to execute pipelines anywhere – on-premise, or in a...
View ArticleAnnouncing StreamSets Data Collector 3.11.0 and StreamSets Data Collector...
StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.11.0 and StreamSets Data Collector Edge 3.11.0. StreamSets Data Collector is open source under Apache License...
View ArticleStreamSets Transformer: Your Questions Answered
StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, is the newest addition to the StreamSets DataOps Platform. StreamSets enables...
View Article