Selvaraaju (‘Selva') Murugesan is Senior Manager for Innovation and Data Analytics in the Australian Capital Territory (ACT) Government. Selva focuses on data management practices and data analytics, using StreamSets Data Collector to extract data from different databases, perform data cleansing on the fly and push data to the ACT Government's Open Data Portal. Over the past few months, Selva has assembled a short playlist of videos demonstrating various aspects of Data Collector. From the basics of installation to advanced topics such as configuring impersonation for MapR-FS, Selva's videos provide a great introduction to Data Collector. We're excited to feature them in this blog post!
Installing StreamSets with MapR
In this first video, Selva installs Data Collector on Red Hat Enterprise Linux 7 via the full RPM package, configures Data Collector to work with MapR, and sets up an admin user.
Documentation
Installing MapR Libraries for StreamSets Data Collector
Selva installs the necessary libraries for Data Collector to integrate with MapR 6.0.0.
Documentation
Configuring Impersonation for MapR-FS
By default, Data Collector will write to MapR-FS as the currently logged in Data Collector user, however, it is possible to configure MapR-FS impersonation so that data is written as the user configured in the MapR-FS destination settings.
Documentation
Reading and Writing Data to the Local File System
Selva creates a simple pipeline to read CSV data from a local file, remove most of the fields, and writes it back to another local file.
Documentation
Masking Fields in the Pipeline
Data engineers often need to mask sensitive data when moving it between systems. Here, Selva shows how to use Data Collector's Field Masker processor.
Documentation
Ingesting Data from a Web Service
In what is currently the last video in the series, Selva shows how Data Collector can read CSV data from a web service and write it to a local file.
Documentation
Conclusion
Many thanks to Selva for his permission to share these videos!
Is there an aspect of StreamSets Data Collector, or any of the StreamSets products, that you would like to see demonstrated in a video? Let us know in the comments!
The post How the ACT Government Uses Data Collector w/ MapR (videos) appeared first on Continuous Dataflows Built with StreamSets DataOps Platform.