Ingesting data into Couchbase using StreamSets Data Collector

Nick Cadenhead, a Senior Consultant at 9th BIT Consulting in Johannesburg, South Africa, uses Couchbase Server to power analytics solutions for his clients. In this blog entry, reposted from his article at LinkedIn, Nick explains why he selected StreamSets Data Collector for data ingest, and how he extended it with a custom destination to write data to Couchbase.

For some time, I have been working with the Couchbase NoSQL database solution and it’s been an interesting journey so far.

Historically, I’m not a database guy, so I’ve not worked much with databases in terms of designing, building and maintaining them as a full time job. However, I do know the basics. This position has allowed me to get into the “mindset” of NoSQL concepts like no structures, no transactions, denormalizing of data and more without having many conflicting situations with the paradigms of the structured world of SQL and relational databases.

So during my sales engineering activities supporting Couchbase proof of concepts (POC) engagements, there is always a requirement to ingest data into a Couchbase bucket (think of a bucket as a relational database) in order to demonstrate and highlight the features and capabilities of Couchbase.

Usually data ingestion requires some code to be written to ingest data into Couchbase. Couchbase provides quite a few SDKs (Java, .Net, Node JS and more) for developers to enable their applications to use Couchbase.

So this got me thinking. Why can’t there be a standard way, or tool for that matter, to ingest data into Couchbase instead of writing code all the time?

Don’t get me wrong. There’s nothing wrong with writing code!!!

Then I came across StreamSets Data Collector (SDC).

SDC is an open source platform for the ingestion of streaming and batch data into big data stores. It features a graphical web-based console for configuring data “pipelines” to handle data flows from origins to destinations, monitoring runtime dataflow metrics and automating the handling of data drift.

Data pipelines are constructed in the web-based console via a drag and drop process. Pipelines connect to origins (sources) and ingest data into destinations (targets). Between origins and destinations are processor steps which are essentially data transformation steps for doing field masking, field evaluating, looking up data in a database or external cloud services such as Salesforce, evaluating expressions on fields, routing data, evaluating/manipulating data using JavaScript, Jython or Groovy, and many more.

Couchbase integration with StreamSets
A StreamSets Data Collector pipeline ingesting data into a Couchbase Bucket

Thus SDC is a great option for my data ingestion needs. It’s open source and available to download immediately. There are a large number of technologies supported for data ingestion ranging from databases to flat files, logs, HTTP services and big data platforms like Hadoop, MongoDB and cloud platforms like Salesforce. But there was one problem. Couchbase was not on the list of technology data connectors available for SDC. No problem! I decided to write my own connector for Couchbase.

Leveraging the Data Connector Java-based API available for the open community to extend the integration capabilities of SDC, together with the online documentation and guides, I was able to implement a data connector very quickly for Couchbase. The initial build of the connector is very simple; just ingest JSON data into a Couchbase bucket. Over time the connector will be expanded to query a Couchbase bucket, better ingestion capabilities and more. For now, it serves my needs.

One of the added benefits with SDC is data pipeline analytics. The analytics features in the SDC console give users an insight into how data is flowing from origins to destinations. The standard visualizations in the SDC console give detailed analysis on the performance of the data pipeline. The analysis of the pipeline showed me very quickly how my data was being ingested into the Couchbase Buckets and highlighted any errors which occurred throughout the stages of the data pipeline.

So by using data pipelines in SDC, they allow me to ingest data very quickly into Couchbase without writing much or any code at all.

The data connector is open of course at GitHub.

Please feel free to contact me if you have any questions on StreamSets Data Collector, Couchbase or the code of the Couchbase connector.

Nick’s Couchbase destination currently targets StreamSets Data Collector API version 1.2.2.0. We at StreamSets are working with Nick to update it for the upcoming 2.3.0.0 release and ultimately add it to StreamSets Data Collector as a supported integration.

The post Ingesting data into Couchbase using StreamSets Data Collector appeared first on StreamSets.

Ingesting data into Couchbase using StreamSets Data Collector

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112