Binary Classification of Streaming Data using TensorFlow to ADLS Gen1 and ADLS Gen2

Over the past decade, digital transformation has evolved such that every system and device has a digital trail: from IT servers, to factory equipment, to consumer electronics, to buildings, to cars. Increasing data volumes, rates, and variety have created increasing complexity, not to mention these new datasets must be analyzed in real-time. Fit-for-purpose data platforms allow for storing and applying advanced analytics to unlimited raw data. Analytics can occur on edge systems, in data centers, or across cloud providers. Streaming compute platforms enable the processing of real-time data. Given the exponential growth of data rates and the time-critical demand for responses, we must consider robust approaches to analyze and deliver predictions, inferences, and/or classifications in near real-time.

The StreamSets DataOps Platform enables companies to build, execute, operate, and protect continuous dataflows that unleash pervasive analytics. It combines award-winning open source software featuring Intelligent Pipelines with a cloud-native control plane that helps enterprises manage their data movement as a continuous ingestion practice.

To provide time-critical responses for analyzing data sets, StreamSets provides the capability to create pipelines that ingest datasets or dimensions and generate predictions or classifications within a contained environment. All this without having to initiate HTTP or REST API calls to ML models served and exposed as web services. For example, StreamSets pipelines can now detect fraudulent transactions or perform natural language processing on text in real-time as data is passing through various stages before being stored in the final destination—for further processing or decision making.

Consider the use case of classifying a breast cancer tumor as being malignant or benign. The (Wisconsin) breast cancer is a classic dataset and is available as part of scikit-learn.

Note: Detailed instructions on how to train and export a TensorFlow model using this dataset and using it in your StreamSets dataflow pipelines are provided in this blog post.

Once the model is trained and exported using TensorFlow SavedModelBuilder, using it in your StreamSets dataflow pipelines for prediction or classification is pretty straightforward. Upon previewing (or executing) the pipeline, the input breast cancer records are passed through the pipeline stages including the TensorFlow model:

The final output records are sent to both Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2* (as shown above). The output includes breast cancer features used by the model for classification, model output value of 0 or 1 in user-defined field TF_Model_Classification, and respective cancer condition Benign or Malignant in field Condition created by Expression Evaluator.

For details on data preparation stages in this pipeline, checkout this detailed blog post.

Here’s a screenshot of the pipeline continuously reading patient data and classifying breast cancer tumor as being Benign or Malignant in real-time:

Demo Video

https://youtu.be/65K97KFj108

Summary

The example above illustrates the use of Machine Learning Evaluators in StreamSets. These evaluators will enable you to discover useful information in streaming data with pre-trained ML models. Generating predictions and/or classifications without having to write any custom code becomes extremely easy and at the same time provides time critical responses for analyzing streaming data sets.

*Note: Connectivity with Azure Data Lake Storage Gen2 will be available in the upcoming StreamSets Data Collector release.

The post Binary Classification of Streaming Data using TensorFlow to ADLS Gen1 and ADLS Gen2 appeared first on StreamSets: Where DevOps Meets Data Integration.

Binary Classification of Streaming Data using TensorFlow to ADLS Gen1 and ADLS Gen2

Demo Video

Summary

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112