Analyzing Salesforce Data with StreamSets, Elasticsearch, and Kibana

After I published a proof-of-concept Salesforce Origin for StreamSets Data Collector (SDC), I noticed an article on the Elastic blog, Analyzing Salesforce Data with Logstash, Elasticsearch, and Kibana. In the blog entry, Elastic systems architect Russ Savage (now at Cask Data), explains the motivation for ingesting Salesforce data into Elasticsearch:

Working directly with sales and marketing operations, we outlined a number of challenges they had that might be solved with this solution. Those included:

Interactive time-series snapshot analysis across a number of dimensions. By sales rep, by region, by campaign and more.

Which sales reps moved the most pipeline the day before the end of month/quarter? What was the progression of Stage 1 opportunities over time.

Correlating data outside of Salesforce (like web traffic) to pipeline building and demand. By region/country/state/city and associated pipeline.

It’s very challenging to look back in time and see trends in the data. Many companies have configured Salesforce to save reporting snapshots, but if you’re like me, you want to see the data behind the aggregate report. I want the ability to drill down to any level of detail, for any timeframe, and find any metric. We found that Salesforce snapshots just aren’t flexible enough for that.

Since we have first-class support for Elasticsearch as a destination in SDC, I decided to recreate the use case with the Salesforce Origin and see if we could fulfill those same requirements while taking advantage of StreamSets’ interactive pipeline IDE and ability to continuously monitor origins for new data.

Querying Salesforce from StreamSets Data Collector

Currently, the Salesforce Origin is available via a GitHub repo, but the plan is to move it into SDC proper at some point in the near future. To install the origin, extract the tarball into the SDC user-libs directory:

$ cd path-to-sdc/user-libs
$ tar xvfz force-lib-1.0-SNAPSHOT.jar

Since the origin needs to connect to the Salesforce API, we must edit the SDC security policy file. Add a section to path-to-sdc/etc/sdc-security.policy:

grant codebase "file://${sdc.dist.dir}/user-libs/force-lib/-" {
    permission java.net.SocketPermission "*", "connect, resolve";
};

(Curious about why we need the * wildcard in the policy? Here’s the answer)

Once you’ve saved sdc-security.policy, restart SDC, and you should see the new origin in the processor palette:

Force.com Origin Drop it into a new pipeline and you’ll be able to configure it with your Salesforce credentials, the SOQL query you wish to use, and so on:

Force.com Origin Config For this example, we want to continuously retrieve existing records from Salesforce, updating their state in Elasticsearch, so we’ll run the origin in ‘full’ mode, rather than ‘incremental’, and poll Salesforce every few minutes. We’ll also specify a subset of fields to retrieve from Salesforce in the SOQL query:

Force Origin Config Detail

Pushing the Data to Elasticsearch

To keep things simple for this example, we’ll just read from Salesforce and write to Elasticsearch, though we could drop any number of processors into the pipeline to filter or enrich the data:

Force and Elasticsearch

Before we run the pipeline, we need to send index and mapping metadata to Elasticsearch, so it knows how to work with our opportunity records. Here’s the metadata for our example:

curl -XPUT 'http://localhost:9200/opportunities' -d '{
  "mappings": {
    "opportunities" : {
      "properties" : {
        "Id": {
          "type": "string",
          "index": "not_analyzed"
        },
        "AccountId": {
          "type": "string",
          "index": "not_analyzed"
        },
        "Amount": {
          "type": "double"
        },
        "CloseDate": {
          "type": "date"
        },
        "LeadSource": {
          "type": "string",
          "index": "not_analyzed"
        },
        "Name": {
          "type": "string"
        },
        "OwnerId": {
          "type": "string",
          "index": "not_analyzed"
        },
        "StageName": {
          "type": "string",
          "index": "not_analyzed"
        },
        "TotalOpportunityQuantity": {
          "type": "double"
        },
        "Type": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}'

Note that we set picklist fields such as StageName and LeadSource to not_analyzed. We want to segment the data on these fields, but we don’t want to index the text in them.

We configure the Elasticsearch Destination with opportunities as its index and mapping, matching the metadata we uploaded, and set the Document ID field to ${record:value('/Id')}. Notice that Document ID is an expression, not a field name. If you set it to /Id, you’ll find that you end up with a single record in Elasticsearch, with index /Id. I know this from first-hand experience!

Finally, we check Enable Upsert, so records will automatically be updated or inserted into Elasticsearch based on the Document ID:

Elasticsearch Config Now we can run the pipeline, reading data from Salesforce and writing it to Elasticsearch:

Pipeline Running As a quick check, we can run a count query on Elasticsearch and see that all of our 483 opportunities have indeed been received:

$ curl -X GET 'http://localhost:9200/opportunities/_count'
{"count":483,"_shards":{"total":5,"successful":5,"failed":0}}

Visualizing the Data with Kibana

Once the data is in Elasticsearch, we can use Kibana for visualization. Here’s a vertical bar chart of opportunities from the last six months, segmented by stage:

Kibana 1

Since my pipeline is running continuously, I can add a new opportunity in Salesforce and quickly see the results in Kibana. The new record shows up as the purple bar on the far right:

Kibana 2

Conclusion

StreamSets Data Collector and its Salesforce Origin allow you to quickly extract data from Salesforce for ingest by a wide range of destinations. In this example, we focused on the process of writing opportunity data to Elasticsearch, but we could have filtered or enriched the data in the pipeline, or even sent it to multiple destinations, all from the drag and drop UI. Download StreamSets Data Collector today and build your next big data ingest pipeline in minutes!

The post Analyzing Salesforce Data with StreamSets, Elasticsearch, and Kibana appeared first on StreamSets.

Analyzing Salesforce Data with StreamSets, Elasticsearch, and Kibana

Querying Salesforce from StreamSets Data Collector

Pushing the Data to Elasticsearch

Visualizing the Data with Kibana

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112