Welcome to Sparklens Report

Understand scalability limits of your spark application by profiling and predict performance through sparklens.

Check us out on Github

What is Sparklens?

Sparklens is a profiling and performance prediction tool for Spark with built-in Spark Scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of spark applications. It helps in understanding how efficiently is a given spark application using the compute resources provided to it. May be your application will run faster with more executors and maybe it won’t. Sparklens can answer this question by looking at a single run of your application.

Sparklens reporting is just about 3 steps

Extract JSON

Extract sparklens JSON from your Spark application

Upload JSON

Upload your Sparklens JSON file here.

Get report

Get analyzed by the Qubole Sparklens and get your Sparklens report

This is how you do it!

To create a sparklens report for your spark application, you will need to pass following additional arguments to spark-submit

--packages qubole:sparklens:0.3.1-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true
--conf spark.sparklens.data.dir=/dir/for/saving/sparklens.json

You can upload this file here and get performance report for your spark application. Alternatively you can also print the report on console by using the ReporterApp which is part of sparklens package.

./bin/spark-submit --packages qubole:sparklens:0.3.1-s_2.11 
        --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg "filename"

Get your Spark report now!

How to get json?