Welcome to Sparklens Report
Understand scalability limits of your spark application by profiling and predict performance through sparklens.
What is Sparklens?
Sparklens is a profiling and performance prediction tool for Spark with built-in Spark Scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of spark applications. It helps in understanding how efficiently is a given spark application using the compute resources provided to it. May be your application will run faster with more executors and maybe it won’t. Sparklens can answer this question by looking at a single run of your application.
Sparklens reporting is just about 3 steps
Extract JSON
Extract sparklens JSON from your Spark application
Upload JSON
Upload your Sparklens JSON file here.
Get report
Get analyzed by the Qubole Sparklens and get your Sparklens report
This is how you do it!
To create a sparklens report for your spark application, you will need to pass following additional arguments to spark-submit
--packages qubole:sparklens:0.3.1-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true
--conf spark.sparklens.data.dir=/dir/for/saving/sparklens.json
You can upload this file here and get performance report for your spark application. Alternatively you can also print the report on console by using the ReporterApp which is part of sparklens package.
./bin/spark-submit --packages qubole:sparklens:0.3.1-s_2.11
--class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg "filename"