This month’s IntelliJ HDInsight Tools release delivers a robust remote debugging engine for Spark running in the Azure cloud. The Azure Toolkit for IntelliJ is available for users running Spark to perform interactive remote debugging directly against code running in HDInsight.
Debugging big data applications is a longstanding pain point. The data-intensive, distributed, scalable computing environment in which big data apps run is inherently difficult to troubleshoot, and this is no different for Spark developers. There is little tooling support for debugging such scenarios, leaving developers with manual, brute-force approaches that are cumbersome, and come with limitations. Common approaches include local debugging against sample data which poses limitations on data size; analysis of log files after the app has completed, requiring manual parsing of unwieldy log files; or use of a Spark shell for line by line execution, which does not support break points.