At the Strata + Hadoop World 2017 Conference in San Jose, we have announced the Spark to DocumentDB Connector. It enables real-time data science, machine learning, and exploration over globally distributed data in Azure DocumentDB. Connecting Apache Spark to Azure DocumentDB accelerates our customer’s ability to solve fast-moving data science problems, where data can be quickly persisted and queried using DocumentDB. The Spark to DocumentDB connector efficiently exploits the native DocumentDB managed indexes and enables updateable columns when performing analytics, push-down predicate filtering against fast-changing globally-distributed data, ranging from IoT, data science, and analytics scenarios. The Spark to DocumentDB connector uses the Azure DocumentDB Java SDK. You can get started today and download the Spark connector from GitHub!
Read about it on the Azure blog.