ApacheCon is coming up, and within that massive conference there will be a glimmering gem: a forum dedicated to Spark. Reynold Xin is organizing it, and he shared some…
When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark…
In October 2014, Databricks participated in the Sort Benchmark and set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-byte records. The team used…