This question already has an answer here:
Converting CSV to ORC with Spark
Persisting ORC formats in persistent storage area (like HDFS) is only available with the HiveContext.
As an alternate (workaround) you can register it as temporary table. Something like this: -
DataFrame.write.mode("overwrite").orc("myDF.orc") val orcDF = sqlCtx.read.orc("myDF.orc") orcDF.registerTempTable("<Table Name>")
As for now, saving as orc can only be done with HiveContext.
so the approach will be like this :
import sqlContext.implicits._ val data: RDD[MyObject] = createMyData() val sqlContext = new New Org.Apache.Spark.Sql.Hive.HiveContext(Sc) data.toDF.write.format("orc").save(outputPath)