当前位置: 动力学知识库 > 问答 > 编程问答 >

scala - saving spark rdd in ORC format

问题描述:

This question already has an answer here:

  • Converting CSV to ORC with Spark

    1 answer

网友答案:

Persisting ORC formats in persistent storage area (like HDFS) is only available with the HiveContext.

As an alternate (workaround) you can register it as temporary table. Something like this: -

DataFrame.write.mode("overwrite").orc("myDF.orc")
val orcDF = sqlCtx.read.orc("myDF.orc")
orcDF.registerTempTable("<Table Name>")
网友答案:

As for now, saving as orc can only be done with HiveContext.

so the approach will be like this :

import sqlContext.implicits._ 
val data: RDD[MyObject] = createMyData()
val sqlContext = new New Org.Apache.Spark.Sql.Hive.HiveContext(Sc)   
data.toDF.write.format("orc").save(outputPath)
分享给朋友:
您可能感兴趣的文章:
随机阅读: