当前位置: 动力学知识库 > 问答 > 编程问答 >

apache pig - Loading data into hive and then analyse it from pig using hcatalogue. Does this seems to be good idea?

问题描述:

Lets say we have JSON data and we want to generate some results for business users.So does following seems to be good approach?

Loading data into hive from HDFS and then analyse it from pig using hcatalog. I have below question in this regards.

Q. Is it ok to load data from hcatalog and analyse it into pig, will this have performance overhead compare to directly read data from pig by keeping it into the HDFS.

网友答案:

I would personally prefer to do ETL using Pig.In your case JSON data can be loaded using JsonLoader and can be stored using JsonStorage.So I would load the data using Jsonloader and then store them in csv.Then I would use Hive to analyze this data.

JSON load

http://joshualande.com/read-write-json-apache-pig/

Alternative we can use twitter elephantbird json loader

http://eric.lubow.org/2011/hadoop/pig-queries-parsing-json-on-amazons-elastic-map-reduce-using-s3-data/

分享给朋友:
您可能感兴趣的文章:
随机阅读: