当前位置: 动力学知识库 > 问答 > 编程问答 >

java - Hive / Hadoop / Flatfile: What is an efficient way to combine and concat rows

问题描述:

id col1 col2 ... coln

---------------------

foo barA barB ...

foo barD barX

boo barA barC

foo barC barC

I'd like to combine this into 'collapsed' rows which look like this:

foo barA;barD;barC barB;barX;barC

boo barD barC

At the moment the source document is a hive 'table', [which is essentially the same as a flat text file I suppose] - and I am wondering what is the most efficient way to accomplish this?

EDIT: related earlier question (for SQL, alas not hive) Combine multiple rows into one space separated string

网友答案:

If you are loading the data into hive from a mapreduce job you may be able to adjust that MR to do the transformation on the data for you, and load it into the table as you want it (array or ; delimited, ect)

If you are looking to be able to update/adjust the data, HIVE probably isn't the best option for that. You may want to look at HBase and do an 'aggregation' to generate the data as you want it to load into HBase. Anytime the same Key/ColumnFamily/Column value is generated, it overwrites it if it exists so it will 'update' the value. I use this in production to generate data throughout the day that constantly updates.

In either case, to manipulate the structure of large quantities of data, you will probably want to use a mapreduce job and have it do the restructuring for you.

分享给朋友:
您可能感兴趣的文章:
随机阅读: