We want to analyze using hive on the below type of data. Below are the challenges.
Source data are flat files from different sources.Multiple source file on daily basis.
There is no fixed columns (each files have different columns).
Each file have very large number of rows.
No:of columns,order of the column are diffrent.
each field will be comma seperated, but field value might have quotes ("").
Please suggest what would be the ideal aproch in this. Load to hbase and create hive table on top of that? or is it possible to create hive table with dynamic schema?