当前位置: 动力学知识库 > 问答 > 编程问答 >

hadoop - how to use header(first row) as field names in Pig

问题描述:

Given a csv file with first row which can be taken as the header, how can one load the field names dynamically in Pig using these headers? i.e.

id,year,total

1,1999,190

2,1998,20

a = LOAD '/path/to/file.csv' USING PigStorage() AS --use first row as field names

> describe a;

> id:bytearray,year:bytearray,total:bytearray

网友答案:

As this is a CSV file and you want to use first row as a header, you should use CSVLoader() for it.It will treat first row as header. Your script will be like this.

--Register the piggybank jar
REGISTER piggybank.jar
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader();  

A = LOAD '/path/to/file.csv' using CSVLoader AS(id:int,year:chararray,total:int);
分享给朋友:
您可能感兴趣的文章:
随机阅读: