I have a csv file full of float numbers encoded with a comma rather than a dot.
I made a pig loading script precising the float type, but when reading the comma, pig can"t convert that to a float (he expect float to have dots).
How could I change the commas by dot in the loading phase ?
I understand a UDF could make the trick, but is there another simpler way ?
Okay, tested this just for fun with some simple data:
data = load 'commatest.csv' using PigStorage(';') as (f1:chararray, f2:chararray, f3:chararray); replaced = foreach data generate REPLACE(f1, ',', '.') as f1dot, REPLACE(f2, ',', '.') as f2dot, REPLACE(f3, ',', '.') as f3dot; fdata = foreach replaced generate (float)f1dot as f1, (float)f2dot as f2, (float)f3dot as f3; dump fdata;
To test if it is really converted to float:
test = foreach fdata generate f1*f2*f3; dump test;