当前位置: 动力学知识库 > 问答 > 编程问答 >

hadoop streaming don't untar archives

问题描述:

when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck.

Here is the hadoop streaming task starting command with hadoop-2.5.2, very simple

hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \

-files mapper.sh

-archives /home/hadoop/tmp/test.tgz#test \

-D mapreduce.job.maps=1 \

-D mapreduce.job.reduces=1 \

-input "/test/test.txt" \

-output "/res/" \

-mapper "sh mapper.sh" \

-reducer "cat"

and "mapper.sh"

cat > /dev/null

ls -l test

exit 0

in "test.tgz" there is two files "test.1.txt" and "test.2.txt"

echo "abcd" > test.1.txt

echo "efgh" > test.2.txt

tar zcvf test.tgz test.1.txt test.2.txt

the output from above task

lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

but what desired may be like this

-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt

-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt

so, why test.tgz has not been untarred automatically as document says, and is there any other way makes the "tgz" being untarred

any help please, thanks

网友答案:

My mistake. After submitted a issue to hadoop.apache.org. I've been told that hadoop actually has already untarred the test.tgz.

Although the name is still test.tgz, but it's an after untarred direcotry. So the files can be read like "cat test/test.1.txt"

网友答案:

This will untarred tar -zxvf test.tgz

分享给朋友:
您可能感兴趣的文章:
随机阅读: