当前位置: 动力学知识库 > 问答 > 编程问答 >

Hadoop Streaming - Perl module dependency

问题描述:

When using Perl script as mapper & reducer in Hadoop streaming, how we can manage perl module dependencies.

I want to use "Net::RabbitMQ" in my perl mapper & reducer script.

Is there any standard way in perl/hadoop streaming to handle dependencies similar to the DistributedCache (for Hadoop java MR)

网友答案:

There are a couple of ways to handle dependencies including specifying a custom library path or creating a packed binary of your Perl application with PAR::Packer. There are some examples of how to accomplish these tasks in the Examples section of the Hadoop::Streaming POD, and the author includes a good description of the process, as well as some considerations for the different ways to handle dependencies. Note that the suggestions provided in the Hadoop::Streaming documentation about handling Perl dependencies are not specific to that module.

Here is an excerpt from the documentation for Hadoop::Streaming (there are detailed examples therein, as previously mentioned):

All perl modules must be installed on each hadoop cluster machine. This proves to be a challenge for large installations. I have a local::lib controlled perl directory that I push out to a fixed location on all of my hadoop boxes (/apps/perl5) that is kept up-to-date and included in my system image. Previously I was producing stand-alone perl files with PAR::Packer (pp), which worked quite well except for the size of the jar with the -file option. The standalone files can be put into hdfs and then included with the jar via the -cacheFile option.

分享给朋友:
您可能感兴趣的文章:
随机阅读: