当前位置: 动力学知识库 > 问答 > 编程问答 >

java - How to put a serialized object into the Hadoop DFS and get it back inside the map function ?

问题描述:

I'm new to hadoop and recently I was asked to do a test project using Hadoop.

So while I was reafing BigData, happened to know about Pail. Now what I want to do is something like this. First create a simple object and then serialize it using Thrift and put that into the Hdfs using Pail. Then I want to get that object inside the map function and do what ever I want. But I have no idea on getting tat object inside the map function.

Can someone please tell me of any references or explain how to do that ?

Thanx

网友答案:

I can think of three options:

  1. Use the -files option and name the file in HDFS (preferable as the task tracker will download the file once for all jobs running on that node)
  2. Use the DistributedCache (similar logic to the above), but you configure the file via some API calls rather than through the command line
  3. Load the file directly from HDFS (less efficient as you're pulling the file over HDFS for each task)

As for some code, put the load logic into your mapper's setup(...) or configure(..) method (depending on whether you're using the new or old API) as follows:

protected void setup(Context context) {
    // the -files option makes the named file available in the local directory
    File file = new File("filename.dat");
    // open file and load contents ...

    // load the file directly from HDFS
    FileSystem fs = FileSystem.get(context.getConfiguration());
    InputStream hdfsInputStream = fs.open("/path/to/file/in/hdfs/filename.dat");
    // load file contents from stream...
}

DistributedCache has some example code in the Javadocs

分享给朋友:
您可能感兴趣的文章:
随机阅读: