当前位置: 动力学知识库 > 问答 > 编程问答 >

amazon web services - AWS EMR Hadoop Administration

问题描述:

We are currently using Apache Hadoop (Vanilla Version) in our org. We are planning to migrate to AWS EMR. I'm trying to understand how AWS EMR Hadoop works internally (not how to use it), I'm mainly interested in Hadoop administration steps and how master and slave communicates and various configuration configurations. I already checked the AWS EMR documentation but I don't see detailed comparison.

Can someone recommend me a link/tutorial for migrating to AWS EMR from an Apache Hadoop.

网友答案:

Amazon Elastic MapReduce uses a mostly standard implementation of Hadoop and associated tools.

See: AMI Versions Supported in Amazon EMR

The benefits of using EMR are in the automated deployment of instances. For example, launching a cluster with an appropriate AMI means that software is already loaded on each instance and HDFS is configured across the core nodes.

The Master and Slave (Core/Task) nodes communicate in exactly the normal way that they communicate in any Hadoop cluster. However, only one Master is supported (with no backup Master).

When migrating to EMR, check that you are using compatible versions of software (eg Hadoop, Hive, Pig, Impala, etc). Also consider using Amazon S3 for storage of data instead of HDFS, especially for storing source data, since data on S3 persists even after the EMR cluster is terminated.

分享给朋友:
您可能感兴趣的文章:
随机阅读: