当前位置: 动力学知识库 > 问答 > 编程问答 >

hadoop - Apache Zeppelin running on Spark Cluster and YARN

问题描述:

I have created and ran a %pyspark program in Apache Zeppelin running on a Spark Cluster with yarn-client. The program is reading a file in a Dataframe from HDFS and does a simple groupby command and prints the output successfully. I am using Zeppellin version 0.6.2 and Spark 2.0.0 .

I can see the job running in YARN(see application_1480590511892_0007):

But when I check the Spark UI at the same time there is nothing at all for this job:

Question 1: Shouldn't this job appear in both of these windows?

Also, the completed applications in the SparkUI image just above, were Zeppelin jobs with the %python interpreter simply initializing a SparkSession and stopping it:

1st Zeppelin block:

%python

from pyspark.sql import SparkSession

from pyspark.sql import Row

import collections

spark = SparkSession.builder.appName("SparkSQL").getOrCreate()

2nd Zeppelin block:

 %python

spark.stop()

Question 2: This job in turn, has not appeared in the YARN UI. Is it the case that whenever a job appears in the SparkUI means that it is running with Spark Resource manager?

Any insights for these questions are highly appreciated.

分享给朋友:
您可能感兴趣的文章:
随机阅读: