当前位置: 动力学知识库 > 问答 > 编程问答 >

sql - Does SparkSQL support subquery?

问题描述:

I am running this query in Spark shell but it gives me error,

sqlContext.sql(

"select sal from samplecsv where sal < (select MAX(sal) from samplecsv)"

).collect().foreach(println)

error:

java.lang.RuntimeException: [1.47] failure: ``)'' expected but identifier MAX found

select sal from samplecsv where sal < (select MAX(sal) from samplecsv)

^

at scala.sys.package$.error(package.scala:27)

Can anybody explan me,thanks

网友答案:

Spark 2.0+

Spark SQL should support both correlated and uncorrelated subqueries. See SubquerySuite for details. Some examples include:

select * from l where exists (select * from r where l.a = r.c)
select * from l where not exists (select * from r where l.a = r.c)

select * from l where l.a in (select c from r)
select * from l where a not in (select c from r)

Unfortunately as for now (Spark 2.0) it is impossible to express the same logic using DataFrame DSL.

Spark < 2.0

Spark supports subqueries in the FROM clause (same as Hive <= 0.12).

SELECT col FROM (SELECT *  FROM t1 WHERE bar) t2

It simply doesn't support subqueries in the WHERE clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.

Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using JOIN there is no loss-of-function here.

网友答案:

https://issues.apache.org/jira/browse/SPARK-4226

There is a pull request to implement that feature .. my guess it might land in Spark 2.0.

网友答案:

Spark-sql 1.6.x accepted this query:

select * from (select * from tenmin_history order by TS_TIME DESC limit 144) a order by TS_TIME
分享给朋友:
您可能感兴趣的文章:
随机阅读: