I am running this query in Spark shell but it gives me error,
"select sal from samplecsv where sal < (select MAX(sal) from samplecsv)"
java.lang.RuntimeException: [1.47] failure: ``)'' expected but identifier MAX found
select sal from samplecsv where sal < (select MAX(sal) from samplecsv)
Can anybody explan me,thanks
Spark SQL should support both correlated and uncorrelated subqueries. See
SubquerySuite for details. Some examples include:
select * from l where exists (select * from r where l.a = r.c) select * from l where not exists (select * from r where l.a = r.c) select * from l where l.a in (select c from r) select * from l where a not in (select c from r)
Unfortunately as for now (Spark 2.0) it is impossible to express the same logic using
Spark < 2.0
Spark supports subqueries in the
FROM clause (same as Hive <= 0.12).
SELECT col FROM (SELECT * FROM t1 WHERE bar) t2
It simply doesn't support subqueries in the
WHERE clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.
Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using
JOIN there is no loss-of-function here.
There is a pull request to implement that feature .. my guess it might land in Spark 2.0.
Spark-sql 1.6.x accepted this query:
select * from (select * from tenmin_history order by TS_TIME DESC limit 144) a order by TS_TIME