I just started to use Spark-SQL to load data from a H2 database, here is what I did following the Spark-SQL document:
>>> sqlContext = SQLContext(sc)
>>> df = sqlContext.load(source="jdbc",driver="org.h2.Driver", url="jdbc:h2:~/test", dbtable="RAWVECTOR")
But it didn't work and gave errors, I think the problem is that the username and password are not specified in the function.
This is parameters from the document from Spark-SQL 1.3.1:
The JDBC URL to connect to.
The JDBC table that should be read. Note that anything that
is valid in a
FROM clause of a SQL query can be used. For example,
instead of a full table you could also use a subquery in
The class name of the JDBC driver needed to connect to this
URL. This class with be loaded on the master and workers before
running an JDBC commands to allow the driver to register itself with
the JDBC subsystem.
These options must all be specified if any of them is specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question.
But I didn't find any clue how to pass the database user name and password to the sqlContext.load function.
Any one has similar case or clues?
I figured it out. Just do
df = sqlContext.load( source="jdbc",driver="org.h2.Driver", url="jdbc:h2:tcp://localhost/~/test?user=sa&password=1234", dbtable="RAWVECTOR" )
And when you create the database, use same pattern:
conn = DriverManager.getConnection( "jdbc:h2:tcp://localhost/~/"+dbName+"?user=sa&password=1234", null, null );
And, here is a blog about how to use the API.