当前位置: 动力学知识库 > 问答 > 编程问答 >

dictionary - Scala RDD[String] to RDD[String,String]

问题描述:

I have a RDD[String] which contains following data:

data format : ('Movie Name','Actress Name')

('Night of the Demons (2009) (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')

('The Bad Lieutenant: Port of Call - New Orleans (2009) (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')

('"Please Like Me" (2013) {All You Can Eat (#1.4)}', '$haniqua')

('"Please Like Me" (2013) {French Toast (#1.2)}', '$haniqua')

('"Please Like Me" (2013) {Horrible Sandwiches (#1.6)}', '$haniqua')

I want to convert this to RDD[String,String] such as first element within ' ' will be my first String in RDD and second element within ' ' will be my second String in RDD.

I tried this:

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"

val splitRdd = rdd1.map( line => line.split(",") )

splitRdd.foreach(println)

but it's giving me an error as :

[Ljava.lang.String;@7741fb9

[Ljava.lang.String;@225f63a5

[Ljava.lang.String;@63640bc4

[Ljava.lang.String;@1354c1de

网友答案:

[Ljava.lang.String;@7741fb9 is not an error, This is wt is printed when you try to print an array.

[ - an single-dimensional array

L - the array contains a class or interface

java.lang.String - the type of objects in the array

@ - joins the string together

7741fb9 the hashcode of the object.

To print String array you can try this code:

import scala.runtime.ScalaRunTime._
splitRdd.foreach(array => println(stringOf(array)))

Source

网友答案:

It's not an error. we could also use flatMap() here to avoid confusion,

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
rdd1.flatMap( line => line.split(",")).foreach(println)

Here, The input function to map returns a single element (array), while the flatMap returns a list of elements (0 or more). Also, the output of the flatMap is flattened.

网友答案:

Since it is csv file with field-enclosed & row-enclosed, you need to read the file using regular expressions. Simple split doesn't work.

网友答案:

Try this to convert RDD[String] to RDD[String,String]

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => (line.split(",")(0), line.split(",")(1)) )

The above line returns the rdd as key, value pair [Tuple] RDD.

分享给朋友:
您可能感兴趣的文章:
随机阅读: