Go Back

what do you understand schemardd ?

All Articles

#spark #schemardd #schema #spark schemardd , #schemardd in spark

what do you understand schemardd ?

what do you understand schemardd ?

A SchemaRDD can be registered as a table in the SQLContext that was used to create it. Once an RDD has been registered as a table, it can be used in the FROM clause of SQL statements.

// One method for defining the schema of an RDD is to make a case class with the desired column
// names and types.
case class Record(key: Int, value: String)

val sc: SparkContext // An existing spark context.
val sqlContext = new SQLContext(sc)

// Importing the SQL context gives access to all the SQL functions and implicit conversions.
import sqlContext._

val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
// Any RDD containing case classes can be registered as a table.  The schema of the table is
// automatically inferred using scala reflection.

val results: SchemaRDD = sql("SELECT * FROM records")

Spark SQL allows relational queries expressed in SQL, HiveQL, or Scala to be executed using Spark. At the core of this component is a new type of RDD, SchemaRDD. SchemaRDDs are composed Row objects along with a schema that describes the data types of each column in the row. A SchemaRDD is similar to a table in a traditional relational database. A SchemaRDD can be created from an existing RDD, Parquet file, a JSON dataset, or by running HiveQL against data stored in Apache Hive.



Here we see the how to create schemardd in spark ....

This Solution is provided by Shubham mishra

This article is contributed by Developer Indian team. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Also folllow our instagram , linkedIn , Facebook , twiter account for more....