@@ -461,6 +461,8 @@ name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you can al
461461names (` json ` , ` parquet ` , ` jdbc ` , ` orc ` , ` libsvm ` , ` csv ` , ` text ` ). DataFrames loaded from any data
462462source type can be converted into other types using this syntax.
463463
464+ To load a JSON file you can use:
465+
464466<div class =" codetabs " >
465467<div data-lang =" scala " markdown =" 1 " >
466468{% include_example manual_load_options scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
@@ -479,6 +481,26 @@ source type can be converted into other types using this syntax.
479481</div >
480482</div >
481483
484+ To load a CSV file you can use:
485+
486+ <div class =" codetabs " >
487+ <div data-lang =" scala " markdown =" 1 " >
488+ {% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
489+ </div >
490+
491+ <div data-lang =" java " markdown =" 1 " >
492+ {% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
493+ </div >
494+
495+ <div data-lang =" python " markdown =" 1 " >
496+ {% include_example manual_load_options_csv python/sql/datasource.py %}
497+ </div >
498+
499+ <div data-lang =" r " markdown =" 1 " >
500+ {% include_example manual_load_options_csv r/RSparkSQLExample.R %}
501+
502+ </div >
503+ </div >
482504### Run SQL on files directly
483505
484506Instead of using read API to load a file into DataFrame and query it, you can also query that
@@ -573,7 +595,7 @@ Note that partition information is not gathered by default when creating externa
573595
574596### Bucketing, Sorting and Partitioning
575597
576- For file-based data source, it is also possible to bucket and sort or partition the output.
598+ For file-based data source, it is also possible to bucket and sort or partition the output.
577599Bucketing and sorting are applicable only to persistent tables:
578600
579601<div class =" codetabs " >
@@ -598,7 +620,7 @@ CREATE TABLE users_bucketed_by_name(
598620 name STRING,
599621 favorite_color STRING,
600622 favorite_numbers array<integer >
601- ) USING parquet
623+ ) USING parquet
602624CLUSTERED BY(name) INTO 42 BUCKETS;
603625
604626{% endhighlight %}
@@ -629,7 +651,7 @@ while partitioning can be used with both `save` and `saveAsTable` when using the
629651{% highlight sql %}
630652
631653CREATE TABLE users_by_favorite_color(
632- name STRING,
654+ name STRING,
633655 favorite_color STRING,
634656 favorite_numbers array<integer >
635657) USING csv PARTITIONED BY(favorite_color);
@@ -664,7 +686,7 @@ CREATE TABLE users_bucketed_and_partitioned(
664686 name STRING,
665687 favorite_color STRING,
666688 favorite_numbers array<integer >
667- ) USING parquet
689+ ) USING parquet
668690PARTITIONED BY (favorite_color)
669691CLUSTERED BY(name) SORTED BY (favorite_numbers) INTO 42 BUCKETS;
670692
@@ -675,7 +697,7 @@ CLUSTERED BY(name) SORTED BY (favorite_numbers) INTO 42 BUCKETS;
675697</div >
676698
677699` partitionBy ` creates a directory structure as described in the [ Partition Discovery] ( #partition-discovery ) section.
678- Thus, it has limited applicability to columns with high cardinality. In contrast
700+ Thus, it has limited applicability to columns with high cardinality. In contrast
679701 ` bucketBy ` distributes
680702data across a fixed number of buckets and can be used when a number of unique values is unbounded.
681703
0 commit comments