diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index cf902db8709e7..9f1cd7ad8b802 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -292,7 +292,7 @@ private[spark] class SparkSubmit extends Logging { error("Cluster deploy mode is not applicable to Spark shells.") case (_, CLUSTER) if isSqlShell(args.mainClass) => error("Cluster deploy mode is not applicable to Spark SQL shell.") - case (_, CLUSTER) if isThriftServer(args.mainClass) => + case (_, CLUSTER) if (clusterManager != KUBERNETES) && isThriftServer(args.mainClass) => error("Cluster deploy mode is not applicable to Spark Thrift server.") case _ => } diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 4ae7acaae2314..0c3626e3d46a5 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -340,6 +340,43 @@ RBAC authorization and how to configure Kubernetes service accounts for pods, pl [Using RBAC Authorization](https://kubernetes.io/docs/admin/authorization/rbac/) and [Configure Service Accounts for Pods](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/). +## Running Spark Thrift Server + +Thrift JDBC/ODBC Server (aka Spark Thrift Server or STS) is Spark SQL’s port of Apache Hive’s HiveServer2 that allows +JDBC/ODBC clients to execute SQL queries over JDBC and ODBC protocols on Apache Spark. + +### Client Deployment Mode + +To start STS in client mode, excute the following command + +```bash +$ sbin/start-thriftserver.sh \ + --master k8s://https://: +``` + +### Cluster Deployment Mode + +To start STS in cluster mode, excute the following command + +```bash +$ sbin/start-thriftserver.sh \ + --master k8s://https://: \ + --deploy-mode cluster +``` + +The most basic workflow is to use the pod name (driver pod name incase of cluster mode and self pod name(pod/container from +which the STS start command is issued) incase of client mode, which can be found with "kubectl get pods"), and run +"kubectl port-forward spark-app-podname 31416:10000" (https://kubernetes.io/docs/tasks/access-application-cluster/port-forward +-access-application-cluster/#forward-a-local-port-to-a-port-on-the-pod), which will automatically forward "localhost:31416" +to the pod's port 10000. Any jdbc client can then be used to query jdbc:hive2://localhost:31416. + +Alternatively, any other application on the cluster can simply use "spark-app-podname:10000", which will be resolved by +kube-dns. For persistent external access one can run "kubectl expose pod spark-app-podname --type=NodePort --port 10000" +to create a Kubernetes Service which will accept connections on a particular port of every node on the cluster and send +them to the pod's port 10000. + +Note that STS will not work with Spark dynamicAllocation as Spark Shuffle Service support is not yet available. + ## Future Work There are several Spark on Kubernetes features that are currently being worked on or planned to be worked on. Those features are expected to eventually make it into future versions of the spark-kubernetes integration.