Skip to content

Commit 1611523

Browse files
committed
Add default scala template to databricks cli
1 parent 041207d commit 1611523

28 files changed

Lines changed: 820 additions & 0 deletions

File tree

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"project_name": "my_default_scala",
3+
"compute_type": "serverless",
4+
"artifacts_dest_path": "/Volumes/test-folder",
5+
"default_catalog": "main",
6+
"personal_schemas": "yes, use a schema based on the current user name during development"
7+
}

acceptance/bundle/templates/default-scala/out.test.toml

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
>>> [CLI] bundle init default-scala --config-file ./input.json --output-dir output
3+
4+
Welcome to the default-scala template for Databricks Asset Bundles!
5+
6+
A workspace was selected based on your current profile. For information about how to change this, see https://docs.databricks.com/dev-tools/cli/profiles.html.
7+
workspace_host: [DATABRICKS_URL]
8+
✨ Successfully initialized template
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# my_default_scala
2+
3+
The 'my_default_scala' project was generated by using the default-scala template.
4+
5+
## Getting started
6+
7+
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/install.html. The version must be v0.241.0 or later.
8+
9+
2. Authenticate to your Databricks workspace (if you have not done so already):
10+
```
11+
$ databricks configure
12+
```
13+
14+
3. To deploy a development copy of this project, type:
15+
```
16+
$ databricks bundle deploy --target dev
17+
```
18+
(Note that "dev" is the default target, so the `--target` parameter
19+
is optional here.)
20+
21+
This deploys everything that's defined for this project.
22+
For example, the default template would deploy a job called
23+
`[dev yourname] my_default_scala_job` to your workspace.
24+
You can find that job by opening your workspace and clicking on **Workflows**.
25+
26+
4. Similarly, to deploy a production copy, type:
27+
```
28+
$ databricks bundle deploy --target prod
29+
```
30+
31+
5. To run a job, use the "run" command:
32+
```
33+
$ databricks bundle run
34+
```
35+
36+
6. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
37+
https://docs.databricks.com/dev-tools/vscode-ext.html.
38+
39+
7. For documentation on the Databricks Asset Bundles format used
40+
for this project, and for CI/CD configuration, see
41+
https://docs.databricks.com/dev-tools/bundles/index.html.
42+
43+
## Local Devloop
44+
45+
### Prerequisites
46+
47+
Install the following tools:
48+
49+
- [sbt](https://www.scala-sbt.org/) v1.10.2 or later
50+
- Java 17
51+
52+
### Running via sbt
53+
54+
1. On the terminal, navigate to the project's root directory. This is the directory where the `build.sbt` file is located.
55+
2. Execute the project's default `Main` class by running `sbt run`.
56+
57+
### IntelliJ setup
58+
59+
Install the latest [IntelliJ IDEA](https://www.jetbrains.com/idea/) IDE, both Community and Professional Editions work.
60+
Install the [Scala plugin](https://plugins.jetbrains.com/plugin/1347-scala) from the Jetbrains marketplace.
61+
62+
1. Import the current directory in your in IntelliJ where `build.sbt` is located.
63+
2. Choose the correct Java version in IntelliJ, go to File -> Project Structure -> SDKs.
64+
65+
Then Run -> Edit Configurations -> Set version to Java 17 from drop
66+
3. You should now be able to run the code directly in the IDE via the ▶️ option.
67+
68+
#### JVM settings
69+
70+
If you see the following error message,
71+
72+
```
73+
Failed to initialize MemoryUtil. You must start Java with --add-opens=java.base/java.nio=ALL-UNNAMED
74+
```
75+
76+
add the following to your JVM settings: `--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED`.
77+
78+
See the IntelliJ instructions on how to [configure VM settings for a specific run
79+
configuration](https://www.jetbrains.com/help/idea/run-debug-configuration-java-application.html#more_options)
80+
or [configure it everywhere on your IDE](https://www.jetbrains.com/help/ide-services/configure-settings-via-profiles.html).
81+
82+
### Unit tests
83+
84+
The project comes with a sample set of unit tests: `NycTaxiSpec.scala` using the ScalaTest
85+
framework.
86+
87+
Run the tests either directly in the IntelliJ IDE by clicking ▶️ on the tests, or via sbt
88+
by running `sbt test`.
89+
90+
## Customizations
91+
92+
### Job configuration
93+
This project uses serverless compute. No cluster setup is required.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// This file is used to build the sbt project with Databricks Connect.
2+
// This also includes the instructions on how to to create the jar uploaded via databricks bundle
3+
scalaVersion := "2.13.16"
4+
5+
name := "my_default_scala"
6+
organization := "com.examples"
7+
version := "0.1"
8+
9+
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.+"
10+
libraryDependencies += "org.slf4j" % "slf4j-simple" % "2.0.16"
11+
12+
libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.19" % Test
13+
14+
assembly / assemblyOption ~= { _.withIncludeScala(false) }
15+
assembly / assemblyExcludedJars := {
16+
val cp = (assembly / fullClasspath).value
17+
cp filter { _.data.getName.matches("scala-.*") } // remove Scala libraries
18+
}
19+
20+
assemblyMergeStrategy := {
21+
case _ => MergeStrategy.preferProject
22+
}
23+
24+
// to run with new jvm options, a fork is required otherwise it uses same options as sbt process
25+
fork := true
26+
javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED"
27+
28+
// To ensure logs are written to System.out by default and not System.err
29+
javaOptions += "-Dorg.slf4j.simpleLogger.logFile=System.out"
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# This is a Databricks asset bundle definition for my_default_scala.
2+
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
3+
bundle:
4+
name: my_default_scala
5+
uuid: [UUID]
6+
7+
include:
8+
- resources/*.yml
9+
10+
variables:
11+
catalog:
12+
description: The catalog to use
13+
schema:
14+
description: The schema to use
15+
16+
workspace:
17+
host: [DATABRICKS_URL]
18+
artifact_path: /Volumes/test-folder/${bundle.name}/${bundle.target}/${workspace.current_user.short_name}
19+
20+
artifacts:
21+
default:
22+
type: jar
23+
build: sbt package && sbt assembly
24+
path: .
25+
files:
26+
- source: ./target/scala-2.13/my_default_scala-assembly-0.1.jar
27+
28+
targets:
29+
dev:
30+
# The default target uses 'mode: development' to create a development copy.
31+
# - Deployed resources get prefixed with '[dev my_user_name]'
32+
# - Any job schedules and triggers are paused by default.
33+
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
34+
mode: development
35+
default: true
36+
workspace:
37+
host: [DATABRICKS_URL]
38+
variables:
39+
catalog: main
40+
schema: ${workspace.current_user.short_name}
41+
42+
prod:
43+
mode: production
44+
workspace:
45+
host: [DATABRICKS_URL]
46+
# We explicitly deploy to /Workspace/Users/[USERNAME] to make sure we only have a single copy.
47+
root_path: /Workspace/Users/[USERNAME]/.bundle/${bundle.name}/${bundle.target}
48+
permissions:
49+
- user_name: [USERNAME]
50+
level: CAN_MANAGE
51+
variables:
52+
catalog: main
53+
schema: default
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.databricks/
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
// The project folder is used to store sbt specific project files
2+
// This file is used to define the plugins that are used in the sbt project.
3+
// In particular, this includes the assembly plugin to generate an uber jar.
4+
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0")
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
2+
This folder is reserved for Databricks Asset Bundles resource definitions.
3+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# The main job for my_default_scala
2+
3+
resources:
4+
jobs:
5+
my_default_scala:
6+
name: my_default_scala
7+
parameters:
8+
- name: catalog
9+
default: ${var.catalog}
10+
- name: schema
11+
default: ${var.schema}
12+
tasks:
13+
- task_key: main_task
14+
spark_jar_task:
15+
main_class_name: com.examples.Main
16+
parameters:
17+
- "--catalog"
18+
- "{{job.parameters.catalog}}"
19+
- "--schema"
20+
- "{{job.parameters.schema}}"
21+
environment_key: default
22+
environments:
23+
- environment_key: default
24+
spec:
25+
environment_version: "4-scala-preview"
26+
java_dependencies:
27+
- ${workspace.artifact_path}/.internal/my_default_scala-assembly-0.1.jar

0 commit comments

Comments
 (0)