Conversation
jtcohen6
left a comment
There was a problem hiding this comment.
This is a fascinating PR. Thanks for such clean and well-annotated code. I feel like I'm connecting two wayward dots in my head, between what pyspark does and what database cursors do. To that end, I feel far from qualified to review the specifics; I'd love to spend some time experimenting with one of the engineers on the Core team.
The current "dbtspec" suite/framework for integration testing is quite unfriendly. Please don't spend too much time chiseling away at that particular wall, since we're aiming/planning to replace it over the next few months.
6ec5374 to
5be6715
Compare
JCZuurmond
left a comment
There was a problem hiding this comment.
@jtcohen6 : Functionality wise everything is there. I am struggling with the test set-up in cirecle CI
tests/specs/spark-session.dbtspec
Outdated
| schema: "analytics_{{ var('_dbt_random_suffix') }}" | ||
| sequences: | ||
| test_dbt_empty: empty | ||
| # Disabled tests requires hive support |
There was a problem hiding this comment.
I don't know yet how to enable hive support, this is particular hard due to the sub-process calls to dbt inside the pytest-dbt-adapter package
There was a problem hiding this comment.
Given how difficult the .dbtspec tests are to debug, and the fact that we're moving away from this framework, I'd be happy with an alternative integration test, to validate some truly basic functionality. The setup could be similar to the ones we have in tests/integration currently. I think it would first require adding a new profile marker to the list in conftest.py.
There was a problem hiding this comment.
most of the test are disabled due to missing hive support, which is not easy to add due to the subprocesses in .dbtspec. @jtcohen6 : do you consider the two active tests to be sufficient? (see thread in tox.ini too)
jtcohen6
left a comment
There was a problem hiding this comment.
It looks like this is generally working. I'd hate to have the particularities of our current testing setup (.dbtspec + CircleCI) stand in the way of making this capability more broadly available. To that end, I'm open to being more creative about other ways to test this, knowing too that we're soon to revamp all adapter plugin testing.
tests/specs/spark-session.dbtspec
Outdated
| schema: "analytics_{{ var('_dbt_random_suffix') }}" | ||
| sequences: | ||
| test_dbt_empty: empty | ||
| # Disabled tests requires hive support |
There was a problem hiding this comment.
Given how difficult the .dbtspec tests are to debug, and the fact that we're moving away from this framework, I'd be happy with an alternative integration test, to validate some truly basic functionality. The setup could be similar to the ones we have in tests/integration currently. I think it would first require adding a new profile marker to the list in conftest.py.
|
@jtcohen6 : Could provide some help with this PR? The PR is stuck on the testing image, it needs an image that has pyspark + the standard dbt-spark dependencies. If I move away from |
18f83ef to
25da4c9
Compare
1c70d74 to
85d23b8
Compare
|
@jtcohen6 : This PR is ready to merge! Will you review it once more? Also the accompanying docs PR please. The PR contains test for the session connection method. Not all dbtspec tests are enabled. I think it is technically possible to run at least the tests that are run for Thrift. I was not able to set-up the connection to a metastore like for Thrift. I think it is technically possible. My expectation is that if that connection is set-up, all the tests that run for Thrift should also run for the session connection method. Given that dbtspec is going to be replaced, I would like to wait for that before investing more time on the test suite, I expect it would become easier to enable more tests. |
Documentation updates that go along with dbt-labs/dbt-spark#279
|
The failing CI steps look unrelated to this PR |
|
Thanks @JCZuurmond! Triggering a rerun of the failed CI steps now. Looks like it may have been a connection/timeout issue. We did just migrate to a new dedicated Databricks workspace for sandbox/integration testing. |
d765f7a to
37ebe30
Compare
jtcohen6
left a comment
There was a problem hiding this comment.
@JCZuurmond Thank you for this very, very cool contribution!
Alongside the dbt-sqlite and dbt-duckdb adapters, this feels like a step in the direction of mocking/testing/validating dbt functionality, without first requiring a connection to a remote database.
I'm totally happy with punting on the test cases that require a metastore to persist objects between sessions. We are finally moving away from .dbtspec, and toward a new testing framework for adapters, which you can check out in #299.
In the meantime, my only change here will be to move your contributor note up in the changelog, since this will be included in v1.1.0rc1.
Documentation updates that go along with dbt-labs/dbt-spark#279
* Add session method Documentation updates that go along with dbt-labs/dbt-spark#279 * Add version blocks, tweak language * Add to v1.1 migration guide Co-authored-by: Cor <[email protected]>
resolves #272
Description
Adds a connection method to connect to Spark session. See #272 for full discussion.
Checklist
CHANGELOG.mdand added information about my change to the "dbt-spark next" section.