-
Notifications
You must be signed in to change notification settings - Fork 2.2k
sqldb/v2 as separate package
#10175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sqldb-v2
Are you sure you want to change the base?
sqldb/v2 as separate package
#10175
Conversation
sqldb/v2 as separate package
6b5e428 to
6899a39
Compare
ellemouton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass, looking really good!
go.mod
Outdated
| replace github.com/golang-migrate/migrate/v4 => github.com/lightninglabs/migrate/v4 v4.18.2-9023d66a-fork-pr-2 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok afaict, the big thing we added in the fork was the post migration call-backs logic.
Something id just like some clarity on before we push forward here is:
in our own rolled migration strategy in LND today that takes care of code migrations, we know that the migration version table will only be updated once the code migration is complete. Is the same true for the post-migration callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this with guggero previously.
Currently, no that is not true unfortunately, as the version table will be updated if the SQL migration succeeds, while the code migration fails. But I'm planning to address that asap 🔥!
The current plan is to submit a new PR to our migrate library fork. This PR will add a separate tracking table for code migrations. With this change, the logic will ensure that if the db is initialized and the persisted code migration version is lower than the SQL migration version, the code migration will be re-run prior to any SQL migration.
I haven't gotten to updating the migrate fork yet though. Perhaps we should hold of with merging this one until that change has been merged to the migrate fork, so that we don't need to push a new PR that updates the dependency in lnd though.
Does that sound like a good plan to you @ellemouton?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
7ca3000 to
233f1fb
Compare
|
Thanks a lot for the review @ellemouton 🔥🎉! Did one separate push to address the latest feedback (and add the contents of #10193) + one push to rebase this PR on master. I'm not re-requesting a review of you just yet until we have decided on #10175 (comment) @ellemouton. |
|
Will address failing CI checks on the next push after the next review round. |
|
Based the PR on the now updated |
GustavoStingelin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work here! I'm excited to see this moving forward and to use it in btcwallet.
I’ve opened a PR with some related work already, but this looks more polished and complete.
I’ve added a few comments as well.
sqldb/v2/migrations.go
Outdated
| // MigrateTableName is the name of the table used by golang-migrate to | ||
| // track the current schema version. | ||
| MigrateTableName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about renaming this to MigrationTableName? Or, to avoid repeating the word “migration”, maybe TrackingTableName would work better. To me, MigrateTableName sounds more like the table that is going to be migrated, rather than the one used for tracking migrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is part of the new MigrationStream functionality this PR introduces, I changed this to your suggestion without including it in a separate commit.
sqldb/v2/migrations.go
Outdated
| // Schemas is the embedded file system containing the SQL migration | ||
| // files. | ||
| Schemas embed.FS | ||
|
|
||
| // SQLFileDirectory is the directory containing the SQL migration files. | ||
| SQLFileDirectory string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we’re using two different names to describe the DDL migrations. One is Schemas, which might be confusing since it sounds like the literal database.schema relation. The other is SQLFileDirectory, which I think works well. What do you think about renaming it to something like DDLFiles and DDLFilesDirectory (or something along those lines) to make the intent clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or SQLFiles and SQLFilesDirectory as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #10175 (comment) :)
| // SqliteStore is a database store implementation that uses a sqlite backend. | ||
| type SqliteStore struct { | ||
| cfg *SqliteConfig | ||
|
|
||
| *BaseDB | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be clearer that it's a fake implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in a separate commit as this initial comment was copied from lnd.
| // maxSQLiteBatchSize is the maximum number of items that can be | ||
| // included in a batch query IN clause for SQLite. This was determined | ||
| // using the TestSQLSliceQueries test. | ||
| maxSQLiteBatchSize = 32766 | ||
|
|
||
| // maxPostgresBatchSize is the maximum number of items that can be | ||
| // included in a batch query IN clause for Postgres. This was determined | ||
| // using the TestSQLSliceQueries test. | ||
| maxPostgresBatchSize = 65535 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this value applies to all query parameters used in the query, not only for those used in the IN statement, so if this limit is reached when we have other parameters aside the IN, it will return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was copied from lnd, but I think you might be right. We should research this and address it in a follow-up if correct.
| // Ensure the migration tracker table exists before running migrations. | ||
| // This table tracks migration progress and ensures compatibility with | ||
| // SQLC query generation. If the table is already created by an SQLC | ||
| // migration, this operation becomes a no-op. | ||
| migrationTrackerSQL := ` | ||
| CREATE TABLE IF NOT EXISTS migration_tracker ( | ||
| version INTEGER UNIQUE NOT NULL, | ||
| migration_time TIMESTAMP NOT NULL | ||
| );` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from the MigrationStream table name?
| // TearDown stops the underlying docker container. | ||
| func (f *TestPgFixture) TearDown(t testing.TB) { | ||
| err := f.pool.Purge(f.resource) | ||
| require.NoError(t, err, "Could not purge resource") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be passed as a test Cleanup function inside the fixture creation
t.Cleanup(func() {
fixture.TearDown(t)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqldb/v2/sqlerrors.go
Outdated
| // ErrRetriesExceeded is returned when a transaction is retried more | ||
| // than the max allowed valued without a success. | ||
| ErrRetriesExceeded = errors.New("db tx retries exceeded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // ErrRetriesExceeded is returned when a transaction is retried more | |
| // than the max allowed valued without a success. | |
| ErrRetriesExceeded = errors.New("db tx retries exceeded") | |
| // ErrTxRetriesExceeded is returned when a transaction is retried more | |
| // than the max allowed valued without a success. | |
| ErrTxRetriesExceeded = errors.New("db tx retries exceeded") |
Maybe ErrTxRetriesExceeded to avoid confusion with other types of retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a separate commit.
sqldb/v2/sqlerrors.go
Outdated
| // postgresErrMsgs are strings that signify retriable errors resulting | ||
| // from serialization failures. | ||
| postgresErrMsgs = []string{ | ||
| "could not serialize access", | ||
| "current transaction is aborted", | ||
| "not enough elements in RWConflictPool", | ||
| "deadlock detected", | ||
| "commit unexpectedly resulted in rollback", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe postgresRetriableErrMsgs to be clearer.
| // postgresErrMsgs are strings that signify retriable errors resulting | |
| // from serialization failures. | |
| postgresErrMsgs = []string{ | |
| "could not serialize access", | |
| "current transaction is aborted", | |
| "not enough elements in RWConflictPool", | |
| "deadlock detected", | |
| "commit unexpectedly resulted in rollback", | |
| } | |
| // postgresRetriableErrMsgs are strings that signify retriable errors resulting | |
| // from serialization failures. | |
| postgresRetriableErrMsgs = []string{ | |
| "could not serialize access", | |
| "current transaction is aborted", | |
| "not enough elements in RWConflictPool", | |
| "deadlock detected", | |
| "commit unexpectedly resulted in rollback", | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a separate commit.
sqldb/v2/sqlite.go
Outdated
| // NewTestSqliteDB is a helper function that creates an SQLite database for | ||
| // testing. | ||
| func NewTestSqliteDB(t testing.TB, streams []MigrationStream) *SqliteStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should separate the test helpers in another file from the functions used in "production".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a separate commit.
sqldb/v2/test_postgres.go
Outdated
| t.Cleanup(func() { | ||
| pgFixture.TearDown(t) | ||
| }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move this to DB creation, like done in SQLite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a separate commit :)
|
Anything that I can do to help put energy into this? |
|
Hey @GustavoStingelin! Thank you so much for the review above 🙏!
Apologies, I should probably have informed you this in this PR earlier: As that PR will require updates of the implementation of this PR, it makes most sense that it is finalised and merged before changing this dependent PR. In terms of areas where you can help out: Continuing to review this PR once the above is merged and I've addressed your feedback here would be very helpful. Additionally, if you want to, reviewing the above |
| testPgUser = "test" | ||
| testPgPass = "test" | ||
| testPgDBName = "test" | ||
| PostgresTag = "15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what extent is making the Postgres DB version 15 intentional? PostgreSQL 15 was released 4 years ago, and between that time and now (PostgreSQL 18), I heuristically bet there are significant performance optimizations that have been added to PostgreSQL. I think we could target a more recent version, like the pre-latest release (PostgreSQL 17 seems like a sweet spot)
Related:
https://endoflife.date/postgresql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version 15 was chosen as it is what we currently use in tapd.
It's really solid feedback that this should be configurable, so thanks and I definitely agree 🙏! In order to avoid scope creep, let's do that in a follow-up PR(s).
The same applies for the rest of your comments as well. Once merged, I'll open an issue that tracks all of your feedback points, so that we can tackle those in follow-up PRs :).
| testPgUser = "test" | ||
| testPgPass = "test" | ||
| testPgDBName = "test" | ||
| PostgresTag = "15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, it looks like the PostgresTag version is not configurable. I understand the intent may be to enforce a specific version for all consumers, but this may not always be feasible now or in the future. Each project may have evolving dynamics that require a specific PostgreSQL version. I think it's safe to make it configurable with a default value that can be overridden when needed
| func NewTestPgFixture(t testing.TB, expiry time.Duration) *TestPgFixture { | ||
| // Use a sensible default on Windows (tcp/http) and linux/osx (socket) | ||
| // by specifying an empty endpoint. | ||
| pool, err := dockertest.NewPool("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are gonna use docker container anyway here, Why not using testcontainers which is out of the box, more mature, and maintainable solution. Seeing https://testcontainers.com/modules/postgresql/?language=go
Credits:
Originally inspired by @GustavoStingelin's PR: btcsuite/btcwallet#1027
| fmt.Sprintf("POSTGRES_PASSWORD=%v", testPgPass), | ||
| fmt.Sprintf("POSTGRES_DB=%v", testPgDBName), | ||
| "listen_addresses='*'", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those environment variables need to be configurable. It is up to the API consumer to pass the intended parameters. We can have some sort of base arguments and additional arguments.
- If no additional args provided, the base args would be used
- If additional args provided and no overlapping with base args, the base args would be augmented
- If additional args provided and there is overlapping with base args, the base args would be overriden by the corresponding values of the additional args and augmented with the new ones if any
| Cmd: []string{ | ||
| "postgres", | ||
| "-c", "log_statement=all", | ||
| "-c", "log_destination=stderr", | ||
| "-c", "max_connections=5000", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the above comment
| pool *dockertest.Pool | ||
| resource *dockertest.Resource | ||
| host string | ||
| port int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we making sure here there is no port collision in case of running multiple PostgreSQL DB containers?
|
|
||
| // Tell docker to hard kill the container in "expiry" seconds. | ||
| require.NoError(t, resource.Expire(uint(expiry.Seconds()))) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, That all docker containers "management" lifecycle part here and there across this PR is managed out of the box by testcontainers
|
|
||
| // MapSQLError attempts to interpret a given error as a database agnostic SQL | ||
| // error. | ||
| func MapSQLError(err error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to agnostically map that SQL error? How would doing that be helpful? It looks like at the application layer there is a state to signal whether we are dealing with SQLite or PostgreSQL. We could have separate functions for this: one for MapPostgreSQLError and another for MapSQLiteError. That way the errors would be extendable in the future, given that there is almost a major release version for PostgreSQL every year (hence, heuristically, new errors)
| "INTEGER PRIMARY KEY": "BIGSERIAL PRIMARY KEY", | ||
| "TIMESTAMP": "TIMESTAMP WITHOUT TIME ZONE", | ||
| "UNHEX": "DECODE", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is having an intermediary schema patching approach helpful and reliable over time?
Why not make the schema creation process redundant for each target DB (e.g., one for PostgreSQL and one for SQLite)? That way we can have flexibility and reliability in how we define schemas according to the target DB.
To boil down my arguments for why this intermediary schema patching design approach is unreliable long-term and even with some reasonable time in short-term:
-
PostgreSQL has frequent version releases (approximately one per year) with EOL releases every ~5 years. These uncontrolled changes naturally require updates to the replacement mappings, which may necessitate more migrations or make the process increasingly complex.
-
There was an internal issue discussed in LL where an explicit
AUTO_INCREMENTwas required for intended behavior (which, as observed, required a migration in the next release). Issues like this could be completely avoided by not using patching and instead following a "Write Everything Twice" approach for schema creation. Note: this wouldn't apply to queries, since queries are easier to write in a universally-agnostic manner than schema definitions. -
Future database support concerns: What if there's a plan to support another database? Would the schema patching map become a multi-type hierarchy? We'd eventually end up solving a simple problem with a complex solution.
For example: What's the problem with having the entire graph schema creation written twice—once for SQLite and once for PostgreSQL?
cc: @yyforyongyu for input need – since the btcwallet SQL project will eventually depend on this sqldb package, having this intermediary patching approach could potentially impact the reliability and criticality of the btcwallet sqlization project given the vision for btcwallet sqlization to stand for a decade, with that patching, and time constraint those parameters don't look like align with that vision
In the upcoming commits, we will introduce a new sqldb module, sqldb version 2. The intention of the new sqldb module, is to make it generalizable so that it contains no `lnd` specific code, to ensure that it can be reused in other projects. This commit adds the base of the new module, but does not include any implementation yet, as that will be done in the upcoming commits.
This commit moves all non lnd-specific code of sqldb/v1 to the new sqldb/v2 module. Note however, that without additional changes, this package still needs to reference lnd, as references to the lnd `sqlc` package is required without further changes. Those changes will be introduced in the upcoming commits, to fully decouple the new sqldb/v2 module from lnd.
233f1fb to
20f91ed
Compare
This commit introduces a new struct named `MigrationStream`, which defines a structure for migrations SQL migrations. The `MigrationStream` struct contains the SQL migrations which will be applied, as well as corresponding post-migration code migrations which will be executed afterwards. The struct also contains fields which define how the execution of the migrations are tracked. Importantly, it is also possible to define multiple different `MigrationStream`s which are executed, to for example define one `prod` and one `dev` migration stream.
This commit updates the `sqldb/v2` package to utilize the new `MigrationStream` type for executing migrations, instead of passing `[]MigrationConfig`'s directly.
This commit updates the definition of the `BaseDB` struct to decouple it from lnd`s `sqlc` package. We also introduce new fields to the struct to make it possible to track the database type used at runtime.
In order to make it possible to replace `tapd`'s internal `sqldb` package with the new generic `sqldb/v2` package, we need to make sure that all features and functionality that currently exist in the `tapd` package are also present in the new `sqldb/v2` package. This commit adds such additional missing features to the `sqldb/v2` package.
Previously, the test db helper files were suffixed with "_test", which would indicate that the files specifically contained tests. However, these files actually contain helper functions to be used in tests, and are not tests themselves. To better reflect their purpose, the files have been renamed to instead be prefixed with "test_".
The docs of the `SqliteStore` for no sqlite build environments previously didn't clarify that the actual `SqliteStore` implementation under such build tag environments, didn't actually implement a real sqlite store. This commit clarifies that in the docs.
rename the `ErrRetriesExceeded` error to `ErrTxRetriesExceeded`.
Rename the `postgresErrMsgs` list to `postgresRetriableErrMsgs`, in order to clarify its usage.
Move the tear down of the postgres db fixture on the test cleanup to the constructor, instead of requiring the callsites to implement that.
20f91ed to
d2bc562
Compare
|
Thanks a lot for the reviews @GustavoStingelin & @mohamedawnallah 🎉! Since the Even though the feedback you've both submitted is really solid and should definitely be added in the future, I've been quite restrictive in which feedback I've addressed so far. The reasoning behind this is that I think I need to clarify exactly what the intent of this initial version of the Therefore, what this PR currently implements, is ONLY mainly a combination of the non repo specific db code that's already present in those repos. Additionally in order to make that code useable to execute migrations, this PR also introduces the concept of the I.e., this repo mainly only carries over code that already is live in the other repos, in order to ensure that this PR is quite easy to review, and that the scope of the PR remains quite small. In order to avoid scope creep, I suggest that we add the additional functionality that you've suggested in your feedback in follow-up PRs, just so that we can get the Therefore, I'll only address feedback in this PR which is points out bugs, or if you notice currently live functionality in our repos which I've missed including in this PR. For the additional feedback, once this PR is merged, I'll open an issue which includes the additional functionality you've suggested, so that we can track it and tackle those in follow-up PRs. One final note: Since some of your suggestions were basically only renaming of fields which are already present & live in the DB code in our other repos, I addressed those in separate commits at the end of the PR. Since such commits should be super easy to review and should stall this PR, I opted to including those. |
GustavoStingelin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a comment for a potential issue, otherwise LGTM!
| // Report the current version of the database before the migration. | ||
| currentDbVersion, _, err := driver.Version() | ||
| if err != nil { | ||
| return fmt.Errorf("unable to get current db version: %w", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we are missing an !errors.Is(err, migrate.ErrNilVersion) check for fresh databases? Since no migrations have been applied yet.
Also, what is the difference between this value currentDbVersion and migrationVersion? They look equivalent to me.
Finally, how do we ensure this logic is actually exercised and working as expected?
|
@ziggie1984: review reminder |
This PR addresses Phase 1 of the release plan of comment #9762 (comment) in #9762.
This PR extracts the non-specific
lndspecific logic of thesqldbpackage into a newsqldb/v2package, and introduces the extra functionality thatsqldb/v2adds.