Skip to content

Conversation

@Azmi-84
Copy link
Contributor

@Azmi-84 Azmi-84 commented Mar 28, 2025

📝 Summary

This commit introduces a new Python script that serves as a comprehensive guide to getting started with DuckDB. It includes interactive examples for database connections, table creation, data insertion, basic queries, and integration with Polars. The guide aims to facilitate learning and experimentation with DuckDB's features in a user-friendly manner.

…it introduces a new Python script that serves as a comprehensive guide to getting started with DuckDB. It includes interactive examples for database connections, table creation, data insertion, basic queries, and integration with Polars. The guide aims to facilitate learning and experimentation with DuckDB's features in a user-friendly manner.
@Haleshot
Copy link
Collaborator

Haleshot commented Mar 30, 2025

Thanks a lot for your first PR contribution and for helping kickstart the duckdb series. Just wondering if you checked the existing issue thread for the duckdb series here: #48; I believe @prrao87 wanted to work on this notebook. In the future, I would recommend checking out the respective course issues to get an idea of the notebook topics/people working on them and also commenting on it accordingly if you want a certain topic assigned to you/added to the outline proposed.

Just waiting for a confirmation from @prrao87; if he's fine with having this notebook or if he'd like to collaborate, etc. (leaving the options free). Hope that's fine by you as well @Azmi-84 😅

@Azmi-84
Copy link
Contributor Author

Azmi-84 commented Mar 30, 2025

Thanks a lot for your first PR contribution and for helping kickstart the duckdb series. Just wondering if you checked the existing issue thread for the duckdb series here: #48; I believe @prrao87 wanted to work on this notebook. In the future, I would recommend checking out the respective course issues to get an idea of the notebook topics/people working on them and also commenting on it accordingly if you want a certain topic assigned to you/added to the outline proposed.

Just waiting for a confirmation from @prrao87; if he's fine with having this notebook or if he'd like to collaborate, etc. (leaving the options free). Hope that's fine by you as well @Azmi-84 😅

Oh, Thanks. Actually I didn't notice it. Yeah I would love to contribute in this course. Maybe I'll choose a different and available topic then.

@Haleshot
Copy link
Collaborator

Haleshot commented Mar 30, 2025

Maybe I'll choose a different and available topic then.

I would wait for an update from Prashanth before making any changes in this PR (can let it be for now). In the meantime, feel free to browse through the DuckDB course issue (or others) and let me know which ones you want to add/work on!

@prrao87
Copy link

prrao87 commented Mar 31, 2025

Hi @Haleshot, I took a look at the notebook and I think the goals of this notebook are different from the original notebook we discussed "why duckdb?". I think this notebook by @Azmi-84 is more of an "intro to duckdb's features" notebook that sits in between the first lesson "why duckdb" and the third lesson "querying dataframes", as this summarizes a list of features from duckdb's documentation and gives a gentle introduction to the user on these features.

My thoughts for the "why duckdb?" introductory notebook were to add a high-level overview of embedded databases and discuss their main characteristics. It's along the lines of what Hannes describes in this talk, and explains why someone should care about duckdb in a world where you have polars, daft and a whole suite of other query engines (databases are still useful in their own right, and duckdb makes you rethink the meaning of a "database" in the classical sense).

In the "why duckdb?" notebook, we'd ideally cover how/why duckdb is:

  • easy to use
  • fast
  • persistent (when needed)
  • interoperable (can function universal data connector while providing a SQL interface)

IMO these characteristics are important to highlight before going into the features, as it offers a contrast to the other tutorials in the Marimo learn series and shows the power of embedded databases.

I like the idea of having an "intro to duckdb's features" notebook like this one, however, is a good idea and this could be an additional notebook in the series that provides a basic introduction to the features of duckdb to help people explore it further.
I'd still like to contribute the "why duckdb?" notebook, but I've been swamped with release stuff on my end with kuzu and will do my best to get to it at the earliest (in the coming days) - apologies for the delay!

@Haleshot
Copy link
Collaborator

Haleshot commented Apr 1, 2025

I think this notebook by @Azmi-84 is more of an "intro to duckdb's features" notebook that sits in between the first lesson "why duckdb" and the third lesson "querying dataframes", as this summarizes a list of features from duckdb's documentation and gives a gentle introduction to the user on these features.
My thoughts for the "why duckdb?" introductory notebook were to add a high-level overview of embedded databases and discuss their main characteristics. It's along the lines of what Hannes describes in this talk, and explains why someone should care about duckdb in a world where you have polars, daft and a whole suite of other query engines (databases are still useful in their own right, and duckdb makes you rethink the meaning of a "database" in the classical sense).

Hey, thanks for taking a look at this!
Makes sense about the different focus — that's exactly why I wanted your input on it.

I like the idea of having an "intro to duckdb's features" notebook like this one, however, is a good idea and this could be an additional notebook in the series that provides a basic introduction to the features of duckdb to help people explore it further. I'd still like to contribute the "why duckdb?" notebook, but I've been swamped with release stuff on my end with kuzu and will do my best to get to it at the earliest (in the coming days) - apologies for the delay!

Happy to include this as another notebook in the series. Those outlines in the issues are just starting points anyway — always flexible.

No rush at all on the "why duckdb?" notebook. Release work always takes priority, so whenever you can get to it works for me.

PS: That talk you linked was super nice btw!

@Haleshot
Copy link
Collaborator

Haleshot commented Apr 2, 2025

Also, came across an interesting DuckDB resource which you both might find relevant.

@Haleshot
Copy link
Collaborator

Haleshot commented Apr 6, 2025

@Azmi-84 Let me know if you want me to include a new notebook topic in the #48 issue under which this notebook can fall under (as also recommended by Prashanth above).

@Azmi-84
Copy link
Contributor Author

Azmi-84 commented Apr 6, 2025

@Azmi-84 Let me know if you want me to include a new notebook topic in the #48 issue under which this notebook can fall under (as also recommended by Prashanth above).

Yes, sure and I have no problem with that. Also if it helps someone to have a quick look at the features that DuckDB provides why not that.

@prrao87
Copy link

prrao87 commented Apr 7, 2025

Hi @Azmi-84, I think this notebook can be merged as an intro notebook. Looks fine to me to start with.

@Haleshot, I realize I'm holding the "why duckdb" notebook back due to my extremely busy schedule these days (the tasklist is ever piled up!). If there's someone else who's expressed interest in creating that example, I'd be happy to step aside and take a look at their work as and when my schedule opens up. If not, I'll get to it when I can, rest assured, I am interested in following along with Marimo's progress and I'm sure I'll have some cleanup PRs to add in the future as I look through these intro DuckDB examples :).

@Haleshot
Copy link
Collaborator

@Haleshot, I realize I'm holding the "why duckdb" notebook back due to my extremely busy schedule these days (the tasklist is ever piled up!). If there's someone else who's expressed interest in creating that example, I'd be happy to step aside and take a look at their work as and when my schedule opens up. If not, I'll get to it when I can, rest assured, I am interested in following along with Marimo's progress and I'm sure I'll have some cleanup PRs to add in the future as I look through these intro DuckDB examples :).

No worries at all! I’ll take a look at this PR and merge it as appropriate (intro notebook). Appreciate you staying tuned in.

Copy link
Collaborator

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Azmi; just went through your notebook and left a couple of review comments; it will also be helpful if you could take a look at our working with SQL in marimo webpage in docs.

Made a new topic in the DuckDB issue #48 called "Intro to DuckDB" and assigned the topic to you there.

Also, would recommend renaming the notebook to match the contents (intro to duckdb, etc.), along with the first header tag in the notebook.

Copy link
Collaborator

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First markdown block may not be relevant here.

@Azmi-84
Copy link
Contributor Author

Azmi-84 commented Apr 11, 2025

@Haleshot, I realize I'm holding the "why duckdb" notebook back due to my extremely busy schedule these days (the tasklist is ever piled up!). If there's someone else who's expressed interest in creating that example, I'd be happy to step aside and take a look at their work as and when my schedule opens up. If not, I'll get to it when I can, rest assured, I am interested in following along with Marimo's progress and I'm sure I'll have some cleanup PRs to add in the future as I look through these intro DuckDB examples :).

No worries at all! I’ll take a look at this PR and merge it as appropriate (intro notebook). Appreciate you staying tuned in.

@Haleshot Thanks for ur suggestions, actually I'm busy with my academic stuffs and need to focus on that till 24 April. would u mind if I apply these modifications after this time? I'll be really grateful if u consider till 24 April and I've vacation of 12 days after this.

@Haleshot
Copy link
Collaborator

@Haleshot Thanks for ur suggestions, actually I'm busy with my academic stuffs and need to focus on that till 24 April. would u mind if I apply these modifications after this time? I'll be really grateful if u consider till 24 April and I've vacation of 12 days after this.

Oh, absolutely no issues! Take your time and don't feel any pressure to get back to this before your commitments! Also, wishing you all the best 😃

This commit addresses and resolves the suggestions provided in the review, including:

- Ensuring the notebook follows the best practices outlined in the contribution guidelines.
- Removing irrelevant markdown blocks and using marimo features.

Additionally, the notebook has been completely redesigned with:
- Improved structure and flow for better readability and learning experience.
- Enhanced examples and interactive content for database connections, table creation, and data manipulation.
- Better integration of visuals using Plotly and Marimo for basic interactive analysis.
- Updated dependency management using  for reproducibility.

The notebook now provides a polished and user-friendly guide to DuckDB, ensuring a high-quality learning experience for users.
Comment on lines 461 to 465
@app.cell
def _():
# Prepare the data
user_data = [
(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cells like these (which carry helping value and not necessarily direct "topic" value) can be moved to the end of the notebook under an "Appendix" markdown block where all relevant functions (especially create visualization functions, etc.). You can refer to other notebooks where this has been followed.

@Haleshot
Copy link
Collaborator

Haleshot commented May 5, 2025

Great changes on the whole! Really nice that you revisited the PR after your academic commitments. Hope you got to have a break :)

This commit updates the DuckDB getting started script by hiding code cells to streamline the user interface. The changes enhance readability and focus on the interactive components, making it easier for users to engage with the content without being distracted by the underlying code.
# "pyarrow==19.0.1",
# "pandas==2.2.3",
# "sqlglot==26.12.1",
# "plotly==5.23.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, this version of plotly doesn't seem to exist (tried looking up on GH releases and PyPi). It gives me a:

  × No solution found when resolving `--with` dependencies:
  ╰─▶ Because there is no version of plotly==5.23.1 and you require plotly==5.23.1, we can conclude that your requirements are
      unsatisfiable.

when running uvx marimo edit --sandbox .\01_getting_started.py

Copy link
Collaborator

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this notebook ❤️; really enjoyed reviewing it. Believe it explains the topic in a nice way.

@Haleshot Haleshot merged commit 68f7784 into marimo-team:main May 19, 2025
1 check passed
@Haleshot
Copy link
Collaborator

@Azmi-84 Curious on whether you're on our discord server; would love to give you a shoutout on our courses channel. Let me know if you've joined (your username).

PS: Thanks for your patience in this PR.

@Azmi-84
Copy link
Contributor Author

Azmi-84 commented May 19, 2025

yes i'm on the server(alazmi_80756). appreciate the shoutout and happy to contribute. also no worries at all about the PR, glad to be part of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants