-
Notifications
You must be signed in to change notification settings - Fork 37
Add DuckDB getting started guide with interactive examples. #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…it introduces a new Python script that serves as a comprehensive guide to getting started with DuckDB. It includes interactive examples for database connections, table creation, data insertion, basic queries, and integration with Polars. The guide aims to facilitate learning and experimentation with DuckDB's features in a user-friendly manner.
|
Thanks a lot for your first PR contribution and for helping kickstart the duckdb series. Just wondering if you checked the existing issue thread for the duckdb series here: #48; I believe @prrao87 wanted to work on this notebook. In the future, I would recommend checking out the respective course issues to get an idea of the notebook topics/people working on them and also commenting on it accordingly if you want a certain topic assigned to you/added to the outline proposed. Just waiting for a confirmation from @prrao87; if he's fine with having this notebook or if he'd like to collaborate, etc. (leaving the options free). Hope that's fine by you as well @Azmi-84 😅 |
Oh, Thanks. Actually I didn't notice it. Yeah I would love to contribute in this course. Maybe I'll choose a different and available topic then. |
I would wait for an update from Prashanth before making any changes in this PR (can let it be for now). In the meantime, feel free to browse through the DuckDB course issue (or others) and let me know which ones you want to add/work on! |
|
Hi @Haleshot, I took a look at the notebook and I think the goals of this notebook are different from the original notebook we discussed "why duckdb?". I think this notebook by @Azmi-84 is more of an "intro to duckdb's features" notebook that sits in between the first lesson "why duckdb" and the third lesson "querying dataframes", as this summarizes a list of features from duckdb's documentation and gives a gentle introduction to the user on these features. My thoughts for the "why duckdb?" introductory notebook were to add a high-level overview of embedded databases and discuss their main characteristics. It's along the lines of what Hannes describes in this talk, and explains why someone should care about duckdb in a world where you have polars, daft and a whole suite of other query engines (databases are still useful in their own right, and duckdb makes you rethink the meaning of a "database" in the classical sense). In the "why duckdb?" notebook, we'd ideally cover how/why duckdb is:
IMO these characteristics are important to highlight before going into the features, as it offers a contrast to the other tutorials in the Marimo learn series and shows the power of embedded databases. I like the idea of having an "intro to duckdb's features" notebook like this one, however, is a good idea and this could be an additional notebook in the series that provides a basic introduction to the features of duckdb to help people explore it further. |
Hey, thanks for taking a look at this!
Happy to include this as another notebook in the series. Those outlines in the issues are just starting points anyway — always flexible. No rush at all on the "why duckdb?" notebook. Release work always takes priority, so whenever you can get to it works for me. PS: That talk you linked was super nice btw! |
|
Also, came across an interesting DuckDB resource which you both might find relevant. |
Yes, sure and I have no problem with that. Also if it helps someone to have a quick look at the features that DuckDB provides why not that. |
|
Hi @Azmi-84, I think this notebook can be merged as an intro notebook. Looks fine to me to start with. @Haleshot, I realize I'm holding the "why duckdb" notebook back due to my extremely busy schedule these days (the tasklist is ever piled up!). If there's someone else who's expressed interest in creating that example, I'd be happy to step aside and take a look at their work as and when my schedule opens up. If not, I'll get to it when I can, rest assured, I am interested in following along with Marimo's progress and I'm sure I'll have some cleanup PRs to add in the future as I look through these intro DuckDB examples :). |
No worries at all! I’ll take a look at this PR and merge it as appropriate (intro notebook). Appreciate you staying tuned in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Azmi; just went through your notebook and left a couple of review comments; it will also be helpful if you could take a look at our working with SQL in marimo webpage in docs.
Made a new topic in the DuckDB issue #48 called "Intro to DuckDB" and assigned the topic to you there.
Also, would recommend renaming the notebook to match the contents (intro to duckdb, etc.), along with the first header tag in the notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First markdown block may not be relevant here.
@Haleshot Thanks for ur suggestions, actually I'm busy with my academic stuffs and need to focus on that till 24 April. would u mind if I apply these modifications after this time? I'll be really grateful if u consider till 24 April and I've vacation of 12 days after this. |
Oh, absolutely no issues! Take your time and don't feel any pressure to get back to this before your commitments! Also, wishing you all the best 😃 |
This commit addresses and resolves the suggestions provided in the review, including: - Ensuring the notebook follows the best practices outlined in the contribution guidelines. - Removing irrelevant markdown blocks and using marimo features. Additionally, the notebook has been completely redesigned with: - Improved structure and flow for better readability and learning experience. - Enhanced examples and interactive content for database connections, table creation, and data manipulation. - Better integration of visuals using Plotly and Marimo for basic interactive analysis. - Updated dependency management using for reproducibility. The notebook now provides a polished and user-friendly guide to DuckDB, ensuring a high-quality learning experience for users.
duckdb/01_getting_started.py
Outdated
| @app.cell | ||
| def _(): | ||
| # Prepare the data | ||
| user_data = [ | ||
| ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe cells like these (which carry helping value and not necessarily direct "topic" value) can be moved to the end of the notebook under an "Appendix" markdown block where all relevant functions (especially create visualization functions, etc.). You can refer to other notebooks where this has been followed.
|
Great changes on the whole! Really nice that you revisited the PR after your academic commitments. Hope you got to have a break :) |
This commit updates the DuckDB getting started script by hiding code cells to streamline the user interface. The changes enhance readability and focus on the interactive components, making it easier for users to engage with the content without being distracted by the underlying code.
| # "pyarrow==19.0.1", | ||
| # "pandas==2.2.3", | ||
| # "sqlglot==26.12.1", | ||
| # "plotly==5.23.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, this version of plotly doesn't seem to exist (tried looking up on GH releases and PyPi). It gives me a:
× No solution found when resolving `--with` dependencies:
╰─▶ Because there is no version of plotly==5.23.1 and you require plotly==5.23.1, we can conclude that your requirements are
unsatisfiable.when running uvx marimo edit --sandbox .\01_getting_started.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this notebook ❤️; really enjoyed reviewing it. Believe it explains the topic in a nice way.
|
@Azmi-84 Curious on whether you're on our discord server; would love to give you a shoutout on our courses channel. Let me know if you've joined (your username). PS: Thanks for your patience in this PR. |
|
yes i'm on the server(alazmi_80756). appreciate the shoutout and happy to contribute. also no worries at all about the PR, glad to be part of it. |
📝 Summary
This commit introduces a new Python script that serves as a comprehensive guide to getting started with DuckDB. It includes interactive examples for database connections, table creation, data insertion, basic queries, and integration with Polars. The guide aims to facilitate learning and experimentation with DuckDB's features in a user-friendly manner.