-
Notifications
You must be signed in to change notification settings - Fork 72
Feature/improve cli #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/improve cli #231
Conversation
Increase Speed (dask-contrib#29)
Author: rajagurunath <[email protected]> Date: Mon May 24 02:37:40 2021 +0530
1. SHOW MODELS 2. DESCRIBE MODEL 3. EXPORT MODEL
Feature/export model
Fetch upstream changes to resolve conflicts
Fix a failing build, as ciso8601 is currently not pip-installable (dask-contrib#192)
1. Autocompletion 2. AutoSuggestion 3. Added progressBar 4. Added metacommands
Codecov Report
@@ Coverage Diff @@
## main #231 +/- ##
==========================================
+ Coverage 95.74% 95.81% +0.07%
==========================================
Files 64 64
Lines 2750 2799 +49
Branches 412 424 +12
==========================================
+ Hits 2633 2682 +49
Misses 75 75
Partials 42 42
Continue to review full report at Codecov.
|
nils-braun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! That is very nice, I like it. That will make the CLI very useful.
As we (or: you) are adding more and more functionality, I think we should also add some tests now. Do you have experience with testing CLI applications? I would probably use something like this. If you have never done this before: I am happy to add the first few tests to this PR (and you can take it from there if you want)
dask_sql/cmd.py
Outdated
| from dask_sql.context import Context | ||
|
|
||
| meta_command_completer = WordCompleter( | ||
| ["\l", "\d?", "\dt", "\df", "\de", "\dm", "\conninfo", "quit"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the \d? command, but how about a \help¸ or ?` command also (with the same semantics)?
Later (not now), we can also add comments for connecting to another dask cluster or changing the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added two metacommands
- \dss Switch schema
- \dsc Switch Dask cluster
let me know if this metacommands- name, and function make sense!
dask_sql/cmd.py
Outdated
| self.prompt = partial(prompt, **kwargs) | ||
|
|
||
|
|
||
| def display_markdown(content, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def display_markdown(content, **kwargs): | |
| def _display_markdown(content, **kwargs): |
Let's keep it "private", users should not use this function. Can you also add a very small docstring?
Same also applies to parse_meta_command and meta_commands`.
|
|
||
| def display_markdown(content, **kwargs): | ||
| df = pd.DataFrame(content, **kwargs) | ||
| print(df.to_markdown(tablefmt="fancy_grid")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I love this - looks really good.
Unfortunately, it needs an additional package (tabulate). I see three possibilities:
- try it, and on failure do a normal print
- try it, and on failure do a normal print and print a warning
- add the package to
setup.pyand make it a dependency.
I would favor 2 or 3 (with a small tendency towards 3), but its your choice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tabulate in setup.py basically picked third option
dask_sql/cmd.py
Outdated
| def display_markdown(content, **kwargs): | ||
| df = pd.DataFrame(content, **kwargs) | ||
| print(df.to_markdown(tablefmt="fancy_grid")) | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you remove True (because you need it later as the result of meta_commands), but in principle this has nothing to do with display_markdown. So I would separate the concerns, do not have any return value here and return true in meta_commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes perfect sense, !!!
dask_sql/cmd.py
Outdated
|
|
||
| def parse_meta_command(sql): | ||
| command, _, arg = sql.partition(" ") | ||
| command = command.strip().replace("+", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replacing "+"? Did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using pgcli to make the meta-commands, pgcli has support for specifying +, (if specified it will show some extra information like Description and size, etc ) so thought some people use the same notion as \dt+ etc. I have removed that now !!!
dask_sql/cmd.py
Outdated
| elif cmd == "\l": | ||
| return display_markdown(context.schema.keys()) | ||
| elif cmd == "\dt": | ||
| return display_markdown(context.schema[schema_name].tables.keys()) | ||
| elif cmd == "\df": | ||
| return display_markdown(context.schema[schema_name].functions.keys()) | ||
| elif cmd == "\de": | ||
| return display_markdown(context.schema[schema_name].experiments.keys()) | ||
| elif cmd == "\dm": | ||
| return display_markdown(context.schema[schema_name].models.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woa! That is really nice. I like this very clean code. Well done
dask_sql/cmd.py
Outdated
| elif cmd == "\dm": | ||
| return display_markdown(context.schema[schema_name].models.keys()) | ||
| elif cmd == "\conninfo": | ||
| print("Dask cluster info") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the only command with a "title". Do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah removed it !!!
dask_sql/cmd.py
Outdated
| text = session.prompt("(dask-sql) > ") | ||
| except KeyboardInterrupt: | ||
| continue | ||
| sys.exit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like that Ctrl+C does not kill the program, but just gives me a new line. Is there a specific reason to change this? If people want to quit, they can use your new quit or just Ctrl+D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I don't know the command ctrl+D closes the dask-sql-cli, so I thought of adding sys.exit at KeyboardInterrupt but now I removed that !!
dask_sql/cmd.py
Outdated
| print(df) | ||
| except Exception: | ||
| traceback.print_exc() | ||
| meta_command_detected = meta_commands(text, context=context, client=client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
One thing that I tried, was to enter an invalid meta-command (like \x). The tools will try to understand it as SQL command (which will give a not-so-useful error message). I think it would be nice to classify all lines starting with "" as meta-commands - and if they are now found show a useful error message and the help again. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I haven't thought about those inputs, now improved the error message!
dask_sql/cmd.py
Outdated
| if not meta_command_detected: | ||
| try: | ||
| with ProgressBar() as pb: | ||
| for df in pb(sqlCaller(text), total=1, label="Executing"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good idea, to add a progress bar. But I think we can do even better here - having a progress bar that actually depends on the tasks on the cluster, similar to Dask's progress bar.
Two ideas here. In both cases you need to call context.sql with return_futures=True, as we need to interact with the Dask tasks. To turn the dataframe into wait-able tasks, call .persist() (because a dataframe is a lazy action - nothing to wait on).
Either we then actually use dask's progress bar.
Or you use a combination of dask.distributed.as_competed and client.futures_of, something like (untested):
df = context.sql(text, return_futures=True)
# (include None handling here)
# Turn df into something awaitable:
df = df.persist()
# Now turn it into a list of futures
futures = client.futures_of(df)
# and finally show a progress bar:
for df in pb(as_completed(futures), total=len(futures), label="Executing"):
continue
# In the end, show the actual result (which is now calculated, so that will be fast):
df = df.compute()
print(df.to_markdown(tablefmt="fancy_grid"))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the Suggestion 👍 really great idea, copy-pasted the above solution and it worked as expected. !!
|
I wanted to mention that @jacobtomlinson recently started using rich for the cli of https://github.com/dask-contrib/dask-ctl . Rich has fancy progressbars and colors among other attributes. |
Thanks for the great suggestion @quasiben, will look into it and try to add those changes in upcoming PRs! |
Thanks, @nils-braun for the quick review and suggestions. I have added test cases but I think I kind of cheated 😅 , please review the test cases and let me know if something needs to be added or modified. (And I haven't tested the dask-sql-prompt Session as a whole but only tested |
|
rerun tests |
charlesbluca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really cool @rajagurunath 🙂 just some minor suggestions for typos/casing stuff, and a request to add the new tabulate dependency to another file:
rajagurunath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Thanks a lot, @charlesbluca for reviewing this PR, Added a fix for the suggested changes, Please let me know if this change looks good! 🙂
charlesbluca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajagurunath! This looks good to me 😄
| - prompt_toolkit >=3.0.8 | ||
| - pygments >=2.7.3 | ||
| - nest-asyncio >=1.0.0 | ||
| - tabulate >=0.8.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that we might change this pinning in a follow up once we do our next release - I want to match the recipe on conda-forge, so once this is added to the auto-generated recipe there I'll grab whatever pinning it has there (which may be the same). This should be good for now though 🙂
Improved dask-sql CLI:
improves CLI as mentioned in one of the issues here: #57