Skip to content

Conversation

@rajagurunath
Copy link
Collaborator

Improved dask-sql CLI:

  1. Autocompletion (for metacommands)
  2. AutoSuggestion (from the previous history)
  3. Added progress bar (Not sure if this is really needed)
  4. Added metacommands (somewhat similar to pgcli)

improves CLI as mentioned in one of the issues here: #57

rajagurunath and others added 30 commits September 22, 2020 22:43
Author:    rajagurunath <[email protected]>
Date:      Mon May 24 02:37:40 2021 +0530
1. SHOW MODELS
2. DESCRIBE MODEL
3. EXPORT MODEL
Fetch upstream changes to resolve conflicts
Fix a failing build, as ciso8601 is currently not pip-installable (dask-contrib#192)
@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2021

Codecov Report

Merging #231 (a73eb8c) into main (5d8bb74) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #231      +/-   ##
==========================================
+ Coverage   95.74%   95.81%   +0.07%     
==========================================
  Files          64       64              
  Lines        2750     2799      +49     
  Branches      412      424      +12     
==========================================
+ Hits         2633     2682      +49     
  Misses         75       75              
  Partials       42       42              
Impacted Files Coverage Δ
dask_sql/cmd.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d8bb74...a73eb8c. Read the comment docs.

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! That is very nice, I like it. That will make the CLI very useful.
As we (or: you) are adding more and more functionality, I think we should also add some tests now. Do you have experience with testing CLI applications? I would probably use something like this. If you have never done this before: I am happy to add the first few tests to this PR (and you can take it from there if you want)

dask_sql/cmd.py Outdated
from dask_sql.context import Context

meta_command_completer = WordCompleter(
["\l", "\d?", "\dt", "\df", "\de", "\dm", "\conninfo", "quit"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the \d? command, but how about a \help¸ or ?` command also (with the same semantics)?

Later (not now), we can also add comments for connecting to another dask cluster or changing the schema.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added two metacommands

  1. \dss Switch schema
  2. \dsc Switch Dask cluster

let me know if this metacommands- name, and function make sense!

dask_sql/cmd.py Outdated
self.prompt = partial(prompt, **kwargs)


def display_markdown(content, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def display_markdown(content, **kwargs):
def _display_markdown(content, **kwargs):

Let's keep it "private", users should not use this function. Can you also add a very small docstring?

Same also applies to parse_meta_command and meta_commands`.


def display_markdown(content, **kwargs):
df = pd.DataFrame(content, **kwargs)
print(df.to_markdown(tablefmt="fancy_grid"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I love this - looks really good.
Unfortunately, it needs an additional package (tabulate). I see three possibilities:

  • try it, and on failure do a normal print
  • try it, and on failure do a normal print and print a warning
  • add the package to setup.py and make it a dependency.

I would favor 2 or 3 (with a small tendency towards 3), but its your choice!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tabulate in setup.py basically picked third option

dask_sql/cmd.py Outdated
def display_markdown(content, **kwargs):
df = pd.DataFrame(content, **kwargs)
print(df.to_markdown(tablefmt="fancy_grid"))
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you remove True (because you need it later as the result of meta_commands), but in principle this has nothing to do with display_markdown. So I would separate the concerns, do not have any return value here and return true in meta_commands.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes perfect sense, !!!

dask_sql/cmd.py Outdated

def parse_meta_command(sql):
command, _, arg = sql.partition(" ")
command = command.strip().replace("+", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replacing "+"? Did I miss something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using pgcli to make the meta-commands, pgcli has support for specifying +, (if specified it will show some extra information like Description and size, etc ) so thought some people use the same notion as \dt+ etc. I have removed that now !!!

dask_sql/cmd.py Outdated
Comment on lines 95 to 104
elif cmd == "\l":
return display_markdown(context.schema.keys())
elif cmd == "\dt":
return display_markdown(context.schema[schema_name].tables.keys())
elif cmd == "\df":
return display_markdown(context.schema[schema_name].functions.keys())
elif cmd == "\de":
return display_markdown(context.schema[schema_name].experiments.keys())
elif cmd == "\dm":
return display_markdown(context.schema[schema_name].models.keys())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woa! That is really nice. I like this very clean code. Well done

dask_sql/cmd.py Outdated
elif cmd == "\dm":
return display_markdown(context.schema[schema_name].models.keys())
elif cmd == "\conninfo":
print("Dask cluster info")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the only command with a "title". Do we need it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah removed it !!!

dask_sql/cmd.py Outdated
text = session.prompt("(dask-sql) > ")
except KeyboardInterrupt:
continue
sys.exit()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like that Ctrl+C does not kill the program, but just gives me a new line. Is there a specific reason to change this? If people want to quit, they can use your new quit or just Ctrl+D.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I don't know the command ctrl+D closes the dask-sql-cli, so I thought of adding sys.exit at KeyboardInterrupt but now I removed that !!

dask_sql/cmd.py Outdated
print(df)
except Exception:
traceback.print_exc()
meta_command_detected = meta_commands(text, context=context, client=client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

One thing that I tried, was to enter an invalid meta-command (like \x). The tools will try to understand it as SQL command (which will give a not-so-useful error message). I think it would be nice to classify all lines starting with "" as meta-commands - and if they are now found show a useful error message and the help again. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I haven't thought about those inputs, now improved the error message!

dask_sql/cmd.py Outdated
if not meta_command_detected:
try:
with ProgressBar() as pb:
for df in pb(sqlCaller(text), total=1, label="Executing"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good idea, to add a progress bar. But I think we can do even better here - having a progress bar that actually depends on the tasks on the cluster, similar to Dask's progress bar.
Two ideas here. In both cases you need to call context.sql with return_futures=True, as we need to interact with the Dask tasks. To turn the dataframe into wait-able tasks, call .persist() (because a dataframe is a lazy action - nothing to wait on).
Either we then actually use dask's progress bar.
Or you use a combination of dask.distributed.as_competed and client.futures_of, something like (untested):

df = context.sql(text, return_futures=True)
# (include None handling here)
# Turn df into something awaitable:
df = df.persist()
# Now turn it into a list of futures
futures = client.futures_of(df)
# and finally show a progress bar:
for df in pb(as_completed(futures), total=len(futures), label="Executing"):
    continue
# In the end, show the actual result (which is now calculated, so that will be fast):
df = df.compute()
print(df.to_markdown(tablefmt="fancy_grid"))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the Suggestion 👍 really great idea, copy-pasted the above solution and it worked as expected. !!

@quasiben
Copy link
Contributor

I wanted to mention that @jacobtomlinson recently started using rich for the cli of https://github.com/dask-contrib/dask-ctl . Rich has fancy progressbars and colors among other attributes.

@rajagurunath
Copy link
Collaborator Author

I wanted to mention that @jacobtomlinson recently started using rich for the cli of https://github.com/dask-contrib/dask-ctl . Rich has fancy progressbars and colors among other attributes.

Thanks for the great suggestion @quasiben, will look into it and try to add those changes in upcoming PRs!

1.test cases for dask-sql-cli
2. Switch schema
3. Switch Dask cluster
@rajagurunath
Copy link
Collaborator Author

As we (or: you) are adding more and more functionality, I think we should also add some tests now. Do you have experience with testing CLI applications? I would probably use something like this. If you have never done this before: I am happy to add the first few tests to this PR (and you can take it from there if you want)

Thanks, @nils-braun for the quick review and suggestions.

I have added test cases but I think I kind of cheated 😅 , please review the test cases and let me know if something needs to be added or modified. (And I haven't tested the dask-sql-prompt Session as a whole but only tested _meta_commands function as of now, may need your help to test the prompt_toolkit session )

@rajagurunath
Copy link
Collaborator Author

rerun tests

@quasiben quasiben mentioned this pull request Oct 13, 2021
Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really cool @rajagurunath 🙂 just some minor suggestions for typos/casing stuff, and a request to add the new tabulate dependency to another file:

Copy link
Collaborator Author

@rajagurunath rajagurunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Thanks a lot, @charlesbluca for reviewing this PR, Added a fix for the suggested changes, Please let me know if this change looks good! 🙂

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajagurunath! This looks good to me 😄

- prompt_toolkit >=3.0.8
- pygments >=2.7.3
- nest-asyncio >=1.0.0
- tabulate >=0.8.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that we might change this pinning in a follow up once we do our next release - I want to match the recipe on conda-forge, so once this is added to the auto-generated recipe there I'll grab whatever pinning it has there (which may be the same). This should be good for now though 🙂

@charlesbluca charlesbluca merged commit b42db23 into dask-contrib:main Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants