Skip to content

Conversation

@rwpenney
Copy link
Contributor

@rwpenney rwpenney commented Oct 3, 2020

This patch is a small extension to change-request SPARK-28133, which added inverse hyperbolic functions to the SQL interpreter, but did not include those methods within the Scala sql.functions._ API. This patch makes acosh, asinh and atanh functions available through the Scala API.

Unit-tests have been added to sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala. Manual testing has been done via spark-shell, using the following recipe:

val df = spark.range(0, 11)
              .toDF("x")
              .withColumn("x", ($"x" - 5) / 2.0)
val hyps = df.withColumn("tanh", tanh($"x"))
             .withColumn("sinh", sinh($"x"))
             .withColumn("cosh", cosh($"x"))
val invhyps = hyps.withColumn("atanh", atanh($"tanh"))
                  .withColumn("asinh", asinh($"sinh"))
                  .withColumn("acosh", acosh($"cosh"))
invhyps.show

which produces the following output:

+----+--------------------+-------------------+------------------+-------------------+-------------------+------------------+
|   x|                tanh|               sinh|              cosh|              atanh|              asinh|             acosh|
+----+--------------------+-------------------+------------------+-------------------+-------------------+------------------+
|-2.5| -0.9866142981514303|-6.0502044810397875| 6.132289479663686| -2.500000000000001|-2.4999999999999956|               2.5|
|-2.0| -0.9640275800758169| -3.626860407847019|3.7621956910836314|-2.0000000000000004|-1.9999999999999991|               2.0|
|-1.5| -0.9051482536448664|-2.1292794550948173| 2.352409615243247|-1.4999999999999998|-1.4999999999999998|               1.5|
|-1.0| -0.7615941559557649|-1.1752011936438014| 1.543080634815244|               -1.0|               -1.0|               1.0|
|-0.5|-0.46211715726000974|-0.5210953054937474|1.1276259652063807|               -0.5|-0.5000000000000002|0.4999999999999998|
| 0.0|                 0.0|                0.0|               1.0|                0.0|                0.0|               0.0|
| 0.5| 0.46211715726000974| 0.5210953054937474|1.1276259652063807|                0.5|                0.5|0.4999999999999998|
| 1.0|  0.7615941559557649| 1.1752011936438014| 1.543080634815244|                1.0|                1.0|               1.0|
| 1.5|  0.9051482536448664| 2.1292794550948173| 2.352409615243247| 1.4999999999999998|                1.5|               1.5|
| 2.0|  0.9640275800758169|  3.626860407847019|3.7621956910836314| 2.0000000000000004|                2.0|               2.0|
| 2.5|  0.9866142981514303| 6.0502044810397875| 6.132289479663686|  2.500000000000001|                2.5|               2.5|
+----+--------------------+-------------------+------------------+-------------------+-------------------+------------------+

@zero323
Copy link
Member

zero323 commented Oct 3, 2020

This might deserve a separate JIRA ticket. Also could you might want to add a ticket id and [SQL] tag to the title i.e. [SPARK-xxxx][SQL].

I also suspect that omitting this in language API might be intentional, as pointed out here.

@rwpenney rwpenney changed the title Expose inverse hyperbolic trig functions through sql.functions API [SQL] Expose inverse hyperbolic trig functions through sql.functions API Oct 3, 2020
@rwpenney rwpenney changed the title [SQL] Expose inverse hyperbolic trig functions through sql.functions API {SPARK-33061][SQL] Expose inverse hyperbolic trig functions through sql.functions API Oct 3, 2020
@rwpenney
Copy link
Contributor Author

rwpenney commented Oct 3, 2020

Thanks @zero323 - I've added tags to the title, as suggested.

I'm sorry, I'm not sure how your link to #28593 (which seems to be about timestamp processing) might explain why inverse hyperbolics weren't originally uncluded in sql.functions._. Could you clarify, thanks?

@zero323
Copy link
Member

zero323 commented Oct 3, 2020

Sure. I asked there why we provide wrappers around certain functions and not other and @HyukjinKwon pointed out this:

* Spark also includes more built-in functions that are less common and are not defined here.
* You can still access them (and all the functions defined here) using the `functions.expr()` API
* and calling them through a SQL expression string. You can find the entire list of functions
* at SQL API documentation.
*
* As an example, `isnan` is a function that is defined here. You can use `isnan(col("myCol"))`
* to invoke the `isnan` function. This way the programming language's compiler ensures `isnan`
* exists and is of the proper form. You can also use `expr("isnan(myCol)")` function to invoke the
* same function. In this case, Spark itself will ensure `isnan` exists when it analyzes the query.
*
* `regr_count` is an example of a function that is built-in but not defined here, because it is
* less commonly used. To invoke it, use `expr("regr_count(yCol, xCol)")`.

I don't insist that it necessarily applies here (though arguably, average user doesn't need hyperbolic trig functions on daily basis), just giving some context :)

@rwpenney
Copy link
Contributor Author

rwpenney commented Oct 3, 2020

Thanks for the context.

I agree that hyperbolic trig functions aren't everyone's cup of tea, but I think it would be better to have symmetry: cos & acos, sin & asin, tan & atan are available but we currently only have cosh, sinh & tanh without their corresponding inverse functions.

* @return inverse hyperbolic cosine of `e`
*
* @group math_funcs
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have @since annotation here and for the remaining ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure whether this is something I should put in during the pull-request, or whether it gets added at a later stage.

Shall I presume that this is destined for Spark-3.1, or maybe 3.0.2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be 3.1, as new functions are not added in maintenance releases.

@HyukjinKwon HyukjinKwon changed the title {SPARK-33061][SQL] Expose inverse hyperbolic trig functions through sql.functions API [SPARK-33061][SQL] Expose inverse hyperbolic trig functions through sql.functions API Oct 5, 2020
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Oct 5, 2020

There have been some discussions about which functions to add. Basically some expressions exist in SparkSQL core just for the sake of other DBMS compatibility. In other language APIs, some expressions make less sense.

It's better to make simplify the call and when to add so it ended up with writing as so at #29938 (comment). Let's avoid adding functions there just for the sake of matching.

How often are they used? I don't know enough about that. cc @WeichenXu123 or @srowen. Are they used commonly in data science?

@srowen
Copy link
Member

srowen commented Oct 5, 2020

I imagine these are quite rarely used - more in engineering than anything I can think of in data science. Still for consistency I would not mind adding them, to make languages consistent.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34021/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34021/

@rwpenney
Copy link
Contributor Author

rwpenney commented Oct 5, 2020

As per my previous comments - I'm not suggesting that these inverse-hypoerboic functions will be widely used, but if sql.functions makes ordinary trig-functions available with their inverses, it ought to do the same for hyperbolic trig-functions. If there are concerns about sql.functions becoming rather bloated, perhaps at some point we may need a sql.math.functions or sql.functions.math?

I have, personally, found use-cases where an inverse-sinh is quite handy in data-science applications, for compressing the dynamic range of signed quantities, in the same way log is often used for unsigned quantities. Hyperbolic-tangents appear sufficiently frequently in logistic regression models that having an inverse-tanh is also likely to be helpful.

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Test build #129414 has finished for PR 29938 at commit 7aed453.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member

zero323 commented Oct 5, 2020

Still for consistency I would not mind adding them, to make languages consistent.

I agree with that, but I wonder if consistency requires (String) => ... variants.

@HyukjinKwon
Copy link
Member

Okay, I am fine with this.

@srowen srowen closed this in d8c4a47 Oct 14, 2020
@srowen
Copy link
Member

srowen commented Oct 14, 2020

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants