Skip to content

Commit e028fd3

Browse files
cloud-fanHyukjinKwon
authored andcommitted
[SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count
## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell. Since Spark supports it, we should clearly document the current behavior and add tests to verify it. ## How was this patch tested? N/A Closes #22728 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
1 parent 5c7f6b6 commit e028fd3

3 files changed

Lines changed: 83 additions & 1 deletion

File tree

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ abstract class CountLike extends DeclarativeAggregate {
5252
usage = """
5353
_FUNC_(*) - Returns the total number of retrieved rows, including rows containing null.
5454
55-
_FUNC_(expr) - Returns the number of rows for which the supplied expression is non-null.
55+
_FUNC_(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null.
5656
5757
_FUNC_(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.
5858
""")
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
-- Test data.
2+
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
3+
(1, 1), (1, 2), (2, 1), (1, 1), (null, 2), (1, null), (null, null)
4+
AS testData(a, b);
5+
6+
-- count with single expression
7+
SELECT
8+
count(*), count(1), count(null), count(a), count(b), count(a + b), count((a, b))
9+
FROM testData;
10+
11+
-- distinct count with single expression
12+
SELECT
13+
count(DISTINCT 1),
14+
count(DISTINCT null),
15+
count(DISTINCT a),
16+
count(DISTINCT b),
17+
count(DISTINCT (a + b)),
18+
count(DISTINCT (a, b))
19+
FROM testData;
20+
21+
-- count with multiple expressions
22+
SELECT count(a, b), count(b, a), count(testData.*) FROM testData;
23+
24+
-- distinct count with multiple expressions
25+
SELECT
26+
count(DISTINCT a, b), count(DISTINCT b, a), count(DISTINCT *), count(DISTINCT testData.*)
27+
FROM testData;
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
-- Automatically generated by SQLQueryTestSuite
2+
-- Number of queries: 5
3+
4+
5+
-- !query 0
6+
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
7+
(1, 1), (1, 2), (2, 1), (1, 1), (null, 2), (1, null), (null, null)
8+
AS testData(a, b)
9+
-- !query 0 schema
10+
struct<>
11+
-- !query 0 output
12+
13+
14+
15+
-- !query 1
16+
SELECT
17+
count(*), count(1), count(null), count(a), count(b), count(a + b), count((a, b))
18+
FROM testData
19+
-- !query 1 schema
20+
struct<count(1):bigint,count(1):bigint,count(NULL):bigint,count(a):bigint,count(b):bigint,count((a + b)):bigint,count(named_struct(a, a, b, b)):bigint>
21+
-- !query 1 output
22+
7 7 0 5 5 4 7
23+
24+
25+
-- !query 2
26+
SELECT
27+
count(DISTINCT 1),
28+
count(DISTINCT null),
29+
count(DISTINCT a),
30+
count(DISTINCT b),
31+
count(DISTINCT (a + b)),
32+
count(DISTINCT (a, b))
33+
FROM testData
34+
-- !query 2 schema
35+
struct<count(DISTINCT 1):bigint,count(DISTINCT NULL):bigint,count(DISTINCT a):bigint,count(DISTINCT b):bigint,count(DISTINCT (a + b)):bigint,count(DISTINCT named_struct(a, a, b, b)):bigint>
36+
-- !query 2 output
37+
1 0 2 2 2 6
38+
39+
40+
-- !query 3
41+
SELECT count(a, b), count(b, a), count(testData.*) FROM testData
42+
-- !query 3 schema
43+
struct<count(a, b):bigint,count(b, a):bigint,count(a, b):bigint>
44+
-- !query 3 output
45+
4 4 4
46+
47+
48+
-- !query 4
49+
SELECT
50+
count(DISTINCT a, b), count(DISTINCT b, a), count(DISTINCT *), count(DISTINCT testData.*)
51+
FROM testData
52+
-- !query 4 schema
53+
struct<count(DISTINCT a, b):bigint,count(DISTINCT b, a):bigint,count(DISTINCT a, b):bigint,count(DISTINCT a, b):bigint>
54+
-- !query 4 output
55+
3 3 3 3

0 commit comments

Comments
 (0)