Documentation and tests for the `first` and `firstOrNull` functions #1547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Allex-Nik wants to merge 2 commits into master from first-firstOrNull-documentation-tests

+519 −0

Collaborator

Allex-Nik commented Nov 6, 2025 •

edited

Loading

Fixes #1279

Allex-Nik added 2 commits

November 6, 2025 14:21


          Add documentation for the first and firstOrNull functions

87ca3b7


          Add tests for the first and firstOrNull functions

f58f12c

Allex-Nik requested review from AndreiKingsley, Jolanrensen and zaleslaw

November 6, 2025 13:38

Allex-Nik commented

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
                      df.drop(df.nrow).firstOrNull { isHappy } shouldBe null

                  }

                  @Test

Collaborator Author

Allex-Nik Nov 6, 2025

For now I haven't added a test on an empty dataframe since if we do emptyDf.groupBy { age }.first(), we do not get the group column. This is a known issue, Jolan fixed it, but it is not merged yet.

Allex-Nik commented

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
                      val firstHappy = reducedGrouped.values()[0]

                      val firstUnhappy = reducedGrouped.values()[1]

                      firstHappy shouldBe dataFrameOf(

Collaborator Author

Allex-Nik Nov 6, 2025 •

edited

Loading

grouped.first().values()[0].name.lastName shouldBe "Cooper" also worked (and similar approach in some further tests), but I decided to refrain from using generated properties, as they were only generated for df without transformations, and attempts to access these properties in some further transformations cause issues. I think not using the compiler plugin further in the file is more correct.

Allex-Nik commented

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
                  }

                  @Test

                  fun `first on GroupBy with predicate`() {

Collaborator Author

Allex-Nik Nov 6, 2025 •

edited

Loading

Might be an issue here:

df.groupBy { isHappy }.first{ age > 10 }

works fine: ReducedGroupBy contains columns isHappy, name (firstName, lastName), age, city, weight, but

df.groupBy { isHappy }.first{ age > 100 }

returns ReducedGroupBy without the name column.

For example, this test passes, but it seems to me that it should not:

grouped.first { age > 100 }.values()[0] shouldBe dataFrameOf("isHappy", "age", "city", "weight")(true, null, null, null)[0].

That's why for now I haven't added a test for a predicate that doesn't match any row.

Allex-Nik commented

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

Collaborator Author

Allex-Nik Nov 6, 2025

I faced an issue here:

ReducedPivot (i.e. pivot.first()) and ReducedPivotGroupBy are not rendered correctly in notebooks when there is null in any row resulting from first().

If I replace null in such a row with some value, it is displayed correctly.

Allex-Nik commented

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

Collaborator Author

Allex-Nik Nov 6, 2025

I took a simpler dataframe here for readability. Otherwise it was more laborious for a reader to validate the result.

Allex-Nik commented

View reviewed changes

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
                      ).shouldAllBeEqual()

                  }

                  @Test

Collaborator Author

Allex-Nik Nov 6, 2025

Is it fine to put these tests in the same class with FirstColumnsSelectionDsl? Or is it better to put them in a different class?
If it is better to put them in a different class, is it worth inheriting from TestBase to reuse df?

Collaborator

koperagen Nov 6, 2025

It's ok to put it here. It's common to have functions with same name in the same file, even if it's CS DSL and DF API

koperagen reviewed

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
              /**

               * Returns the first value in this [DataColumn].

               *

               * @param T The type of the values in the [DataColumn].

Collaborator

koperagen Nov 6, 2025 •

edited

Loading

Line seems redundant to me because param T is always inferred from DataColumn

Collaborator

AndreiKingsley Nov 7, 2025

Yes, I'd omit (everywhere) a @param with type parameter, and add @return!

koperagen reviewed

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
              /**

               * Selects the first row from each group of the given [GroupBy]

               * and returns a [ReducedGroupBy] containing these rows

               * (one row per group, each row is the first row in its group).

Collaborator

koperagen Nov 6, 2025

or null if group is empty

Collaborator Author

Allex-Nik Nov 7, 2025

I have faced an issue in this case that might be unexpected behavior.

I started with df.take(0).groupBy { age }.first() - it doesn't return null (returns ReducedGroupBy), which is probably fine because there are no groups at all.

But now I have tried
val grouped = df.groupBy { age }
grouped.updateGroups { if (it == grouped.groups[0]) { it.take(0) } else it }.first()
to make the first group empty. And applying first in this case causes an exception:
The problem is found in one of the loaded libraries: check library renderers java.lang.IllegalStateException: Can not insert column age because column with this path already exists in DataFrame.

This problem does not occur if every group has at least one row, or if I remove the column age from every group. For example, this works:
grouped.updateGroups {
val new = it.remove { age }
if (it == grouped.groups[0]) { new.take(0) } else new
}.first()
We get null for an empty group and the first row for others.

But is it expected behavior that we get such an error about conflicting columns? Or maybe I am just obtaining an empty group incorrectly.

The df in this case is:
val df = dataFrameOf( "name" to columnOf("Alice", "Bob", "Charlie"), "age" to columnOf(15, 20, 25) )

Or, to use a bit more natural example, we can make a fullJoin of df with val ages = dataFrameOf("age" to columnOf(30)), then group by age, and filter out the row with null value in the group for the key 30. And applying first in this case causes the same exception.

I am reporting this just in case it is a not known issue :)

koperagen reviewed

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
               * employees.groupBy { jobTitle }.first()

               * ```

               *

               * @param T The type of the values in the [GroupBy].

Collaborator

koperagen Nov 6, 2025

I think these lines are redundant because they are always infered

koperagen reviewed

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
               * ```kotlin

               * // Select the first row for each city.

               * // Returns a ReducedPivot with one column per city and the first row from the group in each column.

               * df.pivot { city }.first()

Collaborator

koperagen Nov 6, 2025

Please see if you can come up with representative example. Like, in what situation you'd use this function? What df typically it will be and what ideas one can draw from the result? Will be good if example can convey this

koperagen reviewed

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
               * the structure remains unchanged — only the contents of each group

               * are replaced with the first row from that group.

               *

               * Equivalent to `reduce { firstOrNull() }`.

Collaborator

koperagen Nov 6, 2025 •

edited

Loading

reduce is internal function, people won't be able to use it like this

koperagen reviewed

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
               * Reduces this [Pivot] by selecting the first row from each group.

               *

               * Returns a [ReducedPivot] where:

               * - each column corresponds to a [pivot] group — if multiple pivot keys were used,

Collaborator

koperagen Nov 6, 2025

I think text explanations of pivot make it more scary than it is. For first i suggest to not include common pivot logic and refer to pivot kdoc instead
Reference to website with HTML tables or ascii tables might do a better job conveying what's going on

AndreiKingsley approved these changes

View reviewed changes

Collaborator

AndreiKingsley left a comment

Great job!
Regarding overloads for GroupBy and Pivot - I'm working on a general KDoc system for these operations, so I'll be reworking them anyway in the future, so you can leave them as they are.

AndreiKingsley self-requested a review

November 7, 2025 14:19

AndreiKingsley requested changes

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
              /**

               * Returns the first value in this [DataColumn].

               *

               * @param T The type of the values in the [DataColumn].

Collaborator

AndreiKingsley Nov 7, 2025

Yes, I'd omit (everywhere) a @param with type parameter, and add @return!

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/first.kt

    
               * Returns the first value in this [DataColumn].

               *

               * @param T The type of the values in the [DataColumn].

               *

Collaborator

AndreiKingsley Nov 7, 2025

Please, add here and in all other places "See also" section with related operations. For example

See also [firstOrNull], [last], [take].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet