Skip to content

Conversation

@Explorer09
Copy link
Contributor

@Explorer09 Explorer09 commented Oct 31, 2025

Introduce the new Vector_sort() function and obsolete the old Vector_quickSortCustomCompare() and Vector_insertionSort() APIs.

This new sort function is a natural, in-place merge sort. I.e. it takes advantage of partially sorted data, and it's stable.

Space complexity: O(log(n)) worst case
Time complexity: O(n) best case, O(n*log(n)*log(n)) average & worst case

@BenBE , I have copied some of the changes from your d3cd557 commit but I didn't base on it. Because there are some data type naming that I didn't like, and so I name my "context" structure in my own way. Also to keep changes small I didn't upgrade the Object_Compare function type to take a new "context" parameter.

@Explorer09 Explorer09 force-pushed the vector-sort branch 2 times, most recently from 0ec958a to 74c2a2d Compare November 1, 2025 09:29
@Explorer09 Explorer09 force-pushed the vector-sort branch 2 times, most recently from bbd8ecd to aba90b7 Compare November 1, 2025 09:44
@Explorer09 Explorer09 marked this pull request as draft November 1, 2025 12:10
swap(array, i, storeIndex);
storeIndex++;
ATTR_NONNULL
static void rotate(void* buffer, size_t leftSize, size_t rightSize) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note to leave to myself:
There are two possible interfaces for this rotate function. To minimize the chances that the callers pass invalid values, I picked the sizes as the second and third parameters. When it comes to optimization, this would be more dependent on the compiler's capability in simplifying additions and subtraction than the version that passes mid and end pointers. I didn't check the code sizes produced yet, so this remains a TODO for me.

rotate_a(void* buffer, size_t leftSize, size_t rightSize);
rotate_b(void* start, void* mid, const void* end);

Introduce the new Vector_sort() function and obsolete the old
Vector_quickSortCustomCompare() and Vector_insertionSort() APIs.

This new sort function is a natural, in-place merge sort. I.e. it takes
advantage of partially sorted data, and it's stable.

Space complexity: O(log(n)) worst case (on stack, no malloc())
Time complexity: O(n) best case, O(n*(log(n))^2) average & worst case

Signed-off-by: Kang-Che Sung <[email protected]>
Update the prototypes of all Object_Compare functions and Vector_sort()
function to accept the third argument.

The third argument allows passing in extra information or states that
could be useful in sorting. The definition of Object_Compare now matches
the prototype of the compare function in libc qsort_r().

(Previously many programmers would use global variables to read or store
extra information needed in a compare operation. This made the sort
function not reentrant. It was the primary motivation for the qsort_r()
function.)

Currently all Object_Compare functions in htop would simply the third
argument. No functions are using it yet, but this may be changed in the
future.

The callers of Vector_sort() will now by default pass in the reference
of the container object (usually named "this") as the third argument to
the sort compare functions.

Signed-off-by: Kang-Che Sung <[email protected]>
@Explorer09 Explorer09 marked this pull request as ready for review November 1, 2025 18:03
rotate(p1, (size_t)(mid - p1), (size_t)(p2 - mid));

leftLen = (size_t)(p1 - (char*)array) / size;
rightLen = (size_t)(p2 - mid) / size;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another TODO for myself (micro-optimization opportunity):
Test if the compiled assembly code can be smaller if I let mergeRuns accept pointers instead of lengths as arguments. There are multiplications and divisions by size when I convert from lengths to pointers and back to lengths, and the compiler surely cannot cancel * size and / size operations together due to the type being unsigned.

@natoscott
Copy link
Member

Has any performance comparison been done that is sufficiently compelling that we'd want to keep all this extra code, instead of just using qsort? Thanks.

@Explorer09
Copy link
Contributor Author

@natoscott The qsort will be out for two reasons: (1) It's not stable sort (important when the user would sort the table under multiple keys). (2) It does not take the third argument that can make the sort function reentrant. With qsort we would need global variables to pass in extra states that would affect sorting (thus unsafe in potential multithread use).

Note my current implementation is not fully async-signal-safe though. The swap operation is not atomic, and can ruin the data when two sort function calls contest on the same data. It's expected that the sort function, when called in asynchronous contexts, operates on different data or the caller must guard the data with a mutex.

@natoscott
Copy link
Member

Sort stability is about what happens when two items compare equally (which can happen with/without multiple sort keys, so I don't follow your argument there). In htop, we always have the PID we can use as a tie-breaker to guarantee stability. Can you explain how this is insufficient?

Regarding threads and the extra parameter, htop is single threaded and is very likely to remain so. So again this is not a valid technical argument against the use of qsort here.

qsort is highly optimised and widely reviewed. Any custom home-grown sorting code we have in htop simply isn't, to anywhere near the same degree, and appears to be unnecessary code bloat. So unless there is a valid technical reason otherwise (and none have been put forward so far, AFAICT), I think we should switch to qsort and delete all this code.

@Explorer09
Copy link
Contributor Author

@natoscott I assume you have read #1784 and #1785 for the discussion I had with @BenBE before.

Basically I would vote for using qsort() (#1784) if there isn't a stability requirement. I suggested that the old insertionSort API is there because there would be some data that need to be stable sorted. It's also for future usefulness (or utility?) for the API.

And no, I don't think tiebreaking by PID is enough. htop 's process table can be sorted with multiple keys, and there will be a need to sort the table multiple times by the user without explicitly specifying the second sort key. There can be also tables where there isn't an ID field for tiebreaking.

And there's a technical thing: The Vector object in htop stores an array of pointers rather than an array of object data themselves. This mean we cannot pass the Object_compare function pointers directly to qsort(), but need to make a wrapper for compare function. That's why the third argument of the compare function is needed (and qsort() will not be the best API for the task). There's qsort_r(), but checking the availability of it in configure is also a pain.

Therefore the most future-proof way is to implement a stable sorting function by ourselves. Note that I've developed this algorithm for about half a month, and it would be sad for me if it's ditched finally.

@natoscott
Copy link
Member

I have read those, yes - there's little/no justification there for custom sort implementations. It seemed like there was consensus that qsort would be fine based on Hishams reasons for the original implementation, but then it somehow became an exercise in writing new code with insufficient justification IMO.

| There can be also tables where there isn't an ID field for tiebreaking.

What tables are you referring to here? I don't see them in the code today (something else I've missed?) - if they exist today, do these things require sort stability? Thanks.

The main (only?) data we need to sort is the process section (lower part) of the htop display, of course, where we always have a unique identifier (PID) for every entry. The PID provides stability with any number of secondary sort keys, its not clear what problem you are talking about there. And the user never needs to be involved or know it is being used, again that's a tangent that just doesn't make any sense to me. Just do all the current sort comparisons, including with multiple keys, and only use the PID as a final, stable, unique differentiator when everything else compares as equal. Sure, some code refactoring may be required to ensure the PID is available within comparison routines, but its a solvable problem.

I understand the limitations of the qsort API (these are solvable for htop though). I also understand the problems of creating custom sort implementations. Its relatively straight-forward to sort things, its more difficult to perform well in general and particularly with very large process counts.

| Therefore the most future-proof way [...]

It sounds like you are designing for problems htop does not currently have. Can you explain specifically what future features you foresee that cannot use qsort with a stable identifier tie-breaker? Thanks.

Either way, it will be interesting to see performance numbers comparing performance at both very large and small numbers of processes. If a different algorithm significantly outperforms qsort for sorting processes in htop, we should use it.

@Explorer09
Copy link
Contributor Author

The main (only?) data we need to sort is the process section (lower part) of the htop display, of course, where we always have a unique identifier (PID) for every entry. The PID provides stability with any number of secondary sort keys, its not clear what problem you are talking about there. And the user never needs to be involved or know it is being used, again that's a tangent that just doesn't make any sense to me. Just do all the current sort comparisons, including with multiple keys, and only use the PID as a final, stable, unique differentiator when everything else compares as equal. Sure, some code refactoring may be required to ensure the PID is available within comparison routines, but its a solvable problem.

The current process compare functions always sort PID in the ascending order for tiebreaking. No way to make it into descending order. Sorry I just don't like this. (If we were sorting a table in a SQL database, a fallback key for tiebreaking is reasonable. But htop is not a DB.)

I understand the limitations of the qsort API (these are solvable for htop though). I also understand the problems of creating custom sort implementations. Its relatively straight-forward to sort things, its more difficult to perform well in general and particularly with very large process counts.

Global variables, I know. When I proposed that @BenBE doesn't seem to like it. So it's not my decision on this part.

qsort() uses introsort in glibc, and smoothsort in musl.
I'm sure introsort requires O(log(n)) space in the worst case. I didn't understand smoothsort yet and so I can't judge it. For me, the memory space requirement means even the qsort() algorithm can crash htop if the system is very low on memory and there's a very large process list to sort. The space complexity is not better than the "natural, in place merge sort" that I propose here.

So, on the question of how the algorithm performs well, I think we're judging quite subjectively. And it would be better if we have some benchmarks.

(By the way, the old PR is there if you want to test the use of qsort() in htop.)

@Explorer09
Copy link
Contributor Author

Explorer09 commented Nov 6, 2025

If a different algorithm significantly outperforms qsort for sorting processes in htop, we should use it.

@natoscott @BenBE

Just a minor note. Sorting algorithms' performance can be compared with multiple kinds of data (use cases), and so it won't be a single metric that could decide which algorithm would win.

What I am saying is there can be uniformly random data as well as partially sorted data that need to be sorted. And for an algorithm like heapsort (as an example) it would work best with random data but relatively poor with data that's half sorted. Quicksort with a naïve pivot selection can easily make it the worst case (O(n²)) for data that's already sorted or reverse-sorted, you know. I'm not criticising either algorithm or qsort() API, just a note that the qsort() default algorithm is not adaptive in most libc implementations, thus not as "universally useful" as you think.

And there is no "perfect" sorting algorithm. Each algorithm comes with a trade-off. Even my algorithm does. If you are looking for block sort family that looks "perfect" enough in the stats, I did realise there is a tradeoff (adaptive sort i.e. O(n) best case time complexity or O(1) space complexity, pick one). So really no algorithm is perfect.

@natoscott
Copy link
Member

@Explorer09 I did a little more digging and have found all supported htop platforms have a variant of qsort_r - IIRC you mentioned earlier that its missing somewhere? (even modern Solaris has it, and Windows has qsort_s). Which platforms were you thinking do not have it?

There are differences in the calling conventions between platforms, but they are trivially resolved via wrapper.

| And there is no "perfect" sorting algorithm.

From a htop POV, when we can delete substantial code it's a good win - so qsort[_r] would be perfect in that sense IMO. I honestly cannot fathom why anyone would be implementing custom sorting code for a system performance tool, that just blows my mind.

If we did need qsort_r for some unusual / new platform, there are plenty of existing, tested, robust implementations to pick from that are license-compatible with htop - choose one, put it below generic/ and add some configure magic. Done.

@Explorer09
Copy link
Contributor Author

There are differences in the calling conventions between platforms, but they are trivially resolved via wrapper.

The catch is how to detect the calling conventions.

// Older BSD
void qsort_r(void*, size_t, size_t, void*, int(*)(void*, const void*, const void*));
// POSIX, GNU
void qsort_r(void*, size_t, size_t, int(*)(const void*, const void*, void*), void*);

Arguments 4 and 5 are both pointers, so only the const qualifier can tell which argument does what. At this point I am no longer interested in writing a configure code for this.

@natoscott
Copy link
Member

| The catch is how to detect the calling conventions.

Each platform will know the conventions of its own qsort_r, and a xxx/Platform.h wrapper can map that to whichever convention is chosen for use throughout htop. The only requirement for configure is to detect whether there is a qsort_r at all ... which AIUI you said some platforms are missing? If none are missing it, then no configure changes are needed. If it is missing somewhere, configure would need to enable a generic/qsort.c implementation (with contents from somewhere like https://github.com/freebsd/freebsd-src/blob/main/lib/libc/stdlib/qsort.c for example).

@Explorer09
Copy link
Contributor Author

If we did need qsort_r for some unusual / new platform, there are plenty of existing, tested, robust implementations to pick from that are license-compatible with htop - choose one, put it below generic/ and add some configure magic. Done.

I just heard on DistroWatch that htop is being ported to Redox OS! And no, I don't know them. It's just something I heard unintentionally.

And FYI, Redox OS's libc didn't have qsort_r at the moment, although I believe it's trivial for them to implement one.

@Explorer09
Copy link
Contributor Author

Explorer09 commented Nov 7, 2025

The catch is how to detect the calling conventions.

Each platform will know the conventions of its own qsort_r, and a xxx/Platform.h wrapper can map that to whichever convention is chosen for use throughout htop. The only requirement for configure is to detect whether there is a qsort_r at all ...

FreeBSD provides both versions of the qsort_r API, with a feature macro (it's in the code you cited). And to test the availability of qsort_r, AC_SEARCH_LIBS won't work (the macro is limited to checking unmangled symbol only). A custom test is needed in configure, really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants