-
-
Notifications
You must be signed in to change notification settings - Fork 531
New Vector_sort function, replacing insertion and quick sort #1798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0ec958a to
74c2a2d
Compare
bbd8ecd to
aba90b7
Compare
| swap(array, i, storeIndex); | ||
| storeIndex++; | ||
| ATTR_NONNULL | ||
| static void rotate(void* buffer, size_t leftSize, size_t rightSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note to leave to myself:
There are two possible interfaces for this rotate function. To minimize the chances that the callers pass invalid values, I picked the sizes as the second and third parameters. When it comes to optimization, this would be more dependent on the compiler's capability in simplifying additions and subtraction than the version that passes mid and end pointers. I didn't check the code sizes produced yet, so this remains a TODO for me.
rotate_a(void* buffer, size_t leftSize, size_t rightSize);
rotate_b(void* start, void* mid, const void* end);Introduce the new Vector_sort() function and obsolete the old Vector_quickSortCustomCompare() and Vector_insertionSort() APIs. This new sort function is a natural, in-place merge sort. I.e. it takes advantage of partially sorted data, and it's stable. Space complexity: O(log(n)) worst case (on stack, no malloc()) Time complexity: O(n) best case, O(n*(log(n))^2) average & worst case Signed-off-by: Kang-Che Sung <[email protected]>
aba90b7 to
9b974af
Compare
Update the prototypes of all Object_Compare functions and Vector_sort() function to accept the third argument. The third argument allows passing in extra information or states that could be useful in sorting. The definition of Object_Compare now matches the prototype of the compare function in libc qsort_r(). (Previously many programmers would use global variables to read or store extra information needed in a compare operation. This made the sort function not reentrant. It was the primary motivation for the qsort_r() function.) Currently all Object_Compare functions in htop would simply the third argument. No functions are using it yet, but this may be changed in the future. The callers of Vector_sort() will now by default pass in the reference of the container object (usually named "this") as the third argument to the sort compare functions. Signed-off-by: Kang-Che Sung <[email protected]>
9b974af to
f29067d
Compare
| rotate(p1, (size_t)(mid - p1), (size_t)(p2 - mid)); | ||
|
|
||
| leftLen = (size_t)(p1 - (char*)array) / size; | ||
| rightLen = (size_t)(p2 - mid) / size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another TODO for myself (micro-optimization opportunity):
Test if the compiled assembly code can be smaller if I let mergeRuns accept pointers instead of lengths as arguments. There are multiplications and divisions by size when I convert from lengths to pointers and back to lengths, and the compiler surely cannot cancel * size and / size operations together due to the type being unsigned.
|
Has any performance comparison been done that is sufficiently compelling that we'd want to keep all this extra code, instead of just using qsort? Thanks. |
|
@natoscott The Note my current implementation is not fully async-signal-safe though. The swap operation is not atomic, and can ruin the data when two sort function calls contest on the same data. It's expected that the sort function, when called in asynchronous contexts, operates on different data or the caller must guard the data with a mutex. |
|
Sort stability is about what happens when two items compare equally (which can happen with/without multiple sort keys, so I don't follow your argument there). In htop, we always have the PID we can use as a tie-breaker to guarantee stability. Can you explain how this is insufficient? Regarding threads and the extra parameter, htop is single threaded and is very likely to remain so. So again this is not a valid technical argument against the use of qsort here. qsort is highly optimised and widely reviewed. Any custom home-grown sorting code we have in htop simply isn't, to anywhere near the same degree, and appears to be unnecessary code bloat. So unless there is a valid technical reason otherwise (and none have been put forward so far, AFAICT), I think we should switch to qsort and delete all this code. |
|
@natoscott I assume you have read #1784 and #1785 for the discussion I had with @BenBE before. Basically I would vote for using qsort() (#1784) if there isn't a stability requirement. I suggested that the old And no, I don't think tiebreaking by PID is enough. htop 's process table can be sorted with multiple keys, and there will be a need to sort the table multiple times by the user without explicitly specifying the second sort key. There can be also tables where there isn't an ID field for tiebreaking. And there's a technical thing: The Therefore the most future-proof way is to implement a stable sorting function by ourselves. Note that I've developed this algorithm for about half a month, and it would be sad for me if it's ditched finally. |
|
I have read those, yes - there's little/no justification there for custom sort implementations. It seemed like there was consensus that qsort would be fine based on Hishams reasons for the original implementation, but then it somehow became an exercise in writing new code with insufficient justification IMO. | There can be also tables where there isn't an ID field for tiebreaking. What tables are you referring to here? I don't see them in the code today (something else I've missed?) - if they exist today, do these things require sort stability? Thanks. The main (only?) data we need to sort is the process section (lower part) of the htop display, of course, where we always have a unique identifier (PID) for every entry. The PID provides stability with any number of secondary sort keys, its not clear what problem you are talking about there. And the user never needs to be involved or know it is being used, again that's a tangent that just doesn't make any sense to me. Just do all the current sort comparisons, including with multiple keys, and only use the PID as a final, stable, unique differentiator when everything else compares as equal. Sure, some code refactoring may be required to ensure the PID is available within comparison routines, but its a solvable problem. I understand the limitations of the qsort API (these are solvable for htop though). I also understand the problems of creating custom sort implementations. Its relatively straight-forward to sort things, its more difficult to perform well in general and particularly with very large process counts. | Therefore the most future-proof way [...] It sounds like you are designing for problems htop does not currently have. Can you explain specifically what future features you foresee that cannot use qsort with a stable identifier tie-breaker? Thanks. Either way, it will be interesting to see performance numbers comparing performance at both very large and small numbers of processes. If a different algorithm significantly outperforms qsort for sorting processes in htop, we should use it. |
The current process compare functions always sort PID in the ascending order for tiebreaking. No way to make it into descending order. Sorry I just don't like this. (If we were sorting a table in a SQL database, a fallback key for tiebreaking is reasonable. But htop is not a DB.)
Global variables, I know. When I proposed that @BenBE doesn't seem to like it. So it's not my decision on this part. qsort() uses introsort in glibc, and smoothsort in musl. So, on the question of how the algorithm performs well, I think we're judging quite subjectively. And it would be better if we have some benchmarks. (By the way, the old PR is there if you want to test the use of qsort() in htop.) |
Just a minor note. Sorting algorithms' performance can be compared with multiple kinds of data (use cases), and so it won't be a single metric that could decide which algorithm would win. What I am saying is there can be uniformly random data as well as partially sorted data that need to be sorted. And for an algorithm like heapsort (as an example) it would work best with random data but relatively poor with data that's half sorted. Quicksort with a naïve pivot selection can easily make it the worst case (O(n²)) for data that's already sorted or reverse-sorted, you know. I'm not criticising either algorithm or qsort() API, just a note that the qsort() default algorithm is not adaptive in most libc implementations, thus not as "universally useful" as you think. And there is no "perfect" sorting algorithm. Each algorithm comes with a trade-off. Even my algorithm does. If you are looking for block sort family that looks "perfect" enough in the stats, I did realise there is a tradeoff (adaptive sort i.e. O(n) best case time complexity or O(1) space complexity, pick one). So really no algorithm is perfect. |
|
@Explorer09 I did a little more digging and have found all supported htop platforms have a variant of qsort_r - IIRC you mentioned earlier that its missing somewhere? (even modern Solaris has it, and Windows has qsort_s). Which platforms were you thinking do not have it? There are differences in the calling conventions between platforms, but they are trivially resolved via wrapper. | And there is no "perfect" sorting algorithm. From a htop POV, when we can delete substantial code it's a good win - so qsort[_r] would be perfect in that sense IMO. I honestly cannot fathom why anyone would be implementing custom sorting code for a system performance tool, that just blows my mind. If we did need qsort_r for some unusual / new platform, there are plenty of existing, tested, robust implementations to pick from that are license-compatible with htop - choose one, put it below generic/ and add some configure magic. Done. |
The catch is how to detect the calling conventions. // Older BSD
void qsort_r(void*, size_t, size_t, void*, int(*)(void*, const void*, const void*));
// POSIX, GNU
void qsort_r(void*, size_t, size_t, int(*)(const void*, const void*, void*), void*);Arguments 4 and 5 are both pointers, so only the const qualifier can tell which argument does what. At this point I am no longer interested in writing a |
|
| The catch is how to detect the calling conventions. Each platform will know the conventions of its own qsort_r, and a xxx/Platform.h wrapper can map that to whichever convention is chosen for use throughout htop. The only requirement for configure is to detect whether there is a qsort_r at all ... which AIUI you said some platforms are missing? If none are missing it, then no configure changes are needed. If it is missing somewhere, configure would need to enable a generic/qsort.c implementation (with contents from somewhere like https://github.com/freebsd/freebsd-src/blob/main/lib/libc/stdlib/qsort.c for example). |
I just heard on DistroWatch that htop is being ported to Redox OS! And no, I don't know them. It's just something I heard unintentionally. And FYI, Redox OS's libc didn't have qsort_r at the moment, although I believe it's trivial for them to implement one. |
FreeBSD provides both versions of the |
Introduce the new
Vector_sort()function and obsolete the oldVector_quickSortCustomCompare()andVector_insertionSort()APIs.This new sort function is a natural, in-place merge sort. I.e. it takes advantage of partially sorted data, and it's stable.
Space complexity: O(log(n)) worst case
Time complexity: O(n) best case, O(n*log(n)*log(n)) average & worst case
@BenBE , I have copied some of the changes from your d3cd557 commit but I didn't base on it. Because there are some data type naming that I didn't like, and so I name my "context" structure in my own way. Also to keep changes small I didn't upgrade the
Object_Comparefunction type to take a new "context" parameter.