Conversation
|
Won't this indefinitely append nodes to the |
I think it would be the right option, to avoid the boilerplate. |
This reverts commit 3d9ed8e.
I applied a different approach to the code, and in my case, it has been working very well for days now. I know that, ideally, a single vector with unique_ptr would be the best solution, but this is all that my limited time allows me to do right now. Feel free to change whatever you want. |
|
Isn't real solution deleting all nodes from |
|
You should move the unique_ptr to the nodes vector instead of making a new one. |
The vector can have duplicates, so unique_ptr cannot be used since a unique_ptr cannot be copied. You could also use a temporary std::set to store without duplicates in the destructor, but it looks ugly. |
|
Then I will dub this a temp fix and I will modify the getBestNodes to handle the unique_ptr properly. Just a quick example is: I believe this would allow the garbage collector to free the memory when needed. Obviously, I would need to spend more time on it to be sure though. |
There would still be changes needed for this:
|
|
/ |
NRH-AA
left a comment
There was a problem hiding this comment.
If you want you can revert the changes you made and follow these changes:
struct AStarNode
{
AStarNode* parent;
uint16_t g;
uint16_t f;
uint16_t x, y;
bool closed;
};
~AStarNodes()
{
for (auto node : nodes) {
delete node;
}
}
Modify just under
// Calculate cost of neighbor.
const uint16_t g = n->g + AStarNodes::getMapWalkCost(n, pos) + AStarNodes::getTileWalkCost(creature, tile);
const uint16_t h = calculateHeuristic(pos, targetPos);
const uint16_t newf = h + g;
Replace the code under it with:
if (neighborNode) {
if (neighborNode->closed) {
if (neighborNode->f <= newf) {
continue;
}
neighborNode->closed = false;
} else {
if (neighborNode->f <= newf) {
continue;
}
neighborNode->g = g;
neighborNode->f = newf;
neighborNode->parent = n;
}
} else {
nodes.createNewNode(n, pos.x, pos.y, g, newf);
}
Replace getBestNodes()
AStarNode* AStarNodes::getBestNode()
{
if (nodes.size() == 0) {
return nullptr;
}
std::vector<AStarNode*> openNodes;
for (auto node : nodes) {
if (!node->closed) {
openNodes.push_back(node);
}
}
std::nth_element(openNodes.begin(), openNodes.end() - 1, openNodes.end(),
[](AStarNode* left, AStarNode* right) { return left->f > right->f; });
AStarNode* retNode = openNodes.back();
retNode->closed = true;
return retNode;
}
Then add:
firstNode->closed = false;
newNode->closed = false;
when the first and new nodes are created.
openNodes
Parece que el costo de construccion de vectores en cada llamada a getBestNode empeora el rendimiento dramaticamente en comparacion con el mapa de unique_ptr, parece una solucion mas solida la que propone este PR
Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range. If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally. |
I don't think this system in general really follows RAII. The nodes are created on the fly not during construction. The main thing here is we deallocate the memory correctly. The unique_ptr is a cleaner method, but I think it is actually not as good of a solution because it is barley used. The extra vector inside the getBestNodes is redundant to performance. While it may not be as pretty I think it is a better solution than throwing in a unique_ptr just for the sake of looks. I guess from this point maybe some other people can throw in their insights. Either solution is fine imo. |
Yes, RAII can be implemented without any issues. The problem lies in the fact that once the nodes are deleted, their management becomes considerably complicated. Additionally, another complication arises: when the vector is reallocated, all pointers contained in the map are invalidated. Consequently, using a vector becomes counterproductive if the design does not contemplate dynamic space reallocation. Currently, the limit is set at 50 elements, but when this threshold is exceeded, the system fails for obvious reasons. The solution would be to increase the node limit, make sure getBestNode does not remove nodes from the vector, and ensure that the node limit is not exceeded to avoid memory reallocation. For node filtering, ranges can be used: std::views::filter + std::ranges::min_element |
|
@NRH-AA Can this: AStarNode firstNode;
firstNode.parent = nullptr;
firstNode.x = x;
firstNode.y = y;
firstNode.g = 0;
firstNode.f = 0;and this: AStarNode newNode;
newNode.parent = parent;
newNode.x = x;
newNode.y = y;
newNode.g = g;
newNode.f = f;be moved to Or just set default values in struct, so it will be obvious that any of these values may be not set by constructor [struct initialization] (but it looks like struct AStarNode
{
AStarNode* parent = nullptr;
uint16_t g = 0;
uint16_t f = 0;
uint16_t x = 0;
uint16_t y = 0;
};and just modify values we want to change? Other question: what are |
Having default values during initialization is a waste of time since we are going to manually change their values during creation anyway. The fields x and y are important because the current logic requires using these node values. At least for this PR, that is irrelevant. However, another PR that could trim those two fields from the node struct to save a bit more memory and speed is always welcome. f and g are just reference values for the node weights and the total sum of weights. They are called that by convention, but they could perfectly well be called cost and total cost. But who cares about this? Anyone working with the A* algorithm will surely easily know what they mean because it is like a standard. |
|
I benchmarked Summary
What and how I tested Tested on Ubuntu 22.04, engine compiled with g++ 13 in Release mode, core affinity set to first 4 'P' cores of Intel CPU ( Results:
With memory leak fixed new algorithm is 5% faster than old, but New algorithm: New algorithm did not find path to target position in +10k tests (8% less than old algorithm). Total number of steps in paths also dropped from 160kk to 120kk. New algorithm find shorter paths to target.. or these 8% of paths for which it could not find path were the longest paths. EDIT: Even with custom hash algorithm proposed by @ranisalt execution time grow to 9957 ms, which is 27% more than old algorithm. |
| std::vector<AStarNode*> nodes; | ||
| std::map<uint16_t, std::map<uint16_t, AStarNode*>> nodeMap; | ||
|
|
||
| std::map<uint16_t, std::map<uint16_t, AStarNode>> nodeMap; |
There was a problem hiding this comment.
To avoid this massive indirection, and probably improve the speed a little bit, this should be
| std::map<uint16_t, std::map<uint16_t, AStarNode>> nodeMap; | |
| std::map<std::pair<uint16_t, uint16_t>, AStarNode>> nodeMap; |
But we probably need use Boost.ContainerHash or implement our hash:
template<>
struct std::hash<std::pair<uint16_t, uint16_t>>
{
std::size_t operator()(const std::pair<uint16_t, uint16_t>& s) const noexcept
{
return std::hash<uint32_t>{}(static_cast<uint32_t>(s.first) << 16 | s.second);
}
};(throw this in tools.h)
This will avoid navigating nested binary trees, thus reducing complexity from O(log² n) to O(log n) if I understood this correctly. If this is in the hot path it may be better than the current results
There was a problem hiding this comment.
EDIT:
In post above I did benchmark new vs old path finding algorithm and it looks like new algorithm is slower, even with optimization you proposed. New algorithm executes faster, but it returns less 'found path' results. After tweaking algorithm params to make it return similar number of 'found' paths, it's slower than old algorithm.
- Just
std::pairchange is 7% slower than old algorithm. Execution time increased from 7808 ms to 8361 ms.
std::map<std::pair<uint16_t, uint16_t>, AStarNode>> nodeMap;- Custom hash algorithm and
std::unordered_map- to make it work - is 25% faster than old algorithm! Execution time decreased from 7808 ms to 5843 ms. It's also 21% faster than current fixed code from this PR (7416 ms vs 5843 ms).
// tools.h
template<>
struct std::hash<std::pair<uint16_t, uint16_t>>
{
std::size_t operator()(const std::pair<uint16_t, uint16_t>& s) const noexcept
{
return std::hash<uint32_t>{}(static_cast<uint32_t>(s.first) << 16 | s.second);
}
};
// map.h
std::unordered_map<std::pair<uint16_t, uint16_t>, AStarNode> nodeMap;
// map.cpp
nodeMap[std::make_pair(x, y)] = std::move(firstNode);
nodes.reserve(50);
nodes.push_back(&nodeMap[std::make_pair(x, y)]);
// (...)
nodeMap[std::make_pair(x, y)] = std::move(newNode);
nodes.push_back(&nodeMap[std::make_pair(x, y)]);
// (...)
AStarNode* AStarNodes::getNodeByPosition(uint16_t x, uint16_t y)
{
// Check if the node exists in the map
auto it = nodeMap.find(std::make_pair(x, y));
if (it != nodeMap.end()) {
return &it->second;
}
return nullptr;
}There was a problem hiding this comment.
Ah that's very neat. What's the length of the paths you are testing @gesior? Or are you using random paths with varying lenghts? I would assume unordered_map excels if the paths are longer as searching the node would dominate the complexity
There was a problem hiding this comment.
Ah that's very neat. What's the length of the paths you are testing @gesior? Or are you using random paths with varying lenghts?
Official TFS map. I run script for all monsters spawned on map and calculate paths 1-10 SQMs (monsters walk distance) in each direction (N/W/S/E) ( https://paste.ots.me/564224/text ). So it's like 'walk to every tile on "monster screen"' - trying to replicate common monster behavior.
Maps/vectors used by A* are often pretty small - monsters are often pretty close to target. In my benchmarks related to 'auto loot', it's faster to use std::vector, than to use std::map for 'arrays' (maps) with less than 30 elements.
I would assume
unordered_mapexcels if the paths are longer as searching the node would dominate the complexity
IDK. I just tested current PR code vs. new code with 'custom hash algorithm'. 'custom hash algorithm' does not work for std::map at all (it's ignored), but it's required and works for std::unordered_map.
Maybe there is some combination of boost 'map' (ordered) and custom hash algorithm, but I did not test it.
All these tests take too much time.
There was a problem hiding this comment.
This benchmark you ran was very good Gesior, thank you for your attention. But it is worth mentioning that with this pathfind optimization we had significant improvements in "walk" and "turn". I will be attaching a video with the old pathfind and another with the new pathfind (with the fix of memory leak).
Old Pathfind - https://youtu.be/fCvdSqIN-C4
New Pathfind - https://youtu.be/nyOd4Pq9Xms
There was a problem hiding this comment.
It would take time to make unit tests for sure. I don't think there is an issue finding the "correct path". Meaning there are many instances where 2 different paths can be found that are both equal. A* will always find the "correct path" it just may not be the exact same one djkstras does. All of which is also based on how the algorithms are designed as well. For example storing the neighbors from top left to bottom right vs the opposite will change which of the 2 paths it chooses.
The only issue finding the correct path that could happen is if the path is outside the bounds of what we allow to happen. For instance, tibia doesn't allow the following path to occur:
If the distance being checked is too far the path is rejected which is why there is the iterations part of the code. In the original algorithm of TFS before my change there was no distance check. So paths that were outside the bound of maxClientViewport (which is what monsters can see) were allowed to be checked. The only thing stopping the original was the 250 iteration which means a path that is 30sqm away could be checked when it shouldn't be allowed.
There were many cases where the old algorithm would look for paths and find them when it shouldn't. That would explain the discrepancy of my algorithm not finding the path (ignoring it) and the original finding it. All of those types of things were accounted for in my implementation and is why we are allowed to call the pathfinding as much as we want without serious CPU and memory problems (excluding the memory leak). Which we couldn't do before.
We could do unit tests but if the algorithm wasn't working there would be an infinite loop 99.9% of the time.
I am kind of curious to how gesior tested all this because when I tested the original tfs pathfinding I would get times as bad as 1.5m nanoseconds on paths where on mine I maxed at around 100k nanoseconds on the longest paths.
There was a problem hiding this comment.
I will also state that instead of changing
if (iterations >= 120) {
Changing this would be what fixes the differences in finding the path:
// Don't update path. The target is too far away.
int32_t maxDistanceX = fpp.maxSearchDist ? fpp.maxSearchDist : Map::maxClientViewportX + 1;
int32_t maxDistanceY = fpp.maxSearchDist ? fpp.maxSearchDist : Map::maxClientViewportY + 1;
if (distanceX > maxDistanceX || distanceY > maxDistanceY) {
return false;
}
It isn't that we can't find the right path its that we don't allow this path to happen. We want this behavior. Deleting all of the pre path checks should result in a 1:1 even with iterations at only 120 unless you are checking really big paths that we don't need in tibia.
There was a problem hiding this comment.
@NRH-AA I agree with your analysis !
Gesior tests were very accurate, and analyzing the old code we can see that there really weren't as many checks and because of that it generated more possible paths, therefore, between tests A, B and C that the best option is option B limited to 120 iterations !
@ranisalt We can do tests based on the north, east, south, west and diagonal positions by adding 3 path variations for each tested position and then verify if the monster is really following the best path ...
There was a problem hiding this comment.
I am really not sure how to do tests with tfs src. There are many things that make it difficult but just a make shift version could be something like this (check the link). I guess we could remove any code that uses creature, ect and just make a bare bones algorithm. At that point can we really trust the results to be the same though?
There was a problem hiding this comment.
Yeah, the thing is our code is way too tightly coupled and in order to test path finding you would need to craft a map in memory which would be a PITA. I guess we leave it as is
|
Can we finally merge that and remove huge memory leak? |
@gesior My goal was exactly that: to eliminate the memory leak. My initial solution, although not perfect, solves the problem, and I've been using it for weeks without any issues. I know, a second vector called 'toReleaseNodes' may not be ideal, but at least it doesn't leak 70 GB of memory in 20 hours. |
g h and f are well known values when working with pathfinding algorithms. It is probably best to leave them as that, but for an explanation: The higher any of the values are the worse the node is in the path. The h value is the main difference between djikstras and A* pathfinding. The old algorithm used a ton of memory so trying to implement it with the changes I have made for this algorithm will not work well. You can check this website to get a visual of the difference between the two algorithms. https://qiao.github.io/PathFinding.js/visual/
Def sounds good and great job on a quick solution that is more than good enough. I will create a new PR with optimizations to the efficiency of the algorithm. |
|
Changes in #4934 are much better. Even with all optimizations proposed by @ranisalt ( |
|
@gesior Nice ! |


Pull Request Prelude
Changes Proposed
Issues addressed:
How to test: