Skip to content

Fix astar node memory leak#4924

Closed
jacksonie wants to merge 6 commits intootland:masterfrom
jacksonie:fix-astar-node-mem-leak
Closed

Fix astar node memory leak#4924
jacksonie wants to merge 6 commits intootland:masterfrom
jacksonie:fix-astar-node-mem-leak

Conversation

@jacksonie
Copy link

Pull Request Prelude

Changes Proposed

Issues addressed:

How to test:

@jacksonie jacksonie changed the title fix astar node mem leak Fix astar node memory leak May 24, 2025
@MillhioreBT MillhioreBT requested review from nekiro and ranisalt May 24, 2025 05:18
@ranisalt
Copy link
Member

Won't this indefinitely append nodes to the nodeStore vector? Why not store unique pointers in nodes? It seems that every emplace_back is done on both

@MillhioreBT
Copy link
Contributor

Won't this indefinitely append nodes to the nodeStore vector? Why not store unique pointers in nodes? It seems that every emplace_back is done on both

I think it would be the right option, to avoid the boilerplate.
Besides, building a single vector is better than building 2.

@jacksonie
Copy link
Author

Won't this indefinitely append nodes to the nodeStore vector? Why not store unique pointers in nodes? It seems that every emplace_back is done on both

I applied a different approach to the code, and in my case, it has been working very well for days now. I know that, ideally, a single vector with unique_ptr would be the best solution, but this is all that my limited time allows me to do right now. Feel free to change whatever you want.

@MillhioreBT

@gesior
Copy link
Contributor

gesior commented May 31, 2025

Isn't real solution deleting all nodes from nodes in AStarNodes destructor?

@MillhioreBT
Copy link
Contributor

MillhioreBT commented Jun 1, 2025

Isn't real solution deleting all nodes from nodes in AStarNodes destructor?

that's right, it should. as always.

It should be as simple as this, but it's not.
image

MillhioreBT
MillhioreBT previously approved these changes Jun 1, 2025
@NRH-AA
Copy link
Contributor

NRH-AA commented Jun 1, 2025

You should move the unique_ptr to the nodes vector instead of making a new one.

@MillhioreBT
Copy link
Contributor

MillhioreBT commented Jun 1, 2025

You should move the unique_ptr to the nodes vector instead of making a new one.

You should move the unique_ptr to the nodes vector instead of making a new one.

The vector can have duplicates, so unique_ptr cannot be used since a unique_ptr cannot be copied.
Using shared_ptr would solve it but sounds much worse.

You could also use a temporary std::set to store without duplicates in the destructor, but it looks ugly.

@NRH-AA
Copy link
Contributor

NRH-AA commented Jun 1, 2025

Then I will dub this a temp fix and I will modify the getBestNodes to handle the unique_ptr properly. Just a quick example is:

std::unique_ptr<AStarNode> AStarNodes::getBestNode()
{
    if (nodes.empty()) {
        return nullptr;
    }

    std::nth_element(nodes.begin(), nodes.end() - 1, nodes.end(),
        [](const std::unique_ptr<AStarNode>& left, const std::unique_ptr<AStarNode>& right) {
            return left->f > right->f;
        });

    std::unique_ptr<AStarNode> best = std::move(nodes.back());
    nodes.pop_back();
    return best; // Caller now owns the node safely
}

I believe this would allow the garbage collector to free the memory when needed. Obviously, I would need to spend more time on it to be sure though.

@MillhioreBT
Copy link
Contributor

Then I will dub this a temp fix and I will modify the getBestNodes to handle the unique_ptr properly. Just a quick example is:

std::unique_ptr<AStarNode> AStarNodes::getBestNode()
{
    if (nodes.empty()) {
        return nullptr;
    }

    std::nth_element(nodes.begin(), nodes.end() - 1, nodes.end(),
        [](const std::unique_ptr<AStarNode>& left, const std::unique_ptr<AStarNode>& right) {
            return left->f > right->f;
        });

    std::unique_ptr<AStarNode> best = std::move(nodes.back());
    nodes.pop_back();
    return best; // Caller now owns the node safely
}

I believe this would allow the garbage collector to free the memory when needed. Obviously, I would need to spend more time on it to be sure though.

There would still be changes needed for this: neighborNode->parent = n; because in this line, two things can happen:

  1. A raw pointer managed by unique_ptr on vector can be assigned.
  2. A raw pointer that will be destroyed when the unique_ptr is destroyed in the next iteration can be assigned (which would cause a crash).

@NRH-AA
Copy link
Contributor

NRH-AA commented Jun 1, 2025

/

Copy link
Contributor

@NRH-AA NRH-AA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want you can revert the changes you made and follow these changes:

struct AStarNode
{
	AStarNode* parent;
	uint16_t g;
	uint16_t f;
	uint16_t x, y;
	bool closed;
};
~AStarNodes()
{
	for (auto node : nodes) {
		delete node;
	}
}

Modify just under

// Calculate cost of neighbor.
const uint16_t g = n->g + AStarNodes::getMapWalkCost(n, pos) + AStarNodes::getTileWalkCost(creature, tile);
const uint16_t h = calculateHeuristic(pos, targetPos);
const uint16_t newf = h + g;

Replace the code under it with:

if (neighborNode) {
	if (neighborNode->closed) {
		if (neighborNode->f <= newf) {
			continue;
		}

		neighborNode->closed = false;
	} else {
		if (neighborNode->f <= newf) {
			continue;
		}

		neighborNode->g = g;
		neighborNode->f = newf;
		neighborNode->parent = n;
	}
} else {
	nodes.createNewNode(n, pos.x, pos.y, g, newf);
}

Replace getBestNodes()

AStarNode* AStarNodes::getBestNode()
{
	if (nodes.size() == 0) {
		return nullptr;
	}

	std::vector<AStarNode*> openNodes;
	for (auto node : nodes) {
		if (!node->closed) {
			openNodes.push_back(node);
		}
	}

	std::nth_element(openNodes.begin(), openNodes.end() - 1, openNodes.end(),
	                 [](AStarNode* left, AStarNode* right) { return left->f > right->f; });
	AStarNode* retNode = openNodes.back();
	retNode->closed = true;
	return retNode;
}

Then add:

firstNode->closed = false;
newNode->closed = false;

when the first and new nodes are created.

@MillhioreBT
Copy link
Contributor

MillhioreBT commented Jun 2, 2025

openNodes

openNodes

Parece que el costo de construccion de vectores en cada llamada a getBestNode empeora el rendimiento dramaticamente en comparacion con el mapa de unique_ptr, parece una solucion mas solida la que propone este PR

If you want you can revert the changes you made and follow these changes:

struct AStarNode
{
	AStarNode* parent;
	uint16_t g;
	uint16_t f;
	uint16_t x, y;
	bool closed;
};
~AStarNodes()
{
	for (auto node : nodes) {
		delete node;
	}
}

Modify just under

// Calculate cost of neighbor.
const uint16_t g = n->g + AStarNodes::getMapWalkCost(n, pos) + AStarNodes::getTileWalkCost(creature, tile);
const uint16_t h = calculateHeuristic(pos, targetPos);
const uint16_t newf = h + g;

Replace the code under it with:

if (neighborNode) {
	if (neighborNode->closed) {
		if (neighborNode->f <= newf) {
			continue;
		}

		neighborNode->closed = false;
	} else {
		if (neighborNode->f <= newf) {
			continue;
		}

		neighborNode->g = g;
		neighborNode->f = newf;
		neighborNode->parent = n;
	}
} else {
	nodes.createNewNode(n, pos.x, pos.y, g, newf);
}

Replace getBestNodes()

AStarNode* AStarNodes::getBestNode()
{
	if (nodes.size() == 0) {
		return nullptr;
	}

	std::vector<AStarNode*> openNodes;
	for (auto node : nodes) {
		if (!node->closed) {
			openNodes.push_back(node);
		}
	}

	std::nth_element(openNodes.begin(), openNodes.end() - 1, openNodes.end(),
	                 [](AStarNode* left, AStarNode* right) { return left->f > right->f; });
	AStarNode* retNode = openNodes.back();
	retNode->closed = true;
	return retNode;
}

Then add:

firstNode->closed = false;
newNode->closed = false;

when the first and new nodes are created.

Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range.

If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally.

@NRH-AA
Copy link
Contributor

NRH-AA commented Jun 2, 2025

openNodes
Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range.

If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally.

I don't think this system in general really follows RAII. The nodes are created on the fly not during construction. The main thing here is we deallocate the memory correctly. The unique_ptr is a cleaner method, but I think it is actually not as good of a solution because it is barley used. The extra vector inside the getBestNodes is redundant to performance. While it may not be as pretty I think it is a better solution than throwing in a unique_ptr just for the sake of looks. I guess from this point maybe some other people can throw in their insights. Either solution is fine imo.

@MillhioreBT
Copy link
Contributor

MillhioreBT commented Jun 2, 2025

openNodes
Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range.
If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally.

I don't think this system in general really follows RAII. The nodes are created on the fly not during construction. The main thing here is we deallocate the memory correctly. The unique_ptr is a cleaner method, but I think it is actually not as good of a solution because it is barley used. The extra vector inside the getBestNodes is redundant to performance. While it may not be as pretty I think it is a better solution than throwing in a unique_ptr just for the sake of looks. I guess from this point maybe some other people can throw in their insights. Either solution is fine imo.

Yes, RAII can be implemented without any issues. The problem lies in the fact that once the nodes are deleted, their management becomes considerably complicated. Additionally, another complication arises: when the vector is reallocated, all pointers contained in the map are invalidated. Consequently, using a vector becomes counterproductive if the design does not contemplate dynamic space reallocation. Currently, the limit is set at 50 elements, but when this threshold is exceeded, the system fails for obvious reasons.

The solution would be to increase the node limit, make sure getBestNode does not remove nodes from the vector, and ensure that the node limit is not exceeded to avoid memory reallocation. For node filtering, ranges can be used: std::views::filter + std::ranges::min_element

@MillhioreBT MillhioreBT requested review from MillhioreBT and NRH-AA June 3, 2025 02:23
@gesior gesior mentioned this pull request Jun 4, 2025
3 tasks
@gesior
Copy link
Contributor

gesior commented Jun 4, 2025

@NRH-AA
@MillhioreBT
As we are fixing memory leak, but also code style that made this memory leak appear unnoticed in PR.
I got a question before it gets merged.

Can this:

	AStarNode firstNode;
	firstNode.parent = nullptr;
	firstNode.x = x;
	firstNode.y = y;
	firstNode.g = 0;
	firstNode.f = 0;

and this:

	AStarNode newNode;
	newNode.parent = parent;
	newNode.x = x;
	newNode.y = y;
	newNode.g = g;
	newNode.f = f;

be moved to AStarNode class constructor with default values for parent, g and f?

Or just set default values in struct, so it will be obvious that any of these values may be not set by constructor [struct initialization] (but it looks like x and y must be set for nodeMap, so constructor with 3 optional parameters [parent, g, f] would be better):

struct AStarNode
{
	AStarNode* parent = nullptr;
	uint16_t g = 0;
	uint16_t f = 0;
	uint16_t x = 0;
	uint16_t y = 0;
};

and just modify values we want to change?


Other question: what are g and f values? Of course I can google AStarNode and search what these values mean in A* algorithm, but can't we name them with something that would tell anyone what these values mean for algorithm? Like, if getting them higher/lower makes them better.

@MillhioreBT
Copy link
Contributor

@NRH-AA @MillhioreBT As we are fixing memory leak, but also code style that made this memory leak appear unnoticed in PR. I got a question before it gets merged.

Can this:

	AStarNode firstNode;
	firstNode.parent = nullptr;
	firstNode.x = x;
	firstNode.y = y;
	firstNode.g = 0;
	firstNode.f = 0;

and this:

	AStarNode newNode;
	newNode.parent = parent;
	newNode.x = x;
	newNode.y = y;
	newNode.g = g;
	newNode.f = f;

be moved to AStarNode class constructor with default values for parent, g and f?

Or just set default values in struct, so it will be obvious that any of these values may be not set by constructor [struct initialization] (but it looks like x and y must be set for nodeMap, so constructor with 3 optional parameters [parent, g, f] would be better):

struct AStarNode
{
	AStarNode* parent = nullptr;
	uint16_t g = 0;
	uint16_t f = 0;
	uint16_t x = 0;
	uint16_t y = 0;
};

and just modify values we want to change?

Other question: what are g and f values? Of course I can google AStarNode and search what these values mean in A* algorithm, but can't we name them with something that would tell anyone what these values mean for algorithm? Like, if getting them higher/lower makes them better.

Having default values during initialization is a waste of time since we are going to manually change their values during creation anyway.

The fields x and y are important because the current logic requires using these node values.

At least for this PR, that is irrelevant. However, another PR that could trim those two fields from the node struct to save a bit more memory and speed is always welcome.

f and g are just reference values for the node weights and the total sum of weights. They are called that by convention, but they could perfectly well be called cost and total cost. But who cares about this? Anyone working with the A* algorithm will surely easily know what they mean because it is like a standard.

@gesior
Copy link
Contributor

gesior commented Jun 5, 2025

I benchmarked getPathTo function without "Optimize pathfinding" commit (1 commit before), with "Optimize pathfinding" and with this PR fix.

Summary

  • new algorithm uses 5% less CPU than old
  • it finds 8% less paths to target - fails to find path in some cases in which old algorithm returns path
  • it also returns 30% less steps in found paths - maybe, because it fails on long paths, so 8% extra paths of old algorithm gives 30% more steps

What and how I tested
I wrote simple script that calculates path to positions 1-10 steps in each direction for each monster in game (added in Game::checkDecay() this code: https://paste.ots.me/564224/text ).

Tested on Ubuntu 22.04, engine compiled with g++ 13 in Release mode, core affinity set to first 4 'P' cores of Intel CPU (taskset -c 0-3 ./tfs).

Results:

  • old algorithm: 7808 ms (100%)
  • new algorithm with memory leak: 8289 ms (106%)
  • new algorithm without memory leak: 7416 ms (95%)

With memory leak fixed new algorithm is 5% faster than old, but getPathTo results are different.
Old algorithm:

Found: 139.091 Not found: 146.029 Total steps: 160.303.989

New algorithm:

Found: 129.553 Not found: 155.567 Total steps: 121.928.695

New algorithm did not find path to target position in +10k tests (8% less than old algorithm). Total number of steps in paths also dropped from 160kk to 120kk. New algorithm find shorter paths to target.. or these 8% of paths for which it could not find path were the longest paths.
So we got 8% less paths found, 30% less steps in paths found and 5% less CPU usage.
Without generated paths comparison - especially these new algorithm fails to find -, I'm not sure, if new algorithm is really 5% faster or it just gives up earlier by checking less tiles.

EDIT:
I increased iterations limit from 120 to 250 to make it find as many paths as old algorithm (139k):

Found: 139.036 Not found: 146.084 Total steps: 160.639.857

Even with custom hash algorithm proposed by @ranisalt execution time grow to 9957 ms, which is 27% more than old algorithm.
It looks like new algorithm isn't faster. It just calculates less paths - more often return There is no way. :(

std::vector<AStarNode*> nodes;
std::map<uint16_t, std::map<uint16_t, AStarNode*>> nodeMap;

std::map<uint16_t, std::map<uint16_t, AStarNode>> nodeMap;
Copy link
Member

@ranisalt ranisalt Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this massive indirection, and probably improve the speed a little bit, this should be

Suggested change
std::map<uint16_t, std::map<uint16_t, AStarNode>> nodeMap;
std::map<std::pair<uint16_t, uint16_t>, AStarNode>> nodeMap;

But we probably need use Boost.ContainerHash or implement our hash:

template<>
struct std::hash<std::pair<uint16_t, uint16_t>>
{
    std::size_t operator()(const std::pair<uint16_t, uint16_t>& s) const noexcept
    {
        return std::hash<uint32_t>{}(static_cast<uint32_t>(s.first) << 16 | s.second);
    }
};

(throw this in tools.h)

This will avoid navigating nested binary trees, thus reducing complexity from O(log² n) to O(log n) if I understood this correctly. If this is in the hot path it may be better than the current results

Copy link
Contributor

@gesior gesior Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT:
In post above I did benchmark new vs old path finding algorithm and it looks like new algorithm is slower, even with optimization you proposed. New algorithm executes faster, but it returns less 'found path' results. After tweaking algorithm params to make it return similar number of 'found' paths, it's slower than old algorithm.

  1. Just std::pair change is 7% slower than old algorithm. Execution time increased from 7808 ms to 8361 ms.
std::map<std::pair<uint16_t, uint16_t>, AStarNode>> nodeMap;
  1. Custom hash algorithm and std::unordered_map - to make it work - is 25% faster than old algorithm! Execution time decreased from 7808 ms to 5843 ms. It's also 21% faster than current fixed code from this PR (7416 ms vs 5843 ms).
// tools.h

template<>
struct std::hash<std::pair<uint16_t, uint16_t>>
{
	std::size_t operator()(const std::pair<uint16_t, uint16_t>& s) const noexcept
	{
		return std::hash<uint32_t>{}(static_cast<uint32_t>(s.first) << 16 | s.second);
	}
};

// map.h

std::unordered_map<std::pair<uint16_t, uint16_t>, AStarNode> nodeMap;

// map.cpp

	nodeMap[std::make_pair(x, y)] = std::move(firstNode);
	nodes.reserve(50);
	nodes.push_back(&nodeMap[std::make_pair(x, y)]);

// (...)

	nodeMap[std::make_pair(x, y)] = std::move(newNode);
	nodes.push_back(&nodeMap[std::make_pair(x, y)]);

// (...)

AStarNode* AStarNodes::getNodeByPosition(uint16_t x, uint16_t y)
{
	// Check if the node exists in the map
	auto it = nodeMap.find(std::make_pair(x, y));
	if (it != nodeMap.end()) {
		return &it->second;
	}
	return nullptr;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's very neat. What's the length of the paths you are testing @gesior? Or are you using random paths with varying lenghts? I would assume unordered_map excels if the paths are longer as searching the node would dominate the complexity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's very neat. What's the length of the paths you are testing @gesior? Or are you using random paths with varying lenghts?

Official TFS map. I run script for all monsters spawned on map and calculate paths 1-10 SQMs (monsters walk distance) in each direction (N/W/S/E) ( https://paste.ots.me/564224/text ). So it's like 'walk to every tile on "monster screen"' - trying to replicate common monster behavior.

Maps/vectors used by A* are often pretty small - monsters are often pretty close to target. In my benchmarks related to 'auto loot', it's faster to use std::vector, than to use std::map for 'arrays' (maps) with less than 30 elements.

I would assume unordered_map excels if the paths are longer as searching the node would dominate the complexity

IDK. I just tested current PR code vs. new code with 'custom hash algorithm'. 'custom hash algorithm' does not work for std::map at all (it's ignored), but it's required and works for std::unordered_map.
Maybe there is some combination of boost 'map' (ordered) and custom hash algorithm, but I did not test it.
All these tests take too much time.

Copy link
Contributor

@alysonjacomin alysonjacomin Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark you ran was very good Gesior, thank you for your attention. But it is worth mentioning that with this pathfind optimization we had significant improvements in "walk" and "turn". I will be attaching a video with the old pathfind and another with the new pathfind (with the fix of memory leak).

Old Pathfind - https://youtu.be/fCvdSqIN-C4

New Pathfind - https://youtu.be/nyOd4Pq9Xms

Copy link
Contributor

@NRH-AA NRH-AA Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would take time to make unit tests for sure. I don't think there is an issue finding the "correct path". Meaning there are many instances where 2 different paths can be found that are both equal. A* will always find the "correct path" it just may not be the exact same one djkstras does. All of which is also based on how the algorithms are designed as well. For example storing the neighbors from top left to bottom right vs the opposite will change which of the 2 paths it chooses.

The only issue finding the correct path that could happen is if the path is outside the bounds of what we allow to happen. For instance, tibia doesn't allow the following path to occur:

image

If the distance being checked is too far the path is rejected which is why there is the iterations part of the code. In the original algorithm of TFS before my change there was no distance check. So paths that were outside the bound of maxClientViewport (which is what monsters can see) were allowed to be checked. The only thing stopping the original was the 250 iteration which means a path that is 30sqm away could be checked when it shouldn't be allowed.

There were many cases where the old algorithm would look for paths and find them when it shouldn't. That would explain the discrepancy of my algorithm not finding the path (ignoring it) and the original finding it. All of those types of things were accounted for in my implementation and is why we are allowed to call the pathfinding as much as we want without serious CPU and memory problems (excluding the memory leak). Which we couldn't do before.

We could do unit tests but if the algorithm wasn't working there would be an infinite loop 99.9% of the time.

I am kind of curious to how gesior tested all this because when I tested the original tfs pathfinding I would get times as bad as 1.5m nanoseconds on paths where on mine I maxed at around 100k nanoseconds on the longest paths.

Copy link
Contributor

@NRH-AA NRH-AA Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also state that instead of changing
if (iterations >= 120) {

Changing this would be what fixes the differences in finding the path:

// Don't update path. The target is too far away.
int32_t maxDistanceX = fpp.maxSearchDist ? fpp.maxSearchDist : Map::maxClientViewportX + 1;
int32_t maxDistanceY = fpp.maxSearchDist ? fpp.maxSearchDist : Map::maxClientViewportY + 1;
if (distanceX > maxDistanceX || distanceY > maxDistanceY) {
	return false;
}

It isn't that we can't find the right path its that we don't allow this path to happen. We want this behavior. Deleting all of the pre path checks should result in a 1:1 even with iterations at only 120 unless you are checking really big paths that we don't need in tibia.

Copy link
Contributor

@alysonjacomin alysonjacomin Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NRH-AA I agree with your analysis !

Gesior tests were very accurate, and analyzing the old code we can see that there really weren't as many checks and because of that it generated more possible paths, therefore, between tests A, B and C that the best option is option B limited to 120 iterations !

@ranisalt We can do tests based on the north, east, south, west and diagonal positions by adding 3 path variations for each tested position and then verify if the monster is really following the best path ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really not sure how to do tests with tfs src. There are many things that make it difficult but just a make shift version could be something like this (check the link). I guess we could remove any code that uses creature, ect and just make a bare bones algorithm. At that point can we really trust the results to be the same though?

https://pastebin.com/H0v40A8X

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the thing is our code is way too tightly coupled and in order to test path finding you would need to craft a map in memory which would be a PITA. I guess we leave it as is

@gesior
Copy link
Contributor

gesior commented Jun 19, 2025

Can we finally merge that and remove huge memory leak?
We can decide later, if we want to revert 'optimized getPathTo' - as it's slower, if we tune it to return same results as old algorithm.

@jacksonie
Copy link
Author

jacksonie commented Jun 19, 2025

Can we finally merge that and remove huge memory leak? We can decide later, if we want to revert 'optimized getPathTo' - as it's slower, if we tune it to return same results as old algorithm.

@gesior My goal was exactly that: to eliminate the memory leak.

My initial solution, although not perfect, solves the problem, and I've been using it for weeks without any issues.

I know, a second vector called 'toReleaseNodes' may not be ideal, but at least it doesn't leak 70 GB of memory in 20 hours.

@NRH-AA
Copy link
Contributor

NRH-AA commented Jun 20, 2025

Other question: what are g and f values? Of course I can google AStarNode and search what these values mean in A* algorithm, but can't we name them with something that would tell anyone what these values mean for algorithm? Like, if getting them higher/lower makes them better.

g h and f are well known values when working with pathfinding algorithms. It is probably best to leave them as that, but for an explanation:
g -> Is the cost so far to get to the node + the cost to walk from the last node to the current node.
h -> Is the calculation from the current node to the target/end point.
f -> This is the combination of g+h and must be stored in order to decide if a certain path is better than a different one already calculated on the node.

The higher any of the values are the worse the node is in the path. The h value is the main difference between djikstras and A* pathfinding.

The old algorithm used a ton of memory so trying to implement it with the changes I have made for this algorithm will not work well.

You can check this website to get a visual of the difference between the two algorithms.
Old pathfinding: Djisktras
New: A* (AStar) algorithm

https://qiao.github.io/PathFinding.js/visual/

@gesior My goal was exactly that: to eliminate the memory leak.
My initial solution, although not perfect, solves the problem, and I've been using it for weeks without any issues.
I know, a second vector called 'toReleaseNodes' may not be ideal, but at least it doesn't leak 70 GB of memory in 20 hours.

Def sounds good and great job on a quick solution that is more than good enough.

I will create a new PR with optimizations to the efficiency of the algorithm.

Copy link
Contributor

@NRH-AA NRH-AA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look fine to me. As long as it is tested and the memory leak is addressed. Just check out #4934 first because it does improve performance and takes care of the memory problem as well.

@gesior gesior mentioned this pull request Jun 28, 2025
@gesior
Copy link
Contributor

gesior commented Jun 28, 2025

Changes in #4934 are much better.

Even with all optimizations proposed by @ranisalt (struct std::hash<std::pair<uint16_t, uint16_t>>) in this PR, it takes 5843 ms to process all monsters on TFS .otbm.
With PR #4934 it's 4350 ms (25% less than this PR and 45% less than 'old TFS algorithm'). There is no memory leak, no crashes and no reports from address sanitizer (I did not test this PR with address sanitizer).

@alysonjacomin
Copy link
Contributor

@gesior Nice !

@ranisalt
Copy link
Member

ranisalt commented Jul 1, 2025

@NRH-AA is this still relevant after #4934 ?

@gesior
Copy link
Contributor

gesior commented Jul 1, 2025

@NRH-AA is this still relevant after #4934 ?

No. Memory leak is fixed after #4934 and we also get 60% CPU usage reduction vs old algorithm.
We can close this PR.

@jacksonie jacksonie closed this Jul 1, 2025
@jacksonie jacksonie deleted the fix-astar-node-mem-leak branch July 1, 2025 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants