Skip to content

Increase supported Y-level from 127 to 255#20

Merged
babbaj merged 46 commits intobabbaj:masterfrom
underscore-zi:master
Apr 3, 2025
Merged

Increase supported Y-level from 127 to 255#20
babbaj merged 46 commits intobabbaj:masterfrom
underscore-zi:master

Conversation

@underscore-zi
Copy link
Copy Markdown
Contributor

Key thing is expanding the Chunk structure to store 16 sections expanding the range of possible support y-values from 128 to 256.

In the nether this only really adds the ability to pathfind on the nether roof. The expanded range makes it possible to pathfind across the full positive-y range of the overworld and end dimension chunks. Which is what I've been working on adding into Baritone.

@leijurv
Copy link
Copy Markdown
Collaborator

leijurv commented Feb 11, 2025

In my opinion there is no need to pathfind in the overworld or end. If you want auto elytra fly, that would be better added to the actual mod, meaning baritone (or https://github.com/babbaj/nether-pathfinder-mod). Simply because the player can fly to above y=320, then point straight at their destination. There is no pathfinding required. Same goes for the nether roof - if you're on the nether roof, no need to pathfind, simply point at the destination and use fireworks periodically.

So I think that if you want to auto elytra fly in overworld, in the end, or above the nether roof, that's better added to baritone as a simple "point straight at the destination at all times" option. And it should tell you in chat to get up above the build limit before you can turn it on.

@underscore-zi
Copy link
Copy Markdown
Contributor Author

So I agree that the pathfinder isn't necessary if you want to fly at altitude.

It is useful for getting to that altitude in the first place though, which was my initial take on adding it into Baritone. Pathfinder just handled the launch and landing. Though it ran into some issues when terrain had too much prominence, its pretty fun to watch it find its way out of large caves.

I think there are reasons to fly at lower altitudes though, sometimes the trip is short and it makes more sense to just fly direct to it. I kinda get a kick out of every time I just point and click on the Xaero Plus map and have it fly me right there, or just some timed flights between AFK platforms.

The API use-case has been my primary motivation as Baritone has a good elytra control system in place but its only usable for nether travel. I can have my own mods path to a given destination and let Baritone worry about the navigation. I currently use it along side a stash scanner so it orbits close enough for me to freecam the area and between findings flies at a level I can still see terrain from.

@leijurv
Copy link
Copy Markdown
Collaborator

leijurv commented Feb 11, 2025

@babbaj what do you think

Another issue: I don't think that following the nether-pathfinder path makes sense above y=320. Doesn't it do all the "straight" flying first, then all the diagonal? e.g. flying from 0,0 to 5000,6000, wouldn't it like fly 0,0 -> 5000,5000 -> 5000,6000? or maybe 0,0 -> 0,1000 -> 5000,6000? not sure which. Anyway I don't think nether-pathfinder will achieve the desired result of pointing straight at the goal once it's up above the build limit? Therefore, wouldn't you want it to "take over" and point straight at the goal? (where "it" means the actual elytra control mod, baritone in this case)

In this PR's code, why 256 and not 384?

If there were configurable height I'd definitely want it to be a template parameter, so that the nether 0-128 behavior is unchanged with respect to memory use / performance etc. (I recognize that that is a much larger change, and it would have to be passed all over the place)

@underscore-zi
Copy link
Copy Markdown
Contributor Author

Another issue: I don't think that following the nether-pathfinder path makes sense above y=320.

Yeah, when you mentioned pointing straight at the target you meant along the lines of yaw-locking rather than the pathfinder generating the straight route. Or, as I was doing, just generating a path independent of the pathfinder.

In this PR's code, why 256 and not 384?

Looks like two things:

  1. My oversight, I thought the build limit was y=255, that explains why its not 320. Its a pretty easy change though.
  2. Supporting negative-y positions means needing to translate the 0-based start of all the segments from the pathfinder to start at -64. Not impossible but it felt like a messy solution compared to just using an index offset when reading the chunk to start at y=0.

I'm not sure this pathfinder is useful for underground flying as even with x4min off it tended to struggle in the tighter caves.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 12, 2025

just because elytra flying out of caves sounds cool I think this is worth considering. Only thing i want to make sure is that the performance in the nether isn't made worse (pretty sure it won't). The increased memory usage can be fixed with some allocation/page table tomfoolery (not templates).

@underscore-zi
Copy link
Copy Markdown
Contributor Author

I played around a bit today to get the negative-y levels working and correct my mistake regarding the positive-y range.

It does outright fail in the tighter caves, but I could see myself using this in those larger caverns. Using #thisway it'll eventually pathfind out to keep making progress. And its pretty cool to watch: https://github.com/user-attachments/assets/c97bc129-ae79-4151-bf4b-a53a1451bf09

@babbaj When it comes to memory usage what type page-table tomfoolery are you thinking?

My initial thought would basically be setting up either a page to reflect an empty section or an entire empty chunk. Then copy on write, since the majority of a chunk's section's are empty they won't need to be written. Do I have the right idea or were you thinking something else?

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 13, 2025

When it comes to memory usage what type page-table tomfoolery are you thinking?

allocate a huge range of virtual memory and make a bump allocator. empty sections that are never written to will never get assigned a physical page and reads will just return 0

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Feb 20, 2025

At a high-level there are two changes in the first commit:

  1. Allocations are grouped by Region, this still avoids calls for every chunk and makes cleanup easy using the Region position.

  2. Chunk States are stored out of band as part of the Region instead of the Chunk.

The chunk data fits exactly inside 3x4kb pages, so the extra byte would offset each following chunk adding up across an entire region potentially pushing terrain across a page boundary that wouldn't have been crossed otherwise. This is most noticeable in the nether since it naturally fits in 1 page and has terrain right to the page boundary so it basically doubled usage. In the overworld the impact depends more on the terrain.

Some memory usage statistics from trying to load 25 regions which resulted in loading 5269 chunks from my Baritone the_nether_128 cache.

Original nether-only implementation:

  • Commit Size: 22992 KB (22 MB)
  • Working Set: 26776 KB (26 MB)

New Implementation:

  • Commit Size: 161328 KB (157 MB)
  • Working Set: 27112 KB (26 MB)

For comparison the new implementation loading 5975 chunks (from 6 regions) from the overworld cache.

  • Commit Size: 75108 KB (73 MB)
  • Working Set: 48004 KB (47 MB)

The second commit, is adapting the public interface to support the MC coords which may be negative. I was doing this on the Java side but I think it makes more sense to do it in the library.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 21, 2025

Regions are an unnecessary abstraction that i will get rid of when i get around to merging this and implementing my idea.

The second commit, is adapting the public interface to support the MC coords which may be negative. I was doing this on the Java side but I think it makes more sense to do it in the library.

To keep the interface consistent and simple nether pathfinder chunks should always be 384 blocks tall and y coordinates should always start at 0. We do not need to know the dimension or be told what its height is. If the caller chooses to pathfind starting above the nether roof or has created a big hole in their roof there's no reason that shouldn't work. For the typical use case A* will simply choose not to pathfind through the bedrock layer.

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Feb 21, 2025

We do not need to know the dimension

parseBaritoneRegion needs to know the dimension to properly parse the file. It can be derived from the path if you want to require that thought.

To keep the interface consistent and simple nether pathfinder chunks should always be 384 blocks tall and y coordinates should always start at 0.

My thinking was that it simplifies the interface as the caller doesn't need to be aware of how the data is internally structured, just make a request about A->B.

If the caller chooses to pathfind starting above the nether roof or has created a big hole in their roof there's no reason that shouldn't work.

Just to be clear, that does already work, but yeah I agree, no point in an artificial limit of 256 when it works to 384.

Regions are an unnecessary abstraction that i will get rid of when i get around to merging this and implementing my idea.

So, I took the main part of your idea being to use the demand-zero page which Regions are clearly taking good advantage of by the working set stats. Is it the bump allocator you want or some other idea you have?

I experimented with a bump allocator which worked fine with straight line flight where the chunks were effectively FIFO. They'd fill a pool up, and then the whole pool could be freed at once fairly cleanly. In caves and the nether usage would increase because of backtracking. Any given pool would contain a mix of chunks from the good/bad paths so the pool couldn't be cleanly freed.

That led to thinking about a block allocator so I could reclaim memory from the middle of the pool without waiting for the entire pool to be free at which point I considered Regions to be a natural grouping, would be mostly zero filled and conveniently could all be unloaded at once; simplifying management.

I'm fine with providing a different implementation if you want.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 21, 2025

parseBaritoneRegion needs to know the dimension to properly parse the file

I'm pretty sure it does not and the caller is already expected to pass the dimension specific directory

They'd fill a pool up, and then the whole pool could be freed at once fairly cleanly

This is not what i had in mind, what i was thinking is allocating a single huge range of memory terabytes in size and bump allocating from that then individual chunks can be freed with munmap or madvise

@leijurv
Copy link
Copy Markdown
Collaborator

leijurv commented Feb 21, 2025

munmap

🤯

@leijurv
Copy link
Copy Markdown
Collaborator

leijurv commented Feb 21, 2025

allocating a single huge range of memory terabytes in size and bump allocating from that then individual chunks can be freed with munmap or madvise

Why on earth? I don't believe there is any need to implement our own memory allocator. System malloc/free will be way more reliable and portable

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 21, 2025

malloc is inefficient and uncool

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Feb 21, 2025

I'm pretty sure it does not

The number of bytes following a set CHUNK_PRESENT byte varies depending on dimension and is not indicated in the file itself.

the caller is already expected to pass the dimension specific directory

Yes, but requiring it also have the correct name is a new restriction. It'll mostly be used in cases where that doesn't matter. I just lean towards letting intent be explicit rather than trying to derive it.

what i was thinking is allocating a single huge range of memory terabytes in size and bump allocating from that then individual chunks can be freed with munmap or madvise

Okay, I can do that, not like it needs to support 32bit builds anyhow.

Do you have any thoughts regarding managing chunk state? Since for this the chunk data needs to be page aligned in order to unload them on demand, or wait until a page aligned section can be coalesced.

Bit of a hack but could utilize the bit for (0,0,0).

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 21, 2025

From what i’ve read windows actually doesn’t allow reserving more memory than could potentially be physically allocated (cringe) so that exact idea won’t work but some sort of arena allocator would be nice but i don’t care about windows so not my problem lol. Also to make leijurv happy there could be a configurable malloc based allocator.

Do you have any thoughts regarding managing chunk state?

a separate array

not like it needs to support 32bit builds anyhow.

having a malloc based allocator would also fix this

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Feb 21, 2025

a separate array

Associating a chunk to an index could be pretty annoying when talking about the TB+ storage block, and will become sparse overtime as most chunks get freed. So would need extra bookkeeping to manage the indices.

but some sort of arena allocator would be nice

An arena could more practically store a state array and use relative offset. Though if going the area route the benefit of demand zero paging is lost when the software starts reusing chunks out of the arena. Unless the pages are unloaded and reloaded for each chunk, at which point why bother with the arena.

If the arena is treated kinda like a bump allocator and don't reuse anything then chunks could be freed on-demand until none are left. I feel like this kinda comes back around to the Region system I already implemented though. Which is a 1024 chunk arena with a state array, but has the benefit of simplified bookkeeping and look-ups. Downside is the less efficient memory usage compared to the size of the chunks I don't think its all that meaningful.

having a malloc based allocator would also fix this

Meh, I don't think the build even does a 32bit build right now does it?

I do want to push back on using malloc though There is a significant memory saving by taking advantage of the demand zero page, its about a 2/3 reduction in the nether and 1/3 in the overworld on average compared to what malloc would use.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 22, 2025

So would need extra bookkeeping to manage the indices.

i was thinking array because it would be easy with one big range of memory but we can't do that on windows so we can just put the extra state in the hashmap along with the pointers to chunks

I also think I have changed my mind and think windows peasants do not deserve to overcomplicate this codebase with their own allocator

your region abstraction also isn't just a simple allocate/deallocate interface so that aint it

@underscore-zi
Copy link
Copy Markdown
Contributor Author

your region abstraction also isn't just a simple allocate/deallocate interface so that aint it

Is it not? It just allocs a new region if a chunk request comes in for a chunk in a region not yet in the map. And the free logic is basically unchanged from the chunk cache logic, loop over the region map comparing distancesq to find regions that can entirely be freed.

Couple utility methods for accessing the array to do the modulo on position.

I do see I left in a dimension field, that's unused and should have been removed before committing.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 22, 2025

Is it not?

it changes the code from a simple map<chunkpos, chunk*> to needing to fetch a region then fetch chunks out of the region

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Feb 22, 2025

Could a solution be to generate a map<chunkpos, chunk*> update it as regions are allocated/freed?

That would be aligned with just treating the regions as arenas and keeping the same public interface. Though need to provide chunkstate* also.

@leijurv
Copy link
Copy Markdown
Collaborator

leijurv commented Feb 22, 2025

malloc is inefficient and uncool

100% fake news

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 22, 2025

malloc is inefficient and uncool

100% fake news

malloc has to handle allocations of any size, any lifetime, from any thread

i do not believe you can do that anywhere close to as fast and efficiently as a bump allocator

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Feb 22, 2025

Could a solution be to generate a map<chunkpos, chunk*> update it as regions are allocated/freed?

That would be aligned with just treating the regions as arenas and keeping the same public interface. Though need to provide chunkstate* also.

id rather this pr just go back to being just its original purpose and i play with different allocation strategies myself. realistically i don't think the memory usage matters much anyways.

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Mar 12, 2025

Looks like there already is a check to override some blocks like tall-grass as air in the chunk packer. I feel that adding mushrooms there could make sense. It wouldn't rewrite already cached chunks but would work for the future.

Alternatively, on the Java side two settings could be introduced:

  1. A useCache flag to indicate if the cache should be passed into the pathfinder at all. Currently Baritone doesn't make use of loading from the cache so it wouldn't be a change from the current state of things.
  2. An allowNetherRoof flag to tell Java if it should load y>127 in the nether.

Those two settings on the Baritone side should lead to a similar performance in the nether.

Though I guess thats also outside the scope of this PR and project

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 12, 2025

I just added an argument that will allow baritone to limit the height to 128

@underscore-zi
Copy link
Copy Markdown
Contributor Author

Cool, that works. Results are a pretty clear drop in commit and usage.

2025-03-12-020741497-f6e44

I am running into another issue though. Every so often it looks like it tries to path through a wall. (2025-03-12-022657795-915d4.webm).

My first through was that it could be a bug in loading the chunks but that fact that it can recalculate a path around it after moving in spectator mode indicates to me that the chunk data itself is okay.

I suspect that this is due to callback in findPathSegment no longer considering if the chunk is from java. Resulting some segments being generated that pass through air chunks and given the right terrain, it never replaces that with a better route.

I never experienced this prior to that change but it might have existed prior and my flights just were not long enough to trigger it. So I figured I'd mention it now before I dig into it (tomorrow) in case you've seen it.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 12, 2025

setChunkState wasn't holding a lock to the mutex and I fixed the chunk state not being used in findPathSegment.

I also want to get rid of setChunkState because jni is expensive and it can be changed to just writing to a pointer.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 13, 2025

I also noticed that no locks are held when calling raytrace besides the culling lock so raytracing might be happening at the same time as chunks are being added to the cache.

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Mar 13, 2025

Your commits have do seem to be preventing the specific bug I was running into.

Regarding raytracing though, since Java is repeating those raytraces I think any incorrect trace would be self-corrected pretty quickly. I wonder though if that might explain some of those occasion clips that happens into newly loaded terrain. (afaik not a new bug with this work)

I also want to get rid of setChunkState because jni is expensive and it can be changed to just writing to a pointer.

I imagine just something like returning both the data and state pointers as a long[] and let Java do an unsafe write?

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 13, 2025

I looked at how it's used and it makes more sense to just to make getOrCreateChunk set the state or add a new function that does

@underscore-zi
Copy link
Copy Markdown
Contributor Author

getOrCreateChunk does get used during passable checks on the Java side which are usually read-only so it might falsely flag some chunks as being from Java early.

If going with a second function, perhaps renaming the original and having a pair of functions: getOrCreateFakeChunk and getOrCreateJavaChunk would make sense. Though that could be replaced with a bool argument at the cost of extra branching.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 14, 2025

that's probably a bug, i don't passable should return true for chunks that we don't have data for.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 14, 2025

Along with some api changes I removed the mutex locking from the jni because I think the thread safety should be all done in java and needs to be fixed

also can you push your baritone changes 🥺 👉 👈

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Mar 15, 2025

I've just pushed some of my baritone work onto my fork at https://github.com/underscore-zi/baritone I haven't made it work with your most recent commits though and there are still some TODOs in there.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 16, 2025

Pushed my changes to https://github.com/babbaj/baritone/tree/nether-elytra-update

Seems to work but there's a few bugs:

  • There's a bug in baritone region parsing in the overworld. Some call to decomp fails because "not enough data"
  • Reading baritone region files can not be disabled
  • Trying to start a path above the roof with elytraAllowAboveRoof disabled causes nether-pathfinder to exit the process with the message "retard"
  • When flying above the roof, if the goal is below the roof (goalxz will default to 64) baritone will keep trying to path through unloaded chunks to get below the roof.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 16, 2025

can't reproduce the baritone region crash. probably just a corrupt file or wasn't fully written to.

@underscore-zi
Copy link
Copy Markdown
Contributor Author

Yeah, I was going to comment later after I addressed some of the other things asking if it was a corrupt file. I've encountered that with a corrupted region file before.

A try/catch around parseBaritoneRegion has worked on my end and I think makes sense since that failure isn't really a non-recoverable issue.

@underscore-zi
Copy link
Copy Markdown
Contributor Author

There's a bug in baritone region parsing in the overworld. Some call to decomp fails because "not enough data"

I believe this is just if there are corrupted region files. Would make sense to not crash though, the try/catch mostly works for me, but I managed to get one terminate with a crafted bad region (access violation) so I want to figure that out.

Reading baritone region files can not be disabled

Pushed a setting to allow that to be disabled.

Trying to start a path above the roof with elytraAllowAboveRoof disabled causes nether-pathfinder to exit the process with the message "retard"

I also pushed some checks when starting a flight to ensure the player and goal are within appropriate bounds and prints a message letting the user know the limits.

When flying above the roof, if the goal is below the roof (goalxz will default to 64) baritone will keep trying to path through unloaded chunks to get below the roof.

This could be fixed by defaulting y to player.y or using a max(player.y, 64) to allow it to be used for potential cave escapes. The current landing code doesn't like goals high in the air though, it'll zoom in to the landing air column often overshooting and dying.

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Mar 17, 2025

This could be fixed by defaulting y to player.y or using a max(player.y, 64) to allow it to be used for potential cave escapes. The current landing code doesn't like goals high in the air though, it'll zoom in to the landing air column often overshooting and dying.

I would prefer changing the goal only when the player is close to it

@underscore-zi
Copy link
Copy Markdown
Contributor Author

underscore-zi commented Mar 26, 2025

I've pushed some changes over on the baritone fork, when setting a Y for a GoalXZ it will check if the player is on the nether roof and use 128 as default the goal Y if they are.

I did try for a more generic approach first of using the current Player Y as the default Y value. It fixed the immediate issue but flying in the lower part of the nether was a worse experience when it came to trying to get near enough to start landing. Similar issues with caves in the overworld.


The GoalXZ change is the main one for this PR. I made some landing code changes also but I imagine discussion around that should take place on a Baritone PR?

@0-x-2-2
Copy link
Copy Markdown
Collaborator

0-x-2-2 commented Apr 1, 2025

🔥

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Apr 2, 2025

gonna make sure this works on m1 then probably just merge it

@babbaj
Copy link
Copy Markdown
Owner

babbaj commented Apr 2, 2025

This could be fixed by defaulting y to player.y or using a max(player.y, 64) to allow it to be used for potential cave escapes. The current landing code doesn't like goals high in the air though, it'll zoom in to the landing air column often overshooting and dying.

I would prefer changing the goal only when the player is close to it

implemented here by changing the destination y if the player is on the wrong side of the rood
babbaj/baritone@cf3260c

I think we can also make a draft pr in baritone now

@babbaj babbaj merged commit b331813 into babbaj:master Apr 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants