Skip to content

Conversation

@franz1981
Copy link
Contributor

@franz1981 franz1981 commented Jul 2, 2024

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the cache, due to debugging capabilities, its finalizer will still retain a strong reference to it, causing few classes to leak (and eventually, their ClassLoader). Despite we cannot avoid finalizers to wait the finalization pass, we can reduce the memory footprint of "leaked" instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen

@franz1981
Copy link
Contributor Author

@franz1981
Copy link
Contributor Author

franz1981 commented Jul 2, 2024

@normanmaurer @chrisvest It would be great if we could just let FastThreadLocalThread which cleanupFastThreadLocals == true to not use any finalizer, trusting them, but maybe I'm making this too simple and there's still a reason to use finalizers for those, wdyt?

@franz1981
Copy link
Contributor Author

franz1981 commented Jul 2, 2024

FYI @normanmaurer @chrisvest I've added 72c4731b0b767cfb1864c60a757944fd1bb2450a which push the approach further by enabling opt-in to NOT use finalizers for FastThreadLocalThreads which take care of cleaning up their FastThreadLocals instances.

Copy link
Member

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the non-broken NettyLeak1 reproducer from #12749, and it looks like the issue is resolved. So… controversial suggestion: maybe we don't need to make the leak check and can always clear the FreeOnFinalize.cache field.

@chrisvest
Copy link
Member

Here's a fixed version of NettyLeak0 from #12749 which also shows no leaks:

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;

public class NettyLeak0 {

    public static void main(String[] args) throws Exception {
        System.setProperty("io.netty.allocator.useCacheForAllThreads", "true");
        System.setProperty("io.netty.allocator.maxOrder", "9");
        Thread.setDefaultUncaughtExceptionHandler((t, e) -> {
            e.printStackTrace();
            Runtime.getRuntime().halt(42);
        });

        int bufSize = 32 * 1024;

        AtomicInteger longLiving = new AtomicInteger();
        List<ByteBuf> longLivingSink = Collections.synchronizedList(new ArrayList<ByteBuf>());
        ArrayBlockingQueue<Thread> threads = new ArrayBlockingQueue<>(64);
        for (int i = 0; true; i++) {
            Thread thread = new Thread(() -> {
                for (int x = 0; x < 256; x++) {
                    if (x != 0) {
                        // allocate different size classes to fill more thread caches
                        for (int j = 0; j <= 10; j++) {
                            ByteBuf buf = PooledByteBufAllocator.DEFAULT.directBuffer(bufSize >> j);
                            buf.release();
                        }
                    } else {
                        // only one buf is alive after thread termination
                        ByteBuf buf = PooledByteBufAllocator.DEFAULT.directBuffer(1);
                        longLiving.incrementAndGet();
                        longLivingSink.add(buf);
                    }
                }
            });
            if (!threads.offer(thread)) {
                threads.take().join();
                threads.put(thread);
            }
            thread.start();

            if (i % 1000 == 0) {
                System.gc(); // to get a chance for io.netty.buffer.PoolThreadCache.finalize
                System.out.println(PooledByteBufAllocator.DEFAULT.metric());
                System.out.println(
                        "chunks count: " +
                                PooledByteBufAllocator.DEFAULT.metric().directArenas().stream()
                                        .flatMap(m -> m.chunkLists().stream())
                                        .mapToInt(m -> {
                                            int count = 0;
                                            for (PoolChunkMetric __ : m) {
                                                count += 1;
                                            }
                                            return count;
                                        })
                                        .sum()
                );
                int ll = longLiving.get();
                System.out.println("long living: " + ll);
                // real size of a buf will be > 1b & < 1kb, but just estimate it as 1kb
                System.out.println("long living size (MB): " + ll / 1024);
                System.out.println();
                longLivingSink.removeIf(buf -> buf.release() && decrement(longLiving));
            }
        }
    }

    private static boolean decrement(AtomicInteger longLiving) {
        longLiving.addAndGet(-1);
        return true;
    }
}

@franz1981
Copy link
Contributor Author

@normanmaurer wdyt about the @chrisvest suggestion?

maybe we don't need to make the leak check and can always clear the FreeOnFinalize.cache field.

@normanmaurer
Copy link
Member

@normanmaurer wdyt about the @chrisvest suggestion?

maybe we don't need to make the leak check and can always clear the FreeOnFinalize.cache field.

Sounds good to me

@franz1981 franz1981 marked this pull request as ready for review July 4, 2024 09:58
@franz1981
Copy link
Contributor Author

PTAL @chrisvest @normanmaurer

Not very happy that I've added a new sys prop here, but I want to keep it safe for existing users...
Who requires to keep on creating/stopping event loop threads (e.g. the dev mode in quarkus) can decide to avoid finalization and just set it to speedup dropping the previous classloader (@gsmet please try this for your issue)

@franz1981 franz1981 force-pushed the 4.1_classloader_leak branch from 14cbf9d to 9375b45 Compare July 4, 2024 10:01
@normanmaurer
Copy link
Member

PTAL @chrisvest @normanmaurer

Not very happy that I've added a new sys prop here, but I want to keep it safe for existing users... Who requires to keep on creating/stopping event loop threads (e.g. the dev mode in quarkus) can decide to avoid finalization and just set it to speedup dropping the previous classloader (@gsmet please try this for your issue)

That's good to me

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the cache, due to debugging
capabilities, it's finalizer will still retain a strong reference to it, causing few classes to leak (and eventually, their ClassLoader).
Despite we cannot avoid finalizers to wait the finalization pass, we can reduce the memory footprint of "leaked" instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen
@franz1981 franz1981 force-pushed the 4.1_classloader_leak branch from 9375b45 to f618686 Compare July 4, 2024 10:53
@franz1981
Copy link
Contributor Author

Let's see what the CI think @normanmaurer and thanks again!

@gsmet
Copy link

gsmet commented Jul 4, 2024

Let me have a look for my use case before merging. Thanks!

@gsmet
Copy link

gsmet commented Jul 4, 2024

I can confirm that I don't see the finalizers trying to load classes after my CL is closed, which looks promising!

gsmet added a commit to gsmet/quarkus that referenced this pull request Jul 4, 2024
…eads

This is going to be useful once
netty/netty#14155 lands in Quarkus.
@franz1981
Copy link
Contributor Author

So @normanmaurer , now just need CI and @chrisvest to be happy bout this

@franz1981
Copy link
Contributor Author

Windows seems the only one unhappy, likely for unrelated reasons; if you are good to go, I am done here @normanmaurer

@normanmaurer
Copy link
Member

I am happy if @chrisvest is happy... @chrisvest PTAL again

Copy link
Member

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@franz1981
Copy link
Contributor Author

@normanmaurer I can merge it myself, ok @normanmaurer ?
My first time merging a PR here (no jokes!)

@normanmaurer
Copy link
Member

@franz1981 sure... lets go. Please also cherry-pick it to the other related branches..

@normanmaurer normanmaurer added this to the 4.1.112.Final milestone Jul 9, 2024
@franz1981 franz1981 merged commit 8039232 into netty:4.1 Jul 9, 2024
franz1981 added a commit that referenced this pull request Jul 9, 2024
…ect (#14155)

Motivation:
If a cache's FastThreadLocalThread owner win the race to remove the
cache, due to debugging capabilities, its finalizer will still retain a
strong reference to it, causing few classes to leak (and eventually,
their ClassLoader). Despite we cannot avoid finalizers to wait the
finalization pass, we can reduce the memory footprint of "leaked"
instances before the finalization happen.

Modification:
non-debug early cache removal can remove the cache strong reference
within FreeOnFinalize, making it an emtpy shell, eligible for GC.

Result:
Smaller memory footprint while waiting finalization to happen
@huibinliupush
Copy link
Contributor

huibinliupush commented Sep 6, 2024

@normanmaurer @chrisvest It would be great if we could just let FastThreadLocalThread which cleanupFastThreadLocals == true to not use any finalizer, trusting them, but maybe I'm making this too simple and there's still a reason to use finalizers for those, wdyt?

hi @franz1981 , would you please tell me the reason why FastThreadLocalThread(cleanupFastThreadLocals == true) must use finalizer to clean their PoolThreadCache ?

    private static boolean useCacheFinalizers(Thread current) {
        if (!defaultDisableCacheFinalizersForFastThreadLocalThreads()) {
            return true;
        }
        // why  FastThreadLocalThread(cleanupFastThreadLocals == true) must use finalizer ?
        return current instanceof FastThreadLocalThread &&
               ((FastThreadLocalThread) current).willCleanupFastThreadLocals();
    }

In my opinion, when the FastThreadLocalThread(cleanupFastThreadLocals == true) exits, FastThreadLocal.removeAll() will be called, and in the onRemoval method of PoolThreadLocalCache, the PoolThreadCache will be freed . there is no need to use any finalizer 。

        @Override
        protected void onRemoval(PoolThreadCache threadCache) {
            threadCache.free(false);
        }

Why can't we just do this?

    private static boolean useCacheFinalizers() {
       return !defaultDisableCacheFinalizersForFastThreadLocalThreads();
    }

gsmet added a commit to gsmet/quarkus that referenced this pull request Jan 7, 2025
…eads

This is going to be useful once
netty/netty#14155 lands in Quarkus.
gsmet added a commit to gsmet/quarkus that referenced this pull request Jan 16, 2025
…eads

This is going to be useful once
netty/netty#14155 lands in Quarkus.
carlesarnal pushed a commit to carlesarnal/quarkus that referenced this pull request Feb 6, 2025
lasteris pushed a commit to lasteris/quarkus that referenced this pull request Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants