Skip to content

The HikariPool intermittently drops to zero, because of single thread connection-adder it's causing HikariCP to fail in creating new connections #2274

@001mertiya

Description

@001mertiya

Hi @brettwooldridge,

We are encountering issues where the total number of connections gradually decreases from the maximum pool size to zero. This situation is exacerbated by a single-threaded connection adder, which is causing HikariCP to fail in renewing or creating new connections.

2024-11-19 10:53:55.674 ERROR [MGQL Discovery Service,d27a993960a2d692,d27a993960a2d692] 1 --- [   scheduling-1] o.h.engine.jdbc.spi.SqlExceptionHelper   : wm-jdbc-pool-1 - Connection is not available, request timed out after 60000ms.
2024-11-19 10:53:55.677 ERROR [MGQL Discovery Service,d27a993960a2d692,d27a993960a2d692] 1 --- [   scheduling-1] c.w.s.k.kstar.services.DBMonitorService  : DBMonitor failed

org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:467) ~[spring-orm-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:595) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:382) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:174) ~[spring-data-jpa-2.7.18.jar!/:2.7.18]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at jdk.proxy2/jdk.proxy2.$Proxy301.count(Unknown Source) ~[na:na]
at com.walmart.store.kgraph.kstar.services.DBMonitorService.check(DBMonitorService.java:52) ~[mgql-registry-core-0.524.0.jar!/:0.524.0]
	at jdk.internal.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:798) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.cloud.sleuth.instrument.scheduling.TraceSchedulingAspect.traceBackgroundThread(TraceSchedulingAspect.java:70) ~[spring-cloud-sleuth-instrumentation-3.1.10.jar!/:3.1.10]
	at jdk.internal.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:634) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:624) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:72) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:707) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at com.walmart.store.kgraph.kstar.services.DBMonitorService$$EnhancerBySpringCGLIB$$154d6a6b.check(<generated>) ~[mgql-registry-core-0.524.0.jar!/:0.524.0]
	at jdk.internal.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.3.31.jar!/:5.3.31]
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.31.jar!/:5.3.31]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:37) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.internal.SessionImpl.connection(SessionImpl.java:516) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:152) ~[spring-orm-5.3.31.jar!/:5.3.31]
	at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:421) ~[spring-orm-5.3.31.jar!/:5.3.31]
	... 49 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: wm-jdbc-pool-1 - Connection is not available, request timed out after 60000ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) ~[HikariCP-4.0.3.jar!/:na]
	at com.walmart.platform.data.walmartjdbc.jdbc.WmtDataSource.getConnection(WmtDataSource.java:36) ~[walmart-jdbc-connector-1.0.42.jar!/:na]
	at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	... 53 common frames omitted

When the total connections reach 0, all requests to the connection pool fail due to connection not available and the following exception is thrown:

org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60002ms.

Here's the details for our application:

socket timeout: 77 seconds
login timeout: 85 seconds
hikari connection timeout: 60 seconds

Hikari version: 4.0.3
mssql-jdbc version: 12.6.3.jre8
msal4j version: 1.15.1

Issue Details:

We have observed multiple instances where connections are gradually drained, resulting in no available connections in the pool to handle incoming traffic. Consequently, requests are left waiting for connections and ultimately time out. Ideally, Hikari should maintain a minimum number of connections, with the default minimum set at 10. This issue has been noted in a limited number of pods and may be triggered by network problems or server degradation. Total, idle and actrive connections are zero for hours, as you can see in the below graph. To address this situation, we had to terminate the affected pods.

Screenshot 2024-12-17 at 5 22 41 PM

Captured a thread dump from one of the impacted pods for analysis. It appears that a single-threaded connection adder is responsible for adding connections to the pool when the count falls below the minimum idle threshold. However, this adder became unresponsive while waiting to establish a connection. Below is the complete stack trace for reference.

"wm-jdbc-pool-1 connection adder" #130175 daemon prio=5 os_prio=0 cpu=12.23ms elapsed=13998.58s tid=0x000070cf721939f0 nid=0x1fc8a waiting on condition [0x000070cf68dfc000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.8.1.101/Native Method)
- parking to wait for <0x00000006d5c03260> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.8.1.101/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.8.1.101/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.8.1.101/ForkJoinPool.java:3465)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.8.1.101/ForkJoinPool.java:3436)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.8.1.101/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@17.0.8.1.101/CompletableFuture.java:2072)
at com.microsoft.sqlserver.jdbc.SQLServerMSAL4JUtils.getSqlFedAuthToken(SQLServerMSAL4JUtils.java:96)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:6022)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:5989)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:5823)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:322)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:130)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:6881)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:5428)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:5360)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7748)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4410)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3847)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:3372)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:3197)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1971)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1263)
at com.walmart.platform.data.walmartjdbc.jdbc.WalmartJDBCDriver.connect(WalmartJDBCDriver.java:73)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:121)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
at com.zaxxer.hikari.pool.HikariPool.access$100(HikariPool.java:71)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:726)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:712)
at java.util.concurrent.FutureTask.run(java.base@17.0.8.1.101/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1.101/ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1.101/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17.0.8.1.101/Thread.java:833)

The call to future.get() is a blocking call that waits for the result and if it takes too long or fails to complete, it blocks the current thread indefinitely.

We had a call with mssql-jdbc developers and they agreed that calls to MSAL authentication library done from the mssql-jdbc java driver code use future.get(), potentially leading to hangs in case the library does not receive timely response from the server. All these calls done as part of getFedAuthToken need to have timeouts imposed. And they are working on it as part of this PR.

I was considering that Hikari should be resilient enough to handle these situations, this can happen with any other underlying JDBC driver. It would be beneficial for Hikari to execute the newPoolEntry in a new thread and make the operation time bound. @brettwooldridge What are your thoughts on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions