Skip to content

[QUESTION] deadlock/starvation with MSI #2601

@ecki

Description

@ecki

Question

In our product (which uses the Karaf OSGI container (has therefore a bit more complex class loader structure) custom connection pool) we see regular hangs on startup on a low-cpu Kubernetes pod. It resolves sometimes after hours but does not work reliable.

I havent found the real issue yet, however we do see a hanging MSI authentication in the common worker pool.

Question: since it is a known issue of MSAL that the common pool can cause problems - and some code of the msjdbc already uses a dedicated executor (looks like vaul/client encryption uses it?), would it make sense to also specify a pool for the auth?

URL

jdbc:sqlserver://<redacted>.database.windows.net:1433;database=<redacted>;
    authentication=ActiveDirectoryManagedIdentity;user=<redacted>;
    sendStringParametersAsUnicode=false;encrypt=true;trustServerCertificate=false;
    hostNameInCertificate=*.database.windows.net;loginTimeout=30

The loginTimeout does not help.

SQLServerSecurityUtility.getManagedIdentityCredAuthToken

Here is a partial stack trace of the "hanging" thread.

"ForkJoinPool.commonPool-worker-1" #122 [709] daemon prio=5 os_prio=0 cpu=272.54ms elapsed=936.64s tid=0x00007f955392b160 nid=709 waiting on condition  [0x00007f94e6fec000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x0000000690a0c728> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:221)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:754)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly([email protected]/AbstractQueuedSynchronizer.java:1099)
	at java.util.concurrent.CountDownLatch.await([email protected]/CountDownLatch.java:230)
	at reactor.core.publisher.BlockingOptionalMonoSubscriber.blockingGet(BlockingOptionalMonoSubscriber.java:112)
	at reactor.core.publisher.Mono.blockOptional(Mono.java:1831)
	at com.microsoft.sqlserver.jdbc.SQLServerSecurityUtility.getManagedIdentityCredAuthToken(SQLServerSecurityUtility.java:382)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:6041)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:5995)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:5829)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:335)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:130)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:6888)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:5434)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:5366)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7745)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4391)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3828)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:3385)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:3194)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1971)
	at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1263)

I would open a bug report, but our analysis is still early/murky.

We think due to the dynamic sizing of the pool the issue is more prominent on 1-2 vCore Pods.

Metadata

Metadata

Assignees

Labels

QuestionUsed when a question is asked, as opposed to an issue being raisedWaiting for ResponseWaiting for a reply from the original poster, or affiliated party

Type

No type

Projects

Status

Closed Issues

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions