Skip to content

[Feature] Decouple SeaTunnel Client and Server SEATUNNEL_HOME#10431

Open
LiJie20190102 wants to merge 6 commits intoapache:devfrom
LiJie20190102:feature_decouple_st_home
Open

[Feature] Decouple SeaTunnel Client and Server SEATUNNEL_HOME#10431
LiJie20190102 wants to merge 6 commits intoapache:devfrom
LiJie20190102:feature_decouple_st_home

Conversation

@LiJie20190102
Copy link
Collaborator

@LiJie20190102 LiJie20190102 commented Feb 1, 2026

Purpose of this pull request

Implement feature #10167

Does this PR introduce any user-facing change?

How was this patch tested?

1、Deploy the master node and the slave node on the same machine, set ST_HOME to /opt/soft/st0201/server, and configure the network settings in hazelcast-master.yaml and hazelcast-worker.yaml
2、Deploy the client on another server, with ST_HOME set to /opt/soft/st0201/client, and configure the network settings in hazelcast-client.yaml. This ensures that the package paths for the server and client are different.
3、Submit the task on the client side, such as bin/seatunnel.sh --config config/v2.batch.config.template . After submitting the task, check whether it is executed successfully, and ensure that the server logs do not contain any client file resources

Self-test results

  • Before

    1 client log:

image 2 server master log: 14B051F8C4C1DF729C55470778425E5E
  • After
  1. client log:
AA9CD784258FF87D3C18D08FCA0B2C6C 2. server master log 23FAECAA699F5B0839B9E896EFDEDB6D 3. server worker log 06A7B960FF59587217B06CA80B8D38D7

Check list

@davidzollo
Copy link
Contributor

Thanks for the contribution! This feature is very valuable for flexible deployment scenarios where client and server paths differ.

However, after a detailed review, I found a critical bug in the ClassLoader cache management that will cause memory leaks, and a robustness issue in the path resolver.

1. Critical: Key Mismatch in ClassLoader Cache (Memory Leak)

There is an inconsistency in how the cache key is calculated between getting and releasing the classloader.

  • In getClassLoader (Line 71), the key is calculated using the primitive jars (which contain the logical path with $SEATUNNEL_HOME).
  • In releaseClassLoader (Lines 112-123), you resolve the paths to absolute paths before calculating the key.
// DefaultClassLoaderService.java

// In getClassLoader:
String key = covertJarsToKey(jars); // Key is based on "file:$SEATUNNEL_HOME/..."

// In releaseClassLoader:
Collection<URL> resolvedJars = new ArrayList<>(jars);
PathResolver.resolvePathEnv(resolvedJars); 
// ...
String key = covertJarsToKey(resolvedJars); // Key is based on "file:/opt/seatunnel/..."

Consequence: The keys will never match. releaseClassLoader will fail to look up the entry in classLoaderCache. The reference count will never decrement, and the ClassLoader will never be released, leading to a Permanent Generation memory leak for long-running processes.

Fix Suggestion: In releaseClassLoader, do not use resolvedJars to generate the key. Use the original jars collection to ensure the key matches the one put into the map during getClassLoader.

2. NPE Risk in PathResolver

In PathResolver.java, Common.getSeaTunnelHome() can return null if the environment is not set up correctly.

// PathResolver.java
String home = Common.getSeaTunnelHome();
// ...
String normalizedHome = new File(home).getAbsolutePath(); // This throws NPE if home is null

Fix Suggestion: Please add a null check for home. If SEATUNNEL_HOME is not found, it should either skip the replacement logic safely or throw a descriptive exception rather than a raw NullPointerException.

3. Testing

I highly recommend adding a test case (perhaps in PathResolverTest or an IT) that explicitly sets SEATUNNEL_HOME to null to verify robustness.

@DanielCarter-stack
Copy link

Issue 1: Inconsistent URL Construction Leading to Path Format Mismatch

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/PathResolver.java:75 and :132

Problem Description:
replacePathWithEnv and resolvePathEnv use different methods to construct URLs, resulting in inconsistent URL formats:

  • replacePathWithEnv: new URL(url.getProtocol(), url.getHost(), url.getPort(), newPath)
    • For file: protocol, generates file:$SEATUNNEL_HOME/lib/xxx.jar
  • resolvePathEnv: fullPath.toUri().toURL()
    • Generates file:///opt/seatunnel/lib/xxx.jar (three slashes)

Potential Risks:

  • When URLs travel between client and server multiple times (e.g., task retry, recovery), path formats may be inconsistent
  • Some code that depends on URL format (such as jar.toURI().getPath()) may behave inconsistently under different formats

Impact Scope:

  • Direct impact: Two methods of PathResolver
  • Indirect impact: All code using action.getJarUrls() and DefaultClassLoaderService
  • Affected area: Core framework (ClassLoader)

Severity: MAJOR

Improvement Suggestions:

// In replacePathWithEnv, keep URL format consistent
public static URL replacePathWithEnv(URL url) {
    // ... existing logic ...
    
    if (normalizedPath.startsWith(normalizedHome)) {
        String relativePath = normalizedPath.substring(normalizedHome.length());
        // Consistent with resolvePathEnv: use URI constructor
        String newPath = SEATUNNEL_HOME_VAR + "/" + relativePath.replace(File.separatorChar, '/');
        try {
            // Use URI constructor to build URL, ensure format consistency
            return new java.net.URI(url.getProtocol(), url.getHost(), newPath, null).toURL();
        } catch (MalformedURLException | URISyntaxException e) {
            throw new RuntimeException("Failed to create logical URL for: " + url, e);
        }
    }
    return url;
}

Rationale: Uniformly use URI to construct URLs, ensuring URL formats generated by replacePathWithEnv and resolvePathEnv are consistent.


Issue 2: Redundant Path Prefix Handling Logic in resolvePathEnv

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/PathResolver.java:116-128

Problem Description:
The code removes leading slashes from paths twice, resulting in redundant logic and prone to errors:

  1. First time: Remove leading slash from path
  2. After replacing $SEATUNNEL_HOME
  3. Second time: Remove leading slash from relativePath

Potential Risks:

  • If the input path format does not meet expectations (e.g., file:/$SEATUNNEL_HOME/lib/xxx.jar), it may produce incorrect results
  • Poor code readability and difficult to maintain

Impact Scope:

  • Direct impact: PathResolver.resolvePathEnv()
  • Indirect impact: Server-side ClassLoader creation
  • Affected area: Core framework

Severity: MINOR

Improvement Suggestions:

public static URL resolvePathEnv(URL url) {
    String path = url.getPath();
    if (!path.contains(SEATUNNEL_HOME_VAR)) {
        return url;
    }

    String home = Common.getSeaTunnelHome();
    if (StringUtils.isBlank(home)) {
        return url;
    }

    // Handle path prefix uniformly
    // Normalize path: remove leading slashes, replace variables
    String cleanPath = path.startsWith("/") ? path.substring(1) : path;
    String relativePath = cleanPath.replace(SEATUNNEL_HOME_VAR, "");
    relativePath = relativePath.replaceAll("^/+", "");  // Remove all leading slashes
    
    Path fullPath = Paths.get(home, relativePath);
    try {
        return fullPath.toUri().toURL();
    } catch (MalformedURLException e) {
        throw new RuntimeException("Failed to resolve logical URL for: " + url, e);
    }
}

Rationale: Simplify logic, unify path prefix handling, improve code readability and maintainability.


Issue 3: PathResolver's Collection Modification Methods May Cause Concurrency Issues

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/PathResolver.java:43-51 and :89-97

Problem Description:
replacePathWithEnv(Collection<URL>) and resolvePathEnv(Collection<URL>) modify the passed-in collections:

urls.clear();
urls.addAll(replaced);

Potential Risks:

  1. If the caller passes a shared collection, it may lead to unexpected modifications
  2. If called in a multi-threaded environment, it may lead to ConcurrentModificationException
  3. In current code, ClientJobExecutionEnvironment uses new HashSet<>() to create a new collection, so it is safe
  4. But replaceActionJarUrls(action.getJarUrls()) directly modifies the internal collection of Action

Impact Scope:

  • Direct impact: Collection methods of PathResolver
  • Indirect impact: ClientJobExecutionEnvironment and DAGUtils
  • Affected area: Client and server

Severity: MAJOR

Improvement Suggestions:

/**
 * Replaces the absolute path of SEATUNNEL_HOME in the given URLs with a logical variable.
 * Note: This method modifies the collection in-place. If you need to preserve the original
 * collection, create a copy before calling this method.
 *
 * @param urls The collection of absolute URLs (will be modified)
 */
public static void replacePathWithEnv(Collection<URL> urls) {
    if (urls == null || urls.isEmpty()) {
        return;
    }
    List<URL> replaced = urls.stream().map(PathResolver::replacePathWithEnv).collect(Collectors.toList());
    urls.clear();
    urls.addAll(replaced);
}

// Or, provide a version that does not modify the collection
public static Collection<URL> replacePathWithEnvCopy(Collection<URL> urls) {
    if (urls == null || urls.isEmpty()) {
        return urls;
    }
    return urls.stream()
               .map(PathResolver::replacePathWithEnv)
               .collect(Collectors.toList());
}

Rationale:

  1. Clearly state in JavaDoc that the method modifies the collection
  2. Provide alternative methods that do not modify the collection, allowing the caller to choose
  3. Or, change to not modify the collection, letting the caller handle the return value

Issue 4: DefaultClassLoaderService's Error Message Is Misleading

Location: seatunnel-engine/seatunnel-engine-core/src/main/java/org/apache/seatunnel/engine/core/classloader/DefaultClassLoaderService.java:78-86

Problem Description:
When a JAR file does not exist, the error message mentions:

please ensure that the deployment paths of SeaTunnel on different nodes are consistent.

But in the new design, client and server paths can be different (decoupled through SEATUNNEL_HOME).

Potential Risks:

  • Users seeing the error message will try to make all nodes' paths consistent
  • But the root cause of the problem may be:
    1. Server-side SEATUNNEL_HOME is configured incorrectly
    2. Server is missing certain JAR files
    3. Path resolution failure

Impact Scope:

  • Direct impact: DefaultClassLoaderService
  • Indirect impact: User efficiency in troubleshooting issues
  • Affected area: Operations and debugging

Severity: MINOR

Improvement Suggestions:

if (!file.exists()) {
    String host = ((NodeEngineImpl) nodeEngine).getNode().getThisAddress().getHost();
    String jarPath = jar.toString();
    String errorMsg = String.format(
        "The jar file %s cannot be found on node %s. " +
        "Current SEATUNNEL_HOME: %s. " +
        "Please ensure: " +
        "1) SEATUNNEL_HOME is correctly configured on this node, " +
        "2) The connector JARs are installed in $SEATUNNEL_HOME/connectors or $SEATUNNEL_HOME/plugins, " +
        "3) Run 'bin/install-plugin.sh' to install connectors if needed.",
        jarPath, host, Common.getSeaTunnelHome());
    throw new ClassLoaderException(ClassLoaderErrorCode.NOT_FOUND_JAR, errorMsg);
}

Rationale: Provide more accurate and helpful error messages to help users quickly locate problems.


Issue 5: PathResolver's Exception Handling Is Too Broad

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/PathResolver.java:77 and :134

Problem Description:
Wrapping MalformedURLException as RuntimeException, error messages are not detailed enough.

Potential Risks:

  • Callers cannot obtain sufficient context information to diagnose problems
  • Cannot distinguish which URL processing failed
  • Cannot provide targeted fix suggestions

Impact Scope:

  • Direct impact: PathResolver
  • Indirect impact: All callers
  • Affected area: Core framework

Severity: MINOR

Improvement Suggestions:

public static URL replacePathWithEnv(URL url) {
    // ... existing logic ...
    
    try {
        return new URL(url.getProtocol(), url.getHost(), url.getPort(), newPath);
    } catch (MalformedURLException e) {
        throw new RuntimeException(
            String.format(
                "Failed to create logical URL. Original URL: %s, New path: %s, " +
                "SEATUNNEL_HOME: %s. Please check if the URL format is valid.",
                url, newPath, Common.getSeaTunnelHome()),
            e);
    }
}

public static URL resolvePathEnv(URL url) {
    // ... existing logic ...
    
    try {
        return fullPath.toUri().toURL();
    } catch (MalformedURLException e) {
        throw new RuntimeException(
            String.format(
                "Failed to resolve logical URL. Logical URL: %s, Resolved path: %s, " +
                "SEATUNNEL_HOME: %s. Please check if SEATUNNEL_HOME is correctly configured.",
                url, fullPath, Common.getSeaTunnelHome()),
            e);
    }
}

Rationale: Provide more detailed error information, including original URL, processed path, SEATUNNEL_HOME value, etc., to help diagnose problems.


Issue 6: Missing Handling for SEATUNNEL_HOME Being Empty or Containing Special Characters

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/PathResolver.java

Problem Description:
The code checks for StringUtils.isBlank(home), but does not handle the following cases:

  1. SEATUNNEL_HOME contains spaces (e.g., C:/Program Files/SeaTunnel)
  2. SEATUNNEL_HOME contains special characters (e.g., Chinese, URL-encoded characters)
  3. SEATUNNEL_HOME is a relative path (e.g., ../seatunnel)

Potential Risks:

  • On Windows systems, if SEATUNNEL_HOME contains spaces, the constructed URL may be invalid
  • If the path contains non-ASCII characters, it may cause serialization/deserialization failures

Impact Scope:

  • Direct impact: PathResolver
  • Indirect impact: Cross-platform compatibility
  • Affected area: Windows users and non-English environment users

Severity: MINOR

Improvement Suggestions:

public static URL resolvePathEnv(URL url) {
    String path = url.getPath();
    if (!path.contains(SEATUNNEL_HOME_VAR)) {
        return url;
    }

    String home = Common.getSeaTunnelHome();
    if (StringUtils.isBlank(home)) {
        return url;
    }
    
    // Normalize home path
    try {
        Path normalizedHome = Paths.get(home).normalize();
        if (!Files.exists(normalizedHome)) {
            throw new RuntimeException(
                String.format("SEATUNNEL_HOME does not exist: %s", home));
        }
        home = normalizedHome.toAbsolutePath().toString();
    } catch (InvalidPathException e) {
        throw new RuntimeException(
            String.format("Invalid SEATUNNEL_HOME: %s. Error: %s", home, e.getMessage()), e);
    }

    // ... continue processing ...
}

Rationale: Validate and normalize SEATUNNEL_HOME before processing paths, detecting configuration issues early.


Issue 7: Incomplete Test Coverage

Location: seatunnel-common/src/test/java/org/apache/seatunnel/common/utils/PathResolverTest.java

Problem Description:
Although existing tests cover basic functionality, the following edge case tests are missing:

  1. Path exactly equals SEATUNNEL_HOME (relativePath is empty)
  2. JAR file is in SEATUNNEL_HOME root directory (e.g., $SEATUNNEL_HOME/xxx.jar)
  3. SEATUNNEL_HOME contains special characters (spaces, Chinese, etc.)
  4. Abnormal URL format (e.g., file:/ followed by no path)
  5. Windows long paths (e.g., \\?\C:\...)

Potential Risks:

  • Under certain edge conditions, code may produce unexpected results
  • Users may encounter issues in production when facing untested scenarios

Impact Scope:

  • Direct impact: Test coverage
  • Indirect impact: Code quality
  • Affected area: All code using PathResolver

Severity: MINOR

Improvement Suggestions:

@Test
public void testJarInRootOfSeaTunnelHome() throws MalformedURLException {
    String home = "/opt/seatunnel";
    System.setProperty("SEATUNNEL_HOME", home);
    Common.setSeaTunnelHome(home);

    String jarPath = home + "/plugin.jar";
    URL absoluteUrl = new File(jarPath).toURI().toURL();
    URL logicalUrl = PathResolver.replacePathWithEnv(absoluteUrl);

    Assertions.assertEquals("$SEATUNNEL_HOME/plugin.jar", logicalUrl.getPath());
}

@Test
public void testSeaTunnelHomeWithSpaces() throws MalformedURLException {
    String home = "/opt/sea tunnel/home";
    System.setProperty("SEATUNNEL_HOME", home);
    Common.setSeaTunnelHome(home);

    String jarPath = home + "/lib/test.jar";
    URL absoluteUrl = new File(jarPath).toURI().toURL();
    URL logicalUrl = PathResolver.replacePathWithEnv(absoluteUrl);

    Assertions.assertTrue(logicalUrl.getPath().contains("$SEATUNNEL_HOME"));
}

@Test
public void testEmptyRelativePath() throws MalformedURLException {
    // Test when path exactly equals SEATUNNEL_HOME
    String home = "/opt/seatunnel";
    System.setProperty("SEATUNNEL_HOME", home);
    Common.setSeaTunnelHome(home);

    // This case is not very practical, but test boundary conditions
    URL absoluteUrl = new File(home).toURI().toURL();
    URL logicalUrl = PathResolver.replacePathWithEnv(absoluteUrl);

    // Should return the original URL, because JAR file should not be a directory
    Assertions.assertEquals(absoluteUrl, logicalUrl);
}

Rationale: Improve test coverage to ensure code works correctly under various edge conditions.


Issue 8: Missing Documentation

Location: PR overall

Problem Description:

  1. PathResolver class lacks class-level JavaDoc, does not explain its purpose and usage scenarios
  2. PR does not include user documentation updates (e.g., configuration instructions for separate deployment)
  3. Code comments mention "Handle Windows backslashes", but do not explain the complete cross-platform support solution

Potential Risks:

  • Developers do not know when to use replacePathWithEnv and when to use resolvePathEnv
  • Users do not know how to configure SEATUNNEL_HOME for separate deployment
  • Maintainers find it difficult to understand the design intent of the code

Impact Scope:

  • Direct impact: Maintainability
  • Indirect impact: User adoption rate
  • Affected area: All users and developers

Severity: MINOR

Improvement Suggestions:

/**
 * Utility class for resolving SeaTunnel paths between client and server.
 * 
 * <p>In a distributed deployment where the client and server have different
 * SEATUNNEL_HOME paths, JAR file paths must be translated between the two environments.
 * This class provides methods to:
 * <ul>
 *   <li>Replace absolute paths with logical variables (on the client side)</li>
 *   <li>Resolve logical variables back to absolute paths (on the server side)</li>
 * </ul>
 * 
 * <h3>Usage Example</h3>
 * <pre>
 * // Client side (before serialization)
 * URL clientJar = new URL("file:/opt/client/lib/connector.jar");
 * URL logicalJar = PathResolver.replacePathWithEnv(clientJar);
 * // logicalJar is now: file:$SEATUNNEL_HOME/lib/connector.jar
 * 
 * // Server side (after deserialization)
 * URL serverJar = PathResolver.resolvePathEnv(logicalJar);
 * // serverJar is now: file:/opt/server/lib/connector.jar
 * </pre>
 * 
 * @since 2.3.5
 */
public class PathResolver {
    // ...
}

Rationale: Provide clear class-level documentation explaining purpose, usage scenarios, and examples.


davidzollo
davidzollo previously approved these changes Feb 2, 2026
Copy link
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
LGTM

@LiJie20190102
Copy link
Collaborator Author

The latest code test results:
image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants