Skip to content

Conversation

@LiJie20190102
Copy link
Contributor

@LiJie20190102 LiJie20190102 commented Nov 9, 2025

Purpose of this pull request

fix #10012. I previously submitted a PR, which is #10014 , but I didn't handle it properly, so I resubmitted it

Does this PR introduce any user-facing change?

How was this patch tested?

If the path is /data/setunnel, and the file structure example is:

/data/seatunnel/20241001/report.txt
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
/data/seatunnel/20241012/logo.png

If you only want to filter based on file names, simply write the regular file names; If you want to filter based on the file directory at the same time, the expression needs to start with path.

Example 1: Match all .txt files,Regular Expression:

.*.txt

The result of this example matching is:

/data/seatunnel/20241001/report.txt

Example 2: Match third level folders starting with 202410 and files ending with .csv, the Regular Expression:

/data/seatunnel/202410\d*/.*.csv

The result of this example matching is:

/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv

Check list

@LiJie20190102
Copy link
Contributor Author

@davidzollo Could you please help me review the code again

davidzollo
davidzollo previously approved these changes Nov 10, 2025
Copy link
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if CI passed

There are some examples.

File Structure Example:
If the `path` is `/data/setunnel`, and the file structure example is:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the `path` is `/data/setunnel`, and the file structure example is:
If the `path` is `/data/seatunnel`, and the file structure example is:

There are some examples.

File Structure Example:
If the `path` is `/data/setunnel`, and the file structure example is:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

There are some examples.

File Structure Example:
If the `path` is `/data/setunnel`, and the file structure example is:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 412 to 417
if (pattern.pattern().startsWith(path)) {
// filter based on the file directory at the same time
return pattern.matcher(fileStatus.getPath().getName()).matches();
}
// filter based on file names
return pattern.matcher(fileStatus.getPath().toUri().getPath()).matches();
Copy link
Member

@zhangshenghang zhangshenghang Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the logic reversed?

pattern.pattern().startsWith(path) = true ,Should we filter directories or files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. I have made improvements and the fileStatus in this method will only be files, not directories

@zhangshenghang
Copy link
Member

thanks @LiJie20190102
It is also necessary to ensure that CI can run normally and pass.

@LiJie20190102
Copy link
Contributor Author

thanks @LiJie20190102 It is also necessary to ensure that CI can run normally and pass.

90287c6523c66007a979fdb09c474a2a Can you help me check why it failed? I am referring to 'org.apache.seatunnel.connectors.seatunnel.file.writer.XmlReadStrategyTest', which can pass

Comment on lines 48 to 49
@Test
public void testJsonFilterPatternWithFilePath() throws URISyntaxException, IOException {
Copy link
Contributor

@dybyte dybyte Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me check why it failed? I am referring to 'org.apache.seatunnel.connectors.seatunnel.file.writer.XmlReadStrategyTest', which can pass

What do you think about using the annotation @DisabledOnOs(OS.WINDOWS) here? cc @davidzollo @zhangshenghang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me check why it failed? I am referring to 'org.apache.seatunnel.connectors.seatunnel.file.writer.XmlReadStrategyTest', which can pass

What do you think about using the annotation @DisabledOnOs(OS.WINDOWS) here? cc @davidzollo @zhangshenghang

@dybyte I agree with your idea, @LiJie20190102 but can you provide a screenshot of the successful run under Windows ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidzollo @zhangshenghang If I provide a locally successful snapshot, does it not require adding e2e?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Comment on lines 88 to 89
@Test
public void testJsonFilterPatternWithFileName() throws URISyntaxException, IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@davidzollo
Copy link
Contributor

Considering that this PR affects lots of file connectors. Please add related e2e test, thx

@github-actions github-actions bot added the e2e label Nov 12, 2025
@LiJie20190102
Copy link
Contributor Author

Considering that this PR affects lots of file connectors. Please add related e2e test, thx

e2e has already been added. @davidzollo @zhangshenghang

corgy-w
corgy-w previously approved these changes Nov 17, 2025
// skip hidden tmp directory, such as .hive-staging_hive
if (!fileStatus.getPath().getName().startsWith(".")) {
fileNames.addAll(getFileNamesByPath(fileStatus.getPath().toString()));
fileNames.addAll(getFileNamesByPath(fileStatus.getPath().toUri().getPath()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any problem with this change? If the file path is this: s3a://my-bucket/data/2025/11/report.csv, is there a problem with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration file:
image
Debugging results:
image
I think the logic is correct, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration file: image Debugging results: image I think the logic is correct, what do you think?

What was the previous result like? Before the modification, s3a:/ was not removed; after the modification, it is removed. Will there be any other impacts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, after careful consideration, this is the area where I adapted the code previously modified, and this code can be left unchanged

Assertions.assertEquals(0, extraCommands.getExitCode());
};

@TestTemplate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current S3 tests are disabled. Could you please modify the changes to enable the S3 tests as well? It should be similar to how MinIO supports the S3 protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Connector-v2][FTP] 'file_filterpattern' cannot filter directories

5 participants