Skip to content

[Feature] [Connector-V2] Add MQTT Sink Connector#10575

Merged
corgy-w merged 8 commits intoapache:devfrom
assokhi:feature/mqtt-connector
Mar 25, 2026
Merged

[Feature] [Connector-V2] Add MQTT Sink Connector#10575
corgy-w merged 8 commits intoapache:devfrom
assokhi:feature/mqtt-connector

Conversation

@assokhi
Copy link
Copy Markdown
Contributor

@assokhi assokhi commented Mar 8, 2026

Purpose of this pull request

Resolves #9566

This pull request introduces a new MQTT Sink Connector for Apache SeaTunnel V2. It enables high-performance, distributed data integration with IoT endpoints and message brokers using the MQTT 3.1.1 protocol.

Core Engineering Highlights

  • Subtask Multiplexing: Programmatically generates unique clientId values by appending the engine's subtask index. This prevents broker-side connection hijacking during parallel execution.
  • Memory Efficiency: Uses MemoryPersistence from the Eclipse Paho client to maximize throughput and avoid disk I/O overhead in containerized environments.
  • Network Resilience: Implements localized retry loops and MqttCallback triggers to handle transient network interruptions gracefully.
  • Serialization Support: Native integration with SeaTunnel's JSON and Text serialization schemas.

Registry & Integration Updates
The following project configuration files were updated to properly register the new connector:

  • seatunnel-connectors-v2/pom.xml — Registered the connector-mqtt module.
  • seatunnel-e2e/seatunnel-connector-v2-e2e/pom.xml — Registered the E2E test module.
  • plugin-mapping.properties — Added seatunnel.sink.MQTT mapping.
  • seatunnel-dist/pom.xml — Added dependency with provided scope for the distribution build.
  • .github/workflows/labeler/label-scope-conf.yml — Configured automated GitHub labeling.
  • config/plugin_config — Registered the MQTT sink for startup scripts and plugin loading.

Does this PR introduce any user-facing change?

Yes. This PR introduces a new MQTT Sink Connector that allows users to publish SeaTunnel pipeline output to MQTT brokers.

Example configuration:

sink {
  MQTT {
    url = "tcp://localhost:1883"
    topic = "seatunnel/telemetry"
    username = "admin"
    password = "password"
    qos = 1
    format = "json"
  }
}

@DanielCarter-stack
Copy link
Copy Markdown

Issue 1: Exception handling does not comply with project specifications

Location: MqttSinkWriter.java:85

throw new RuntimeException("Failed to connect MQTT client [" + clientId + "]", e);

Related Context:

  • Parent interface: SinkWriter.java (seatunnel-api)
  • Reference implementation: KafkaConnectorException.java (connector-kafka)
  • Caller: MqttSink.createWriter() (MqttSink.java:42-44)

Issue Description:
Using generic RuntimeException instead of custom ConnectorException class violates SeaTunnel's error handling specification. Other Connectors (such as Kafka, JDBC) define specialized exception classes inheriting from SeaTunnelRuntimeException.

Potential Risks:

  • Error type is unclear, making targeted handling difficult for upper layers
  • Missing error codes, unable to internationalize error messages
  • Inconsistent with other Connectors in the project, increasing maintenance costs

Scope of Impact:

  • Direct impact: MqttSinkWriter constructor
  • Indirect impact: JobMaster's error handling logic
  • Impact scope: Single Connector

Severity: MAJOR

Improvement Suggestions:

  1. Create custom exception class:
// seatunnel-connectors-v2/connector-mqtt/src/main/java/org/apache/seatunnel/connectors/seatunnel/mqtt/exception/MqttConnectorErrorCode.java
package org.apache.seatunnel.connectors.seatunnel.mqtt.exception;

import org.apache.seatunnel.common.exception.SeaTunnelErrorCode;

public enum MqttConnectorErrorCode implements SeaTunnelErrorCode {
    CONNECTION_FAILED("MQTT-01", "MQTT connection failed"),
    PUBLISH_FAILED("MQTT-02", "MQTT message publish failed"),
    INVALID_CONFIG("MQTT-03", "Invalid MQTT configuration");
    
    private final String code;
    private final String description;
    
    // Constructor and getter
}
// seatunnel-connectors-v2/connector-mqtt/src/main/java/org/apache/seatunnel/connectors/seatunnel/mqtt/exception/MqttConnectorException.java
package org.apache.seatunnel.connectors.seatunnel.mqtt.exception;

import org.apache.seatunnel.common.exception.SeaTunnelErrorCode;
import org.apache.seatunnel.common.exception.SeaTunnelRuntimeException;

public class MqttConnectorException extends SeaTunnelRuntimeException {
    public MqttConnectorException(SeaTunnelErrorCode errorCode, String errorMessage) {
        super(errorCode, errorMessage);
    }
    
    public MqttConnectorException(SeaTunnelErrorCode errorCode, String errorMessage, Throwable cause) {
        super(errorCode, errorMessage, cause);
    }
}
  1. Modify MqttSinkWriter:
// Import new exception class
import org.apache.seatunnel.connectors.seatunnel.mqtt.exception.MqttConnectorException;
import org.apache.seatunnel.connectors.seatunnel.mqtt.exception.MqttConnectorErrorCode;

// In constructor
} catch (MqttException e) {
    throw new MqttConnectorException(
        MqttConnectorErrorCode.CONNECTION_FAILED,
        "Failed to connect MQTT client [" + clientId + "]",
        e);
}

// In write method
throw new IOException(
    new MqttConnectorException(
        MqttConnectorErrorCode.PUBLISH_FAILED,
        "Failed to publish MQTT message after " + retryTimeoutMs + "ms")
        .getMessage(),
    lastException);

Rationale: Follow project specifications to improve consistency and maintainability of error handling.


Issue 2: Missing QoS parameter validation

Location: MqttSinkOptions.java:49-53 and MqttSinkWriter.java:61

// MqttSinkOptions.java
public static final Option<Integer> QOS =
        Options.key("qos")
                .intType()
                .defaultValue(1)
                .withDescription("MQTT QoS level: 0 (at-most-once), 1 (at-least-once)");

// MqttSinkWriter.java - No validation
this.qos = pluginConfig.get(MqttSinkOptions.QOS);

Related Context:

  • Caller: MqttSinkWriter constructor
  • Reference implementation: KafkaSinkOptions has parameter validation
  • Related method: MqttSinkWriter.write() line 93 uses qos

Issue Description:
Code comments indicate only QoS 0 and 1 are supported, but user input is not validated. If user configures qos=2, it will cause runtime errors or undefined behavior.

Potential Risks:

  • User configuration errors are discovered at runtime instead of startup
  • Error messages are unclear (Paho Client errors may not be intuitive)
  • Violates "fail-fast" principle

Scope of Impact:

  • Direct impact: MqttSinkWriter.write()
  • Indirect impact: Job startup failure
  • Impact scope: Single Connector

Severity: MAJOR

Improvement Suggestions:

// Add validation in MqttSinkWriter constructor
public MqttSinkWriter(
        SinkWriter.Context context, SeaTunnelRowType rowType, ReadonlyConfig pluginConfig) {
    this.topic = pluginConfig.get(MqttSinkOptions.TOPIC);
    this.qos = pluginConfig.get(MqttSinkOptions.QOS);
    
    // Add QoS validation
    if (qos < 0 || qos > 1) {
        throw new IllegalArgumentException(
            "MQTT QoS must be 0 (at-most-once) or 1 (at-least-once), got: " + qos);
    }
    
    this.retryTimeoutMs = pluginConfig.get(MqttSinkOptions.RETRY_TIMEOUT);
    // ...
}

Or add validation in MqttSinkOptions:

public static final Option<Integer> QOS =
        Options.key("qos")
                .intType()
                .defaultValue(1)
                .withValidator validators -> {
                    int qos = (Integer) validators;
                    if (qos < 0 || qos > 1) {
                        throw new IllegalArgumentException("QoS must be 0 or 1");
                    }
                })
                .withDescription("MQTT QoS level: 0 (at-most-once), 1 (at-least-once)");

Rationale: Provide clear error messages and detect errors during configuration phase rather than runtime.


Issue 3: CleanSession=true contradicts at-least-once semantics

Location: MqttSinkWriter.java:168

options.setCleanSession(true);

Related Context:

  • Configuration location: MqttSinkWriter.buildConnectOptions() lines 165-180
  • Call chain: MqttSinkWriter constructor → buildConnectOptions() → MqttClient.connect()
  • Documentation claim: docs/en/connectors/sink/Mqtt.md line 17

Issue Description:
Code sets CleanSession=true, but documentation claims to provide "at-least-once" semantics. According to MQTT 3.1.1 protocol:

  • When CleanSession=true, Broker does not save unacknowledged QoS 1 messages
  • After client disconnects, these messages are permanently lost upon reconnection
  • This directly contradicts "at-least-once" semantics

Potential Risks:

  • Users may incorrectly believe messages will not be lost
  • Data may be lost in failure scenarios
  • Does not match documentation promises

Scope of Impact:

  • Direct impact: Reliability guarantees of MqttSinkWriter
  • Indirect impact: User expectations of data reliability
  • Impact scope: Single Connector

Severity: CRITICAL

Improvement Suggestions:

  1. Short-term solution (fix documentation):
    Clearly document limitations in documentation:
## Key features

- [ ] [exactly-once](../../introduction/concepts/connector-v2-features.md)

**Delivery Semantics Notice**: 
This connector provides **at-most-once** delivery when QoS=0, and **best-effort at-least-once** when QoS=1. 
Due to `cleanSession=true` (required for stateless operation), unacknowledged messages may be lost during 
client disconnections. For stronger guarantees, consider enabling Source replay capabilities in SeaTunnel.
  1. Long-term solution (let user choose):
    Add configuration option:
// MqttSinkOptions.java
public static final Option<Boolean> CLEAN_SESSION =
    Options.key("clean_session")
        .booleanType()
        .defaultValue(true)
        .withDescription("Whether to use clean session. false enables persistent sessions but may cause broker-side state accumulation");
// MqttSinkWriter.java
options.setCleanSession(config.get(MqttSinkOptions.CLEAN_SESSION));
if (!config.get(MqttSinkOptions.CLEAN_SESSION)) {
    log.warn("clean_session=false may cause broker-side state accumulation. Ensure proper clientId management.");
}

Rationale: Honestly inform users of limitations to avoid misleading promises. CleanSession=true is reasonable for stateless design, but should not claim to provide complete at-least-once guarantees.


Issue 4: Missing unit tests

Location: seatunnel-connectors-v2/connector-mqtt/src/test/java/

Related Context:

  • Reference implementation: connector-kafka has complete unit test suite
  • E2E tests: MqttSinkIT.java exists but insufficient
  • Classes under test: MqttSink, MqttSinkFactory, MqttSinkWriter

Issue Description:
Only E2E integration tests exist, missing unit tests. E2E tests cannot cover:

  • Boundary conditions
  • Exception paths
  • Parameter validation
  • Configuration parsing

Potential Risks:

  • Easy to introduce bugs during refactoring
  • Boundary conditions not covered (e.g., QoS=-1)
  • Error handling paths not tested
  • Low code coverage

Scope of Impact:

  • Direct impact: Code quality assurance
  • Indirect impact: Future maintenance costs
  • Impact scope: Single Connector

Severity: MAJOR

Improvement Suggestions:

Create unit test classes:

// seatunnel-connectors-v2/connector-mqtt/src/test/java/org/apache/seatunnel/connectors/seatunnel/mqtt/sink/MqttSinkWriterTest.java
package org.apache.seatunnel.connectors.seatunnel.mqtt.sink;

import org.apache.seatunnel.api.configuration.ReadonlyConfig;
import org.apache.seatunnel.api.table.type.SeaTunnelRow;
import org.apache.seatunnel.api.table.type.SeaTunnelRowType;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;

import static org.junit.jupiter.api.Assertions.*;
import static org.mockito.Mockito.*;

@ExtendWith(MockitoExtension.class)
class MqttSinkWriterTest {
    
    @Mock
    private SinkWriter.Context context;
    
    @Mock
    private SeaTunnelRowType rowType;
    
    @Test
    void testInvalidQosThrowsException() {
        ReadonlyConfig config = ReadonlyConfig.fromMap(Map.of(
            "url", "tcp://localhost:1883",
            "topic", "test",
            "qos", 2  // Invalid value
        ));
        
        IllegalArgumentException ex = assertThrows(
            IllegalArgumentException.class,
            () -> new MqttSinkWriter(context, rowType, config)
        );
        
        assertTrue(ex.getMessage().contains("QoS must be 0 or 1"));
    }
    
    @Test
    void testInvalidFormatThrowsException() {
        ReadonlyConfig config = ReadonlyConfig.fromMap(Map.of(
            "url", "tcp://localhost:1883",
            "topic", "test",
            "format", "xml"  // Invalid format
        ));
        
        assertThrows(
            IllegalArgumentException.class,
            () -> new MqttSinkWriter(context, rowType, config)
        );
    }
    
    @Test
    void testConnectionFailureThrowsWrappedException() {
        // Mock Paho client to throw MqttException
        // Validation exception is properly wrapped
    }
    
    @Test
    void testWriteWithRetrySuccess() {
        // Simulate first failure, second success
    }
    
    @Test
    void testWriteTimeoutAfterRetries() {
        // Simulate retry timeout
    }
}
// seatunnel-connectors-v2/connector-mqtt/src/test/java/org/apache/seatunnel/connectors/seatunnel/mqtt/sink/MqttSinkFactoryTest.java
@Test
void testOptionRule() {
    MqttSinkFactory factory = new MqttSinkFactory();
    OptionRule rule = factory.optionRule();
    
    Set<Option<?>> required = rule.getRequiredOptions();
    assertTrue(required.contains(MqttSinkOptions.URL));
    assertTrue(required.contains(MqttSinkOptions.TOPIC));
    
    Set<Option<?>> optional = rule.getOptionalOptions();
    assertTrue(optional.contains(MqttSinkOptions.QOS));
}

Rationale: Improve code quality, ensure safe refactoring, meet Apache project standards.


Issue 5: Text format delimiter hardcoded

Location: MqttSinkWriter.java:189-192

case "text":
    return TextSerializationSchema.builder()
            .seaTunnelRowType(rowType)
            .delimiter(",")
            .build();

Related Context:

  • Reference implementation: KafkaSink supports custom delimiter
  • Configuration option: MqttSinkOptions.FORMAT definition
  • Related class: TextSerializationSchema

Issue Description:
Text format field delimiter is hardcoded to comma, users cannot customize. For certain data scenarios, other delimiters may be needed (e.g., tab, pipe, etc.).

Potential Risks:

  • Limits user flexibility
  • Inconsistent with Kafka Sink (Kafka supports field_delimiter configuration)
  • May cause parsing issues

Scope of Impact:

  • Direct impact: Users using Text format
  • Indirect impact: Data format compatibility
  • Impact scope: Single Connector

Severity: MINOR

Improvement Suggestions:

  1. Add configuration option:
// MqttSinkOptions.java
public static final Option<String> FIELD_DELIMITER =
    Options.key("field_delimiter")
        .stringType()
        .defaultValue(",")
        .withDescription("Field delimiter for text format. Only used when format=text");
  1. Register in MqttSinkFactory:
.optional(
    MqttSinkOptions.USERNAME,
    MqttSinkOptions.PASSWORD,
    MqttSinkOptions.QOS,
    MqttSinkOptions.FORMAT,
    MqttSinkOptions.FIELD_DELIMITER,  // Add
    MqttSinkOptions.RETRY_TIMEOUT,
    MqttSinkOptions.CONNECTION_TIMEOUT)
  1. Use in MqttSinkWriter:
case "text":
    String delimiter = config.get(MqttSinkOptions.FIELD_DELIMITER);
    return TextSerializationSchema.builder()
            .seaTunnelRowType(rowType)
            .delimiter(delimiter)
            .build();

Rationale: Improve flexibility, stay consistent with Kafka Sink.


Issue 6: Changelog placeholder not updated

Location: docs/en/connectors/changelog/connector-mqtt.md:7

- Add MQTT Sink Connector ([#XXXX](https://github.com/apache/seatunnel/pull/XXXX))

Related Context:

Issue Description:
Changelog contains placeholder #XXXX, should be replaced with actual PR number before submission.

Potential Risks:

  • Incomplete changelog
  • Automation tools may not link correctly
  • Users cannot trace change sources

Scope of Impact:

  • Direct impact: Changelog quality
  • Indirect impact: User experience
  • Impact scope: Single Connector

Severity: MINOR

Improvement Suggestions:

## next version

### Sink

- Add MQTT Sink Connector ([#10575](https://github.com/apache/seatunnel/pull/10575))

Also recommend linking Issue #9566:

- Add MQTT Sink Connector ([#10575](https://github.com/apache/seatunnel/pull/10575))
  Resolves [#9566](https://github.com/apache/seatunnel/issues/9566)

Rationale: Maintain documentation integrity, facilitate user tracing.


Issue 7: Performance bottleneck - synchronous blocking send

Location: MqttSinkWriter.java:90-118

public void write(SeaTunnelRow element) throws IOException {
    byte[] payload = serializationSchema.serialize(element);
    MqttMessage message = new MqttMessage(payload);
    message.setQos(qos);
    
    // Synchronous retry loop
    while (System.currentTimeMillis() < deadline) {
        if (mqttClient.isConnected()) {
            mqttClient.publish(topic, message);  // Blocking call
            return;
        }
        Thread.sleep(RETRY_BACKOFF_MS);
    }
}

Related Context:

  • Paho Client: MqttClient.publish() is synchronous blocking method
  • Comparison: KafkaSink uses asynchronous Producer
  • Performance impact: Each message requires waiting for network round-trip

Issue Description:
Using synchronous blocking method to send each message. Even though QoS 1 ACK is asynchronous, Paho's publish() method blocks until message sending completes. This becomes a bottleneck in high-throughput scenarios.

Potential Risks:

  • Limited throughput (thousands of messages per second level)
  • Increased latency
  • Does not align with SeaTunnel streaming engine's high-performance goals

Scope of Impact:

  • Direct impact: Users in high-performance scenarios
  • Indirect impact: Overall pipeline throughput
  • Impact scope: Single Connector

Severity: MAJOR (if positioned as high-performance Connector)
Severity: MINOR (if positioned as lightweight IoT Connector, current performance acceptable)

Improvement Suggestions:

Solution 1: Batch send (recommended)

// Add batch configuration
public static final Option<Integer> BATCH_SIZE =
    Options.key("batch_size")
        .intType()
        .defaultValue(1)
        .withDescription("Number of messages to batch before sending");

// Implement batch sending logic
private final List<MqttMessage> messageBuffer = new ArrayList<>(batchSize);

@Override
public void write(SeaTunnelRow element) throws IOException {
    byte[] payload = serializationSchema.serialize(element);
    MqttMessage message = new MqttMessage(payload);
    message.setQos(qos);
    
    synchronized (messageBuffer) {
        messageBuffer.add(message);
        if (messageBuffer.size() >= batchSize) {
            flushBatch();
        }
    }
}

private void flushBatch() throws IOException {
    // Use MqttClient.publish(topic, MqttMessage[]) for batch sending
    // Or send in loop but reduce synchronization overhead
}

@Override
public Optional<Void> prepareCommit() {
    flushBatch();  // Flush before checkpoint
    return Optional.empty();
}

Solution 2: Async send

// Use MqttAsyncClient (but requires major refactoring)
// Or use thread pool for asynchronous sending

Solution 3: Document in docs (simplest)
Document performance characteristics in documentation:

## Performance Considerations

The MQTT Sink sends messages synchronously to guarantee delivery. Typical throughput:
- QoS 0: ~10,000 messages/sec (local network)
- QoS 1: ~5,000 messages/sec (requires broker ACK)

For higher throughput requirements, consider:
- Using Kafka Sink instead
- Reducing QoS to 0
- Increasing SeaTunnel parallelism

Rationale: Current design is sufficient for IoT scenarios (low-frequency messages), but performance limitations should be clearly documented to avoid user misunderstanding.


@assokhi assokhi force-pushed the feature/mqtt-connector branch from 41ff5b5 to 42dc644 Compare March 8, 2026 19:46
@gitfortian
Copy link
Copy Markdown
Contributor

Help,No Source Only Sink Connector?

@assokhi
Copy link
Copy Markdown
Contributor Author

assokhi commented Mar 9, 2026

Hi @gitfortian the scope of this PR is only on the Mqtt Sink Connector.

@assokhi
Copy link
Copy Markdown
Contributor Author

assokhi commented Mar 9, 2026

@DanielCarter-stack @davidzollo

I've resolved the issues listed above :

  • Implemented the custom MqttConnectorException hierarchy and QoS IllegalArgumentException validation.
  • Made clean_session dynamically configurable.
  • Added the requested unit tests (MqttSinkWriterTest and MqttSinkFactoryTest).
  • updated the documentation.

Still, I don't know why the CI is not passing. On my local machine, the build for the MQTT connector as well as the global build (install -DskipTests) are passing perfectly. Looking at the GitHub Actions logs, it seems to be failing due to 3-hour runner timeouts on unrelated modules and Hazelcast instance is not active, leading to memory crashes. Could anyone verify this underlying issue and perhaps re-trigger the build?

@assokhi assokhi force-pushed the feature/mqtt-connector branch 2 times, most recently from 4fd1923 to f8bc6f7 Compare March 12, 2026 09:36
@davidzollo
Copy link
Copy Markdown
Contributor

davidzollo commented Mar 12, 2026

If it's not caused by your own PR issue, it might be a network problem or the Github CI resources being tight. You need to first check the specific error message. If it's a CI timeout, you can modify the corresponding timeout period in your PR in the file https://github.com/apache/seatunnel/blob/dev/.github/workflows/backend.yml. Sometimes, some CI tasks need to be retried several times to pass.

@davidzollo
Copy link
Copy Markdown
Contributor

Please add the related Chinese doc. You can write the content in English within the Chinese doc.

Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it is close to being ready for merging. Keep up the good work! Note that the CI must pass successfully.

@assokhi assokhi force-pushed the feature/mqtt-connector branch from bd2cfb8 to 4176070 Compare March 16, 2026 22:34
@assokhi assokhi force-pushed the feature/mqtt-connector branch from 4176070 to a782315 Compare March 16, 2026 22:43
@assokhi
Copy link
Copy Markdown
Contributor Author

assokhi commented Mar 18, 2026

I've increased the CI timeouts for it-4 and it-2 as discussed, and all checks are now passing.
Could you please review it @davidzollo @dybyte

Comment thread .github/workflows/backend.yml Outdated
Comment thread .github/workflows/backend.yml Outdated
@dybyte dybyte requested a review from davidzollo March 18, 2026 12:26
Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Good job. Considering that this is your first time contributing to the community, I give a +1 (of course, this also depends on the attitudes of other reviewers).

The current implementation is more like a first-version sink with clean_session=true, rather than the "stronger guarantee" described in the documentation.

Considering that this is your first time contributing to the community, I give a +1 (of course, this also depends on the attitudes of other reviewers).

clean_session=false is inconsistent with the actual recovery link. The documentation in Mqtt.md (line 17) clearly implies that using a stable clientId can achieve stronger delivery semantics; however, in the implementation, a new random clientId is generated each time a writer is created (MqttSinkWriter.java#L80), and only in-memory persistence is used (MqttSinkWriter.java#L91). At the same time, the sink does not have a custom recovery state. Both SeaTunnelSink.java (line 85) and SinkWriter.java (line 82) follow the default empty state. During failure recovery, the engine will recreate the writer (SinkFlowLifeCycle.java#L342). This means that a new clientId will definitely be used after recovery, and the persistent session on the broker side cannot actually be reused.

@corgy-w corgy-w merged commit 9a66fee into apache:dev Mar 25, 2026
5 checks passed
onceMisery pushed a commit to onceMisery/seatunnel that referenced this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Sink] Want to support mqtt protocol

6 participants