Skip to content

Commit d109eea

Browse files
committed
[DOCS] Improve the quick start guide for Kafka Connect Sink
1 parent 7e83f61 commit d109eea

1 file changed

Lines changed: 16 additions & 7 deletions

File tree

hudi-kafka-connect/README.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,15 @@ After installing these dependencies, follow steps based on your requirement.
3636

3737
### 1 - Starting the environment
3838

39-
For runtime dependencies, we encourage using the confluent HDFS connector jars. We have tested our setup with version `10.1.0`.
40-
Either use confluent-hub to install the connector or download it from [here](https://tinyurl.com/yb472f79).
39+
For runtime dependencies, we encourage using the confluent HDFS connector jars. We have tested our setup with
40+
version `10.1.0`. Either use confluent-hub to install the connector or download it
41+
from [here](https://tinyurl.com/yb472f79). You can install the confluent-hub command-line tool by downloading Confluent
42+
Platform from [here](https://tinyurl.com/s2jjby53).
4143

4244
Copy the entire folder to the classpath that will be used by the Hudi Kafka Connector.
4345

4446
```bash
47+
# Points CONFLUENT_DIR to Confluent Platform installation
4548
export CONFLUENT_DIR=/path/to/confluent_install_dir
4649
mkdir -p /usr/local/share/kafka/plugins
4750
$CONFLUENT_DIR/bin/confluent-hub install confluentinc/kafka-connect-hdfs:10.1.0
@@ -55,7 +58,7 @@ plugin path that contains all the other jars (`/usr/local/share/kafka/plugins/li
5558
cd $HUDI_DIR
5659
mvn package -DskipTests -pl packaging/hudi-kafka-connect-bundle -am
5760
mkdir -p /usr/local/share/kafka/plugins/lib
58-
cp $HUDI_DIR/packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar /usr/local/share/kafka/plugins/lib
61+
cp $HUDI_DIR/packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-0.13.0-SNAPSHOT.jar /usr/local/share/kafka/plugins/lib
5962
```
6063

6164
If the Hudi Sink Connector writes to a target Hudi table on [Amazon S3](https://aws.amazon.com/s3/), you need two
@@ -70,7 +73,8 @@ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.10.1/hadoop-a
7073
```
7174

7275
Set up a Kafka broker locally. Download the latest apache kafka from [here](https://kafka.apache.org/downloads). Once
73-
downloaded and built, run the Zookeeper server and Kafka server using the command line tools.
76+
downloaded and built, run the Zookeeper server and Kafka server using the command line tools. The servers should be
77+
ready in one to two minutes after executing the commands.
7478

7579
```bash
7680
export KAFKA_HOME=/path/to/kafka_install_dir
@@ -101,7 +105,8 @@ cd $CONFLUENT_DIR
101105

102106
### 3 - Create the Hudi Control Topic for Coordination of the transactions
103107

104-
The control topic should only have `1` partition, since its used to coordinate the Hudi write transactions across the multiple Connect tasks.
108+
The control topic should only have `1` partition, since its used to coordinate the Hudi write transactions across the
109+
multiple Connect tasks.
105110

106111
```bash
107112
cd $KAFKA_HOME
@@ -148,6 +153,8 @@ curl APIs can be used to delete and add a new Hudi Sink. Again, a default config
148153
that can be changed based on the desired properties.
149154

150155
```bash
156+
# The following command is expected to throw an error if the Hudi Sink Connector has not been added yet.
157+
# {"error_code":404,"message":"Connector hudi-sink not found"}
151158
curl -X DELETE http://localhost:8083/connectors/hudi-sink
152159
curl -X POST -H "Content-Type:application/json" -d @$HUDI_DIR/hudi-kafka-connect/demo/config-sink.json http://localhost:8083/connectors
153160
```
@@ -269,7 +276,7 @@ Then you can run async compaction job with `HoodieCompactor` and `spark-submit`
269276
```
270277
spark-submit \
271278
--class org.apache.hudi.utilities.HoodieCompactor \
272-
hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.10.0-SNAPSHOT.jar \
279+
hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0-SNAPSHOT.jar \
273280
--base-path /tmp/hoodie/hudi-test-topic \
274281
--table-name hudi-test-topic \
275282
--schema-file /Users/user/repo/hudi/docker/demo/config/schema.avsc \
@@ -328,7 +335,7 @@ Then you can run async clustering job with `HoodieClusteringJob` and `spark-subm
328335
```
329336
spark-submit \
330337
--class org.apache.hudi.utilities.HoodieClusteringJob \
331-
hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.10.0-SNAPSHOT.jar \
338+
hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0-SNAPSHOT.jar \
332339
--props clusteringjob.properties \
333340
--mode execute \
334341
--base-path /tmp/hoodie/hudi-test-topic \
@@ -388,6 +395,8 @@ cd $HUDI_DIR/docker
388395
Firstly, (re)-install a different connector that is configured to write the Hudi table to Hdfs instead of local filesystem.
389396

390397
```bash
398+
# The following command is expected to throw an error if the Hudi Sink Connector has not been added yet.
399+
# {"error_code":404,"message":"Connector hudi-sink not found"}
391400
curl -X DELETE http://localhost:8083/connectors/hudi-sink
392401
curl -X POST -H "Content-Type:application/json" -d @$HUDI_DIR/hudi-kafka-connect/demo/config-sink-hive.json http://localhost:8083/connectors
393402
```

0 commit comments

Comments
 (0)