@@ -36,12 +36,15 @@ After installing these dependencies, follow steps based on your requirement.
3636
3737### 1 - Starting the environment
3838
39- For runtime dependencies, we encourage using the confluent HDFS connector jars. We have tested our setup with version ` 10.1.0 ` .
40- Either use confluent-hub to install the connector or download it from [ here] ( https://tinyurl.com/yb472f79 ) .
39+ For runtime dependencies, we encourage using the confluent HDFS connector jars. We have tested our setup with
40+ version ` 10.1.0 ` . Either use confluent-hub to install the connector or download it
41+ from [ here] ( https://tinyurl.com/yb472f79 ) . You can install the confluent-hub command-line tool by downloading Confluent
42+ Platform from [ here] ( https://tinyurl.com/s2jjby53 ) .
4143
4244Copy the entire folder to the classpath that will be used by the Hudi Kafka Connector.
4345
4446``` bash
47+ # Points CONFLUENT_DIR to Confluent Platform installation
4548export CONFLUENT_DIR=/path/to/confluent_install_dir
4649mkdir -p /usr/local/share/kafka/plugins
4750$CONFLUENT_DIR /bin/confluent-hub install confluentinc/kafka-connect-hdfs:10.1.0
@@ -55,7 +58,7 @@ plugin path that contains all the other jars (`/usr/local/share/kafka/plugins/li
5558cd $HUDI_DIR
5659mvn package -DskipTests -pl packaging/hudi-kafka-connect-bundle -am
5760mkdir -p /usr/local/share/kafka/plugins/lib
58- cp $HUDI_DIR /packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-0.11 .0-SNAPSHOT.jar /usr/local/share/kafka/plugins/lib
61+ cp $HUDI_DIR /packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-0.13 .0-SNAPSHOT.jar /usr/local/share/kafka/plugins/lib
5962```
6063
6164If the Hudi Sink Connector writes to a target Hudi table on [ Amazon S3] ( https://aws.amazon.com/s3/ ) , you need two
@@ -70,7 +73,8 @@ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.10.1/hadoop-a
7073```
7174
7275Set up a Kafka broker locally. Download the latest apache kafka from [ here] ( https://kafka.apache.org/downloads ) . Once
73- downloaded and built, run the Zookeeper server and Kafka server using the command line tools.
76+ downloaded and built, run the Zookeeper server and Kafka server using the command line tools. The servers should be
77+ ready in one to two minutes after executing the commands.
7478
7579``` bash
7680export KAFKA_HOME=/path/to/kafka_install_dir
@@ -101,7 +105,8 @@ cd $CONFLUENT_DIR
101105
102106### 3 - Create the Hudi Control Topic for Coordination of the transactions
103107
104- The control topic should only have ` 1 ` partition, since its used to coordinate the Hudi write transactions across the multiple Connect tasks.
108+ The control topic should only have ` 1 ` partition, since its used to coordinate the Hudi write transactions across the
109+ multiple Connect tasks.
105110
106111``` bash
107112cd $KAFKA_HOME
@@ -148,6 +153,8 @@ curl APIs can be used to delete and add a new Hudi Sink. Again, a default config
148153that can be changed based on the desired properties.
149154
150155``` bash
156+ # The following command is expected to throw an error if the Hudi Sink Connector has not been added yet.
157+ # {"error_code":404,"message":"Connector hudi-sink not found"}
151158curl -X DELETE http://localhost:8083/connectors/hudi-sink
152159curl -X POST -H " Content-Type:application/json" -d @$HUDI_DIR /hudi-kafka-connect/demo/config-sink.json http://localhost:8083/connectors
153160```
@@ -269,7 +276,7 @@ Then you can run async compaction job with `HoodieCompactor` and `spark-submit`
269276```
270277spark-submit \
271278 --class org.apache.hudi.utilities.HoodieCompactor \
272- hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.10 .0-SNAPSHOT.jar \
279+ hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13 .0-SNAPSHOT.jar \
273280 --base-path /tmp/hoodie/hudi-test-topic \
274281 --table-name hudi-test-topic \
275282 --schema-file /Users/user/repo/hudi/docker/demo/config/schema.avsc \
@@ -328,7 +335,7 @@ Then you can run async clustering job with `HoodieClusteringJob` and `spark-subm
328335```
329336spark-submit \
330337 --class org.apache.hudi.utilities.HoodieClusteringJob \
331- hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.10 .0-SNAPSHOT.jar \
338+ hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13 .0-SNAPSHOT.jar \
332339 --props clusteringjob.properties \
333340 --mode execute \
334341 --base-path /tmp/hoodie/hudi-test-topic \
@@ -388,6 +395,8 @@ cd $HUDI_DIR/docker
388395Firstly, (re)-install a different connector that is configured to write the Hudi table to Hdfs instead of local filesystem.
389396
390397``` bash
398+ # The following command is expected to throw an error if the Hudi Sink Connector has not been added yet.
399+ # {"error_code":404,"message":"Connector hudi-sink not found"}
391400curl -X DELETE http://localhost:8083/connectors/hudi-sink
392401curl -X POST -H " Content-Type:application/json" -d @$HUDI_DIR /hudi-kafka-connect/demo/config-sink-hive.json http://localhost:8083/connectors
393402```
0 commit comments