[HUDI-4780] hoodie.logfile.max.size It does not take effect, causing the log file to be too large #6602

yihua · 2022-09-05T22:31:41Z

@loukey-lj Could you explain how this affects the size calculation? Should output.getPos() return the size written already?

@yihua When appending data to an old log file,org.apache.hudi.common.table.log.HoodieLogFormatWriter#getOutputStream postition always start at 0, after flush, org.apache.hudi.common.table.log.HoodieLogFormatWriter#getCurrentSize returned result is the size of the append dataset, not the total size of the entire file. u can debug follow code.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(1000 * 20);
env.setParallelism(1);

StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(env); final DataStreamSource<Tuple3<String, Long, Long>> tuple3DataStreamSource = env.addSource(new SourceFunction<Tuple3<String, Long, Long>>() { @Override public void run(SourceContext<Tuple3<String, Long, Long>> ctx) throws Exception { while (!Thread.interrupted()){ ctx.collect(new Tuple3<>("1",System.currentTimeMillis(), System.currentTimeMillis())); Thread.sleep(1000 * 10 ); } } @Override public void cancel() { } }); tableEnvironment.createTemporaryView("s", tuple3DataStreamSource); tableEnvironment.executeSql("\n" + "\n" + "create table if not exists h(\n" + " `id` string PRIMARY KEY NOT ENFORCED , \n" + " `ts` bigint , \n" + " `time` bigint \n" + ") \n" + "with (\n" + "\t'connector' = 'hudi',\n 'write.bucket_assign.tasks'='1', " + "\t'hoodie.datasource.write.keygenerator.class'='org.apache.hudi.keygen.SimpleAvroKeyGenerator',\n" + "\t'table.type' = 'MERGE_ON_READ',\n" + "\t'hive_sync.enable' = 'false',\n" + "\t'write.tasks'='1',\n" + "\t'path' = 'hdfs://xx',\n" + "\t'hoodie.cleaner.commits.retained' = '1'\n" + ")\n"); tableEnvironment.executeSql("insert into h SELECT * from s \n");

@loukey-lj Got it. This is a HDFS-specific problem.

-Original file line number
+Diff line change
@@ Expand Up / @@ -270,7 +270,7 @@ public long getCurrentSize() throws IOException { @@
         if (output == null) {
           return 0;
         }
-        return output.getPos();
+        return output.getPos() + logFile.getFileSize();
       }
       /**
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-4780] hoodie.logfile.max.size It does not take effect, causing the log file to be too large #6602

Uh oh!

Diff view

Diff view

There are no files selected for viewing

yihua Sep 5, 2022

Uh oh!

loukey-lj Sep 6, 2022

Uh oh!

yihua Sep 13, 2022

Uh oh!

[HUDI-4780] hoodie.logfile.max.size It does not take effect, causing the log file to be too large #6602

Uh oh!

[HUDI-4780] hoodie.logfile.max.size It does not take effect, causing the log file to be too large #6602

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

yihua Sep 5, 2022

Choose a reason for hiding this comment

Uh oh!

loukey-lj Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

yihua Sep 13, 2022

Choose a reason for hiding this comment

Uh oh!