写Hudi异常:'Not an Avro data file' 解决方法

开发 前端
各种原因导致.rollback、.clean、.clean.requested和.clean.inflight文件大小为0,也就是空文件,而在archive和clean时无法处理空文件,就报错上面的异常。

​前言

记录写Hudi时的一个异常的解决方法,其实这个异常从去年就发现并找到解决方法了,而且已经提交到社区merge了,PR:[HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean。之所以现在又要总结这个异常的处理方法是因为:我们生产环境用的 Hudi0.9.0版本,且没有升级,因为升级Hudi版本可能会有不兼容的问题,需要测试,比较费时,所以目前还没有升级版本,而这个PR合入的版本为0.11.0,所以本文主要总结在0.9.0版本如何解决这个问题,当然也适用于0.11.0版本之前的其他有同样问题的版本。

异常信息

archive和clean时都会有这个异常,主要异常信息:

 Caused by: java.io.IOException: Not an Avro data file

异常产生原因

各种原因导致.rollback、.clean、.clean.requested和.clean.inflight​文件大小为0,也就是空文件,而在archive和clean时无法处理空文件,就报错上面的异常。有一个已知原因,就是HDFS配了满了之后会产生空文件,更多的是PMC也不清楚的未知原因,上面的PR中有体现。

解决方案

这是在不升级Hudi版本的前提下,如果可以升级Hudi版本,直接升级到Hudi最新版即可。

解决方案1

当发生该异常时,由运维人员删除对应的空文件即可,当然这适用于表不多且异常偶发的情况,具体命令放在最后。但是当表比较多时,运维人员处理起来比较麻烦,这就需要第二种解决方案了。

解决方案2

基于Hudi0.9.0源码将文章开头提到的PR合进去,然后install本地仓库或者公司自己的内部仓库中,然后Maven pom依赖中引用自己的仓库地址就可以了。基于0.9.0的代码我已经提交,有需要的可以自行下载,其他版本就需要大家自己合了。

  • gitee: https://gitee.com/dongkelun/hudi/tree/0.9.0-fixNotAvro/
  • github: https://github.com/dongkelun/hudi/tree/0.9.0-fixNotAvro

Hudi maven install命令:

mvn clean install -DskipTest

验证

直接本地运行测试用例中的testArchiveCompletedRollbackAndClean和testCleanEmptyInstants,这俩测试用例通过了应该就没有问题

方案1具体处理方法

不管是什么原因导致的异常,不管用何种方式,只要确保找到正确的对应的大小为0的空文件删掉即可,一定不要删错

异常信息1

ERROR [Timer-Driven Process Thread-4] o.a.hudi.table.HoodieTimelineArchiveLog Failed to archive commits, .commit file: 20220726050533.rollback
java.io.IOException: Not an Avro data file
at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:103)
at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:341)
at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:305)
at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:439)
at org.apache.hudi.client.HoodieJavaWriteClient.postWrite(HoodieJavaWriteClient.java:187)
at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:129)
at org.apache.nifi.processors.javaHudi.JavaHudi.write(JavaHudi.java:523)
at org.apache.nifi.processors.javaHudi.JavaHudi.onTrigger(JavaHudi.java:404)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1167)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:208)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

原因

.rollback文件大小为0

解决方法

在表元数据路径下查看异常信息里的文件,确认是否大小为0

hadoop fs -ls hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie/20220726050533.rollback
0 2022-07-26 07:05 hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie/20220726050533.rollback

确认为0后,删掉该文件即可

hadoop fs -rm -r hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie/20220726050533.rollback

注意不要删错,也可将该文件改名避免删错再启动组件验证是否正常,如果还有异常,排查其他.rollback大小为0文件,一起删掉

最好不要用grep的方式删除,避免误删,只有配额不足导致的文件特别多的情况下才建议使用

查找所有符合条件的文件(一般只有一符合条件的文件,目前发现只有配额不足的情况才会有多个)

hadoop fs -ls -R  hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie | grep .rollback | grep -v .rollback.inflight | awk '{ if ($5 == 0) print $8 }'

删除所有符合条件的文件

hadoop fs -ls -R  hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie | grep .rollback | grep -v .rollback.inflight | awk '{ if ($5 == 0) print $8 }'  | xargs hadoop fs -rm

异常信息2:

ERROR [Timer-Driven Process Thread-4] o.a.hudi.table.HoodieTimelineArchiveLog Failed to archive commits, .commit file: 20220726050533.rollback
java.io.IOException: Not an Avro data file
at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:103)
at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:341)
at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:305)
at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:439)
at org.apache.hudi.client.HoodieJavaWriteClient.postWrite(HoodieJavaWriteClient.java:187)
at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:129)
at org.apache.nifi.processors.javaHudi.JavaHudi.write(JavaHudi.java:523)
at org.apache.nifi.processors.javaHudi.JavaHudi.onTrigger(JavaHudi.java:404)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1167)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:208)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

异常原因

.clean文件大为0

解决方法

找到对应表元数据路径下.clean文件大小为0 的文件并删除,目前遇到的情况只有一个文件且是最新的.clean文件最好不要用grep的方式删除,避免误删

hadoop fs -ls hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie/20220726050533.rollback
0 2022-07-26 07:05 hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie/20220726050533.rollback

异常信息3:

o.a.h.t.a.clean.BaseCleanActionExecutor Failed to perform previous clean operation, instant: [==>20211011143809__clean__REQUESTED]
org.apache.hudi.exception.HoodieIOException: Not an Avro data file
at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.runPendingClean(BaseCleanActionExecutor.java:87)
at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.lambda$execute$0(BaseCleanActionExecutor.java:137)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.execute(BaseCleanActionExecutor.java:134)
at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.clean(HoodieJavaCopyOnWriteTable.java:188)
at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:660)
at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:641)
at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:672)
at org.apache.hudi.client.AbstractHoodieWriteClient.autoCleanOnCommit(AbstractHoodieWriteClient.java:505)
at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:440)
at org.apache.hudi.client.HoodieJavaWriteClient.postWrite(HoodieJavaWriteClient.java:187)
at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:129)
at org.apache.nifi.processors.javaHudi.JavaHudi.write(JavaHudi.java:401)
at org.apache.nifi.processors.javaHudi.JavaHudi.onTrigger(JavaHudi.java:305)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1166)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:208)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Not an Avro data file
at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
at org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:106)
at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.runPendingClean(BaseCleanActionExecutor.java:84)
... 24 common frames omitted

异常原因

. clean.requested​  或者 . clean.inflight 

解决方法

删除对应的大小为0的文件,文件名异常信息里已经有了

最好不要用grep的方式删除,避免误删,只有配额不足导致的文件特别多的情况下才建议使用

hadoop fs -ls -R  hdfs://cluster1/apps/hive/tenant/zxqzk_smzt_mztgx/sam_exp/.hoodie | grep .clean.requested | awk '{ if ($5 == 0) print $8 }'  | xargs hadoop fs -rm

责任编辑:武晓燕 来源: 伦少的博客
相关推荐

2009-06-12 16:17:48

Hibernate l

2010-11-04 10:44:27

DB2 not fou

2009-12-25 10:01:23

WinForm程序

2018-01-18 10:46:01

服务器磁盘原因

2022-10-28 07:10:51

HudiJavaHive

2011-03-01 13:40:45

500 OOPS

2010-03-23 13:46:17

无线网卡分配IP地址

2022-10-13 07:35:52

数据配置MySQL

2010-05-27 09:33:04

SVN冲突

2010-07-15 14:01:12

telnet乱码

2011-09-16 15:36:47

网络环路

2009-12-17 10:09:02

ssh超时断开

2023-04-06 15:21:34

IIoT自动化

2011-07-04 10:09:19

Qt Creator BUG

2009-07-01 18:14:36

JSP乱码

2010-01-18 10:19:04

FreeBSDroot密码

2010-10-13 17:22:12

MySQL查询乱码

2010-04-20 16:46:41

Oracle数据库密码

2009-07-03 18:14:27

Servlet线程安全

2009-09-10 09:35:25

Linq语句
点赞
收藏

51CTO技术栈公众号