这几天在压测服务,结果出现了Metaspace溢出,服务OOM了。本文从问题开始找到查找问题,到解决问题。
问题现象
服务在压测过程中,出现了Metaspace空间不足,服务OOM的情况,经过在线分析,发现是一段创建任务的逻辑,在运行过程中创建并加载了很多相同的泪。
其中一个线程栈如下:
dynamic-kafka-worker-pool-sync_staging-async_topic-async_topic_c-3" Id=575 BLOCKED on org.springframework.boot.loader.LaunchedURLClassLoader@43e9089 owned by "dynamic-kafka-worker-pool-sync_staging-async_topic-async_topic_c-1" Id=572
at org.springframework.cglib.core.AbstractClassGenerator.generate(AbstractClassGenerator.java:344)
- blocked on org.springframework.boot.loader.LaunchedURLClassLoader@43e9089
at org.springframework.cglib.proxy.Enhancer.generate(Enhancer.java:582)
at org.springframework.cglib.core.AbstractClassGenerator$ClassLoaderData.get(AbstractClassGenerator.java:131)
at org.springframework.cglib.core.AbstractClassGenerator.create(AbstractClassGenerator.java:319)
at org.springframework.cglib.proxy.Enhancer.createHelper(Enhancer.java:569)
at org.springframework.cglib.proxy.Enhancer.createClass(Enhancer.java:416)
at org.springframework.aop.framework.ObjenesisCglibAopProxy.createProxyClassAndInstance(ObjenesisCglibAopProxy.java:57)
at org.springframework.aop.framework.CglibAopProxy.getProxy(CglibAopProxy.java:205)
at org.springframework.aop.framework.ProxyFactory.getProxy(ProxyFactory.java:110)
at org.springframework.context.annotation.ContextAnnotationAutowireCandidateResolver.buildLazyResolutionProxy(ContextAnnotationAutowireCandidateResolver.java:117)
at org.springframework.context.annotation.ContextAnnotationAutowireCandidateResolver.getLazyResolutionProxyIfNecessary(ContextAnnotationAutowireCandidateResolver.java:52)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1172)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:521)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:497)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$1.getTarget(CommonAnnotationBeanPostProcessor.java:461)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673)
at cn.howarliu.AsyncTaskProducer$$EnhancerBySpringCGLIB$$ebe8dcee.sendMessage(<generated>)
at cn.howarliu.service.impl.AsyncTaskServiceImpl.create(AsyncTaskServiceImpl.java:88)
at cn.howarliu.service.impl.AsyncTaskServiceImpl$$FastClassBySpringCGLIB$$e8965ed3_3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:750)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.transaction.interceptor.TransactionInterceptor$$Lambda$3136/0x0000000801913040.proceedWithInvocation(Unknown Source)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)
at cn.howarliu.service.impl.AsyncTaskServiceImpl$$EnhancerBySpringCGLIB$$f9101846.create(<generated>)
at cn.howarliu.AsyncTaskSupporter.create(AsyncTaskSupporter.java:27)
at cn.howarliu.AsyncTaskSupporter.create(AsyncTaskSupporter.java:35)
分析问题
找代码
顺着线程栈找代码,经过的代码如下:
- cn.howardliu.AsyncTaskSupporter
public static Long create(AsyncTaskCreateRequestDto request) {
return AsyncTaskSupporter.getBean(AsyncTaskService.class)
.create(request);
}
- cn.howardliu.service.impl.AsyncTaskServiceImpl
@Resource
@Lazy
private AsyncTaskProducer asyncTaskProducer;
@Override
@Transactional(rollbackFor = Exception.class)
public Long create(AsyncTaskCreateRequestDto request) {
// 其他逻辑……
asyncTaskProducer.sendMessage(dataContract);
// 其他逻辑……
}
- cn.howardliu.AsyncTaskKafkaProducer
@Override
@Transactional(rollbackFor = Exception.class)
public void sendMessage(AsyncTaskDataContract contract) {
// 这个方法的逻辑这里不展示了
}
分析原因
如果是有运行时生成类的情况,一定是标记了@Lazy注解的AsyncTaskProducer和@Transactional。
Spring会在运行时查找并创建@Lazy标记的bean,如果bean的方法用了@Transactional,会使用Cglib生成动态代理类。
生成动态代理类的核心代码是org.springframework.aop.framework.CglibAopProxy,具体方法如下(可以重点关注createEnhancer()方法):
@Override
public Object getProxy(@Nullable ClassLoader classLoader) {
if (logger.isTraceEnabled()) {
logger.trace("Creating CGLIB proxy: " + this.advised.getTargetSource());
}
try {
Class<?> rootClass = this.advised.getTargetClass();
Assert.state(rootClass != null, "Target class must be available for creating a CGLIB proxy");
Class<?> proxySuperClass = rootClass;
if (ClassUtils.isCglibProxyClass(rootClass)) {
proxySuperClass = rootClass.getSuperclass();
Class<?>[] additionalInterfaces = rootClass.getInterfaces();
for (Class<?> additionalInterface : additionalInterfaces) {
this.advised.addInterface(additionalInterface);
}
}
// Validate the class, writing log messages as necessary.
validateClassIfNecessary(proxySuperClass, classLoader);
// Configure CGLIB Enhancer...
Enhancer enhancer = createEnhancer();
if (classLoader != null) {
enhancer.setClassLoader(classLoader);
if (classLoader instanceof SmartClassLoader &&
((SmartClassLoader) classLoader).isClassReloadable(proxySuperClass)) {
enhancer.setUseCache(false);
}
}
enhancer.setSuperclass(proxySuperClass);
enhancer.setInterfaces(AopProxyUtils.completeProxiedInterfaces(this.advised));
enhancer.setNamingPolicy(SpringNamingPolicy.INSTANCE);
enhancer.setStrategy(new ClassLoaderAwareUndeclaredThrowableStrategy(classLoader));
Callback[] callbacks = getCallbacks(rootClass);
Class<?>[] types = new Class<?>[callbacks.length];
for (int x = 0; x < types.length; x++) {
types[x] = callbacks[x].getClass();
}
// fixedInterceptorMap only populated at this point, after getCallbacks call above
enhancer.setCallbackFilter(new ProxyCallbackFilter(
this.advised.getConfigurationOnlyCopy(), this.fixedInterceptorMap, this.fixedInterceptorOffset));
enhancer.setCallbackTypes(types);
// Generate the proxy class and create a proxy instance.
return createProxyClassAndInstance(enhancer, callbacks);
}
catch (CodeGenerationException | IllegalArgumentException ex) {
throw new AopConfigException("Could not generate CGLIB subclass of " + this.advised.getTargetClass() +
": Common causes of this problem include using a final class or a non-visible class",
ex);
}
catch (Throwable ex) {
// TargetSource.getTarget() failed
throw new AopConfigException("Unexpected AOP exception", ex);
}
}
protected Enhancer createEnhancer() {
return new Enhancer();
}
看源码不会有问题
从源码分析,spring会判断是否使用class缓存,如果是,不会重复创建,如果否,会重复创建。
默认会使用缓存,只有两处逻辑会设定不用缓存:
- 启动参数设置了cglib.useCache是false;
- classloader用的是SmartClassLoader且类是可重加载的。
检查第一种情况
先看第一个,从启动参数看看有没有这个变量,使用命令jinfo $pid | grep cglib,发现没有相关参数。
检查第二种情况
第二种情况属于盲区,不知道怎么查。
九年义务教育告诉我,不会的题先跳过。
检查其他可能得情况
如果源码没有问题,那就是运行时有问题,到线上看看运行时加载的CglibAopProxy类和我们看到的类是不是一样的。
借助arthas的jad命令:
有问题的机器createEnhancer方法如下图:
没问题的机器createEnhancer方法如下图:
发现类果然不同,有问题机器的类主动设置不适用缓存了。
现象命中,但是为什么?
可能性1:jar包版本不一致
unzip解压fat-jar,找到spring-aop的包,发现版本是相同的。
其实这一个已经排除,但是还是不死心,万一虽然版本相同,但是包内容不同呢?
再次解压spring-aop的包,找到CglibAopProxy.class,通过javap -c -v反编译:
protected org.springframework.cglib.proxy.Enhancer createEnhancer();
descriptor: ()Lorg/springframework/cglib/proxy/Enhancer;
flags: (0x0004) ACC_PROTECTED
Code:
stack=2, locals=1, args_size=1
0: new #77 // class org/springframework/cglib/proxy/Enhancer
3: dup
4: invokespecial #78 // Method org/springframework/cglib/proxy/Enhancer."<init>":()V
7: areturn
LineNumberTable:
line 231: 0
LocalVariableTable:
Start Length Slot Name Signature
0 8 0 this Lorg/springframework/aop/framework/CglibAopProxy;
能够看到,反编译后的逻辑是,先调用了Enhancer的init方法,即构造函数,然后就调用了areturn指令返回了结果。
所以,无论是jar的版本,还是jar中的内容,都是原来的return new Enhancer();。
可能性2:从别的jar包加载的CglibAopProxy
有问题的服务取到的类是别的包的,用arthas的sc命令看下:
[arthas@]$ sc -d org.springframework.aop.framework.CglibAopProxy
class-info org.springframework.aop.framework.CglibAopProxy
code-source file:/path/to/runner.jar!/BOOT-INF/lib/spring-aop-5.1.10.jar!/
name org.springframework.aop.framework.CglibAopProxy
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name CglibAopProxy
modifier
annotation
interfaces org.springframework.aop.framework.AopProxy,java.io.Serializable
super-class +-java.lang.Object
class-loader +-org.springframework.boot.loader.LaunchedURLClassLoader@4aa21f9d
+-jdk.internal.loader.ClassLoaders$AppClassLoader@6a6824be
+-jdk.internal.loader.ClassLoaders$PlatformClassLoader@5ee0a0ae
classLoaderHash 4aa21f9d
都是 spring-aop-5.1.10.jar 中类。
可能性3:加载类时代码被改了
其实排除了上面两种可能,就只剩下这一种可能了。
有问题的服务使用了HotswapAgent,找到HotswapAgent的源码分析,终于找到了SpringPlugin类,其中有一段改写createEnhancer方法的代码:
@OnClassLoadEvent(classNameRegexp = "org.springframework.aop.framework.CglibAopProxy")
public static void cglibAopProxyDisableCache(CtClass ctClass) throws NotFoundException, CannotCompileException {
CtMethod method = ctClass.getDeclaredMethod("createEnhancer");
method.setBody("{org.springframework.cglib.proxy.Enhancer enhancer = new org.springframework.cglib.proxy.Enhancer();enhancer.setUseCache(false);return enhancer;}");
LOGGER.debug("org.springframework.aop.framework.CglibAopProxy - cglib Enhancer cache disabled", new Object[0]);
}
找到这里,基本上可以确定就是这个插件的问题了。
解决问题
很多时候,提出问题、找到问题、分析问题是难点,解决问题反而是比较简答的。比如上面的问题,我们有两种解法:
- 方法一:既然是运行时加载bean出现了重复创建Cglib代理类的问题,那就直接删除@Lazy注解,启动服务时创建并注入bean,运行时就不存在创建bean的问题了;
- 方法二:删除有问题的组件,等不再压测的时候再加回来。
实际工作中也是,我是先把@Lazy注解去掉了,然后又耐心找问题。
这算是一个技巧,先解问题,不要成为阻塞点,然后找到问题原因,提升能力。换句话就是,先做该做的,再做想做的。
文末总结
HotswapAgent在Spring启动时改写了CglibAopProxy的createEnhancer()方法,使用Cglib生成代理类时,不会使用缓存class,会重新生成class。 在高并发场景中,请求中如果需要生成代理类,会重复生成class,造成Metaspace溢出,出现OOM。