本文转载自微信公众号「程序员jinjunzhu」,作者jinjunzhu。转载本文请联系程序员jinjunzhu公众号。
saga模式是分布式事务中使用比较多的一种模式,主要应用在多节点长流程的应用中,对一个全局事务,如果某个节点抛出了异常,则从当前这个节点依次往前补偿事务。一阶段正向服务和二阶段补偿服务都需要由业务代码来实现。今天我们就来看看它的源码实现。
状态机定义
以一个典型的电商购物流程为例,我们定义3个服务,订单服务(OrderServer),账户服务(AccountService)和库存服务(StorageService),这里我们把订单服务当做聚合服务,也就是TM。
当外部下单时,订单服务首先会创建一个订单,然后调用账户服务扣减金额,最后调用库存服务扣减库存。这个流程入下图:
seata的saga模式是基于状态机来实现了,状态机对状态的控制需要一个JSON文件,这个JSON文件定义如下:
- {
- "Name": "buyGoodsOnline",
- "Comment": "buy a goods on line, add order, deduct account, deduct storage ",
- "StartState": "SaveOrder",
- "Version": "0.0.1",
- "States": {
- "SaveOrder": {
- "Type": "ServiceTask",
- "ServiceName": "orderSave",
- "ServiceMethod": "saveOrder",
- "CompensateState": "DeleteOrder",
- "Next": "ChoiceAccountState",
- "Input": [
- "$.[businessKey]",
- "$.[order]"
- ],
- "Output": {
- "SaveOrderResult": "$.#root"
- },
- "Status": {
- "#root == true": "SU",
- "#root == false": "FA",
- "$Exception{java.lang.Throwable}": "UN"
- }
- },
- "ChoiceAccountState":{
- "Type": "Choice",
- "Choices":[
- {
- "Expression":"[SaveOrderResult] == true",
- "Next":"ReduceAccount"
- }
- ],
- "Default":"Fail"
- },
- "ReduceAccount": {
- "Type": "ServiceTask",
- "ServiceName": "accountService",
- "ServiceMethod": "decrease",
- "CompensateState": "CompensateReduceAccount",
- "Next": "ChoiceStorageState",
- "Input": [
- "$.[businessKey]",
- "$.[userId]",
- "$.[money]",
- {
- "throwException" : "$.[mockReduceAccountFail]"
- }
- ],
- "Output": {
- "ReduceAccountResult": "$.#root"
- },
- "Status": {
- "#root == true": "SU",
- "#root == false": "FA",
- "$Exception{java.lang.Throwable}": "UN"
- },
- "Catch": [
- {
- "Exceptions": [
- "java.lang.Throwable"
- ],
- "Next": "CompensationTrigger"
- }
- ]
- },
- "ChoiceStorageState":{
- "Type": "Choice",
- "Choices":[
- {
- "Expression":"[ReduceAccountResult] == true",
- "Next":"ReduceStorage"
- }
- ],
- "Default":"Fail"
- },
- "ReduceStorage": {
- "Type": "ServiceTask",
- "ServiceName": "storageService",
- "ServiceMethod": "decrease",
- "CompensateState": "CompensateReduceStorage",
- "Input": [
- "$.[businessKey]",
- "$.[productId]",
- "$.[count]",
- {
- "throwException" : "$.[mockReduceStorageFail]"
- }
- ],
- "Output": {
- "ReduceStorageResult": "$.#root"
- },
- "Status": {
- "#root == true": "SU",
- "#root == false": "FA",
- "$Exception{java.lang.Throwable}": "UN"
- },
- "Catch": [
- {
- "Exceptions": [
- "java.lang.Throwable"
- ],
- "Next": "CompensationTrigger"
- }
- ],
- "Next": "Succeed"
- },
- "DeleteOrder": {
- "Type": "ServiceTask",
- "ServiceName": "orderSave",
- "ServiceMethod": "deleteOrder",
- "Input": [
- "$.[businessKey]",
- "$.[order]"
- ]
- },
- "CompensateReduceAccount": {
- "Type": "ServiceTask",
- "ServiceName": "accountService",
- "ServiceMethod": "compensateDecrease",
- "Input": [
- "$.[businessKey]",
- "$.[userId]",
- "$.[money]"
- ]
- },
- "CompensateReduceStorage": {
- "Type": "ServiceTask",
- "ServiceName": "storageService",
- "ServiceMethod": "compensateDecrease",
- "Input": [
- "$.[businessKey]",
- "$.[productId]",
- "$.[count]"
- ]
- },
- "CompensationTrigger": {
- "Type": "CompensationTrigger",
- "Next": "Fail"
- },
- "Succeed": {
- "Type":"Succeed"
- },
- "Fail": {
- "Type":"Fail",
- "ErrorCode": "PURCHASE_FAILED",
- "Message": "purchase failed"
- }
- }
- }
状态机是运行在TM中的,也就是我们上面定义的订单服务。订单服务创建订单时需要开启一个全局事务,这时就需要启动状态机,代码如下:
- StateMachineEngine stateMachineEngine = (StateMachineEngine) ApplicationContextUtils.getApplicationContext().getBean("stateMachineEngine");
- Map<String, Object> startParams = new HashMap<>(3);
- String businessKey = String.valueOf(System.currentTimeMillis());
- startParams.put("businessKey", businessKey);
- startParams.put("order", order);
- startParams.put("mockReduceAccountFail", "true");
- startParams.put("userId", order.getUserId());
- startParams.put("money", order.getPayAmount());
- startParams.put("productId", order.getProductId());
- startParams.put("count", order.getCount());
- //sync test
- StateMachineInstance inst = stateMachineEngine.startWithBusinessKey("buyGoodsOnline", null, businessKey, startParams);
可以看到,上面代码定义的buyGoodsOnline,正是JSON文件中name的属性值。
状态机初始化
那上面创建订单代码中的stateMachineEngine这个bean是在哪里定义的呢?订单服务的demo中有一个类StateMachineConfiguration来进行定义,代码如下:
- public class StateMachineConfiguration {
- @Bean
- public ThreadPoolExecutorFactoryBean threadExecutor(){
- ThreadPoolExecutorFactoryBean threadExecutor = new ThreadPoolExecutorFactoryBean();
- threadExecutor.setThreadNamePrefix("SAGA_ASYNC_EXE_");
- threadExecutor.setCorePoolSize(1);
- threadExecutor.setMaxPoolSize(20);
- return threadExecutor;
- }
- @Bean
- public DbStateMachineConfig dbStateMachineConfig(ThreadPoolExecutorFactoryBean threadExecutor, DataSource hikariDataSource) throws IOException {
- DbStateMachineConfig dbStateMachineConfig = new DbStateMachineConfig();
- dbStateMachineConfig.setDataSource(hikariDataSource);
- dbStateMachineConfig.setThreadPoolExecutor((ThreadPoolExecutor) threadExecutor.getObject());
- /**
- *这里配置了json文件的路径,TM在初始化的时候,会把json文件解析成StateMachineImpl类,如果数据库没有保存这个状态机,则存入数据库seata_state_machine_def表,
- *如果数据库有记录,则取最新的一条记录,并且注册到StateMachineRepositoryImpl,
- *注册的Map有2个,一个是stateMachineMapByNameAndTenant,key格式是(stateMachineName + "_" + tenantId),
- *一个是stateMachineMapById,key是stateMachine.getId()
- *具体代码见StateMachineRepositoryImpl类registryStateMachine方法
- *这个注册的触发方法在DefaultStateMachineConfig的初始化方法init(),这个类是DbStateMachineConfig的父类
- */
- dbStateMachineConfig.setResources(new PathMatchingResourcePatternResolver().getResources("classpath*:statelang/*.json"));//json文件
- dbStateMachineConfig.setEnableAsync(true);
- dbStateMachineConfig.setApplicationId("order-server");
- dbStateMachineConfig.setTxServiceGroup("my_test_tx_group");
- return dbStateMachineConfig;
- }
- @Bean
- public ProcessCtrlStateMachineEngine stateMachineEngine(DbStateMachineConfig dbStateMachineConfig){
- ProcessCtrlStateMachineEngine stateMachineEngine = new ProcessCtrlStateMachineEngine();
- stateMachineEngine.setStateMachineConfig(dbStateMachineConfig);
- return stateMachineEngine;
- }
- @Bean
- public StateMachineEngineHolder stateMachineEngineHolder(ProcessCtrlStateMachineEngine stateMachineEngine){
- StateMachineEngineHolder stateMachineEngineHolder = new StateMachineEngineHolder();
- stateMachineEngineHolder.setStateMachineEngine(stateMachineEngine);
- return stateMachineEngineHolder;
- }
- }
可以看到,我们在DbStateMachineConfig中配置了状态机的json文件,同时配置了applicationId和txServiceGroup。在DbStateMachineConfig初始化的时候,子类DefaultStateMachineConfig的init的方法会把json文件解析成状态机,并注册。
注册的过程中往seata_state_machine_def这张表里插入了1条记录,表里的content字段保存了我们的JOSON文件内容,其他字段值数据如下图:
附:根据前面的JSON文件,我们debug跟踪到的StateMachineImpl的内容如下:
- id = null
- tenantId = null
- appName = "SEATA"
- name = "buyGoodsOnline"
- comment = "buy a goods on line, add order, deduct account, deduct storage "
- version = "0.0.1"
- startState = "SaveOrder"
- status = {StateMachine$Status@9135} "AC"
- recoverStrategy = null
- isPersist = true
- type = "STATE_LANG"
- content = null
- gmtCreate = null
- states = {LinkedHashMap@9137} size = 11
- "SaveOrder" -> {ServiceTaskStateImpl@9153}
- "ChoiceAccountState" -> {ChoiceStateImpl@9155}
- "ReduceAccount" -> {ServiceTaskStateImpl@9157}
- "ChoiceStorageState" -> {ChoiceStateImpl@9159}
- "ReduceStorage" -> {ServiceTaskStateImpl@9161}
- "DeleteOrder" -> {ServiceTaskStateImpl@9163}
- "CompensateReduceAccount" -> {ServiceTaskStateImpl@9165}
- "CompensateReduceStorage" -> {ServiceTaskStateImpl@9167}
- "CompensationTrigger" -> {CompensationTriggerStateImpl@9169}
- "Succeed" -> {SucceedEndStateImpl@9171}
- "Fail" -> {FailEndStateImpl@9173}
启动状态机
在第一节创建订单的代码中,startWithBusinessKey方法进行了整个事务的启动,这个方法还有一个异步模式startWithBusinessKeyAsync,这里我们只分析同步模式,源代码如下:
- public StateMachineInstance startWithBusinessKey(String stateMachineName, String tenantId, String businessKey,
- Map<String, Object> startParams) throws EngineExecutionException {
- return startInternal(stateMachineName, tenantId, businessKey, startParams, false, null);
- }
- private StateMachineInstance startInternal(String stateMachineName, String tenantId, String businessKey,
- Map<String, Object> startParams, boolean async, AsyncCallback callback)
- throws EngineExecutionException {
- //省略部分源代码
- //创建一个状态机实例
- //默认值tenantId="000001"
- StateMachineInstance instance = createMachineInstance(stateMachineName, tenantId, businessKey, startParams);
- /**
- * ProcessType.STATE_LANG这个枚举只有一个元素
- * OPERATION_NAME_START = "start"
- * callback是null
- * getStateMachineConfig()返回DbStateMachineConfig
- */
- ProcessContextBuilder contextBuilder = ProcessContextBuilder.create().withProcessType(ProcessType.STATE_LANG)
- .withOperationName(DomainConstants.OPERATION_NAME_START).withAsyncCallback(callback).withInstruction(
- new StateInstruction(stateMachineName, tenantId)).withStateMachineInstance(instance)
- .withStateMachineConfig(getStateMachineConfig()).withStateMachineEngine(this);
- Map<String, Object> contextVariables;
- if (startParams != null) {
- contextVariables = new ConcurrentHashMap<>(startParams.size());
- nullSafeCopy(startParams, contextVariables);
- } else {
- contextVariables = new ConcurrentHashMap<>();
- }
- instance.setContext(contextVariables);//把启动参数赋值给状态机实例的context
- //给ProcessContextImpl的variables加参数
- contextBuilder.withStateMachineContextVariables(contextVariables);
- contextBuilder.withIsAsyncExecution(async);
- //上面定义的建造者创建一个ProcessContextImpl
- ProcessContext processContext = contextBuilder.build();
- //这个条件是true
- if (instance.getStateMachine().isPersist() && stateMachineConfig.getStateLogStore() != null) {
- //记录状态机开始状态
- stateMachineConfig.getStateLogStore().recordStateMachineStarted(instance, processContext);
- }
- if (StringUtils.isEmpty(instance.getId())) {
- instance.setId(
- stateMachineConfig.getSeqGenerator().generate(DomainConstants.SEQ_ENTITY_STATE_MACHINE_INST));
- }
- if (async) {
- stateMachineConfig.getAsyncProcessCtrlEventPublisher().publish(processContext);
- } else {
- //发送消息到EventBus,这里的消费者是ProcessCtrlEventConsumer,在DefaultStateMachineConfig初始化时设置
- stateMachineConfig.getProcessCtrlEventPublisher().publish(processContext);
- }
- return instance;
- }
上面的代码中我们可以看出,启动状态记得时候主要做了2件事情,一个是记录状态机开始的状态,一个是发送消息到EventBus,下面我们详细看一下这2个过程。
开启全局事务
上面的代码分析中,有一个记录状态机开始状态的代码,如下:
- stateMachineConfig.getStateLogStore().recordStateMachineStarted(instance, processContext);
这里调用了类DbAndReportTcStateLogStore的recordStateMachineStarted方法,我们来看一下,代码如下:
- public void recordStateMachineStarted(StateMachineInstance machineInstance, ProcessContext context) {
- if (machineInstance != null) {
- //if parentId is not null, machineInstance is a SubStateMachine, do not start a new global transaction,
- //use parent transaction instead.
- String parentId = machineInstance.getParentId();
- if (StringUtils.hasLength(parentId)) {
- if (StringUtils.isEmpty(machineInstance.getId())) {
- machineInstance.setId(parentId);
- }
- } else {
- //走这个分支,因为没有配置子状态机
- /**
- * 这里的beginTransaction就是开启全局事务,
- * 这里是调用TC开启全局事务
- */
- beginTransaction(machineInstance, context);
- }
- if (StringUtils.isEmpty(machineInstance.getId()) && seqGenerator != null) {
- machineInstance.setId(seqGenerator.generate(DomainConstants.SEQ_ENTITY_STATE_MACHINE_INST));
- }
- // save to db
- //dbType = "MySQL"
- machineInstance.setSerializedStartParams(paramsSerializer.serialize(machineInstance.getStartParams()));
- executeUpdate(stateLogStoreSqls.getRecordStateMachineStartedSql(dbType),
- STATE_MACHINE_INSTANCE_TO_STATEMENT_FOR_INSERT, machineInstance);
- }
- }
上面executeUpdate方法在子类AbstractStore,debug一下executeUpdate这个方法可以看到,这里执行的sql如下:
- INSERT INTO seata_state_machine_inst
- (id, machine_id, tenant_id, parent_id, gmt_started, business_key, start_params, is_running, status, gmt_updated)
- VALUES ('192.168.59.146:8091:65853497147990016', '06a098cab53241ca7ed09433342e9f07', '000001', null, '2020-10-31 17:18:24.773',
- '1604135904773', '{"@type":"java.util.HashMap","money":50.,"productId":1L,"_business_key_":"1604135904773","businessKey":"1604135904773",
- "count":1,"mockReduceAccountFail":"true","userId":1L,"order":{"@type":"io.seata.sample.entity.Order","count":1,"payAmount":50,
- "productId":1,"userId":1}}', 1, 'RU', '2020-10-31 17:18:24.773')
可以看到,这个全局事务记录在了表seata_state_machine_inst,记录的是我们启动状态机的参数,status记录的状态是"RU"也就是RUNNING。
分支事务处理
上一节我们提到,启动状态机后,向EventBus发了一条消息,这个消息的消费者是ProcessCtrlEventConsumer,我们看一下这个类的代码:
- public class ProcessCtrlEventConsumer implements EventConsumer<ProcessContext> {
- private ProcessController processController;
- @Override
- public void process(ProcessContext event) throws FrameworkException {
- //这里的processController是ProcessControllerImpl
- processController.process(event);
- }
- @Override
- public boolean accept(Class<ProcessContext> clazz) {
- return ProcessContext.class.isAssignableFrom(clazz);
- }
- public void setProcessController(ProcessController processController) {
- this.processController = processController;
- }
- }
ProcessControllerImpl类的process方法有2个处理逻辑,process和route,代码如下:
- public void process(ProcessContext context) throws FrameworkException {
- try {
- //这里的businessProcessor是CustomizeBusinessProcessor
- businessProcessor.process(context);
- businessProcessor.route(context);
- } catch (FrameworkException fex) {
- throw fex;
- } catch (Exception ex) {
- LOGGER.error("Unknown exception occurred, context = {}", context, ex);
- throw new FrameworkException(ex, "Unknown exception occurred", FrameworkErrorCode.UnknownAppError);
- }
- }
这里的处理逻辑有些复杂,先上一张UML类图,跟着这张图,可以捋清楚代码的调用逻辑:
我们先来看一下CustomizeBusinessProcessor中的process方法:
- public void process(ProcessContext context) throws FrameworkException {
- /**
- *processType = {ProcessType@10310} "STATE_LANG"
- *code = "STATE_LANG"
- *message = "SEATA State Language"
- *name = "STATE_LANG"
- *ordinal = 0
- */
- ProcessType processType = matchProcessType(context);
- if (processType == null) {
- if (LOGGER.isWarnEnabled()) {
- LOGGER.warn("Process type not found, context= {}", context);
- }
- throw new FrameworkException(FrameworkErrorCode.ProcessTypeNotFound);
- }
- ProcessHandler processor = processHandlers.get(processType.getCode());
- if (processor == null) {
- LOGGER.error("Cannot find process handler by type {}, context= {}", processType.getCode(), context);
- throw new FrameworkException(FrameworkErrorCode.ProcessHandlerNotFound);
- }
- //这里的是StateMachineProcessHandler
- processor.process(context);
- }
这里的代码不好理解,我们分四步来研究。
第一步,我们看一下StateMachineProcessHandler类中process方法,这个方法代理了ServiceTaskStateHandler的process方法,代码如下:
- public void process(ProcessContext context) throws FrameworkException {
- /**
- * instruction = {StateInstruction@11057}
- * stateName = null
- * stateMachineName = "buyGoodsOnline"
- * tenantId = "000001"
- * end = false
- * temporaryState = null
- */
- StateInstruction instruction = context.getInstruction(StateInstruction.class);
- //这里的state实现类是ServiceTaskStateImpl
- State state = instruction.getState(context);
- String stateType = state.getType();
- //这里stateHandler实现类是ServiceTaskStateHandler
- StateHandler stateHandler = stateHandlers.get(stateType);
- List<StateHandlerInterceptor> interceptors = null;
- if (stateHandler instanceof InterceptableStateHandler) {
- //list上有1个元素ServiceTaskHandlerInterceptor
- interceptors = ((InterceptableStateHandler)stateHandler).getInterceptors();
- }
- List<StateHandlerInterceptor> executedInterceptors = null;
- Exception exception = null;
- try {
- if (interceptors != null && interceptors.size() > 0) {
- executedInterceptors = new ArrayList<>(interceptors.size());
- for (StateHandlerInterceptor interceptor : interceptors) {
- executedInterceptors.add(interceptor);
- interceptor.preProcess(context);
- }
- }
- stateHandler.process(context);
- } catch (Exception e) {
- exception = e;
- throw e;
- } finally {
- if (executedInterceptors != null && executedInterceptors.size() > 0) {
- for (int i = executedInterceptors.size() - 1; i >= 0; i--) {
- StateHandlerInterceptor interceptor = executedInterceptors.get(i);
- interceptor.postProcess(context, exception);
- }
- }
- }
- }
从这个方法我们看到,代理对stateHandler.process加入了前置和后置增强,增强类是ServiceTaskHandlerInterceptor,前置后置增强分别调用了interceptor的preProcess和postProcess。
第二步,我们来看一下增强逻辑。ServiceTaskHandlerInterceptor的preProcess和postProcess方法,代码如下:
- public class ServiceTaskHandlerInterceptor implements StateHandlerInterceptor {
- //省略部分代码
- @Override
- public void preProcess(ProcessContext context) throws EngineExecutionException {
- StateInstruction instruction = context.getInstruction(StateInstruction.class);
- StateMachineInstance stateMachineInstance = (StateMachineInstance)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_INST);
- StateMachineConfig stateMachineConfig = (StateMachineConfig)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_CONFIG);
- //如果超时,修改状态机状态为FA
- if (EngineUtils.isTimeout(stateMachineInstance.getGmtUpdated(), stateMachineConfig.getTransOperationTimeout())) {
- String message = "Saga Transaction [stateMachineInstanceId:" + stateMachineInstance.getId()
- + "] has timed out, stop execution now.";
- EngineUtils.failStateMachine(context, exception);
- throw exception;
- }
- StateInstanceImpl stateInstance = new StateInstanceImpl();
- Map<String, Object> contextVariables = (Map<String, Object>)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_CONTEXT);
- ServiceTaskStateImpl state = (ServiceTaskStateImpl)instruction.getState(context);
- List<Object> serviceInputParams = null;
- Object isForCompensation = state.isForCompensation();
- if (isForCompensation != null && (Boolean)isForCompensation) {
- CompensationHolder compensationHolder = CompensationHolder.getCurrent(context, true);
- StateInstance stateToBeCompensated = compensationHolder.getStatesNeedCompensation().get(state.getName());
- if (stateToBeCompensated != null) {
- stateToBeCompensated.setCompensationState(stateInstance);
- stateInstance.setStateIdCompensatedFor(stateToBeCompensated.getId());
- } else {
- LOGGER.error("Compensation State[{}] has no state to compensate, maybe this is a bug.",
- state.getName());
- }
- //加入补偿集合
- CompensationHolder.getCurrent(context, true).addForCompensationState(stateInstance.getName(),
- stateInstance);
- }
- //省略部分代码
- stateInstance.setInputParams(serviceInputParams);
- if (stateMachineInstance.getStateMachine().isPersist() && state.isPersist()
- && stateMachineConfig.getStateLogStore() != null) {
- try {
- //记录一个分支事务的状态RU到数据库
- /**
- *INSERT INTO seata_state_inst (id, machine_inst_id, name, type, gmt_started, service_name, service_method, service_type, is_for_update, input_params, status, business_key, state_id_compensated_for, state_id_retried_for)
- *VALUES ('4fe5f602452c84ba5e88fd2ee9c13b35', '192.168.59.146:8091:65853497147990016', 'SaveOrder', 'ServiceTask', '2020-10-31 17:18:40.84', 'orderSave',
- *'saveOrder', null, 1, '["1604135904773",{"@type":"io.seata.sample.entity.Order","count":1,"payAmount":50,"productId":1,"userId":1}]', 'RU', null, null, null)
- */
- stateMachineConfig.getStateLogStore().recordStateStarted(stateInstance, context);
- }
- }
- //省略部分代码
- stateMachineInstance.putStateInstance(stateInstance.getId(), stateInstance);//放入StateMachineInstanceImpl的stateMap用于重试或交易补偿
- ((HierarchicalProcessContext)context).setVariableLocally(DomainConstants.VAR_NAME_STATE_INST, stateInstance);//记录状态后面传给TaskStateRouter判断全局事务结束
- }
- @Override
- public void postProcess(ProcessContext context, Exception exp) throws EngineExecutionException {
- StateInstruction instruction = context.getInstruction(StateInstruction.class);
- ServiceTaskStateImpl state = (ServiceTaskStateImpl)instruction.getState(context);
- StateMachineInstance stateMachineInstance = (StateMachineInstance)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_INST);
- StateInstance stateInstance = (StateInstance)context.getVariable(DomainConstants.VAR_NAME_STATE_INST);
- if (stateInstance == null || !stateMachineInstance.isRunning()) {
- LOGGER.warn("StateMachineInstance[id:" + stateMachineInstance.getId() + "] is end. stop running");
- return;
- }
- StateMachineConfig stateMachineConfig = (StateMachineConfig)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_CONFIG);
- if (exp == null) {
- exp = (Exception)context.getVariable(DomainConstants.VAR_NAME_CURRENT_EXCEPTION);
- }
- stateInstance.setException(exp);
- //设置事务状态
- decideExecutionStatus(context, stateInstance, state, exp);
- //省略部分代码
- Map<String, Object> contextVariables = (Map<String, Object>)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_CONTEXT);
- //省略部分代码
- context.removeVariable(DomainConstants.VAR_NAME_OUTPUT_PARAMS);
- context.removeVariable(DomainConstants.VAR_NAME_INPUT_PARAMS);
- stateInstance.setGmtEnd(new Date());
- if (stateMachineInstance.getStateMachine().isPersist() && state.isPersist()
- && stateMachineConfig.getStateLogStore() != null) {
- //更新分支事务的状态为成功
- /**
- * UPDATE seata_state_inst SET gmt_end = '2020-10-31 17:18:49.919', excep = null, status = 'SU',
- * output_params = 'true' WHERE id = '4fe5f602452c84ba5e88fd2ee9c13b35' AND
- * machine_inst_id = '192.168.59.146:8091:65853497147990016'
- */
- stateMachineConfig.getStateLogStore().recordStateFinished(stateInstance, context);
- }
- //省略部分代码
- }
- }
从这个代码我们能看到,分支事务执行前,封装了一个StateInstanceImpl赋值给了ProcessContext,分支事务执行后,对这个StateInstanceImpl进行了修改,这个StateInstanceImpl有3个作用:
传入StateMachineInstanceImpl的stateMap用于重试或交易补偿
记录了分支事务的执行情况,同时支持持久化到seata_state_inst表
传入TaskStateRouter用作判断全局事务结束
第三步,我们看一下被代理的方法stateHandler.process(context),正常执行逻辑中stateHandler的实现类是ServiceTaskStateHandler,代码如下:
- public void process(ProcessContext context) throws EngineExecutionException {
- StateInstruction instruction = context.getInstruction(StateInstruction.class);
- ServiceTaskStateImpl state = (ServiceTaskStateImpl) instruction.getState(context);
- StateInstance stateInstance = (StateInstance) context.getVariable(DomainConstants.VAR_NAME_STATE_INST);
- Object result;
- try {
- /**
- * 这里的input是我们在JSON中定义的,比如orderSave这个ServiceTask,input如下:
- * 0 = "1608714480316"
- * 1 = {Order@11271} "Order(id=null, userId=1, productId=1, count=1, payAmount=50, status=null)"
- * JSON中定义如下:
- * "Input": [
- * "$.[businessKey]",
- * "$.[order]"
- * ]
- */
- List<Object> input = (List<Object>) context.getVariable(DomainConstants.VAR_NAME_INPUT_PARAMS);
- //Set the current task execution status to RU (Running)
- stateInstance.setStatus(ExecutionStatus.RU);//设置状态
- if (state instanceof CompensateSubStateMachineState) {
- //省略子状态机的研究
- } else {
- StateMachineConfig stateMachineConfig = (StateMachineConfig) context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_CONFIG);
- //这里的state.getServiceType是springBean
- ServiceInvoker serviceInvoker = stateMachineConfig.getServiceInvokerManager().getServiceInvoker(
- state.getServiceType());
- if (serviceInvoker == null) {
- throw new EngineExecutionException("No such ServiceInvoker[" + state.getServiceType() + "]",
- FrameworkErrorCode.ObjectNotExists);
- }
- if (serviceInvoker instanceof ApplicationContextAware) {
- ((ApplicationContextAware) serviceInvoker).setApplicationContext(
- stateMachineConfig.getApplicationContext());
- }
- //这里触发了我们在JSON中定义ServiceTask中方法,比如orderSave中的saveOrder方法
- result = serviceInvoker.invoke(state, input.toArray());
- }
- if (LOGGER.isDebugEnabled()) {
- LOGGER.debug("<<<<<<<<<<<<<<<<<<<<<< State[{}], ServiceName[{}], Method[{}] Execute finish. result: {}",
- state.getName(), serviceName, methodName, result);
- }
- //省略部分代码
- }
- //省略异常处理代码
- }
可以看到,process这个方法是一个核心的业务处理,它用发射触发了JSON中定义ServiceTask的方法,并且根据状态触发了Next对象,即流程中的下一个ServiceTask。
第四步,我们再看一下CustomizeBusinessProcessor的route方法,代码如下:
- public void route(ProcessContext context) throws FrameworkException {
- //code = "STATE_LANG"
- //message = "SEATA State Language"
- //name = "STATE_LANG"
- //ordinal = 0
- ProcessType processType = matchProcessType(context);
- RouterHandler router = routerHandlers.get(processType.getCode());
- //DefaultRouterHandler的route方法
- router.route(context);
- }
我们看一下DefaultRouterHandler的route方法,代码如下:
- public void route(ProcessContext context) throws FrameworkException {
- try {
- ProcessType processType = matchProcessType(context);
- //这里的processRouter是StateMachineProcessRouter
- ProcessRouter processRouter = processRouters.get(processType.getCode());
- Instruction instruction = processRouter.route(context);
- if (instruction == null) {
- LOGGER.info("route instruction is null, process end");
- } else {
- context.setInstruction(instruction);
- eventPublisher.publish(context);
- }
- } catch (FrameworkException e) {
- throw e;
- } catch (Exception ex) {
- throw new FrameworkException(ex, ex.getMessage(), FrameworkErrorCode.UnknownAppError);
- }
- }
看一下StateMachineProcessRouter的route方法,这里也是用了代理模式,代码如下:
- public Instruction route(ProcessContext context) throws FrameworkException {
- StateInstruction stateInstruction = context.getInstruction(StateInstruction.class);
- State state;
- if (stateInstruction.getTemporaryState() != null) {
- state = stateInstruction.getTemporaryState();
- stateInstruction.setTemporaryState(null);
- } else {
- //走这个分支
- StateMachineConfig stateMachineConfig = (StateMachineConfig)context.getVariable(
- DomainConstants.VAR_NAME_STATEMACHINE_CONFIG);
- StateMachine stateMachine = stateMachineConfig.getStateMachineRepository().getStateMachine(
- stateInstruction.getStateMachineName(), stateInstruction.getTenantId());
- state = stateMachine.getStates().get(stateInstruction.getStateName());
- }
- String stateType = state.getType();
- StateRouter router = stateRouters.get(stateType);
- Instruction instruction = null;
- List<StateRouterInterceptor> interceptors = null;
- if (router instanceof InterceptableStateRouter) {
- //这里只有EndStateRouter
- interceptors = ((InterceptableStateRouter)router).getInterceptors();//EndStateRouterInterceptor
- }
- List<StateRouterInterceptor> executedInterceptors = null;
- Exception exception = null;
- try {
- //前置增量实现方法是空,这里省略代码
- instruction = router.route(context, state);
- } catch (Exception e) {
- exception = e;
- throw e;
- } finally {
- if (executedInterceptors != null && executedInterceptors.size() > 0) {
- for (int i = executedInterceptors.size() - 1; i >= 0; i--) {
- StateRouterInterceptor interceptor = executedInterceptors.get(i);
- interceptor.postRoute(context, state, instruction, exception);//结束状态机
- }
- }
- //if 'Succeed' or 'Fail' State did not configured, we must end the state machine
- if (instruction == null && !stateInstruction.isEnd()) {
- EngineUtils.endStateMachine(context);
- }
- }
- return instruction;
- }
这里的代理只实现了一个后置增强,做的事情就是结束状态机。
下面我们来看一下StateRouter,UML类图如下:
从UML类图我们看到,除了EndStateRouter,只有一个TaskStateRouter了。而EndStateRouter并没有做什么事情,因为关闭状态机的逻辑已经由代理做了。这里我们看一下TaskStateRouter,代码如下:
- public Instruction route(ProcessContext context, State state) throws EngineExecutionException {
- StateInstruction stateInstruction = context.getInstruction(StateInstruction.class);
- if (stateInstruction.isEnd()) {
- //如果已经结束,直接返回
- //省略代码
- }
- //The current CompensationTriggerState can mark the compensation process is started and perform compensation
- // route processing.
- State compensationTriggerState = (State)context.getVariable(
- DomainConstants.VAR_NAME_CURRENT_COMPEN_TRIGGER_STATE);
- if (compensationTriggerState != null) {
- //加入补偿集合进行补偿并返回
- return compensateRoute(context, compensationTriggerState);
- }
- //There is an exception route, indicating that an exception is thrown, and the exception route is prioritized.
- String next = (String)context.getVariable(DomainConstants.VAR_NAME_CURRENT_EXCEPTION_ROUTE);
- if (StringUtils.hasLength(next)) {
- context.removeVariable(DomainConstants.VAR_NAME_CURRENT_EXCEPTION_ROUTE);
- } else {
- next = state.getNext();
- }
- //If next is empty, the state selected by the Choice state was taken.
- if (!StringUtils.hasLength(next) && context.hasVariable(DomainConstants.VAR_NAME_CURRENT_CHOICE)) {
- next = (String)context.getVariable(DomainConstants.VAR_NAME_CURRENT_CHOICE);
- context.removeVariable(DomainConstants.VAR_NAME_CURRENT_CHOICE);
- }
- //从当前context中取不出下一个节点了,直接返回
- if (!StringUtils.hasLength(next)) {
- return null;
- }
- StateMachine stateMachine = state.getStateMachine();
- State nextState = stateMachine.getState(next);
- if (nextState == null) {
- throw new EngineExecutionException("Next state[" + next + "] is not exits",
- FrameworkErrorCode.ObjectNotExists);
- }
- //获取到下一个要流转的状态并且赋值给stateInstruction
- stateInstruction.setStateName(next);
- return stateInstruction;
- }
可以看到,route的作用是帮状态机确定下一个流程节点,然后放入到当前的context中的stateInstruction。
到这里,我们就分析完成了状态机的原理,ProcessControllerImpl类中。
需要注意的是,这里获取到下一个节点后,并没有直接处理,而是使用观察者模式,先发送到EventBus,等待观察者来处理,循环往复,直到EndStateRouter结束状态机。
这里观察者模式的Event是ProcessContext,里面包含了Instruction,而Instruction里面包含了State,这个State里面就决定了下一个处理的节点直到结束。UML类图如下:
总结
seata中间件中的saga模式使用比较广泛,但是代码还是比较复杂的。我从下面几个方面进行了梳理:
- 我们定义的json文件加载到了类StateMachineImpl中。
- 启动状态机,我们也就启动了全局事务,这个普通模式启动全局事务是一样的,都会向TC发送消息。
- 处理状态机状态和控制状态流转的入口类在ProcessControllerImpl,从process方法可以跟代码。
- ProcessControllerImpl调用CustomizeBusinessProcessor的process处理当前状态,然后调用route方法获取到下一个节点并发送到EventBus。
saga模式额外引入了3张表,我们也可以根据跟全局事务和分支事务相关的2张表来跟踪代码,我之前给出的demo,如果事务成功,这2张表的写sql按照状态机执行顺序给出一个成功sql,代码如下:
- INSERT INTO seata_state_machine_inst
- (id, machine_id, tenant_id, parent_id, gmt_started, business_key, start_params, is_running, status, gmt_updated)
- VALUES ('192.168.59.146:8091:65853497147990016', '06a098cab53241ca7ed09433342e9f07', '000001', null, '2020-10-31 17:18:24.773', '1604135904773', '{"@type":"java.util.HashMap","money":50.,"productId":1L,"_business_key_":"1604135904773","businessKey":"1604135904773",\"count\":1,\"mockreduceaccountfail\":\"true\","userId":1L,"order":{"@type":"io.seata.sample.entity.Order","count":1,"payAmount":50,"productId":1,"userId":1}}', 1, 'RU', '2020-10-31 17:18:24.773')
- INSERT INTO seata_state_inst (id, machine_inst_id, name, type, gmt_started, service_name, service_method, service_type, is_for_update, input_params, status, business_key, state_id_compensated_for, state_id_retried_for)
- VALUES ('4fe5f602452c84ba5e88fd2ee9c13b35', '192.168.59.146:8091:65853497147990016', 'SaveOrder', 'ServiceTask', '2020-10-31 17:18:40.84', 'orderSave', 'saveOrder', null, 1, '["1604135904773",{"@type":"io.seata.sample.entity.Order","count":1,"payAmount":50,"productId":1,"userId":1}]', 'RU', null, null, null)
- UPDATE seata_state_inst SET gmt_end = '2020-10-31 17:18:49.919', excep = null, status = 'SU', output_params = 'true' WHERE id = '4fe5f602452c84ba5e88fd2ee9c13b35' AND machine_inst_id = '192.168.59.146:8091:65853497147990016'
- INSERT INTO seata_state_inst (id, machine_inst_id, name, type, gmt_started, service_name, service_method, service_type, is_for_update, input_params, status, business_key, state_id_compensated_for, state_id_retried_for)
- VALUES ('8371235cb2c66c8626e148f66123d3b4', '192.168.59.146:8091:65853497147990016', 'ReduceAccount', 'ServiceTask', '2020-10-31 17:19:00.441', 'accountService', 'decrease', null, 1, '["1604135904773",1L,50.,{"@type":"java.util.LinkedHashMap","throwException":"true"}]', 'RU', null, null, null)
- UPDATE seata_state_inst SET gmt_end = '2020-10-31 17:19:09.593', excep = null, status = 'SU', output_params = 'true' WHERE id = '8371235cb2c66c8626e148f66123d3b4' AND machine_inst_id = '192.168.59.146:8091:65853497147990016'
- INSERT INTO seata_state_inst (id, machine_inst_id, name, type, gmt_started, service_name, service_method, service_type, is_for_update, input_params, status, business_key, state_id_compensated_for, state_id_retried_for)
- VALUES ('e70a49f1eac72f929085f4e82c2b4de2', '192.168.59.146:8091:65853497147990016', 'ReduceStorage', 'ServiceTask', '2020-10-31 17:19:18.494', 'storageService', 'decrease', null, 1, '["1604135904773",1L,1,{"@type":"java.util.LinkedHashMap"}]', 'RU', null, null, null)
- UPDATE seata_state_inst SET gmt_end = '2020-10-31 17:19:26.613', excep = null, status = 'SU', output_params = 'true' WHERE id = 'e70a49f1eac72f929085f4e82c2b4de2' AND machine_inst_id = '192.168.59.146:8091:65853497147990016'
- UPDATE seata_state_machine_inst SET gmt_end = '2020-10-31 17:19:33.581', excep = null, end_params = '{"@type":"java.util.HashMap","productId":1L,"count":1,"ReduceAccountResult":true,"mockReduceAccountFail":"true","userId":1L,"money":50.,"SaveOrderResult":true,"_business_key_":"1604135904773","businessKey":"1604135904773","ReduceStorageResult":true,"order":{"@type":"io.seata.sample.entity.Order","count":1,"id":60,"payAmount":50,"productId":1,"userId":1}}',status = 'SU', compensation_status = null, is_running = 0, gmt_updated = '2020-10-31 17:19:33.582' WHERE id = '192.168.59.146:8091:65853497147990016' and gmt_updated = '2020-10-31 17:18:24.773'
这篇文章我主要从一个正常的流程研究了saga模式的源代码,还有好多细节没有做分析,比如全局事务失败后的回滚或补偿逻辑,以后有机会再交流。