场景
一个调度器,两个调度任务,分别处理两个目录下的txt文件,某个调度任务应对某些复杂问题的时候会持续特别长的时间,甚至有一直阻塞的可能。我们需要一个manager来管理这些task,当这个task的上一次执行时间距离现在超过5个调度周期的时候,就直接停掉这个线程,然后再重启它,保证两个目标目录下没有待处理的txt文件堆积。
问题
直接使用java默认的线程池调度task1和task2.由于外部txt的种种不可控原因,导致task2线程阻塞。现象就是task1和线程池调度器都正常运行着,但是task2迟迟没有动作。
当然,找到具体的阻塞原因并进行针对性解决是很重要的。但是,这种措施很可能并不能完全、彻底、全面的处理好所有未知情况。我们需要保证任务线程或者调度器的健壮性!
方案计划
线程池调度器并没有原生的针对被调度线程的业务运行状态进行监控处理的API。因为task2是阻塞在我们的业务逻辑里的,所以***的方式是写一个TaskManager,所有的任务线程在执行任务前全部到这个TaskManager这里来注册自己。这个TaskManager就负责对于每个自己管辖范围内的task进行实时全程监控!
后面的重点就是如何处理超过5个执行周期的task了。
方案如下:
●一旦发现这个task线程,立即中止它,然后再次重启;
●一旦发现这个task线程,直接将整个pool清空并停止,重新放入这两个task ——【task明确的情况下】;
方案实施
中止后重启
●Task实现类
- class FileTask extends Thread {
- private long lastExecTime = 0;
- protected long interval = 10000;
- public long getLastExecTime() {
- return lastExecTime;
- }
- public void setLastExecTime(long lastExecTime) {
- this.lastExecTime = lastExecTime;
- }
- public long getInterval() {
- return interval;
- }
- public void setInterval(long interval) {
- this.interval = interval;
- }
- public File[] getFiles() {
- return null;
- }
●Override
- public void run() {
- while (!Thread.currentThread().isInterrupted()) {
- lastExecTime = System.currentTimeMillis();
- System.out.println(Thread.currentThread().getName() + " is running -> " + new Date());
- try {
- Thread.sleep(getInterval() * 6 * 1000);
- } catch (InterruptedException e) {
- Thread.currentThread().interrupt();
- e.printStackTrace(); // 当线程池shutdown之后,这里就会抛出exception了
- }
- }
- }
- }
●TaskManager
- public class TaskManager implements Runnable {
- private final static Log logger = LogFactory.getLog(TaskManager .class);
- public Set<FileTask> runners = new CopyOnWriteArraySet<FileTask>();
- ExecutorService pool = Executors.newCachedThreadPool();
- public void registerCodeRunnable(FileTask process) {
- runners.add(process);
- }
- public TaskManager (Set<FileTask> runners) {
- this.runners = runners;
- }
@Override
- public void run() {
- while (!Thread.currentThread().isInterrupted()) {
- try {
- long current = System.currentTimeMillis();
- for (FileTask wrapper : runners) {
- if (current - wrapper.getLastExecTime() > wrapper.getInterval() * 5) {
- wrapper.interrupt();
- for (File file : wrapper.getFiles()) {
- file.delete();
- }
- wrapper.start();
- }
- }
- } catch (Exception e1) {
- logger.error("Error happens when we trying to interrupt and restart a task ");
- ExceptionCollector.registerException(e1);
- }
- try {
- Thread.sleep(500);
- } catch (InterruptedException e) {
- }
- }
- }
这段代码会报错 java.lang.Thread IllegalThreadStateException。为什么呢?其实这是一个很基础的问题,您应该不会像我一样马虎。查看Thread.start()的注释, 有这样一段:
It is never legal to start a thread more than once. In particular, a thread may not be restarted once it has completed execution.
是的,一个线程不能够启动两次。那么它是怎么判断的呢?
- public synchronized void start() {
- /**
- * A zero status value corresponds to state "NEW". 0对应的是state NEW
- */
if (threadStatus != 0) //如果不是NEW state,就直接抛出异常!#p#
- throw new IllegalThreadStateException();
- group.add(this);
- boolean started = false;
- try {
- start0(); // 启动线程的native方法
- started = true;
- } finally {
- try {
- if (!started) {
- group.threadStartFailed(this);
- }
- } catch (Throwable ignore) {
- }
- }
- }
恩,只有是NEW状态才能够调用native方法启动一个线程。好吧,到这里了,就普及也自补一下jvm里的线程状态:
所有的线程状态::
●NEW —— 还没有启动过
●RUNNABLE —— 正在jvm上运行着
●BLOCKED —— 正在等待锁/信号量被释放
●WAITING —— 等待其他某个线程的某个特定动作
●TIMED_WAITING —— A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.
●TERMINATED —— 退出,停止
线程在某个时间点上只可能存在一种状态,这些状态是jvm里的,并不反映操作系统线程的状态。查一下Thread的API,没有对其状态进行修改的API。那么这条路是不通的吗?
仔细考虑一下……
如果把任务做成Runnable实现类,然后在把这个实现类丢进线程池调度器之前,利用此Runnable构造一个Thread,是不是这个Thread对象就能够控制这个runnable对象,进而控制在线程池中运行着的task了呢?非也!让我们看看Thread和ThreadPoolExecutor对Runnable的处理吧。
●Thread
- /* What will be run. */
- private Runnable target;
结合上面的start()方法,很容易猜出,start0()会把target弄成一个线程来进行运行。
●ThreadPoolExecutor
- public void execute(Runnable command) {
- if (command == null)
- throw new NullPointerException();
- int c = ctl.get();
- if (workerCountOf(c) < corePoolSize) {
- if (addWorker(command, true))
- return;
- c = ctl.get();
- }
- if (isRunning(c) && workQueue.offer(command)) {
- int recheck = ctl.get();
- if (! isRunning(recheck) && remove(command))
- reject(command);
- else if (workerCountOf(recheck) == 0)
- addWorker(null, false);
- }
- else if (!addWorker(command, false))
- reject(command);
- }
- private boolean addWorker(Runnable firstTask, boolean core) {
- …
- boolean workerStarted = false;
- boolean workerAdded = false;
- Worker w = null;
- try {
- final ReentrantLock mainLock = this.mainLock;
- w = new Worker(firstTask);
- final Thread t = w.thread;
- if (t != null) {
- mainLock.lock();
- try {
- int c = ctl.get();
- int rs = runStateOf(c);
- if (rs < SHUTDOWN ||
- (rs == SHUTDOWN && firstTask == null)) {
- if (t.isAlive()) // precheck that t is startable
- throw new IllegalThreadStateException();
- workers.add(w);
- int s = workers.size();
- if (s > largestPoolSize)
- largestPoolSize = s;
- workerAdded = true;
- }
- } finally {
- mainLock.unlock();
- }
- if (workerAdded) {
- t.start();
- workerStarted = true;
- }
- }
- } finally {
- if (! workerStarted)
- addWorkerFailed(w);
- }
- return workerStarted;
- }
那么Worker又是怎样的呢?
●Worker
- private final class Worker
- extends AbstractQueuedSynchronizer
- implements Runnable
- {
- final Thread thread;
- Runnable firstTask;
- volatile long completedTasks;
- Worker(Runnable firstTask) {
- setState(-1); //调用runWorker之前不可以interrupt
- this.firstTask = firstTask;
- this.thread = getThreadFactory().newThread(this);
- }
- public void run() {
- runWorker(this);
- }
- ……
- …….
- void interruptIfStarted() {
- Thread t;
- if (getState() >= 0 && (t = thread) != null && !t.isInterrupted()) {
- try {
- t.interrupt();
- } catch (SecurityException ignore) {
- }
- }
- }
- }
可见worker里既包装了Runnable对象——task,又包装了一个Thread对象——以自己作为初始化参数,因为worker也是Runnable对象。然后对外提供了运行与停止接口,run()和interruptIfStarted()。回顾上面使用Thread的例子不禁有了新的领悟,我们把一个Thread对象交给ThreadPoolExecutor执行后,实际的调用是对Thread(FileTask())对象,我们暂时称之为workerWrapper。那么我们在池外进行FileTask.interrupt()操作影响的是FileTask对象,而不是workerWrapper。所以可能上面对于start()方法二次调用不是特别适当。更恰当的应该是在fileTask.interrupt()的时候就跑出异常,因为从来没有对fileTask对象执行过start()方法,这时候去interrupt就会出现错误。具体如下图:
分析到此,我们已经明确除了调用ThreadPoolExecutor了的interruptWorkers()方法别无其他途径操作这些worker了。
- private void interruptWorkers() {
- final ReentrantLock mainLock = this.mainLock;
- mainLock.lock();
- try {
- for (Worker w : workers)
- w.interruptIfStarted();
- } finally {
- mainLock.unlock();
- }
- }
重启线程池
●TaskManager
- public class TaskManager implements Runnable {
- …..
- public TaskManager (Set<FileTask> runners) {
- super();
- this.runners = runners;
- executeTasks(runners);
- }
- private void executeTasks(Set<FileTask> runners) {
- for (FileTask task : runners) {
- pool.execute(task);
- System.out.println(task.getClass().getSimpleName() + " has been started");
- }
- }
@Override
- public void run() {
- while (!Thread.currentThread().isInterrupted()) {
- try {
- long current = System.currentTimeMillis();
- for (FileTask wrapper : runners) {
- if (wrapper.getLastExecTime() != 0 && current - wrapper.getLastExecTime() > wrapper.getInterval() * 5 * 1000) { // 开始忘了乘以1000
- wrapper.interrupt();
- if (wrapper.getFiles() != null){
- for (File file : wrapper.getFiles()) {
- file.delete();
- }
- }
- System.out.println("Going to shutdown the thread pool");
- List<Runnable> shutdownNow = pool.shutdownNow(); // 不等当前pool里的任务执行完,直接关闭线程池
- for (Runnable run : shutdownNow) {
- System.out.println(run + " going to be shutdown");
- }
- while (pool.awaitTermination(1, TimeUnit.SECONDS)) {
- System.out.println("The thread pool has been shutdown " + new Date());
- executeTasks(runners); // 重新执行
- Thread.sleep(200);
- }
- }
- }
- } catch (Exception e1) {
- e1.printStackTrace();
- }
- try {
- Thread.sleep(500);
- } catch (InterruptedException e) {
- }
- }
- }
- public static void main(String[] args) {
- Set<FileTask> tasks = new HashSet<FileTask>();
- FileTask task = new FileTask();
- task.setInterval(1);
- task.setName("task-1");
- tasks.add(task);
- FileTask task1 = new FileTask();
- task1.setInterval(2);
- task.setName("task-2");
- tasks.add(task1);
- TaskManager codeManager = new TaskManager (tasks);
- new Thread(codeManager).start();
- }
- }
成功!把整个的ThreadPoolExector里所有的worker全部停止,之后再向其队列里重新加入要执行的两个task(注意这里并没有清空,只是停止而已)。这样做虽然能够及时处理task,但是一个很致命的缺点在于,如果不能明确的知道ThreadPoolExecutor要执行的task,就没有办法重新执行这些任务。#p#
定制线程池
好吧!停止钻研别人的东西!我们完全可以自己写一个自己的ThreadPoolExecutor,只要把worker暴露出来就可以了。这里是不是回想起前面的start问题来了,没错,我们即便能够直接针对Thread进行interrupt, 但是不能再次start它了。那么clone一个同样的Thread行不行呢?#p#
●Thread
- @Override
- protected Object clone() throws CloneNotSupportedException {
- throw new CloneNotSupportedException();
- }
答案显而易见,线程是不支持clone 的。我们需要重新new 一个Thread来重新运行。其实我们只需要将原来的Worker里的Runnable换成我们自己的task,然后将访问权限适当放开就可以了。还有,就是让我们的CustomThreadPoolExecutor继承Thread,因为它需要定时监控自己的所有的worker里Thread的运行状态。
●CustomThreadPoolExecutor
- public class CustomThreadPoolExecutor extends ThreadPoolExecutor implements Runnable {
- public void execute(Testask command) {
- ….//将执行接口改为接收我们的业务类
- }
- …
- …
- private final class Worker
- extends AbstractQueuedSynchronizer
- implements Runnable
- {
- …
- Testask firstTask; //将Runnable改为我们的业务类,方便查看状态
- …
- Worker(Testask firstTask) {
- …//同样将初始化参数改为我们的业务类
- }
- }
- public static void main(String[] args) {
- CustomThreadPoolExecutor pool = new CustomThreadPoolExecutor(0, Integer.MAX_VALUE,
- 60L, TimeUnit.SECONDS,
- new SynchronousQueue<Runnable>());
- Testask task = new Testask();
- task.setInterval(1);
- pool.execute(task);
- Testask task1 = new Testask();
- task1.setInterval(2);
- pool.execute(task1);
- new Thread(pool).start();
- }
- @Override
- public void run() {
- while (!Thread.currentThread().isInterrupted()) {
- try {
- long current = System.currentTimeMillis();
- Set<Testask> toReExecute = new HashSet<Testask>();
- System.out.println("\t number is " + number);
- for (Worker wrapper : workers) {
- Testask tt = wrapper.firstTask;
- if (tt != null) {
- if (current - tt.getLastExecTime() > tt.getInterval() * 5 * 1000) {
- wrapper.interruptIfStarted();
- remove(tt);
- if (tt.getFiles() != null) {
- for (File file : tt.getFiles()) {
- file.delete();
- }
- }
- System.out.println("THread is timeout : " + tt + " " + new Date());
- toReExecute.add(tt);
- }
- }
- }
- if (toReExecute.size() > 0) {
- mainLock.lock();
- try {
- for (Testask tt : toReExecute) {
- execute(tt); // execute this task again
- }
- } finally {
- mainLock.unlock();
- }
- }
- } catch (Exception e1) {
- System.out.println("Error happens when we trying to interrupt and restart a code task ");
- }
- try {
- Thread.sleep(500);
- } catch (InterruptedException e) {
- }
- }
- }
- }
●Testask
- class Testask implements Runnable {
- …..
- @Override
- public void run() {
- while (!Thread.currentThread().isInterrupted()) {
- lastExecTime = System.currentTimeMillis();
- System.out.println(Thread.currentThread().getName() + " is running -> " + new Date());
- try {
- CustomThreadPoolExecutor.number++;
- Thread.sleep(getInterval() * 6 * 1000);
- System.out.println(Thread.currentThread().getName() + " after sleep");
- } catch (InterruptedException e) {
- Thread.currentThread().interrupt();
- System.out.println("InterruptedException happens");
- }
- }
- System.out.println("Going to die");
- }
- }
最终方案
综上,最稳妥的就是使用JDK自带的ThreadPoolExecutor, 如果需要对池里的task进行任意时间的控制,可以考虑全面更新,全方面,360度无死角的定制自己的线程池当然是***的方案,但是一定要注意对于共享对象的处理,适当的处理好并发访问共享对象的方法。
鉴于我们的场景,由于时间紧,而且需要了解的task并不多,暂时选用全部重新更新的策略。上线后,抽时间把自己定制的ThreadPoolExecutor搞定,然后更新上去!