Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22911

Chris Hager: Python Thread Pool

$
0
0

A thread pool is a group of pre-instantiated, idle threads which stand ready to be given work. These are often preferred over instantiating new threads for each task when there is a large number of (short) tasks to be done rather than a small number of long ones.

Suppose you want do download 1000s of documents from the internet, but only have resources for downloading 50 at a time. The solution is to utilize is a thread pool, spawning a fixed number of threads to download all the URLs from a queue, 50 at a time.

In order to use thread pools, Python 3.x includes the ThreadPoolExecutor class, and both Python 2.x and 3.x have multiprocessing.dummy.ThreadPool. multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

The downside of multiprocessing.dummy.ThreadPool is that in Python 2.x, it is not possible to exit the program with eg. a KeyboardInterrupt before all tasks from the queue have been finished by the threads.

In order to achieve an interruptable thread queue in Python 2.x and 3.x (for use in PDFx), I’ve build this code, inspired by stackoverflow.com/a/7257510. It implements a thread pool which works with Python 2.x and 3.x:

importsysIS_PY2=sys.version_info<(3,0)ifIS_PY2:fromQueueimportQueueelse:fromqueueimportQueuefromthreadingimportThreadclassWorker(Thread):""" Thread executing tasks from a given tasks queue """def__init__(self,tasks):Thread.__init__(self)self.tasks=tasksself.daemon=Trueself.start()defrun(self):whileTrue:func,args,kargs=self.tasks.get()try:func(*args,**kargs)exceptExceptionase:# An exception happened in this threadprint(e)finally:# Mark this task as done, whether an exception happened or notself.tasks.task_done()classThreadPool:""" Pool of threads consuming tasks from a queue """def__init__(self,num_threads):self.tasks=Queue(num_threads)for_inrange(num_threads):Worker(self.tasks)defadd_task(self,func,*args,**kargs):""" Add a task to the queue """self.tasks.put((func,args,kargs))defmap(self,func,args_list):""" Add a list of tasks to the queue """forargsinargs_list:self.add_task(func,args)defwait_completion(self):""" Wait for completion of all the tasks in the queue """self.tasks.join()if__name__=="__main__":fromrandomimportrandrangefromtimeimportsleep# Function to be executed in a threaddefwait_delay(d):print("sleeping for (%d)sec"%d)sleep(d)# Generate random delaysdelays=[randrange(3,7)foriinrange(50)]# Instantiate a thread pool with 5 worker threadspool=ThreadPool(5)# Add the jobs in bulk to the thread pool. Alternatively you could use# `pool.add_task` to add single jobs. The code will block here, which# makes it possible to cancel the thread pool with an exception when# the currently running batch of workers is finished.pool.map(wait_delay,delays)pool.wait_completion()

The queue size is similar to the number of threads (see self.tasks = Queue(num_threads)), therefore adding tasks with pool.map(..) and pool.add_task(..) blocks until a new slot in the Queue is available.

When you issue a KeyboardInterrupt by pressing Ctrl+C, the current batch of workers will finish and the program quits with the exception at the pool.map(..) step.


If you have suggestions or feedback, let me know via @metachris


Viewing all articles
Browse latest Browse all 22911

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>