Splitting a Python list into chunks is a common way of distributing the workload across multiple workers that can process them in parallel for faster results. Working with smaller pieces of data at a time may be the only way to fit a large dataset into computer memory. Sometimes, the very nature of the problem requires you to split the list into chunks.
In this tutorial, you’ll explore the range of options for splitting a Python list—or another iterable—into chunks. You’ll look at using Python’s standard modules and a few third-party libraries, as well as manually looping through the list and slicing it up with custom code. Along the way, you’ll learn how to handle edge cases and apply these techniques to multidimensional data by synthesizing chunks of an image in parallel.
In this tutorial, you’ll learn how to:
- Split a Python list into fixed-size chunks
- Split a Python list into a fixed number of chunks of roughly equal size
- Split finite lists as well as infinite data streams
- Perform the splitting in a greedy or lazy manner
- Produce lightweight slices without allocating memory for the chunks
- Split multidimensional data, such as an array of pixels
Throughout the tutorial, you’ll encounter a few technical terms, such as sequence, iterable, iterator, and generator. If these are new to you, then check out the linked resources before diving in. Additionally, familiarity with Python’s itertools module can be helpful in understanding some of the code snippets that you’ll find later.
To download the complete source code of the examples presented in this tutorial, click the link below:
Free Sample Code:Click here to download the free source code that you’ll use to split a Python list or iterable into chunks.
Split a Python List Into Fixed-Size Chunks
There are many real-world scenarios that involve splitting a long list of items into smaller pieces of equal size. The whole list may be too large to fit in your computer’s memory. Perhaps it’s more convenient or efficient to process the individual chunks separately rather than all at once. But there could be other reasons for splitting.
For example, when you search for something online, the results are usually presented to you in chunks, called pages, containing an equal number of items. This technique, known as content pagination, is common in web development because it helps improve the website’s performance by reducing the amount of data to transfer from the database at a time. It can also benefit the user by improving their browsing experience.
Most computer networks use packet switching to transfer data in packets or datagrams, which can be individually routed from the source to the destination address. This approach doesn’t require a dedicated physical connection between the two points, allowing the packets to bypass a damaged part of the network. The packets can be of variable length, but some low-level protocols require the data to be split into fixed-size packets.
Note: When splitting sequential data, you need to consider its size while keeping a few details in mind.
Specifically, if the total number of elements to split is an exact multiple of the desired chunk’s length, then you’ll end up with all the chunks having the same number of items. Otherwise, the last chunk will contain fewer items, and you may need extra padding to compensate for that.
Additionally, your data may have a known size up front when it’s loaded from a file in one go, or it can consist of an indefinite stream of bytes—while live streaming a teleconference, for example. Some solutions that you learn in this tutorial will only work when the number of elements is known before the splitting begins.
Most web frameworks, such as Django, will handle content pagination for you. Also, you don’t typically have to worry about some low-level network protocols. That being said, there are times when you’ll need to have more granular control and do the splitting yourself. In this section, you’ll take a look at how to split a list into smaller lists of equal size using different tools in Python.
Standard Library in Python 3.12: itertools.batched()
Using the standard library is almost always your best choice because it requires no external dependencies. The standard library provides concise, well-documented code that’s been tested by millions of users in production, making it less likely to contain bugs. Besides that, the standard library’s code is portable across different platforms and typically much more performant than a pure-Python equivalent, as most of it is implemented in C.
Unfortunately, the Python standard library hasn’t traditionally had built-in support for splitting iterable objects like Python lists. At the time of writing, Python 3.11 is the most recent version of the interpreter. But you can put yourself on the cutting edge by downloading a pre-release version of Python 3.12, which gives you access to the new itertools.batched(). Here’s an example demonstrating its use:
>>> fromitertoolsimportbatched>>> forbatchinbatched("ABCDEFGHIJ",4):... print(batch)...('A', 'B', 'C', 'D')('E', 'F', 'G', 'H')('I', 'J')The function accepts any iterable object, such as a string, as its first argument. The chunk size is its second argument. Regardless of the input data type, the function always yields chunks or batches of elements as Python tuples, which you may need to convert to something else if you prefer working with a different sequence type. For example, you might want to join the characters in the resulting tuples to form strings again.
Note: The underlying implementation of itertools.batched() could’ve changed since the publishing of this tutorial, which was written against an alpha release of Python 3.12. For example, the function may now yield lists instead of tuples, so be sure to check the official documentation for the most up-to-date information.
Also, notice that the last chunk will be shorter than its predecessors unless the iterable’s length is divisible by the desired chunk size. To ensure that all the chunks have an equal length at all times, you can pad the last chunk with empty values, such as None, when necessary:
>>> defbatched_with_padding(iterable,batch_size,fill_value=None):... forbatchinbatched(iterable,batch_size):... yieldbatch+(fill_value,)*(batch_size-len(batch))>>> forbatchinbatched_with_padding("ABCDEFGHIJ",4):... print(batch)...('A', 'B', 'C', 'D')('E', 'F', 'G', 'H')('I', 'J', None, None)This adapted version of itertools.batched() takes an optional argument named fill_value, which defaults to None. If a chunk’s length happens to be less than size, then the function appends additional elements to that chunk’s end using fill_value as padding.
You can supply either a finitesequence of values to the batched() function or an infiniteiterator yielding values without end:
>>> fromitertoolsimportcount>>> finite=batched([1,2,3,4,5,6,7,8,9,10],4)>>> infinite=batched(count(1),4)>>> finite<itertools.batched object at 0x7f4e0e2ee830>>>> infinite<itertools.batched object at 0x7f4b4e5fbf10>>>> list(finite)[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]>>> next(infinite)(1, 2, 3, 4)>>> next(infinite)(5, 6, 7, 8)>>> next(infinite)(9, 10, 11, 12)In both cases, the function returns an iterator that consumes the input iterable using lazy evaluation by accumulating just enough elements to fill the next chunk. The finite iterator will eventually reach the end of the sequence and stop yielding chunks. Conversely, the infinite one will continue to produce chunks as long as you keep requesting them—for instance, by calling the built-in next() function on it.
Read the full article at https://realpython.com/how-to-split-a-python-list-into-chunks/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]