Abu Ashraf Masnun: Python: Using the `requests` module to download large files efficiently

If you use Python regularly, you might have come across the wonderful requests library. I use it almost everyday to read urls or make POST requests. In this post, we shall see how we can download a large file using the requests module with low memory consumption.

To Stream or Not to Stream

When downloading large files/data, we probably would prefer the streaming mode while making the get call. If we use the stream parameter and set it to True, the download will not immediately start. The file download will start when we try to access the content property or try to iterate over the content using iter_content / iter_lines.

If we set stream to False, all the content is downloaded immediately and put into memory. If the file size is large, this can soon cause issues with higher memory consumption. On the other hand – if we set stream to False, the content is not downloaded, but the headers are downloaded and the connection is kept open. We can now choose to proceed downloading the file or simply cancel it.

But we must also remember that if we decide to stream the file, the connection will remain open and can not go back to the connection pool. If we’re working with many large files, these might lead to some efficiency. So we should carefully choose where we should stream. And we should take proper care to close the connections and dispose any unused resources in such scenarios.

Iterating The Content

By setting the stream parameter, we have delayed the download and avoided taking up large chunks of memory. The headers have been downloaded but the body of the file still awaits retrieval. We can now get the data by accessing the content property or choosing to iterate over the content. Accessing the content directly would read the entire response data to memory at once. That is a scenario we want to avoid when our target file is quite large.

So we are left with the choice to iterate over the content. We can use iter_content where the content would be read chunk by chunk. Or we can use iter_lines where the content would be read line by line. Either way, the entire file will not be loaded into memory and keep the memory usage down.

Code Example

response = requests.get(url, stream=True)
handle = open(target_path, "wb")
for chunk in response.iter_content(chunk_size=512):
    if chunk:  # filter out keep-alive new chunks
        handle.write(chunk)

The code should be self explanatory. We are opening the url with stream set to True. And then we are opening a file handle to the target_path (where we want to save our file). Then we iterate over the content, chunk by chunk and write the data to the file.

That’s it!

Abu Ashraf Masnun: Python: Using the `requests` module to download large files efficiently

To Stream or Not to Stream

Iterating The Content

Code Example

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Prison officer charged!

Moondru Mudichu 20-07-2016 – Polimer tv Serial

James Carpinello

Anthony Wahome Biography, Family, Wife and Children

Who’s been sentenced at Northampton Magistrates’ Court

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

Jamani mm nauliza hivi second selection za form five zinatoka lini?

Wiz Khalifa – Kush + Orange Juice 2 [iTunes Plus M4A]

(Notes & Audio) The 26 Promises of Allah to the Ummah

Angry father ordered to compensate daughter’s male friend

FLASHBACK WITH SIRASA FM AT GALGAMUWA 2022

Jessica Carradero Lopez Arrested by Miami-Dade County Corrections on Dec 17,...

Download: Rich Bizzy -Panono Ukwenda (Cover)

Sniper: Ghost Warrior 3: Трейнер/Trainer (+17) [1.0 - 1.02] {FLiNG}

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

GERVASE JOHN

Gordian S01e01-73 [H264 - Ita Jap Ac3 - SoftSub Ita]

Ndebele names

Hyper-V replication "Enabling Replication Failed"