How much memory are your Pandas DataFrame or Series using? Pandas provides an API for measuring this information, but a variety of implementation details means the results can be confusing or misleading.
Consider the following example:
>>>importpandasaspd>>>series=pd.Series(["abcdefhjiklmnopqrstuvwxyz"*10...foriinrange(1_000_000)])>>>series.memory_usage()8000128>>>series.memory_usage(deep=True)307000128Which is correct, is memory usage 8MB or 300MB? Neither!
In this special case, it’s actually 67MB, at least with the default Python interpreter. This is partially because I cheated, and often 300MB will actually be closer to the truth.
What’s going on? Let’s find out!
Read more...