Introduction
I have written several articles about using python and pandas to manipulate data and create useful Excel output. In my experience, no matter how strong the python tools are, there are times when you need to rely on more advanced Excel features to communicate your message or further analyze the data. This article will walk through some additional improvements you can make to your Excel-based output by:
- Adding Excel tables with XlsxWriter
- Inserting custom VBA into your Excel file
- Using COM for merging multiple Excel worksheets
Excel Tables
In a prior article, I discussed how pandas works very seamlessly with XlsxWriter
to format your data and present it in a more complex manner than in the
standard pandas
to_excel()
format.
For a recent project, I wanted to add some more formatting to a fairly simple table and discovered how useful this can be and how easy it is with XlsxWriter. I recommend reading the XlsxWriter documentation for more background and details on all of the options.
For this example, I’ll be using the samples sales data I have used in the past. This data is meant to show a simple data dump of sales to multiple customers over time. Let’s summarize the data to see how much each customer purchased and what their average purchase amount was:
importpandasaspdsales_df=pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true')sales_summary=sales_df.groupby(['name'])['ext price'].agg(['sum','mean'])# Reset the index for consistency when saving in Excelsales_summary.reset_index(inplace=True)writer=pd.ExcelWriter('sales_summary.xlsx',engine='xlsxwriter')sales_summary.to_excel(writer,'summary',index=False)writer.save()
The standard Excel output looks like this:
This is useful but not very impressive.
If we want to turn this into an actual Excel Table, we can do that pretty easily using the
add_table
function in XlsxWriter. I typically create a
format_excel
function to
keep the formatting in one place. Here is what the formatting function would look like:
defformat_excel(writer):""" Add Excel specific formatting to the workbook"""# Get the workbook and the summary sheet so we can add the formattingworkbook=writer.bookworksheet=writer.sheets['summary']# Add currency formatting and apply itmoney_fmt=workbook.add_format({'num_format':42,'align':'center'})worksheet.set_column('A:A',20)worksheet.set_column('B:C',15,money_fmt)worksheet.add_table('A1:C22',{'columns':[{'header':'account','total_string':'Total'},{'header':'Total Sales','total_function':'sum'},{'header':'Average Sales','total_function':'average'}],'autofilter':False,'total_row':True,'style':'Table Style Medium 20'})
Applying the function is straightforward:
sales_df=pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true')sales_summary=sales_df.groupby(['name'])['ext price'].agg(['sum','mean'])# Reset the index for consistency when saving in Excelsales_summary.reset_index(inplace=True)writer=pd.ExcelWriter('sales_summary.xlsx',engine='xlsxwriter')sales_summary.to_excel(writer,'summary',index=False)format_excel(writer)writer.save()
Here is what the new and improved output looks like:
Using tables in Excel is a really good way to add totals or other summary stats to your data. They are also a quick tool to format the output for better display. I encourage you to read through the XlsxWriter documentation to learn about all the options you have with table formatting. It is a very powerful option and easy to use with pandas.
For reference, the full script is on github.
Adding VBA to your Excel
I recently created an interactive Excel workbook via the tools I have talked about on this blog. I wanted to add a small snippet of VBA to the resulting file but was not sure exactly how to do this. Fortunately XlsxWriter saves us again with the ability to extract VBA from an existing file into a standalone binary file and insert insert in another file. The Working with VBA Macros documentation is pretty clear but here is a quick sample.
Use the
vba_extract.py
file (included with XlsxWriter) to strip out the VBA from an existing Excel file:
vba_extract.py source_file.xlsm
Extracted vbaProject.bin
Using similar code to the example above, here is how to add this file into your Excel output.
importpandasaspdsales_df=pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true')sales_summary=sales_df.groupby(['name'])['ext price'].agg(['sum','mean'])# Reset the index for consistency when saving in Excelsales_summary.reset_index(inplace=True)writer=pd.ExcelWriter('sales_summary.xlsx',engine='xlsxwriter')sales_summary.to_excel(writer,'summary',index=False)workbook=writer.bookworkbook.add_vba_project('vbaProject.bin')writer.save()
Astute readers will notice that the output is saved as a .XLSX file but Excel will need the file to have an .XLSM extension in order for it to execute the VBA code.
Unfortunately if you try to save it as an XLSM like this:
writer=pd.ExcelWriter('sales_summary.xlsm',engine='xlsxwriter')
You get this error:
ValueError:Invalidextensionforengine'xlsxwriter':'xlsm'
One solution is to rename the file using
os.rename
but another (simpler)
option is to assign the desired name to the filename attribute:
writer=pd.ExcelWriter('sales_summary.xlsx',engine='xlsxwriter')sales_summary.to_excel(writer,'summary',index=False)workbook=writer.bookworkbook.filename='sales_summary.xlsm'workbook.add_vba_project('vbaProject.bin')writer.save()
This approach feels a little hackish but is the simplest fix for this issue. Even with this minor inconvenience, this is a really powerful feature that will allow you to make really robust Excel-based solutions with your python scripts.
Copying Excel Worksheets using COM
XlsxWriter allows you to create an Excel file from scratch but it does not support copying data from an existing workbook and including it in a new file. The best way to do this is to use win32com to automate that portion of Excel. The downside to this approach is that you can only use win32com on a Windows OS but if you find yourself in the situation where you want to merge two files together, at least there are options.
One of the main reasons I would use this approach is for when I have a sheet with a lot of complex formatting or structure that is easy to change in Excel but difficult to program with XlsxWriter. You can choose to created your “template” file and merge it it with the custom work you may have done in python.
This example is based on this Stack Overflow response. The purpose of the code is to copy in a standard “Instructions” sheet into the sales_summary file we created using pandas.
fromwin32com.clientimportDispatchExexcel=DispatchEx('Excel.Application')excel.Visible=Falseworkbook_1=excel.Workbooks.Open(r'C:\full\path\to\sales_summary.xlsx')workbook_2=excel.Workbooks.Open(r'C:\full\path\to\sales_template.xlsx')workbook_2.Worksheets("Instructions").Move(Before=workbook_1.Worksheets("summary"))workbook_1.SaveAs(r'C:\full\path\to\sales_summary_complete.xlsx')excel.Application.Quit()delexcel
There are a couple of things to keep in mind with this code:
- You need to have pywin32 installed - I recommend using anaconda for your python distribution
- You need to use the full path to the Excel files
- When you save the new file, Excel may pop up a dialog box asking you to verify that it can overwrite the existing file. You should handle that appropriately in your script
I personally find that working with win32com is finicky so I try to minimize it but it is a handy tool to have in your coding arsenal.
Summary
Like any tool, Excel can be abused and can result in some unmaintainable worksheets “from hell.” However, just because Excel can be a problem, you should recognize when it is the right solution for your business situation. Excel will continue to have a dominant place in the business software ecosystem. This article should help you further improve the quality of the Excel-based solutions you develop with python and pandas.