Introduction
Using python and pandas in the business world can be a very useful alternative to the pain of manipulating Excel files. While this combination of technologies is powerful, it can be challenging to convince others to use a python script - especially when many may be intimidated by using the command line. In this article I will show an example of how to easily create an end-user-friendly GUI using the Gooey library. This interface is based on wxWindows so it looks like a “native” application on Windows, Mac and Linux. Ultimately, I believe that presenting a simple user interface to your scripts can greatly increase the adoption of python in your place of business.
The Problem
I will be basing the example in this article on my prior post - Combining Data From Multiple Excel Files. The basic concept is that there is a periodic need to combine data from multiple excel files into a “master file” and perform some additional manipulations.
Unfortunately this process is error prone and time consuming when someone tries to do a lot of manual copying and pasting in Excel. However, it is relatively straightforward to create python + pandas scripts to perform the same manipulations in a more repeatable and robust format. However, as soon as you ask the user to type something like the line below, you will lose them:
python pandas_gui.py c:\Users\cmoffitt\Documents\src c:\users\cmoffitt\Documents\dest c:\users\cmoffitt\Documents\customer-status.xlsx -d 9/6/2015
Instead, you could present them with a simple GUI that looks something like this:
The nice thing about this example is that you have standard windows directory and file chooser dialogs along with a standard date picker widget. It will be a much smoother transition for your users to use this UI than to try to remember how to use the command line with all the various arguments shown above.
The rest of this article will explain how to create this UI with very minor
changes to the base code you would build using
argparse
. If you are
not familiar with
argparse
then this article might be helpful to reference
before you go much further. As shown in the article,
argparse
(and friends)
are very sophisticated libraries but I have found that you can create very
meaningful and useful tools with the very basic options I’ll show in this post.
Install
Gooey requires wxPython for its interface. wxPython can be a little tricky to install but if you are using the Anaconda or Miniconda distribution the install is as simple as:
conda install wxpython
I highly recommend using conda for the install process - especially on Windows.
Gooey can then be installed using pip:
pip install gooey
Building The Script
This notebook shows the basic idea for this program. What I will do next is
build a simple version of this using
argparse
to pass in the source
and destination directories as well as a location for the customer-status.xlsx file.
I am going to create a
parse_args
function to set up the following required inputs:
- data_directory
- output_directory
- Customer account status file
I will add an example of an optional date argument as well but for the purposes of this example, I do not actually use the value. As they say, that is an exercise left to the reader.
The simplest example of argparse would look something like this:
fromargparseimportArgumentParserparser=ArgumentParser(description='Create Quarterly Marketing Report')parser.add_argument('data_directory',action='store',help="Source directory that contains Excel files")parser.add_argument('output_directory',action='store',help="Output directory to save summary report")parser.add_argument('cust_file',action='store',help='Customer Account Status File')parser.add_argument('-d',help='Start date to include')args=parser.parse_args()
When you are ready to access your arguments, you can get them like this:
source=args.data_directorydest=args.output_directory
One other unique aspect to this code is that I added a simple json dump of the arguments and restore them as the default next time the script is executed. I originally did this to streamline the testing process but realize that this would be helpful in the real world too. Here is the full code sample with the default values filled in based on the previous execution of the script.
fromargparseimportArgumentParserdefparse_args():""" Use ArgParser to build up the arguments we will use in our script Save the arguments in a default json file so that we can retrieve them every time we run the script."""stored_args={}# get the script name without the extension & use it to build up# the json filenamescript_name=os.path.splitext(os.path.basename(__file__))[0]args_file="{}-args.json".format(script_name)# Read in the prior arguments as a dictionaryifos.path.isfile(args_file):withopen(args_file)asdata_file:stored_args=json.load(data_file)parser=ArgumentParser(description='Create Quarterly Marketing Report')parser.add_argument('data_directory',action='store',default=stored_args.get('data_directory'),help="Source directory that contains Excel files")parser.add_argument('output_directory',action='store',default=stored_args.get('output_directory'),help="Output directory to save summary report")parser.add_argument('cust_file',action='store',default=stored_args.get('cust_file'),help='Customer Account Status File')parser.add_argument('-d',help='Start date to include',default=stored_args.get('d'))args=parser.parse_args()# Store the values of the arguments so we have them next time we runwithopen(args_file,'w')asdata_file:# Using vars(args) returns the data as a dictionaryjson.dump(vars(args),data_file)returnargs
This code allows us to do a basic command line interface that looks like this:
python pandas_gui_args.py --help
usage: pandas_gui_args.py [-h][-d D]
data_directory output_directory cust_file
Create Quarterly Marketing Report
positional arguments:
data_directory Source directory that contains Excel files
output_directory Output directory to save summary report
cust_file Customer Account Status File
optional arguments:
-h, --help show this help message and exit
-d D Start date to include
The main section of the code would look like the section below. The basic flow is:
- Get the command line inputs
- Pass the appropriate ones to the input and processing functions
- Save the data to the desired location
if__name__=='__main__':
conf= parse_args()
print("Reading sales files")sales_df= combine_files(conf.data_directory)
print("Reading customer data and combining with sales")customer_status_sales= add_customer_status(sales_df, conf.cust_file)
print("Saving sales and customer summary data")
save_results(customer_status_sales, conf.output_directory)
print("Done")
Add a Gooey GUI
The command line solution shown above is very familiar to many but I imagine there are people in your organization that would instantly turn away from a solution that looks something like what I have shown above. However, Gooey makes it as easy as two lines of code to make a UI for this script.
The most basic steps are to import Gooey and add the decorator in front of the function that processes your arguments. Here is what it would look for our example:
fromgooeyimportGooey@Gooey(program_name="Create Quarterly Marketing Report")defparse_args():""" Rest of program below"""
When you run this, you would see a simple UI like this:
I think we all agree that this is fairly intuitive and would be something you could easily explain to your most non-technical users. The other nice thing is that it runs the same on Windows, Mac or Linux (as illustrated above).
The one challenge would be that users would probably expect to have some nice
widgets to allow them to select directories and dates. If you would like to do that
then you can substitute the
GooeyParser
for your
ArgParser
and
add the widget information to the parser code.
Change
parser=ArgParser(description='Create Quarterly Marketing Report')
to
parser=GooeyParser(description='Create Quarterly Marketing Report')
And add your
widget
:
parser.add_argument('data_directory',action='store',default=stored_args.get('data_directory'),widget='DirChooser',help="Source directory that contains Excel files")
Here’s what it looks like to use the
DirChooser1
,
FileChooser
and
DateChooser
widgets:
fromgooeyimportGooey,GooeyParser@Gooey(program_name="Create Quarterly Marketing Report")defparse_args():""" Use GooeyParser to build up the arguments we will use in our script Save the arguments in a default json file so that we can retrieve them every time we run the script."""stored_args={}# get the script name without the extension & use it to build up# the json filenamescript_name=os.path.splitext(os.path.basename(__file__))[0]args_file="{}-args.json".format(script_name)# Read in the prior arguments as a dictionaryifos.path.isfile(args_file):withopen(args_file)asdata_file:stored_args=json.load(data_file)parser=GooeyParser(description='Create Quarterly Marketing Report')parser.add_argument('data_directory',action='store',default=stored_args.get('data_directory'),widget='DirChooser',help="Source directory that contains Excel files")parser.add_argument('output_directory',action='store',widget='DirChooser',default=stored_args.get('output_directory'),help="Output directory to save summary report")parser.add_argument('cust_file',action='store',default=stored_args.get('cust_file'),widget='FileChooser',help='Customer Account Status File')parser.add_argument('-d',help='Start date to include',default=stored_args.get('d'),widget='DateChooser')args=parser.parse_args()
Now you have some native widgets the look very customary for the host OS:
The other nice feature is that when you execute the program you have a simple wrapper around the display and reasonable error windows if there is an underlying error in your program.
One other handy component is that there is a “Restart” button at the bottom of the screen. If you select that button, you can go back to your input screen and adjust any variables and re-execute the program. This is really nice if you need to runt the program multiple times with different inputs.
Part of what I really like about this solution is that there is very little additional overhead in your code. A traditional GUI (tkinter, QT, wxWindows etc) would require a lot of code to show this UI. This example shows how unobtrusive the solution can be.
The Final Program
I have not gone into the example of the actual pandas code but you can see in the full sample that it is relatively straightforward to use the arguments as inputs into your various python function calls. If you would like to see the code on github, here it is
"""Simple Pandas program to combine Excel files and summarize data.This demonstrates the use of Gooey to add a simple UI on top of the script"""from__future__importprint_functionimportpandasaspdimportnumpyasnpimportglobimportosimportjsonfromargparseimportArgumentParserfromgooeyimportGooey@Gooey(program_name="Create Quarterly Marketing Report")defparse_args():""" Use GooeyParser to build up the arguments we will use in our script Save the arguments in a default json file so that we can retrieve them every time we run the script."""stored_args={}# get the script name without the extension & use it to build up# the json filenamescript_name=os.path.splitext(os.path.basename(__file__))[0]args_file="{}-args.json".format(script_name)# Read in the prior arguments as a dictionaryifos.path.isfile(args_file):withopen(args_file)asdata_file:stored_args=json.load(data_file)parser=ArgumentParser(description='Create Quarterly Marketing Report')parser.add_argument('data_directory',action='store',default=stored_args.get('data_directory'),help="Source directory that contains Excel files")parser.add_argument('output_directory',action='store',default=stored_args.get('output_directory'),help="Output directory to save summary report")parser.add_argument('cust_file',action='store',default=stored_args.get('cust_file'),help='Customer Account Status File')parser.add_argument('-d',help='Start date to include',default=stored_args.get('d'))args=parser.parse_args()# Store the values of the arguments so we have them next time we runwithopen(args_file,'w')asdata_file:# Using vars(args) returns the data as a dictionaryjson.dump(vars(args),data_file)returnargsdefcombine_files(src_directory):""" Read in all of the sales xlsx files and combine into 1 combined DataFrame"""all_data=pd.DataFrame()forfinglob.glob(os.path.join(src_directory,"sales-*.xlsx")):df=pd.read_excel(f)all_data=all_data.append(df,ignore_index=True)all_data['date']=pd.to_datetime(all_data['date'])returnall_datadefadd_customer_status(sales_data,customer_file):""" Read in the customer file and combine with the sales data Return the customer with their status as an ordered category"""df=pd.read_excel(customer_file)all_data=pd.merge(sales_data,df,how='left')# Default everyone to bronze if no data includedall_data['status'].fillna('bronze',inplace=True)# Convert the status to a category and order itall_data["status"]=all_data["status"].astype("category")all_data["status"].cat.set_categories(["gold","silver","bronze"],inplace=True)returnall_datadefsave_results(sales_data,output):""" Perform a summary of the data and save the data as an excel file"""summarized_sales=sales_data.groupby(["status"])["unit price"].agg([np.mean])output_file=os.path.join(output,"sales-report.xlsx")writer=pd.ExcelWriter(output_file,engine='xlsxwriter')summarized_sales=summarized_sales.reset_index()summarized_sales.to_excel(writer)if__name__=='__main__':conf=parse_args()print("Reading sales files")sales_df=combine_files(conf.data_directory)print("Reading customer data and combining with sales")customer_status_sales=add_customer_status(sales_df,conf.cust_file)print("Saving sales and customer summary data")save_results(customer_status_sales,conf.output_directory)print("Done")
I think you will agree that this can be a really useful solution for distributing small standalone programs to users that are not comfortable running python from the command line. I have built this example around pandas but it would work for pretty much any python script using argparse. Obviously if you need a more robust solution then you’ll need to evaluate other options but I would argue that there is a lot of power in this potential solution.