Mike Driscoll: wxPython – Creating a PDF Merger / Splitter Utility

September 10, 2019, 1:48 pm

≫ Next: Artem Rys: Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1

≪ Previous: PyCoder’s Weekly: Issue #385 (Sept. 10, 2019)

The Portable Document Format (PDF) is a well-known format popularized by Adobe. It purports to create a document that should render the same across platforms.

Python has several libraries that you can use to work with PDFs:

ReportLab – Creating PDFs
PyPDF2 – Manipulating preexisting PDFs
pdfrw – Also for manipulating preexisting PDFs, but also works with ReportLab
PDFMiner – Extracts text from PDFs

There are several more Python PDF-related packages, but those four are probably the most well known. One common task of working with PDFs is the need for merging or concatenating multiple PDFs into one PDF. Another common task is taking a PDF and splitting out one or more of its pages into a new PDF.

You will be creating a graphical user interface that does both of these tasks using PyPDF2.

This tutorial is from my book, Creating GUI Applications with wxPython. You can get it here:

Creating GUI Applications with wxPython

Purchase now on Leanpub or Amazon

Installing PyPDF2

The PyPDF2 package can be installed using pip:

pip install pypdf2

This package is pretty small, so the installation should be quite quick.

Now that PyPDF2 is installed, you can design your UI!

Designing the Interface

This application is basically two programs contained in one window. You need a way of displaying a merging application and a splitting application. Having an easy way to switch between the two would be nice. You can design your own panel swapping code or you can use one of wxPython’s many notebook widgets.

To keep things simpler, let’s use a wx.Notebook for this application.

Here is a mockup of the merging tab:

The PDF Merger Mockup

You will be loading up PDF files into a list control type widget. You also want a way to re-order the PDFs. And you need a way to remove items from the list. This mockup shows all the pieces you need to accomplish those goals.

Next is a mockup of the splitting tab:

The PDF Splitter Mockup

Basically what you want is a tool that shows what the input PDF is and what page(s) are to be split off. The user interface for this is pretty plain, but it should work for your needs.

Now let’s create this application!

Creating the Application

Let’s put some thought into your code’s organization. Each tab should probably be in its own module. You should also have a main entry point to run your application. That means you can reasonably have at least three Python files.

Here is what you will be creating:

The main module
The merge panel module
The split panel module

Let’s start with the main module!

The Main Module

As the main entry point of your application, the main module has a lot of responsibility. It will hold your other panels and could be a hub between the panels should they need to communicate. Most of the time, you would use pubsub for that though.

Let’s go ahead and write your first version of the code:

# main.py 
import wx
 
from merge_panel import MergePanel
from split_panel import SplitPanel

The imports for the main module are nice and short. All you need is wx, the MergePanel and the SplitPanel. The latter two are ones that you will write soon.

Let’s go ahead and write the MainPanel code though:

class MainPanel(wx.Panel):
 
    def__init__(self, parent):
        super().__init__(parent) 
        main_sizer = wx.BoxSizer(wx.VERTICAL)
        notebook = wx.Notebook(self)
        merge_tab = MergePanel(notebook)
        notebook.AddPage(merge_tab, 'Merge PDFs')
        split_tab = SplitPanel(notebook)
        notebook.AddPage(split_tab, 'Split PDFs')
        main_sizer.Add(notebook, 1, wx.ALL | wx.EXPAND, 5)self.SetSizer(main_sizer)

The MainPanel is where all the action is. Here you instantiate a wx.Notebook and add the MergePanel and the SplitPanel to it. Then you add the notebook to the sizer and you’re done!

Here’s the frame code that you will need to add:

class MainFrame(wx.Frame):
 
    def__init__(self):
        super().__init__(None, title='PDF Merger / Splitter',
                         size=(800, 600))self.panel = MainPanel(self)self.Show() 
if __name__ == '__main__':
    app = wx.App(False)
    frame = MainFrame()
    app.MainLoop()

As usual, you construct your frame, add a panel and show it to the user. You also set the size of the frame. You might want to experiment with the initial size as it may be too big or too small for your setup.

Now let’s move on and learn how to merge PDFs!

The merge_panel Module

The merge_panel module contains all the code you need for creating a user interface around merging PDF files. The user interface for merging is a bit more involved than it is for splitting.

Let’s get started!

# merge_panel.py 
importosimportglobimport wx
 
from ObjectListView import ObjectListView, ColumnDefn
from PyPDF2 import PdfFileReader, PdfFileWriter
 
wildcard = "PDFs (*.pdf)|*.pdf"

Here you need to import Python’s os module for some path-related activities and the glob module for searching duty. You will also need ObjectListView for displaying PDF information and PyPDF2 for merging the PDFs together.

The last item here is the wildcard which is used when adding files to be merged as well as when you save the merged file.

To make the UI more friendly, you should add drag-and-drop support:

class DropTarget(wx.FileDropTarget):
 
    def__init__(self, window):
        super().__init__()self.window = window
 
    def OnDropFiles(self, x, y, filenames):
        self.window.update_on_drop(filenames)returnTrue

You may recognize this code from the Archiver chapter. In fact, it’s pretty much unchanged. You still need to subclass wx.FileDropTarget and pass it the widget that you want to add drag-and-drop support to. You also need to override OnDropFile() to have it call a method using the widget you passed in. For this example, you are passing in the panel object itself.

You will also need to create a class for holding information about the PDFs. This class will be used by your ObjectListView widget.

Here it is:

class Pdf:
 
    def__init__(self, pdf_path):
        self.full_path = pdf_path
        self.filename = os.path.basename(pdf_path)try:
            with open(pdf_path, 'rb') as f:
                pdf = PdfFileReader(f)
                number_of_pages = pdf.getNumPages()except:
            number_of_pages = 0self.number_of_pages = str(number_of_pages)

The __init__() is nice and short this time around. You set up a list of pdfs for holding the PDF objects to be merged. You also instantiate and add the DropTarget to the panel. Then you create the main_sizer and call create_ui(), which will add all the widgets you need.

Speaking of which, let’s add create_ui() next:

def create_ui(self):
    btn_sizer = wx.BoxSizer()
    add_btn = wx.Button(self, label='Add')
    add_btn.Bind(wx.EVT_BUTTON, self.on_add_file)
    btn_sizer.Add(add_btn, 0, wx.ALL, 5)
    remove_btn = wx.Button(self, label='Remove')
    remove_btn.Bind(wx.EVT_BUTTON, self.on_remove)
    btn_sizer.Add(remove_btn, 0, wx.ALL, 5)self.main_sizer.Add(btn_sizer)

The create_ui() method is a bit long. The code will be broken up to make it easier to digest. The code above will add two buttons:

An Add file button
A Remove file button

These buttons go inside of a horizontally-oriented sizer along the top of the merge panel. You also bind each of these buttons to their own event handlers.

Now let’s add the widget for displaying PDFs to be merged:

move_btn_sizer = wx.BoxSizer(wx.VERTICAL)
    row_sizer = wx.BoxSizer() 
    self.pdf_olv = ObjectListView(self, style=wx.LC_REPORT | wx.SUNKEN_BORDER)self.pdf_olv.SetEmptyListMsg("No PDFs Loaded")self.update_pdfs()
    row_sizer.Add(self.pdf_olv, 1, wx.ALL | wx.EXPAND)

Here you add the ObjectListView widget to the row_sizer and call update_pdfs() to update it so that it has column labels.

You need to add support for reordering the PDFs in the ObjectListView widget, so let’s add that next:

move_up_btn = wx.Button(self, label='Up')
    move_up_btn.Bind(wx.EVT_BUTTON, self.on_move)
    move_btn_sizer.Add(move_up_btn, 0, wx.ALL, 5)
    move_down_btn = wx.Button(self, label='Down')
    move_down_btn.Bind(wx.EVT_BUTTON, self.on_move)
    move_btn_sizer.Add(move_down_btn, 0, wx.ALL, 5)
    row_sizer.Add(move_btn_sizer)self.main_sizer.Add(row_sizer, 1, wx.ALL | wx.EXPAND, 5)

Here you add two more buttons. One for moving items up and one for moving items down. These two buttons are added to a vertically-oriented sizer, move_btn_sizer, which in turn is added to the row_sizer. Finally the row_sizer is added to the main_sizer.

Here’s the last few lines of the create_ui() method:

merge_pdfs = wx.Button(self, label='Merge PDFs')
    merge_pdfs.Bind(wx.EVT_BUTTON, self.on_merge)self.main_sizer.Add(merge_pdfs, 0, wx.ALL | wx.CENTER, 5) 
    self.SetSizer(self.main_sizer)

These last four lines add the merge button and get it hooked up to an event handler. It also sets the panel’s sizer to the main_sizer.

Now let’s create add_pdf():

def add_pdf(self, path):
    self.pdfs.append(Pdf(path))

You will be calling this method with a path to a PDF that you wish to merge with another PDF. This method will create an instance of the Pdf class and append it to the pdfs list.

Now you’re ready to create load_pdfs():

def load_pdfs(self, path):
    pdf_paths = glob.glob(path + '/*.pdf')for path in pdf_paths:
        self.add_pdf(path)self.update_pdfs()

This method takes in a folder rather than a file. It then uses glob to find all the PDFs in that folder. You will loop over the list of files that glob returns and use add_pdf() to add them to the pdfs list. Then you call update_pdfs() which will update the UI with the newly added PDF files.

Let’s find out what happens when you press the merge button:

def on_merge(self, event):
    """
    TODO - Move this into a thread
    """
    objects = self.pdf_olv.GetObjects()iflen(objects)<2:
        with wx.MessageDialog(None,
            message='You need 2 or more files to merge!',
            caption='Error',
            style= wx.ICON_INFORMATION) as dlg:
            dlg.ShowModal()return
    with wx.FileDialog(self, message="Choose a file",
        defaultDir='~',
        defaultFile="",
        wildcard=wildcard,
        style=wx.FD_SAVE | wx.FD_CHANGE_DIR) as dlg:
        if dlg.ShowModal() == wx.ID_OK:
            path = dlg.GetPath()if path:
        _, ext = os.path.splitext(path)if'.pdf'notin ext.lower():
            path = f'{path}.pdf'self.merge(path)

The on_merge() method is the event handler that is called by your merge button. The docstring contains a TODO message to remind you to move the merging code to a thread. Technically the code you will be moving is actually in the merge() function, but as long as you have some kind of reminder, it doesn’t matter all that much.

Anyway, you use GetObjects() to get all the PDFs in the ObjectListView widget. Then you check to make sure that there are at least two PDF files. If not, you will let the user know that they need to add more PDFs! Otherwise you will open up a wx.FileDialog and have the user choose the name and location for the merged PDF.

Finally you check if the user added the .pdf extension and add it if they did not. Then you call merge().

The merge() method is conveniently the next method you should create:

def merge(self, output_path):
    pdf_writer = PdfFileWriter() 
    objects = self.pdf_olv.GetObjects() 
    for obj in objects:
        pdf_reader = PdfFileReader(obj.full_path)for page inrange(pdf_reader.getNumPages()):
            pdf_writer.addPage(pdf_reader.getPage(page)) 
    with open(output_path, 'wb') as fh:
        pdf_writer.write(fh) 
    with wx.MessageDialog(None, message='Save completed!',
                          caption='Save Finished',
                         style= wx.ICON_INFORMATION) as dlg:
        dlg.ShowModal()

Here you create a PdfFileWriter() object for writing out the merged PDF. Then you get the list of objects from the ObjectListView widget rather than the pdfs list. This is because you can reorder the UI so the list may not be in the correct order. The next step is to loop over each of the objects and get its full path out. You will open the path using PdfFileReader and loop over all of its pages, adding each page to the pdf_writer.

Once all the PDFs and all their respective pages are added to the pdf_writer, you can write out the merged PDF to disk. Then you open up a wx.MessageDialog that lets the user know that the PDFs have merged.

While this is happening, you may notice that your UI is frozen. That is because it can take a while to read all those pages into memory and then write them out. This is the reason why this part of your code should be done in a thread. You will be learning about that refactor later on in this chapter.

Now let’s create on_add_file():

def on_add_file(self, event):
    paths = None
    with wx.FileDialog(self, message="Choose a file",
        defaultDir='~',
        defaultFile="",
        wildcard=wildcard,
        style=wx.FD_OPEN | wx.FD_MULTIPLE) as dlg:
        if dlg.ShowModal() == wx.ID_OK:
            paths = dlg.GetPaths()if paths:
        for path in paths:
            self.add_pdf(path)self.update_pdfs()

This code will open up a wx.FileDialog and let the user choose one or more files. Then it returns them as a list of paths. You can then loop over those paths and use add_path() to add them to the pdfs list.

Now let’s find out how to reorder the items in the ObjectListView widget:

def on_move(self, event):
    btn = event.GetEventObject()
    label = btn.GetLabel()
    current_selection = self.pdf_olv.GetSelectedObject()
    data = self.pdf_olv.GetObjects()if current_selection:
        index = data.index(current_selection)
        new_index = self.get_new_index(
            label.lower(), index, data)
        data.insert(new_index, data.pop(index))self.pdfs = data
        self.update_pdfs()self.pdf_olv.Select(new_index)

Both the up and down buttons are bound to the on_move() event handler. You can get access to which button called this handler via event.GetEventObject(), which will return the button object. Then you can get the button’s label. Next you need to get the current_selection and a list of the objects, which is assigned to data. Now you can use the index attribute of the list object to find the index of the current_selection.

Once you have that information, you pass the button label, the index and the data list to get_new_index() to calculate which direction the item should go. Once you have the new_index, you can insert it and remove the old index using the pop() method. Then reset the pdfs list to the data list so they match. The last two steps are to update the widget and re-select the item that you moved.

Let’s take a look at how to get that new index now:

def get_new_index(self, direction, index, data):
    if direction == 'up':
        if index >0:
            new_index = index - 1else:
            new_index = len(data)-1else:
        if index <len(data) - 1:
            new_index = index + 1else:
            new_index = 0return new_index

Here you use the button label, direction, to determine which way to move the item. If it’s “up”, then you check if the index is greater than zero and subtract one. If it is zero, then you take the entire length of the list and subtract one, which should move the item back to the other end of the list.

If you user hit the “down” button, then you check to see if the index is less than the length of the data minus one. In that case, you add one to it. Otherwise you set the new_index to zero.

The code is a bit confusing to look at, so feel free to add some print functions in there and then run the code to see how it works.

The next new thing to learn is how to remove an item:

def on_remove(self, event):
    current_selection = self.pdf_olv.GetSelectedObject()if current_selection:
        index = self.pdfs.index(current_selection)self.pdfs.pop(index)self.pdf_olv.RemoveObject(current_selection)

This method will get the current_selection, pop() it from the pdfs list and then use the RemoveObject() method to remove it from the ObjectListView widget.

Now let’s take a look at the code that is called when you drag-and-drop items onto your application:

def update_on_drop(self, paths):
    for path in paths:
        _, ext = os.path.splitext(path)ifos.path.isdir(path):
            self.load_pdfs(path)elifos.path.isfile(path)and ext.lower() == '.pdf':
            self.add_pdf(path)self.update_pdfs()

In this case, you loop over the paths and check to see if the path is a directory or a file. They could also be a link, but you will ignore those. If the path is a directory, then you call load_pdfs() with it. Otherwise you check to see if the file has an extension of .pdf and if it does, you call add_pdf() with it.

The last method to create is update_pdfs():

def update_pdfs(self):
    self.pdf_olv.SetColumns([
        ColumnDefn("PDF Name", "left", 200, "filename"),
        ColumnDefn("Full Path", "left", 250, "full_path"),
        ColumnDefn("Page Count", "left", 100, "number_of_pages")])self.pdf_olv.SetObjects(self.pdfs)

This method adds or resets the column names and widths. It also adds the PDF list via SetObjects().

Here is what the merge panel looks like:

The PDF Merger Tab

Now you are ready to create the split_panel!

The split_panel Module

The split_panel module is a bit simpler than the merge_panel was. You really only need a couple of text controls, some labels and a button.

Let’s see how all of that ends up laying out:

# split_panel.py 
importosimportstringimport wx
 
from PyPDF2 import PdfFileReader, PdfFileWriter
 
wildcard = "PDFs (*.pdf)|*.pdf"

Here you import Python’s os and string modules. You will also be needing PyPDF2 again and the wildcard variable will be useful for opening and saving PDFs.

You will also need the CharValidator class from the calculator chapter.

It is reproduced for you again here:

class CharValidator(wx.Validator):
    '''
    Validates data as it is entered into the text controls.
    ''' 
    def__init__(self, flag):
        wx.Validator.__init__(self)self.flag = flag
        self.Bind(wx.EVT_CHAR, self.OnChar) 
    def Clone(self):
        '''Required Validator method'''return CharValidator(self.flag) 
    def Validate(self, win):
        returnTrue 
    def TransferToWindow(self):
        returnTrue 
    def TransferFromWindow(self):
        returnTrue 
    def OnChar(self, event):
        keycode = int(event.GetKeyCode())if keycode <256:
            key = chr(keycode)ifself.flag == 'no-alpha'and key instring.ascii_letters:
                returnifself.flag == 'no-digit'and key instring.digits:
                return
        event.Skip()

The CharValidator class is useful for validating that the user is not entering any letters into a text control. You will be using it for splitting options, which will allow the user to choose which pages they want to split out of the input PDF.

But before we get to that, let’s create the SplitPanel:

class SplitPanel(wx.Panel):
 
    def__init__(self, parent):
        super().__init__(parent)
        font = wx.Font(12, wx.SWISS, wx.NORMAL, wx.NORMAL)
        main_sizer = wx.BoxSizer(wx.VERTICAL)

The first few lines of the __init__() create a wx.Font instance and the main_sizer.

Here’s the next few lines of the __init__():

row_sizer = wx.BoxSizer()
lbl = wx.StaticText(self, label='Input PDF:')
lbl.SetFont(font)
row_sizer.Add(lbl, 0, wx.ALL | wx.CENTER, 5)self.pdf_path = wx.TextCtrl(self, style=wx.TE_READONLY)
row_sizer.Add(self.pdf_path, 1, wx.EXPAND | wx.ALL, 5)
pdf_btn = wx.Button(self, label='Open PDF')
pdf_btn.Bind(wx.EVT_BUTTON, self.on_choose)
row_sizer.Add(pdf_btn, 0, wx.ALL, 5)
main_sizer.Add(row_sizer, 0, wx.EXPAND)

This bit of code adds a row of widgets that will be contained inside of row_sizer. Here you have a nice label, a text control for holding the input PDF path and the “Open PDF” button. After adding each of these to the row_sizer, you will then add that sizer to the main_sizer.

Now let’s add a second row of widgets:

msg = 'Type page numbers and/or page ranges separated by commas.' \
    ' For example: 1, 3 or 4-10. Note you cannot use both commas ' \
    'and dashes.'
directions_txt = wx.TextCtrl(self, value=msg,
    style=wx.TE_MULTILINE | wx.NO_BORDER)
directions_txt.SetFont(font)
directions_txt.Disable()
main_sizer.Add(directions_txt, 0, wx.ALL | wx.EXPAND, 5)

These lines of code create a multi-line text control that has no border. It contains the directions of use for the pdf_split_options text control and appears beneath that widget as well. You also Disable() the directions_txt to prevent the user from changing the directions.

There are four more lines to add to the __init__():

split_btn = wx.Button(self, label='Split PDF')
split_btn.Bind(wx.EVT_BUTTON, self.on_split)
main_sizer.Add(split_btn, 0, wx.ALL | wx.CENTER, 5)self.SetSizer(main_sizer)

These last few lines will add the “Split PDF” button, bind it to an event handler and add the button to a sizer. Then you set the sizer for the panel.

Now that you have the UI itself written, you need to start writing the other methods:

def on_choose(self, event):
    path = None
    with wx.FileDialog(self, message="Choose a file",
        defaultDir='~',
        defaultFile="",
        wildcard=wildcard,
        style=wx.FD_OPEN | wx.FD_CHANGE_DIR) as dlg:
        if dlg.ShowModal() == wx.ID_OK:
            path = dlg.GetPath()if path:
        self.pdf_path.SetValue(path)

The on_choose() event handler is called when the user presses the “Open PDF” button. It will load a wx.FileDialog and if the user chooses a PDF, it will set the pdf_path text control with that user’s choice.

Now let’s get to the meat of the code:

def on_split(self, event):
    output_path = None
    input_pdf = self.pdf_path.GetValue()
    split_options = self.pdf_split_options.GetValue()ifnot input_pdf:
        message='You must choose an input PDF!'self.show_message(message)return

When the user presses the “Split PDF” button, on_split() is called. You will start off by checking if the user has chosen a PDF to split at all. If they haven’t, tell them to do so using the show_message() method and return.

Next you need to check to see if the PDF path that the user chose still exists:

ifnotos.path.exists(input_pdf):
    message = f'Input PDF {input_pdf} does not exist!'self.show_message(message)return

If the PDF does not exist, let the user know of the error and don’t do anything.

Now you need to check if the user put anything into split_options:

ifnot split_options:
    message = 'You need to choose what page(s) to split off'self.show_message(message)return

If the user didn’t set the split_options then your application won’t know what pages to split off. So tell the user.

The next check is to make sure the user does not have both commas and dashes:

if','in split_options and'-'in split_options:
    message = 'You cannot have both commas and dashes in options'self.show_message(message)return

You could theoretically support both commas and dashes, but that will make the code more complex. If you want to add that, feel free. For now, it is not supported.

Another item to check is if there is more than one dash:

if split_options.count('-')>1:
    message = 'You can only use one dash'self.show_message(message)return

Users are tricky and it is easy to bump a button twice, so make sure to let the user know that this is not allowed.

The user could also enter a single negative number:

if'-'in split_options:
    page_begin, page_end = split_options.split('-')ifnot page_begin ornot page_end:
        message = 'Need both a beginning and ending page'self.show_message(message)return

In that case, you can check to make sure it splits correctly or you can try to figure out where in the string the negative number is. In this case, you use the split method to figure it out.

The last check is to make sure that the user has entered a number and not just a dash or comma:

ifnot any(char.isdigit()for char in split_options):
    message = 'You need to enter a page number to split off'self.show_message(message)return

You can use Python’s any builtin for this. You loop over all the characters in the string and ask them if they are a digit. If they aren’t, then you show a message to the user.

Now you are ready to create the split PDF file itself:

with wx.FileDialog(self, message="Choose a file",
    defaultDir='~',
    defaultFile="",
    wildcard=wildcard,
    style=wx.FD_SAVE | wx.FD_CHANGE_DIR) as dlg:
    if dlg.ShowModal() == wx.ID_OK:
        output_path = dlg.GetPath()

This bit of code will open the save version of the wx.FileDialog and let the user pick a name and location to save the split PDF.

The last piece of code for this function is below:

if output_path:
    _, ext = os.path.splitext(output_path)if'.pdf'notin ext.lower():
        output_path = f'{output_path}.pdf'
    split_options = split_options.strip()self.split(input_pdf, output_path, split_options)

Once you have the output_path, you will check to make sure the user added the .pdf extension. If they didn’t, then you will add it for them. Then you will strip off any leading or ending white space in split_options and call split().

Now let’s create the code used to actually split a PDF:

def split(self, input_pdf, output_path, split_options):
    pdf = PdfFileReader(input_pdf)
    pdf_writer = PdfFileWriter()if','in split_options:
        pages = [page for page in split_options.split(',')if page]for page in pages:
            pdf_writer.addPage(pdf.getPage(int(page)))elif'-'in split_options:
        page_begin, page_end = split_options.split('-')
        page_begin = int(page_begin)
        page_end = int(page_end)
        page_begin = self.get_actual_beginning_page(page_begin) 
        for page inrange(page_begin, page_end):
            pdf_writer.addPage(pdf.getPage(page))else:
        # User only wants a single page
        page_begin = int(split_options)
        page_begin = self.get_actual_beginning_page(page_begin)
        pdf_writer.addPage(pdf.getPage(page_begin))

Here you create a PdfFileReader object called pdf and a PdfFileWriter object called pdf_writer. Then you check split_options to see if the user used commas or dashes. If the user went with a comma separated list, then you loop over the pages and add them to the writer.

If the user used dashes, then you need to get the beginning page and the ending page. Then you call the get_actual_beginning_page() method to do a bit of math because page one when using PyPDF is actually page zero. Once you have the normalized numbers figured out, you can loop over the range of pages using Python’s range function and add the pages to the writer object.

The else statement is only used when the user enters a single page number that they want to split off. For example, they might just want page 2 out of a 20 page document.

The last step is to write the new PDF to disk:

# Write PDF to disk
with open(output_path, 'wb') as out:
    pdf_writer.write(out) 
# Let user know that PDF is split
message = f'PDF split successfully to {output_path}'self.show_message(message, caption='Split Finished',
                          style=wx.ICON_INFORMATION)

This code will create a new file using the path the user provided. Then it will write out the pages that were added to pdf_writer and display a dialog to the user letting them know that they now have a new PDF.

Let’s take a quick look at the logic you need to add to the get_actual_beginning_page() method:

def get_actual_beginning_page(self, page_begin):
    if page_begin <0or page_begin == 1:
        page_begin = 0if page_begin >1:
        # Take off by one error into account
        page_begin -= 1return page_begin

Here you take in the beginning page and check if the page number is zero, one or greater than one. Then you do a bit of math to avoid off-by-one errors and return the actual beginning page number.

Now let’s create show_message():

def show_message(self, message, caption='Error', style=wx.ICON_ERROR):
    with wx.MessageDialog(None, message=message,
                          caption=caption,
                          style=style) as dlg:
        dlg.ShowModal()

This is a helpful function for wrapping the creation and destruction of a wx.MessageDialog. It accepts the following arguments:

message
caption
style flag

Then it uses Python’s with statement to create an instance of the dialog and show it to the user.

Here is what the split panel looks like when you are finished coding:

The PDF Splitter Tab

Now you are ready to learn about threads and wxPython!

Using Threads in wxPython

Every GUI toolkit handles threads differently. The wxPython GUI toolkit has three thread-safe methods that you should use if you want to use threads:

wx.CallAfter
wx.CallLater
wx.PostEvent

You can use these methods to post information from the thread back to wxPython.

Let’s update the merge_panel so that it uses threads!

Enhancing PDF Merging with Threads

Python comes with several concurrency-related modules. You will be using the threading module here. Take the original code and copy it into a new folder called version_2_threaded or refer to the pre-made folder in the Github repository for this chapter.

Let’s start by updating the imports in merge_panel:

# merge_panel.py 
importosimportglobimport wx
 
from ObjectListView import ObjectListView, ColumnDefn
from pubsub import pub
from PyPDF2 import PdfFileReader, PdfFileWriter
fromthreadingimport Thread
 
wildcard = "PDFs (*.pdf)|*.pdf"

The only differences here are this import line: from threading import Thread and the addition of pubsub. That gives us ability to subclass Thread.

Let’s do that next:

class MergeThread(Thread):
 
    def__init__(self, objects, output_path):
        super().__init__()self.objects = objects
        self.output_path = output_path
        self.start()

The MergeThread class will take in the list of objects from the ObjectListView widget as well as the output_path. At the end of the __init__() you tell the thread to start(), which actually causes the run() method to execute.

Let’s override that:

def run(self):
    pdf_writer = PdfFileWriter()
    page_count = 1 
    for obj inself.objects:
        pdf_reader = PdfFileReader(obj.full_path)for page inrange(pdf_reader.getNumPages()):
            pdf_writer.addPage(pdf_reader.getPage(page))
            wx.CallAfter(pub.sendMessage, 'update',
                         msg=page_count)
            page_count += 1 
    # All pages are added, so write it to disk
    with open(self.output_path, 'wb') as fh:
        pdf_writer.write(fh) 
    wx.CallAfter(pub.sendMessage, 'close')

Here you create a PdfFileWriter class and then loop over the various PDFs, extracting their pages and adding them to the writer object as you did before. After a page is added, you use wx.CallAfter to send a message using pubsub back to the GUI thread. In this message, you send along the current page count of added pages. This will update a dialog that has a progress bar on it.

After the file is finished writing out, you send another message via pubsub to tell the progress dialog to close.

Let’s create a progress widget:

class MergeGauge(wx.Gauge):
 
    def__init__(self, parent, range):
        super().__init__(parent, range=range) 
        pub.subscribe(self.update_progress, "update") 
    def update_progress(self, msg):
        self.SetValue(msg)

To create a progress widget, you can use wxPython’s wx.Gauge. In the code above, you subclass that widget and subscribe it to the update message. Whenever it receives an update, it will change the gauge’s value accordingly.

You will need to put this gauge into a dialog, so let’s create that next:

class MergeProgressDialog(wx.Dialog):
 
    def__init__(self, objects, path):
        super().__init__(None, title='Merging Progress')
        pub.subscribe(self.close, "close") 
        sizer = wx.BoxSizer(wx.VERTICAL)
        lbl = wx.StaticText(self, label='Merging PDFS')
        sizer.Add(lbl, 0, wx.ALL | wx.CENTER, 5)
        total_page_count = sum([int(obj.number_of_pages)for obj in objects])
        gauge = MergeGauge(self, total_page_count)
        sizer.Add(gauge, 0, wx.ALL | wx.EXPAND, 5) 
        MergeThread(objects, output_path=path)self.SetSizer(sizer) 
    def close(self):
        self.Close()

The MergeProgressDialog subscribes the dialog to the “close” message. It also adds a label and the gauge / progress bar to itself. Then it starts the MergeThread. When the “close” message gets emitted, the close() method is called and the dialog will be closed.

The other change you will need to make is in the MergePanel class, specifically the merge() method:

def merge(self, output_path, objects):
    with MergeProgressDialog(objects, output_path) as dlg:
        dlg.ShowModal() 
    with wx.MessageDialog(None, message='Save completed!',
                          caption='Save Finished',
                         style= wx.ICON_INFORMATION) as dlg:
        dlg.ShowModal()

Here you update the method to accept the objects parameter and create the MergeProgressDialog with that and the output_path. Note that you will need to change on_merge() to pass in the objects list in addition to the path to make this work. Once the merge is finished, the dialog will automatically close and destroy itself. Then you will create the same wx.MessageDialog as before and show that to the user to let them know the merged PDF is ready.

You can use the code here to update the split_panel to use threads too if you would like to. This doesn’t have to happen necessarily unless you think you will be splitting off dozens or hundreds of pages. Most of the time, it should be quick enough that the user wouldn’t notice or care much when splitting the PDF.

Wrapping Up

Splitting and merging PDFs can be done using PyPDF2. You could also use pdfrw if you wanted to. There are plenty of ways to improve this application as well.

Here are a few examples:

Put splitting into a thread
Add toolbar buttons
Add keyboard shortcuts
Add a statusbar

However you learned a lot in this chapter. You learned how to merge and split PDFs. You also learned how to use threads with wxPython. Finally this code demonstrated adding some error handling to your inputs, specifically in the split_panel module.

The post wxPython – Creating a PDF Merger / Splitter Utility appeared first on The Mouse Vs. The Python.

↧

Artem Rys: Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1

September 10, 2019, 5:33 pm

≫ Next: IslandT: The weekly Python news report

≪ Previous: Mike Driscoll: wxPython – Creating a PDF Merger / Splitter Utility

Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1

Photo by Paweł Czerwiński on Unsplash

It is an article about monitoring your Github open-source repositories traffic. Unfortunately, you can see these statistics only by accessing each repository step by step. You may not want to access them at all… But if you do, you can use this small tool.

Technical stack:

And from the perspective of $ — this solution is zero cost because of the free quota you have in Google Cloud Platform. No ads — I just like using free opportunities.

So, the main concept is to get top referrers from Github for each your public repository and then store this data in firestore based on the date to be able to create a report on the weekly basis (in the next part).

In this part, we are going only to get top referrers from Github. The main and only code:

Cloud Function to get top referrers for each of your open-source Github repositories.

requirements.txt file:

PyGithub==1.43.8
google-cloud-firestore==1.4.0

You are going to need a Github personal access token to be able to make requests to Github API to get all your public repositories and then get traffic data from them. You can get it here.

Github personal access token page.

Click Generate new token.

Github new personal access token page. Click only `public_repo` scope.

Click Generate token.

From the GCP perspective — you need to have a created project, enabled Cloud Functions API and created a Cloud Firestore database and that’s all.

Docs to deploy Cloud Function in several ways can be found here.

I am using a gcloud tool from my local machine.

gcloud config set core/project <your-project-name>

gcloud functions deploy parse_github_repos_traffic --runtime python37 --trigger-http --set-env-vars=GITHUB_TOKEN=<your-github-token>

Newly created Cloud function.

To test the function, click on it, then go to the Testing tab.

A testing tab of your Cloud Function.

And press Test the function. And you should be able to see something like this.

A testing tab of your Cloud Function after pressing a Test the function button.

And this data is also available in Cloud Firestore (for future analytics).

Cloud Firestore with data.

That is all for this part. In the next part we are going to set up a scheduler to run call this function weekly and analyze the data we got to create a report.

Thanks for the attention to the topic, feel free to leave your questions in the comments for discussion.

Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1 was originally published in python4you on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

IslandT: The weekly Python news report

September 10, 2019, 9:24 pm

≫ Next: Test and Code: 87: Paths to Parametrization - from one test to many

≪ Previous: Artem Rys: Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1

What do we have here this week?
Python goodies that will make you happy…

Great price python related books on Humble as cheap as a dollar for a few books.

Want to get your hands on our Python Basics book? Or perhaps the Python Tricks book? Well now you can get both and much much more for a pay what you want price thanks to @humble Bundle.

Level up your Python today!https://t.co/dZJWwIdaq1 pic.twitter.com/WmAYC8NjdM
— Real Python (@realpython) September 11, 2019

Django 3.0 alpha 1 is now available. The first stage in the 3.0 release cycle is ready for you to use.

Django 3.0 alpha 1 is now available #django #hugemilestone #python https://t.co/CExzOYg2I6
— import python (@importpython) September 11, 2019

Your Guide to the CPython Source Code!

Your Guide to the CPython Source Code — https://t.co/XZoU54Xibc pic.twitter.com/gOihVXR3qt
— Real Python (@realpython) September 11, 2019

Hey Pythonista, would you like to show your love for Python? Grab a few tee-shirts.

Hey Pythonista, would you like to show your love for Python? Get your #TShirt now https://t.co/FwYtF0Rxkg #python pic.twitter.com/i5iX3GWolD
— Python Weekly (@PythonWeekly) February 2, 2016

The new game develops with Pygame!

Card #33: Shadow Step#gamedev / #indiedev / #pixelart / #python / #pygame pic.twitter.com/asoTZdY0hV
— DaFluffyPotato (@DaFluffyPotato) September 7, 2019

Also, this is personal, this website will become 100% “Python Only” website from today onward, you will find more Python code, Python project, Python tutorial, Python product, Python development tool and Python game on this website. Please subscribe to this website to begin your Python programming journey!

↧

Test and Code: 87: Paths to Parametrization - from one test to many

September 11, 2019, 12:00 am

≫ Next: Python Anywhere: Our new CPU API

≪ Previous: IslandT: The weekly Python news report

There's a cool feature of pytest called parametrization.
It's totally one of the superpowers of pytest.

It's actually a handful of features, and there are a few ways to approach it.
Parametrization is the ability to take one test, and send lots of different input datasets into the code under test, and maybe even have different output checks, all within the same test that you developed in the simple test case.

Super powerful, but something since there's a few approaches to it, a tad tricky to get the hang of.

Python Anywhere: Our new CPU API

September 11, 2019, 2:36 am

≫ Next: Stack Abuse: Deploying a Flask Application to Heroku

≪ Previous: Test and Code: 87: Paths to Parametrization - from one test to many

We received many requests from PythonAnywhere users to make it possible to programmatically monitor usage of CPU credit, so we decided to add a new endpoint to our experimental API.

The first step when using the API is to get an API token -- this is what you use to authenticate yourself with our servers when using it. To do that, log in to PythonAnywhere, and go to the "Account" page using the link at the top right. Click on the "API token" tab; if you don't already have a token, it will look like this:

Click the "Create a new API token" button to get your token, and you'll see this:

That string of letters and numbers (d870f0cac74964b27db563aeda9e418565a0d60d in the screenshot) is an API token, and anyone who has it can access your PythonAnywhere account and do stuff -- so keep it secret. If someone does somehow get hold of it, you can revoke it on this page by clicking the red button -- that stops it from working in the future, and creates a new one for you to use.

Now you can use CPU API to track your CPU usage.

For example you could use it at the beginning and the end of your script to learn how many CPU seconds were consumed (assuming nothing else is running).

from math import factorial
from time import sleep
from urllib.parse import urljoin

import requests

api_token = "YOUR TOKEN HERE"
username = "YOUR USERNAME HERE"
pythonanywhere_host = "www.pythonanywhere.com"

api_base = "https://{pythonanywhere_host}/api/v0/user/{username}/".format(
    pythonanywhere_host=pythonanywhere_host,
    username=username,
)

resp = requests.get(
    urljoin(api_base, "cpu/"),
    headers={"Authorization": "Token {api_token}".format(api_token=api_token)}
)

initial_usage_seconds = resp.json()["daily_cpu_total_usage_seconds"]

# we burn some cpu seconds

[factorial(x) for x in range(2000)]

# cpu usage is updated every 60 seconds, so we need to wait to be sure that usage is available for us to read.

sleep(70)

resp = requests.get(
    urljoin(api_base, "cpu/"),
    headers={"Authorization": "Token {api_token}".format(api_token=api_token)}
)

final_usage_seconds = resp.json()["daily_cpu_total_usage_seconds"]

seconds_used = final_usage_seconds - initial_usage_seconds

print("Task cost {} CPU seconds to run".format(seconds_used))

...replacing "YOUR TOKEN HERE" and "YOUR USERNAME HERE" with the appropriate stuff.
If you're on our EU-based system, you should also replace www.pythonanywhere.com with eu.pythonanywhere.com.

Let us know if you have any comments or questions -- otherwise, happy coding!

↧

Stack Abuse: Deploying a Flask Application to Heroku

September 11, 2019, 5:39 am

≫ Next: Real Python: Python vs C++: Selecting the Right Tool for the Job

≪ Previous: Python Anywhere: Our new CPU API

Introduction

In this tutorial you will learn how to deploy a Flask application to Heroku. The app can be as simple as a "Hello World" app to a social media monitoring platform!

Nowadays there is no business that doesn't have a web app to help it a reach greater audience, or maybe provide its services through an online portal.

Today you are about to learn how to make an API using Flask as a case study for how to deploy your app on Heroku.

Building a REST API with Flask

In your project directory, let's start off by creating a virtualenv:

$ python -m venv venv/

And let's activate it with the source command:

$ source venv/bin/activate

Then, let's use pip to install the libraries we're going to use - flask to build the app and gunicorn as our server:

$ pip install flask
$ pip install gunicorn

Our application is going to be a simple API that receives a name and returns a welcome message:

# app.py
from flask import Flask, request, jsonify
app = Flask(__name__)

@app.route('/getmsg/', methods=['GET'])
def respond():
    # Retrieve the name from url parameter
    name = request.args.get("name", None)

    # For debugging
    print(f"got name {name}")

    response = {}

    # Check if user sent a name at all
    if not name:
        response["ERROR"] = "no name found, please send a name."
    # Check if the user entered a number not a name
    elif str(name).isdigit():
        response["ERROR"] = "name can't be numeric."
    # Now the user entered a valid name
    else:
        response["MESSAGE"] = f"Welcome {name} to our awesome platform!!"

    # Return the response in json format
    return jsonify(response)

@app.route('/post/', methods=['POST'])
def post_something():
    param = request.form.get('name')
    print(param)
    # You can add the test cases you made in the previous function, but in our case here you are just testing the POST functionality
    if param:
        return jsonify({
            "Message": f"Welcome {name} to our awesome platform!!",
            # Add this option to distinct the POST request
            "METHOD" : "POST"
        })
    else:
        return jsonify({
            "ERROR": "no name found, please send a name."
        })

# A welcome message to test our server
@app.route('/')
def index():
    return "<h1>Welcome to our server !!</h1>"

if __name__ == '__main__':
    # Threaded option to enable multiple instances for multiple user access support
    app.run(threaded=True, port=5000)

To test your application locally, let's hit the http://127.0.0.1:5000/ endpoint. If everything is fine, we should be greeted with a welcome message:

welcome message

We can also send a name as a parameter, such as http://localhost:5000/getmsg/?name=Mark:

{"MESSAGE":"Welcome Mark to our awesome platform!!"}

With our application ready, let's deploy it to Heroku.

Heroku

Heroku is one of the first cloud platform as a service (PaaS) and supports several languages - Ruby, Java, Node.js, Scala, Clojure, Python, PHP, and Go.

The first thing we need to do is define which libraries our application uses. That way, Heroku knows which ones to provide for us, similar to how we install them locally when developing the app.

To achieve this, we need to create a requirements.txt file with all of the modules:

$ pip freeze > requirements.txt

This way we end up with a requirements.txt file that contains the libraries we're using and their versions:

Click==7.0
Flask==1.1.1
gunicorn==19.9.0
itsdangerous==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
Werkzeug==0.15.6

Note: One of the common mistakes is misspelling requirements, it is a real pain when you debug your code for hours and find out that the app doesn't run because the server didn't download the modules. The only way for Heroku to know the modules that you are using is to add them to the requirements.txt file, so be careful!

For Heroku to be able to run our application like it should, we need to define a set of processes/commands that it should run beforehand. These commands are located in the file Procfile.txt:

web: gunicorn app:app

The web command tells Heroku to start a web server for the application, using gunicorn. Since our application is called app.py, we've set the app name to be app as well.

Heroku Account

Now, we should create a Heroku account.

Once that is out of the way, on the dashboard, select New -> Create new app:

new application

Choose a name for the application and choose a region of where you'd like to host it:

app naming

Once the application is created on Heroku, we're ready to deploy it online.

Git

To upload our code, we'll use Git. First, let's make a git repository:

$ git init .

And now, let's add our files and commit:

$ git add app.py Procfile requirements.txt
$ git commit -m "first commit"

Deploying the App to Heroku

To finally deploy the application, we'll need to install the Heroku CLI with which we'll run Heroku-related commands. Let's login to our account using our credentials by running the command:

$ heroku login -i

Alternatively, we can login using the browser if we run the command:

$ heroku login

At this point, while logged in, we should add our repository to the remote one:

$ heroku git:remote -a {your-project-name}

Be sure to replace {your-project-name} with the actual name of your project you selected in the earlier step.

And with that done, let's upload the project by pushing it to Heroku:

$ git push heroku master

A lengthy progress log should come up on your terminal, ending with:

...
remote: -----> Discovering process types
remote:        Procfile declares types -> web
remote:
remote: -----> Compressing...
remote:        Done: 45.1M
remote: -----> Launching...
remote:        Released v4
remote:        https://{your-project-name}.herokuapp.com/ deployed to Heroku
remote:
remote: Verifying deploy... done.
To https://git.heroku.com/{your-project-name}.git
   ae85864..4e63b46  master -> master

Congratulations, you have successfully uploaded your first web app to Heroku! It's now time now to test and verify our API.

Testing the API

In the log that has been shown in the console you will find a link for your application https://{your-project-name}.herokuapp.com/, this link can also be found under the Settings tab, in the Domains and certificates section:

application url

Visiting the link, we can reach our application, which is now online and public:

welcome message

In case there were any errors, you can access the logs and troubleshoot from there:

log link

You can manually test your app in the browser, by typing the URL and adding the path for the /getmsg/ route. Though, as applications tend to get more and more complex, it's advised to use tools like Postman.

Now let's test the GET request to our application with a name parameter:

local request with parameters

Now let's test a URL that isn't bound to any function, like for example /newurl, with a GET request:

new url

As expected, our Flask app returned a 404 response.

Note: You can change the view of the output from Pretty, Raw, and Preview, which shows you how the output would look in your browser.

Now let's test a POST request:

post request

Also, let's see what happens if we completely omit the name parameter:

{"ERROR":"no name found, please send a name."}

We've tested our app and confirmed that everything is working fine. To see the history of your server and what requests were made you can check the logs for your site via Heroku:

Heroku log

You can see here the POST request we made to our page /post/.

Also, you can see the history of building the application. Moreover, if there's any problem during building you can find it in the log page.

Conclusion

In this article we showed a simple example of building our first simple API on Heroku using the Flask micro-framework. The development process remains the same as you continue to build your application.

Heroku offers a free plan and Student plans. The free plan is limited but it works pretty good for a starting app, POC, or a simple project for example. However, if you want to scale your application then you'll want to consider one of the plans that are available on the site from here.

For more info on Heroku you can check the Heroku manual itself.

↧

Real Python: Python vs C++: Selecting the Right Tool for the Job

September 11, 2019, 7:00 am

≫ Next: PyCharm: PyCharm 2019.2.2

≪ Previous: Stack Abuse: Deploying a Flask Application to Heroku

Are you a C++ developer comparing Python vs C++? Are you looking at Python and wondering what all the fuss is about? Do you wonder how Python compares to the concepts you already know? Or perhaps you have a bet on who would win if you locked C++ and Python in a cage and let them battle it out? Then this article is for you!

In this article, you’ll learn about:

Differences and similarities when you’re comparing Python vs C++
Times when Python might be a better choice for a problem and vice versa
Resources to turn to as you have questions while learning Python

This article is aimed at C++ developers who are learning Python. It assumes a basic knowledge of both languages and will use concepts from Python 3.6 and up, as well as C++11 or later.

Let’s dive into looking at Python vs C++!

Free Bonus:Click here to get access to a chapter from Python Tricks: The Book that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

Comparing Languages: Python vs C++

Frequently, you’ll find articles that extoll the virtues of one programming language over another. Quite often, they devolve into efforts to promote one language by degrading the other. This isn’t that type of article.

When you’re comparing Python vs C++, remember that they’re both tools, and they both have uses for different problems. Think about comparing a hammer and a screwdriver. You could use a screwdriver to drive in nails, and you could use a hammer to force in screws, but neither experience will be all that effective.

Using the right tool for the job is important. In this article, you’ll learn about the features of Python and C++ that make each of them the right choice for certain types of problems. So, don’t view the “vs” in Python vs C++ as meaning “against.” Rather, think of it as a comparison.

Compilation vs Virtual Machine

Let’s start with the biggest difference when you’re comparing Python vs C++. In C++, you use a compiler that converts your source code into machine code and produces an executable. The executable is a separate file that can then be run as a stand-alone program:

This process outputs actual machine instructions for the specific processor and operating system it’s built for. In this drawing, it’s a Windows program. This means you’d have to recompile your program separately for Windows, Mac, and Linux:

You’ll likely need to modify your C++ code to run on those different systems as well.

Python, on the other hand, uses a different process. Now, remember that you’ll be looking at CPython which is the standard implementation for the language. Unless you’re doing something special, this is the Python you’re running.

Python runs each time you execute your program. It compiles your source just like the C++ compiler. The difference is that Python compiles to bytecode instead of native machine code. Bytecode is the native instruction code for the Python virtual machine. To speed up subsequent runs of your program, Python stores the bytecode in .pyc files:

If you’re using Python 2, then you’ll find these files next to the .py files. For Python 3, you’ll find them in a __pycache__ directory.

The generated bytecode doesn’t run natively on your processor. Instead, it’s run by the Python virtual machine. This is similar to the Java virtual machine or the .NET Common Runtime Environment. The initial run of your code will result in a compilation step. Then, the bytecode will be interpreted to run on your specific hardware:

As long as the program hasn’t been changed, each subsequent run will skip the compilation step and use the previously compiled bytecode to interpret:

Interpreting code is going to be slower than running native code directly on the hardware. So why does Python work that way? Well, interpreting the code in a virtual machine means that only the virtual machine needs to be compiled for a specific operating system on a specific processor. All of the Python code it runs will run on any machine that has Python.

Note: CPython is written in C, so it can run on most systems that have a C compiler.

Another feature of this cross-platform support is that Python’s extensive standard library is written to work on all operating systems.

Using pathlib, for example, will manage path separators for you whether you’re on Windows, Mac, or Linux. The developers of those libraries spent a lot of time making it portable so you don’t need to worry about it in your Python program!

Before you move on, let’s start keeping track of a Python vs C++ comparison chart. As you cover new comparisons, they’ll be added in italics:

Feature	Python	C++
Faster Execution		x
Cross-Platform Execution	x

Now that you’ve seen the differences in run time when you’re comparing Python vs C++, let’s dig into the specifics of the languages’ syntax.

Syntax Differences

Python and C++ share many syntactical similarities, but there are a few areas worth discussing:

Whitespace
Boolean expressions
Variables and pointers
Comprehensions

Let’s start with the most contentious one first: whitespace.

Whitespace

The first thing most developers notice when comparing Python vs C++ is the “whitespace issue.” Python uses leading whitespace to mark scope. This means that the body of an if block or other similar structure is indicated by the level of indentation. C++ uses curly braces ({}) to indicate the same idea.

While the Python lexer will accept any whitespace as long as you’re consistent, PEP8 (the official style guide for Python) specifies 4 spaces for each level of indentation. Most editors can be configured to do this automatically.

There has been an enormous amount of writing, shouting, and ranting about Python’s whitespace rules already, so let’s just jump past that issue and on to other matters.

Instead of relying on a lexical marker like ; to end each statement, Python uses the end of the line. If you need to extend a statement over a single line, then you can use the backslash (\) to indicate that. (Note that if you’re inside a set of parentheses, then the continuation character is not needed.)

There are people who are unhappy on both sides of the whitespace issue. Some Python developers love that you don’t have to type out braces and semicolons. Some C++ developers hate the reliance on formatting. Learning to be comfortable with both is your best bet.

Now that you’ve looked at the whitespace issue, let’s move on to one that’s a bit less contentious: Boolean expressions.

Boolean Expressions

The way you’ll use Boolean expressions changes slightly in Python vs C++. In C++, you can use numeric values to indicate true or false, in addition to the built-in values. Anything that evaluates to 0 is considered false, while every other numeric value is true.

Python has a similar concept but extends it to include other cases. The basics are quite similar. The Python documentation states that the following items evaluate to False:

Constants defined as false:
- None
- False
Zeros of any numeric type:
- 0
- 0.0
- 0j
- Decimal(0)
- Fraction(0, 1)
Empty sequences and collections:
- ''
- ()
- []
- {}
- set()
- range(0)

All other items are True. This means that an empty list [] is False, while a list containing only zero [0] is still True.

Most objects will evaluate to True, unless the object has __bool__() which returns False or __len__() which returns 0. This allows you to extend your custom classes to act as Boolean expressions.

Python has a few slight changes from C++ in the Boolean operators as well. For starters, if and while statements do not require the surrounding parentheses as they do in C++. Parentheses can aid in readability, however, so use your best judgment.

Most C++ Boolean operators have similar operators in Python:

C++ Operator	Python Operator
`&&`	`and`
`\|\|`	`or`
`!`	`not`
`&`	`&`
`\|`	`\|`

Most of the operators are similar to C++, but if you want to brush up you can read Operators and Expressions in Python.

Variables and Pointers

When you first start using Python after writing in C++, you might not give variables much thought. They seem to generally work as they do in C++. However, they’re not the same. Whereas in C++ you use variables to reference values, in Python you use names.

Note: For this section, where you’re looking at variables and names in Python vs C++, you’ll use variables for C++ and names for Python. Elsewhere, they will both be called variables.

First, let’s back up a bit and take a broader look at Python’s object model.

In Python, everything is an object. Numbers are held in objects. Modules are held in objects. Both the object of a class and the class itself are objects. Functions are also objects:

>>>

>>> a_list_object=list()>>> a_list_object[]>>> a_class_object=list>>> a_class_object<class 'list'>>>> defsayHi(name):... print(f'Hello, {name}')...>>> a_function_object=sayHi>>> a_function_object<function sayHi at 0x7faa326ac048>

Calling list() creates a new list object, which you assign to a_list_object. Using the name of the class list by itself places a label on the class object. You can place a new label on a function as well. This is a powerful tool and, like all powerful tools, it can be dangerous. (I’m looking at you, Mr. Chainsaw.)

Note: The code above is shown running in a REPL, which stands for “Read, Eval, Print Loop.” This interactive environment is used frequently to try out ideas in Python and other interpreted languages.

If you type python at a command prompt, then it will bring up a REPL where you can start typing in code and trying things out for yourself!

Moving back to the Python vs C++ discussion, note that this is behavior is different from what you’ll see in C++. Unlike Python, C++ has variables that are assigned to a memory location, and you must indicate how much memory that variable will use:

intan_int;floata_big_array_of_floats[REALLY_BIG_NUMBER];

In Python, all objects are created in memory, and you apply labels to them. The labels themselves don’t have types, and they can be put on any type of object:

>>>

>>> my_flexible_name=1>>> my_flexible_name1>>> my_flexible_name='This is a string'>>> my_flexible_name'This is a string'>>> my_flexible_name=[3,'more info',3.26]>>> my_flexible_name[3, 'more info', 3.26]>>> my_flexible_name=print>>> my_flexible_name<built-in function print>

You can assign my_flexible_name to any type of object, and Python will just roll with it.

When you’re comparing Python vs C++, the difference in variables vs names can be a bit confusing, but it comes with some excellent benefits. One is that in Python you don’t have pointers, and you never need to think about heap vs stack issues. You’ll dive into memory management a bit later in this article.

Comprehensions

Python has a language feature called list comprehensions. While it’s possible to emulate list comprehensions in C++, it’s fairly tricky. In Python, they’re a basic tool that’s taught to beginning programmers.

One way of thinking about list comprehensions is that they’re like a super-charged initializer for lists, dicts, or sets. Given one iterable object, you can create a list, and filter or modify the original as you do so:

>>>

>>> [x**2forxinrange(5)][0, 1, 4, 9, 16]

This script starts with the iterable range(5) and creates a list that contains the square for each item in the iterable.

It’s possible to add conditions to the values in the first iterable:

>>>

>>> odd_squares=[x**2forxinrange(5)ifx%2]>>> odd_squares[1, 9]

The if x % 2 at the end of this comprehension limits the numbers used from range(5) to only the odd ones.

At this point you might be having two thoughts:

That’s a powerful syntax trick that will simplify some parts of my code.
You can do the same thing in C++.

While it’s true that you can create a vector of the squares of the odd numbers in C++, doing so usually means a little more code:

std::vector<int>odd_squares;for(intii=0;ii<10;++ii){if(ii%2){odd_squares.push_back(ii);}}

For developers coming from C-style languages, list comprehensions are one of the first noticeable ways they can write more Pythonic code. Many developers start writing Python with C++ structure:

odd_squares=[]foriiinrange(5):if(ii%2):odd_squares.append(ii)

This is perfectly valid Python. It will likely run more slowly, however, and it’s not as clear and concise as the list comprehension. Learning to use list comprehensions will not only speed up your code, but it will also make your code more Pythonic and easier to read!

Note: When you’re reading about Python, you’ll frequently see the word Pythonic used to describe something. This is just a term the community uses to describe code that is clean, elegant and looks like it was written by a Python Jedi.

Python’s `std::algorithms`

C++ has a rich set of algorithms built into the standard library. Python has a similar set of built-in functions that cover the same ground.

The first and most powerful of these is the in operator, which provides a quite readable test to see if an item is included in a list, set, or dictionary:

>>>

>>> x=[1,3,6,193]>>> 6inxTrue>>> 7inxFalse>>> y={'Jim':'gray','Zoe':'blond','David':'brown'}>>> 'Jim'inyTrue>>> 'Fred'inyFalse>>> 'gray'inyFalse

Note that the in operator, when used on dictionaries, only tests for keys, not values. This is shown by the final test, 'gray' in y.

in can be combined with not for quite readable syntax:

ifnamenotiny:print(f"{name} not found")

Next up in your parade of Python built-in operators is any. This is a boolean function that returns True if any element of the given iterable evaluates to True. This can seem a little silly until you remember your list comprehensions! Combining these two can produce powerful, clear syntax for many situations:

>>>

>>> my_big_list=[10,23,875]>>> my_small_list=[1,2,8]>>> any([x<3forxinmy_big_list])False>>> any([x<3forxinmy_small_list])True

Finally, you have all, which is similar to any. This returns Trueonly if—you guessed it—all of the elements in the iterable are True. Again, combining this with list comprehensions produces a powerful language feature:

>>>

>>> list_a=[1,2,9]>>> list_b=[1,3,9]>>> all([x%2forxinlist_a])False>>> all([x%2forxinlist_b])True

any and all can cover much of the same ground where C++ developers would look to std::find or std::find_if.

Note: In the any and all examples above, you can remove the brackets ([]) without any loss of functionality. This makes use of generator expressions which, while quite handy, are beyond the scope of this article.

Before moving on to variable typing, let’s update your Python vs C++ comparison chart:

Feature	Python	C++
Faster Execution		x
Cross-Platform Execution	x
Single-Type Variables		x
Multiple-Type Variables	x
Comprehensions	x
Rich Set of Built-In Algorithms	x	x

Okay, now you’re ready to look at variable and parameter typing. Let’s go!

Static vs Dynamic Typing

Another large topic when you’re comparing Python vs C++ is the use of data types. C++ is a statically typed language, while Python is dynamically typed. Let’s explore what that means.

Static Typing

C++ is statically typed, which means that each variable you use in your code must have a specific data type like int, char, float, and so forth. You can only assign values of the correct type to a variable, unless you jump through some hoops.

This has some advantages for both the developer and the compiler. The developer gains the advantage of knowing what the type of a particular variable is ahead of time, and therefore which operations are allowed. The compiler can use the type information to optimize the code, making it smaller, faster, or both.

This advance knowledge comes at a cost, however. The parameters passed into a function must match the type expected by the function, which can reduce the flexibility and potential usefulness of the code.

Duck Typing

Dynamic typing is frequently referred to as duck typing. It’s an odd name, and you’ll read more about that in just a minute! But first, let’s start with an example. This function takes a file object and reads the first ten lines:

defread_ten(file_like_object):forline_numberinrange(10):x=file_like_object.readline()print(f"{line_number} = {x.strip()}")

To use this function, you’ll create a file object and pass it in:

withopen("types.py")asf:read_ten(f)

This shows how the basic design of the function works. While this function was described as “reading the first ten lines from a file object,” there is nothing in Python that requires that file_like_objectbe a file. As long as the object passed in supports .readline(), the object can be of any type:

classDuck():defreadline(self):return"quack"my_duck=Duck()read_ten(my_duck)

Calling read_ten() with a Duck object produces:

0 = quack1 = quack2 = quack3 = quack4 = quack5 = quack6 = quack7 = quack8 = quack9 = quack

This is the essence of duck typing. The saying goes, “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

In other words, if the object has the needed methods, then it’s acceptable to pass it in, regardless of the object’s type. Duck or dynamic typing gives you an enormous amount of flexibility, as it allows any type to be used where it meets the required interfaces.

However, there is a problem here. What happens if you pass in an object that doesn’t meet the required interface? For example, what if you pass in a number to read_ten(), like this: read_ten(3)?

This results in an exception being thrown. Unless you catch the exception, your program will blow up with a traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "duck_test.py", line 4, in read_tenx=file_like_object.readline()AttributeError: 'int' object has no attribute 'readline'

Dynamic typing can be quite a powerful tool, but as you can see, you must use caution when employing it.

Note: Python and C++ are both considered strongly typed languages. Although C++ has a stronger type system, the details of this are generally not significant to someone learning Python.

Let’s move on to a feature that benefits from Python’s dynamic typing: templates.

Templates

Python doesn’t have templates like C++, but it generally doesn’t need them. In Python, everything is a subclass of a single base type. This is what allows you to create duck typing functions like the ones above.

The templating system in C++ is quite powerful and can save you significant time and effort. However, it can also be a source of confusion and frustration, as compiler errors in templates can leave you baffled.

Being able to use duck typing instead of templates makes some things much easier. But this, too, can cause hard-to-detect issues. As in all complex decisions, there are trade-offs when you’re comparing Python vs C++.

Type Checking

There’s been a lot of interest and discussion in the Python community lately about static type checking in Python. Projects like mypy have raised the possibility of adding pre-runtime type checking to specific spots in the language. This can be quite useful in managing interfaces between portions of large packages or specific APIs.

It helps to address one of the downsides of duck typing. For developers using a function, it helps if they can fully understand what each parameter needs to be. This can be useful on large project teams where many developers need to communicate through APIs.

Once again, let’s take a look at your Python vs C++ comparison chart:

Feature	Python	C++
Faster Execution		x
Cross-Platform Execution	x
Single-Type Variables		x
Multiple-Type Variables	x
Comprehensions	x
Rich Set of Built-In Algorithms	x	x
Static Typing		x
Dynamic Typing	x

Now you’re ready to move on to differences in object-oriented programming.

Object-Oriented Programming

Like C++, Python supports an object-oriented programming model. Many of the same concepts you learned in C++ carry over into Python. You’ll still need to make decisions about inheritance, composition, and multiple inheritance.

Similarities

Inheritance between classes works similarly in Python vs C++. A new class can inherit methods and attributes from one or more base classes, just like you’ve seen in C++. Some of the details are a bit different, however.

Base classes do not have their constructor called automatically like they do in C++. This can be confusing when you’re switching languages.

Multiple inheritance also works in Python, and it has just as many quirks and strange rules as it does in C++.

Similarly, you can also use composition to build classes, where you have objects of one type hold other types. Considering everything is an object in Python, this means that classes can hold anything else in the language.

Differences

There are some differences, however, when you’re comparing Python vs C++. The first two are related.

The first difference is that Python has no concept of access modifiers for classes. Everything in a class object is public. The Python community has developed a convention that any member of a class starting with a single underscore is treated as private. This is in no way enforced by the language, but it seems to work out pretty well.

The fact that every class member and method is public in Python leads to the second difference: Python has far weaker encapsulation support than C++.

As mentioned, the single underscore convention makes this far less of an issue in practical codebases than it is in a theoretical sense. In general, any user that breaks this rule and depends on the internal workings of a class is asking for trouble.

Operator Overloads vs Dunder Methods

In C++, you can add operator overloads. These allow you to define the behavior of specific syntactical operators (like ==) for certain data types. Usually, this is used to add more natural usage of your classes. For the == operator, you can define exactly what it means for two objects of a class to be equal.

One difference that takes some developers a long time to grasp is how to work around the lack of operator overloads in Python. It’s great that Python’s objects all work in any of the standard containers, but what if you want the == operator to do a deep comparison between two objects of your new class? In C++, you would create an operator==() in your class and do the comparison.

Python has a similar structure that’s used quite consistently across the language: dunder methods. Dunder methods get their name because they all start and end with a double underscore, or “d-under.”

Many of the built-in functions that operate on objects in Python are handled by calls to that object’s dunder methods. For your example above, you can add __eq__() to your class to do whatever fancy comparison you like:

classMyFancyComparisonClass():def__eq__(self,other):returnTrue

This produces a class that compares the same way as any other instance of its class. Not particularly useful, but it demonstrates the point.

There are a large number of dunder methods used in Python, and the built-in functions make use of them extensively. For example, adding __lt__() will allow Python to compare the relative order of two of your objects. This means that not only will the < operator now work, but that >, <=, and >= will also work as well.

Even better, if you have several objects of your new class in a list, then you can use sorted() on the list and they’ll be sorted using __lt__().

Once again, let’s take a look at your Python vs C++ comparison chart:

Feature	Python	C++
Faster Execution		x
Cross-Platform Execution	x
Single-Type Variables		x
Multiple-Type Variables	x
Comprehensions	x
Rich Set of Built-In Algorithms	x	x
Static Typing		x
Dynamic Typing	x
Strict Encapsulation		x

Now that you’ve seen object-oriented coding across both languages, let’s look at how Python and C++ manage those objects in memory.

Memory Management

One of the biggest differences, when you’re comparing Python vs C++, is how they handle memory. As you saw in the section about variables in C++ and Python’s names, Python does not have pointers, nor does it easily let you manipulate memory directly. While there are times when you want to have that level of control, most of the time it’s not necessary.

Giving up direct control of memory locations brings a few benefits. You don’t need to worry about memory ownership, or making sure that memory is freed once (and only once) after it’s been allocated. You also never have to worry about whether or not an object was allocated on the stack or the heap, which tends to trip up beginning C++ developers.

Python manages all of these issues for you. To do this everything in Python is a derived class from Python’s object. This allows the Python interpreter to implement reference counting as a means of keeping track of which objects are still in use and which can be freed.

This convenience comes at a price, of course. To free allocated memory objects for you, Python will occasionally need to run what is called a garbage collector, which finds unused memory objects and frees them.

Note: CPython has a complex memory management scheme, which means that freeing memory doesn’t necessarily mean the memory gets returned to the operating system.

Python uses two tools to free memory:

The reference counting collector
The generational collector

Let’s look at each of these individually.

Reference Counting Collector

The reference counting collector is fundamental to the standard Python interpreter and is always running. It works by keeping track of how many times a given block of memory (which is always a Python object) has a name attached to it while your program is running. Many rules describe when the reference count is incremented or decremented, but an example of one case might clarify:

>>>

 1 >>> x='A long string' 2 >>> y=x 3 >>> delx 4 >>> dely

In the above example, line 1 creates a new object containing the string "A long string". It then places the name x on this object, increasing the object’s reference count to 1:

On line 2 it assigns y to name the same object, which will increase the reference count to 2:

When you call del with x in line 3, you’re removing one of the references to the object, dropping the count back to 1:

Finally, when you remove y, the final reference to the object, its reference count drops to zero and it can be freed by the reference counting garbage collector. It may or may not be freed immediately at this point, but generally, that shouldn’t matter to the developer:

While this will take care of finding and freeing many of the objects that need to be freed, there are a few situations it will not catch. For that, you need the generational garbage collector.

Generational Garbage Collector

One of the big holes in the reference counting scheme is that your program can build a cycle of references, where object A has a reference to object B, which has a reference back to object A. It’s entirely possible to hit this situation and have nothing in your code referring to either object. In this case, neither of the objects will ever hit a reference count of 0.

The generational garbage collector involves a complex algorithm that is beyond the scope of this article, but it will find some of these orphaned reference cycles and free them for you. It runs on an occasional basis controlled by settings described in the documentation. One of these parameters is to disable this garbage collector entirely.

When You Don’t Want Garbage Collection

When you’re comparing Python vs C++, as when you’re comparing any two tools, each advantage comes with a trade-off. Python doesn’t require explicit memory management, but occasionally it will spend a longer amount of time than expected on garbage collection. The inverse is true for C++: your program will have consistent response times, but you’ll need to expend more effort in managing memory.

In many programs the occasional garbage collection hit is unimportant. If you’re writing a script that only runs for 10 seconds, then you’re unlikely to notice the difference. Some situations, however, require consistent response times. Real-time systems are a great example, where responding to a piece of hardware in a fixed amount of time can be essential to the proper operation of your system.

Systems with hard real-time requirements are some of the systems for which Python is a poor language choice. Having a tightly controlled system where you’re certain of the timing is a good use of C++. These are the types of issues to consider when you’re deciding on the language for a project.

Time to update your Python vs C++ chart:

Feature	Python	C++
Faster Execution		x
Cross-Platform Execution	x
Single-Type Variables		x
Multiple-Type Variables	x
Comprehensions	x
Rich Set of Built-In Algorithms	x	x
Static Typing		x
Dynamic Typing	x
Strict Encapsulation		x
Direct Memory Control		x
Garbage Collection	x

Threading, Multiprocessing, and Async IO

The concurrency models in C++ and Python are similar, but they have different results and benefits. Both languages have support for threading, multiprocessing, and Async IO operations. Let’s look at each of these.

Threading

While both C++ and Python have threading built into the language, the results can be markedly different, depending on the problem you’re solving. Frequently, threading is used to address performance problems. In C++, threading can provide a general speed-up for both computationally bound and I/O bound problems, as threads can take full advantage of the cores on a multiprocessor system.

Python, on the other hand, has made a design trade-off to use the Global Interpreter Lock, or the GIL, to simplify its threading implementation. There are many benefits to the GIL, but the drawback is that only one thread will be running at a single time, even if there are multiple cores.

If your problem is I/O bound, like fetching several web pages at once, then this limitation will not bother you in the least. You’ll appreciate Python’s easier threading model and built-in methods for inter-thread communications. If your problem is CPU-bound, however, then the GIL will restrict your performance to that of a single processor. Fortunately, Python’s multiprocessing library has a similar interface to its threading library.

Multiprocessing

Multiprocessing support in Python is built into the standard library. It has a clean interface that allows you to spin up multiple processes and share information between them. You can create a pool of processes and spread work across them using several techniques.

While Python still uses similar OS primitives to create the new processes, much of the low-level complication is hidden from the developer.

C++ relies on fork() to provide multiprocessing support. While this gives you direct access to all of the controls and issues of spawning multiple processes, it’s also much more complex.

Async IO

While both Python and C++ support Async IO routines, they’re handled differently. In C++, the std::async methods are likely to use threading to achieve the Async IO nature of their operations. In Python, Async IO code will only run on a single thread.

There are trade-offs here as well. Using separate threads allows the C++ Async IO code to perform faster on computationally bound problems. The Python tasks used in its Async IO implementation are more lightweight, so it’s faster to spin up a large number of them to handle I/O bound issues.

Your Python vs C++ comparison chart remains unchanged for this section. Both languages support a full range of concurrency options, with varying trade-offs between speed and convenience.

Miscellaneous Issues

If you’re comparing Python vs C++ and looking at adding Python to your toolbelt, then there are a few other things to consider. While your current editor or IDE will certainly work for Python, you might want to add certain extensions or language packs. It’s also worth giving PyCharm a look, as it’s Python-specific.

Several C++ projects have Python bindings. Things like Qt, WxWidgets, and many messaging APIs having multiple-language bindings.

If you want to embed Python in C++, then you can use the Python/C API.

Finally, there are several methods for using your C++ skills to extend Python and add functionality, or to call your existing C++ libraries from within your Python code. Tools like CTypes, Cython, CFFI, Boost.Python and Swig can help you combine these languages and use each for what it’s best at.

Summary: Python vs C++

You’ve spent some time reading and thinking about the differences between Python vs C++. While Python has easier syntax and fewer sharp edges, it’s not a perfect fit for all problems. You’ve looked at the syntax, memory management, processing, and several other aspects of these two languages.

Let’s take a final look at your Python vs C++ comparison chart:

Feature	Python	C++
Faster Execution		x
Cross-Platform Execution	x
Single-Type Variables		x
Multiple-Type Variables	x
Comprehensions	x
Rich Set of Built-In Algorithms	x	x
Static Typing		x
Dynamic Typing	x
Strict Encapsulation		x
Direct Memory Control		x
Garbage Collection	x

If you’re comparing Python vs C++, then you can see from your chart that this is not a case where one is better than the other. Each of them is a tool that’s well crafted for various use cases. Just like you don’t use a hammer for driving screws, using the right language for the job will make your life easier!

Conclusion

Congrats! You’ve now seen some of the strengths and weaknesses of both Python and C++. You’ve learned some of the features of each language and how they are similar.

You’ve seen that C++ is great when you want:

Fast execution speed (potentially at the cost of development speed)
Complete control of memory

Conversely, Python is great when you want:

Fast development speed (potentially at the cost of execution speed)
Managed memory

You’re now ready to make a wise language choice when it comes to your next project!

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

PyCharm: PyCharm 2019.2.2

September 11, 2019, 7:07 am

≫ Next: Catalin George Festila: Python 3.7.4 : Using the theano pakage.

≪ Previous: Real Python: Python vs C++: Selecting the Right Tool for the Job

PyCharm 2019.2.2 is now available. This version solves regression issues and improves Jupyter Notebook configuration experience.

New in this Version

Some code insight fixes were implemented for Python 3.8:
- Now the “continue” and “finally” clauses are allowed to be used.
- Support for unicode characters in the re module was added.
An error on the Python Console that was not showing documentation for functions was resolved.
Some issues were solved for IPython that were causing the debugger not to work properly.
We had some regression issues with the debugger causing breakpoints to be ignored and/or throw exceptions and the data viewer not to show the proper information and those were solved.
A problem that caused PyCharm to stall when a Docker server was configured as remote python interpreter was fixed.
An issue that was causing one remote interpreter not be used from two different machines was solved as well.
For Jupyter Notebook:
- Default kernel specification selection is now based on the Python version for the module where a new notebook is created and in case the kernel specification is missing from the metadata a proper error message will be shown.
- User selected kernel was mistakenly reseted when a notebook file was reopened and that was solved.
- We encounter an issue that caused PyCharm to stall when switching from a non available Jupyter server to another and that was fixed.

Further Improvements

New compare with action feature to show database tables differences.
Enhanced full text search for databases now shows alphabetically ordered results.
Several platform issues were solved as well and much more, check out our release notes for more details.

Getting the New Version

You can update PyCharm by choosing Help | Check for Updates (or PyCharm | Check for Updates on macOS) in the IDE. PyCharm will be able to patch itself to the new version, there should no longer be a need to run the full installer.

If you’re on Ubuntu 16.04 or later, or any other Linux distribution that supports snap, you should not need to upgrade manually, you’ll automatically receive the new version.

↧

Catalin George Festila: Python 3.7.4 : Using the theano pakage.

September 11, 2019, 5:13 am

≫ Next: NumFOCUS: Now Hiring: Events and Marketing Intern

≪ Previous: PyCharm: PyCharm 2019.2.2

If you want to test theano then you need to see this webpage. [root@desk mythcat]# dnf search theano ======================== Name & Summary Matched: theano ======================== python-theano-doc.noarch : Theano documentation ============================= Name Matched: theano ============================= python3-theano.noarch : Mathematical expressions involving multidimensional

↧

NumFOCUS: Now Hiring: Events and Marketing Intern

September 11, 2019, 11:24 am

≫ Next: Python Bytes: #147 Mocking out AWS APIs

≪ Previous: Catalin George Festila: Python 3.7.4 : Using the theano pakage.

The post Now Hiring: Events and Marketing Intern appeared first on NumFOCUS.

↧

Python Bytes: #147 Mocking out AWS APIs

September 11, 2019, 1:00 am

≫ Next: Red Hat Developers: Develop with Flask and Python 3 in a container on Red Hat Enterprise Linux

≪ Previous: NumFOCUS: Now Hiring: Events and Marketing Intern

↧

Red Hat Developers: Develop with Flask and Python 3 in a container on Red Hat Enterprise Linux

September 11, 2019, 11:31 am

≫ Next: Matt Layman: Python Testing 201 with pytest

≪ Previous: Python Bytes: #147 Mocking out AWS APIs

In my previous article, Run Red Hat Enterprise Linux 8 in a container on RHEL 7, I showed how to start developing with the latest versions of languages, databases, and web servers available with Red Hat Enterprise Linux 8 even if you are still running RHEL 7. In this article, I’ll build on that base to show how to get started with the Flask microframework using the current RHEL 8 application stream version of Python 3.

From my perspective, using Red Hat Enterprise Linux 8 application streams in containers is preferable to using software collections on RHEL 7. While you need to get comfortable with containers, all of the software installs in the locations you’d expect. There is no need to use scl commands to manage the selected software versions. Instead, each container gets an isolated user space. You don’t have to worry about conflicting versions.

In this article, you’ll create a Red Hat Enterprise Linux 8 Django container with Buildah and run it with Podman. The code will be stored on your local machine and mapped into the container when it runs. You’ll be able to edit the code on your local machine as you would any other application. Since it is mapped via a volume mount, the changes you make to the code will be immediately visible from the container, which is convenient for dynamic languages that don’t need to be compiled. While this approach isn’t the way to do things for production, you get the same development inner loop as you’d have when developing locally without containers. The article also shows how to use Buildah to build a production image with your completed application.

Additionally, you’ll set up the Red Hat Enterprise Linux 8 PostgreSQL application stream in a container that is managed by systemd. You can use systemctl to start and stop the container just as you would for a non-container installation.

Install Podman and Buildah on Red Hat Enterprise Linux 7

First, we need to install Podman, which is in the extras repo on Red Hat Enterprise Linux 7. The extras repo isn’t enabled by default. Developers should also enable the rhscl (Red Hat Software Collections), devtools, and optional repos:

$ sudo subscription-manager repos --enable rhel-7-server-extras-rpms \
    --enable rhel-7-server-optional-rpms \
    --enable rhel-server-rhscl-7-rpms \
    --enable rhel-7-server-devtools-rpms

Now install Podman and Buildah. If sudo isn’t set up on your system, see How to enable sudo on Red Hat Enterprise Linux.

$ sudo yum install podman buildah

Later, we’ll run containers with systemd. If SELinux is enabled on your system (it is by default), you must turn on the container_manage_cgroup boolean to run containers with systemd:

$ sudo setsebool -P container_manage_cgroup on

For more information, see the containers running systemd solution.

Note: The Red Hat ID created when you joined Red Hat Developer gives you access to content on the Red Hat Customer Portal.

Set up a Flask example app

We need Flask code to run. Let’s use Flaskr, the sample app in the Flask distribution’s examples/tutorial directory. Download Flask into a working directory on the host machine and extract the tutorial app:

$ sudo mkdir /opt/src
$ sudo chown $USER:$USER /opt/src
$ cd /opt/src
$ mkdir flask-app
$ curl -L https://github.com/pallets/flask/archive/1.1.1.tar.gz | tar xvzf - 
$ cp -pr flask-1.1.1/examples/tutorial flask-app

We’ve now got an example Flask app at /opt/src/flask-app.

Run Python 3.6 and Flask in a Red Hat Enterprise Linux 8 container (manually)

Now we need Python 3.6 and Flask. We’ll manually set up a container with the dependencies and then run the app to see how it’s done. Let’s start with the Red Hat Enterprise Linux 8 Universal Base Image (UBI). If you’re unfamiliar with the RHEL UBIs, see the section “Red Hat Universal Base Images.”

Red Hat has a new container registry which uses authentication: registry.redhat.io. A Red Hat account isn’t required to use UBI images, but other Red Hat images that aren’t part of UBI can only be obtained through registry.redhat.io. The Red Hat ID created when you joined Red Hat Developer gives you access to the Red Hat Container Registry, so for simplicity, I use only registry.redhat.io in this example.

If you aren’t logged in when you try to pull an image, you’ll get a verbose error message:

...unable to retrieve auth token: invalid username/password.

$ sudo podman login registry.redhat.io

Note: Podman was designed to run without root. However, the support for this feature isn’t there with Red Hat Enterprise Linux 7.6. For more information, see Scott McCarty’s, A preview of running containers without root in RHEL 7.6.

Now run the container, making our source directory /opt/src available inside the container and exposing port 5000 so you can connect to the Flask app with a browser on the host system:

$ sudo podman run -v /opt/src:/opt/src:Z -it -p 5000:5000 registry.redhat.io/ubi8/ubi /bin/bash

The previous command also invoked an interactive shell for the Red Hat Enterprise Linux 8 based UBI container. From inside the container, see what application streams are available with RHEL 8:

# yum module list

You might notice an extra group of application streams labeled Universal Base Image. See the UBI section for more information about Red Hat Universal Base Images.

Next, install Python 3.6:

# yum -y module install python36

Python 3.6 is now installed in our container and is in our path as python3, not python. If you want to know why see Petr Viktorin’s article, Python in RHEL 8.

Next, use pip to install Flask:

# pip3 install flask

You’ll get a warning about running pip as root. Running pip as root on a real system is generally a bad idea. However, we’re running in a dedicated container which is isolated and disposable, so we can do pretty much whatever we want with files in /usr.

Let’s check where the Flask command-line interface (CLI) was installed:

# which flask

Pip installed it into /usr/local/bin.

Now let’s run the example app inside of the container:

# cd /opt/src/flask-app
# export FLASK_APP=flaskr
# export FLASK_ENV=development
# flask init-db
# flask run --host=0.0.0.0

Using a browser on the host system, go to http://localhost:5000/ and view the resulting page:

Now, you’ve got a container configured by hand that runs Flask applications using Red Hat Enterprise Linux 8’s Python 3.6 application stream on your RHEL 7 system. You could treat this container like a “pet,” and use podman restart -l and podman attach -l when you want to run it again—as long as you don’t delete it. We didn’t name the container, but the -l conveniently selects the last running container. Alternatively, you’d need to use podman ps -a to get the ID, or randomly generated name to pass to podman restart and podman attach.

When you restart the container, it is similar to rebooting a system. The installed files are there, but any of the other runtime state-like environment variable settings won’t persist. The life cycle for containers you’ve seen in most tutorials is “run then delete” since containers are designed to be ephemeral. However, knowing how to create and restart containers can be handy when you need to experiment.

Create a Flask container image with Buildah

To make things easier, we’ll create a container image that has Flask installed and starts the Flask app anytime the container is run. The container won’t have a copy of the app, we’ll still map the app into the container from the host system. The code will be stored on your local machine where you can edit it as you would any other application source. Because it is mapped via a volume mount, the changes you make to the code will be immediately visible inside the container.

When creating images with Buildah, you can use Dockerfiles or Buildah command lines. For this article, we’ll use the Dockerfile approach because you’ve probably seen it before in other tutorials.

Because we are working with files that are shared between your host system and the container, we’ll run the container using the same numeric user ID (UID) as your regular account. While inside the container, any files are created in the source directory are owned by your user ID on the host system. Find out your UID with the id command:

$ id

Make a note of the number after UID= and GID= at the start of the line. On my system, my UID and GID are both 1000. In the Dockerfile and other examples here, change the USER line to match your UID:GID.

In /opt/src/flask-app, create Dockerfile with the following contents:

FROM registry.redhat.io/ubi8/python-36

RUN pip3 install flask

# set default flask app and environment
ENV FLASK_APP flaskr
ENV FLASK_ENV development

# This is primarily a reminder that we need access to port 5000
EXPOSE 5000

# Change this to UID that matches your username on the host
# Note: RUN commands before this line will execute as root in the container
# RUN commands after will execute under this non-privileged UID
USER 1000

# Default cmd when container is started
# Create the database if it doesn't exist, then run the app
# Use --host to make Flask listen on all networks inside the container
CMD [ -f ../var/flaskr-instance/flaskr.sqlite ] || flask init-db ; flask run --host=0.0.0.0

A note on the Dockerfile: Instead of installing Python 3.6, I used a UBI image from Red Hat that already had Python 3.6 on top of the UBI 8 image. The command that runs when the container starts will create the database if it doesn’t exist, and then run the Flask app.

Next, build the Flask container (don’t forget the trailing .):

$ sudo buildah bud -t myorg/myflaskapp .

Now we can run the Flask container containing our app:

$ sudo podman run --rm -it -p 5000:5000 -v /opt/src/flask-app:/opt/app-root/src:Z myorg/myflaskapp

The Flaskr app should now be running, which you can verify by using a browser on the host system and going to http://localhost:8000/ to view the resulting page.

You can now edit the code in /opt/src/flask-app like you would any regular source code. When you need to restart Flask, Ctrl+C the container. Note the --rm in the run command, which automatically removes the container when it exits.

To start the container again, you will need to use the above podman run command again, which creates a fresh new container, plus a new database with nothing in it. For many situations, this fresh start is desirable.

Persist the SQLite database between containers

The Flaskr example uses a SQLite database, which is stored inside the container. Containers are intended to be ephemeral, so any changes made inside the container will be lost when the container is deleted.

There are several ways you can keep the database (or other files) from containers across runs. As mentioned above, you could try to keep the container around and restart it, instead of recreating it with run every time. While that practice can be handy for experimenting and debugging, this isn’t a good way to accomplish persistence. Now is a good time to mention if you do have changed files you’d like to get out of a container that has exited but hasn’t been removed, Podman and Buildah have a handy mount command that mounts the container on the host system so you can access the files through the filesystem.

Note: If you are confused about the difference between a container and a container image, see Scott McCarty’s article: A Practical Introduction to Container Terminology.

Instead of trying to keep the container around, a much cleaner solution is to arrange for the database (or other files you’d like to persist) to be stored in the host’s filesystem. You can do this by adding another volume mount with -v to the run command. Here’s the full command, which stores the database with the source code:

$ sudo podman run --rm -it -p 5000:5000 -v /opt/src/flask-app:/opt/app-root/src:Z \
    -v /opt/src/flask-app/instance:/opt/app-root/var/flaskr-instance:Z myorg/myflaskapp

Run MariaDB in a container

Another way to deal with persistence is to run a database server in another container. In a previous article, Run Red Hat Enterprise Linux 8 in a container on RHEL 7, I showed how to run MariaDB using the current Red Hat Enterprise Linux 8 application stream on a RHEL 7 system. The MariaDB container is managed by systemd, so you can use systemctl commands just like you would for a non-containerized version.

For the sake of brevity, I won’t replicate the instructions to get MariaDB running in this article, just follow the previous article’s MariaDB section to get that database running.

The one thing you’ll need to know is how to make your Flask container connect to the database container. By default, containers are designed to run with an isolated virtual network. Steps need to be taken to network containers together. I think the easiest approach for the scenario in this article—where you just want to run a few containers—is to arrange for the containers to share the host’s network.

To use the host’s network, add --net host to the run command for both your Flask and database containers. If you are using the host’s network, you won’t need to select which ports to expose. So, the full run command for the Flask container is:

$ sudo podman run --rm -it --net host -v /opt/src/flask-app:/opt/app-root/src:Z \
    -v /opt/src/flask-app/instance:/opt/app-root/var/flaskr-instance:Z myorg/myflaskapp

While using the host’s network is quick and easy for development, you’d run into port conflicts if you had a number of MariaDB containers that all wanted to use port 3306. One way to improve this setup is to use Podman’s pod capabilities to put the app and database containers in the same pod, where they share namespaces. See Brent Baude’s article, Podman: Managing pods and containers in a local container runtime.

Use Buildah to create an image with your Flask app

After you’ve developed your app, you can use Buildah to create a distributable container image with your Flask app. We’ll use Buildah command lines instead of a Dockerfile. This approach is much more flexible for complex builds and automation: You can use shell scripts or whatever other tools you use for your build environment.

In /opt/src/flask-app, create app-image-build.sh with the following contents:

#!/bin/sh
# Build our Flask app and all the dependencies into a container image
# Note: OOTB on RHEL 7.6 this needs to be run as root.

MYIMAGE=myorg/myflaskapp
FLASK_APP=flaskr
FLASK_ENV=development
USERID=1000

IMAGEID=$(buildah from ubi8/python-36)
buildah run $IMAGEID pip3 install flask

buildah config --env FLASK_APP=$FLASK_APP --env FLASK_ENV=$FLASK_ENV $IMAGEID

# any build steps above this line run as root inside the container
# any steps after run as $USERID
buildah config --user $USERID:$USERID $IMAGEID

buildah copy $IMAGEID . /opt/app-root/src
buildah config --cmd '/bin/sh run-app.sh' $IMAGEID

buildah commit $IMAGEID $MYIMAGE

This image calls a start script to launch our application. Next, create run-app.sh in the same directory, with the following contents:

#!/bin/sh

APP_DB_PATH=${APP_DB_PATH:-../var/instance/flaskr.sqlite}

if [ ! -f ${APP_DB_PATH} ]; then
echo Creating database
flask init-db
fi

echo Running app $FLASK_APP
flask run --host=0.0.0.0

Now, build the image:

$ sudo app-image-build.sh

Run and test the new image:

$ sudo podman run --rm -it --net host -v /opt/src/flask-app/instance:/opt/app-root/var/flaskr-instance:Z myorg/myflaskapp

When you are ready, you can distribute your application by pushing it to a container registry like Red Hat’s Quay.io.

Next steps

By now, you should see that it is easy to get the software components you need running in containers so you can focus on development. It shouldn’t feel very different from developing without containers.

The Flask container you built isn’t tied to a specific app. You could reuse that container for other Flask apps by overriding the environment variables: add -e FLASK_APP mynewapp to the podman run command.

You could also build on the Dockerfile above to install more Python modules for your app into your container image, or customize the way the app starts.

Check out what other UBI 8 images are available in the Red Hat Container Catalog. If the language, runtime, or server aren’t available as a UBI image, you can build your own beginning with the ubi8 base image. Then, you can add the application streams and other rpms you need with yum commands in a Dockerfile, or with buildah run.

Red Hat Universal Base Images

I’ve mentioned Universal Base Images (UBIs) a number of times in this article without explaining them. Red Hat provides these UBIs to use as a base for your container images. From Mike Guerette’s article, Red Hat Universal Base Image: How it works in 3 minutes or less:

“Red Hat Universal Base Images (UBI) are OCI-compliant container base operating system images with complementary runtime languages and packages that are freely redistributable. Like previous RHEL base images, they are built from portions of Red Hat Enterprise Linux. UBI images can be obtained from the Red Hat Container Catalog and be built and deployed anywhere.
“And, you don’t need to be a Red Hat customer to use or redistribute them. Really.”

With the release of Red Hat Enterprise Linux 8 in May, Red Hat announced that all RHEL 8 base images would be available under the new Universal Base Image End User License Agreement (EULA). This fact means that you can build and redistribute container images that use Red Hat’s UBI images as your base, instead of switching to images based on other distributions, like Alpine. In other words, you won’t have to switch from using yum to using apt-get when building containers.

There are three base images for Red Hat Enterprise Linux 8. The standard one is called ubi, or more precisely, ubi8/ubi. This is the image used above which you will probably use most often. The other two are minimal containers. They contain little supporting software for when image size is a high priority and a multi-service image that allows you to run multiple processes inside the container managed by systemd.

Note: There are also UBI images for Red Hat Enterprise Linux 7 under ubi7 if you want to build and distribute containers running on a RHEL 7 image. For this article, we’ll only use the ubi8 images.

If you are just starting out with containers, you don’t need to delve into UBI details right now. Just use the ubi8 images to build containers based off Red Hat Enterprise Linux 8. However, you will want to understand UBI details when you start distributing container images or have questions about support. For more information, see the references at the end of this article.

More information

Run Red Hat Enterprise Linux 8 in a container on RHEL 7 (covers PHP 7.2, MariaDB, and WordPress running in containers)
Setting up a Django application on RHEL 8 Beta

Cheat sheets:

Podman and Buildah:

UBI:

The post Develop with Flask and Python 3 in a container on Red Hat Enterprise Linux appeared first on Red Hat Developer.

↧

Matt Layman: Python Testing 201 with pytest

September 11, 2019, 5:00 pm

≫ Next: Codementor: Node.js VS Python: Which is Better?

≪ Previous: Red Hat Developers: Develop with Flask and Python 3 in a container on Red Hat Enterprise Linux

For Python Frederick’s September presentation, I presented on Python testing. In the presentation, I explained more of the features of pytest that went beyond the basics that we explored in March.

The recording from the talk is available on YouTube. Check it out!

↧

Codementor: Node.js VS Python: Which is Better?

September 11, 2019, 11:24 pm

≫ Next: Wingware Blog: Presentation Mode in Wing 7

≪ Previous: Matt Layman: Python Testing 201 with pytest

Node.js vs Python...Node.JS & Python are two of the most widely-used programming languages. It is a tough to choose which is better among them.Lets try.

↧

Wingware Blog: Presentation Mode in Wing 7

September 10, 2019, 6:00 pm

≫ Next: PyCharm: 2019.3 EAP 1

≪ Previous: Codementor: Node.js VS Python: Which is Better?

Presentation Mode, added in Wing 7, temporarily applies a selected magnification to the entire user interface, so the screen can be read more easily during meetings or talks.

To activate this mode, check on PresentationMode in the high-level configuration menu accessed with the menuicon icon at the top right of Wing's window:

You will be presented with a confirmation dialog that also provides a link to the preference that controls the level of magnification:

/images/blog/presentation-mode/dialog.png

To apply the mode change, Wing restarts and reloads the current project in the same state as you left it, but with the contents of the window magnified.

Before

/images/blog/presentation-mode/window-1.0.png

After

/images/blog/presentation-mode/window-1.5.png

To disable Presentation Mode, uncheck the high-level configuration menu item again.

That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

↧

PyCharm: 2019.3 EAP 1

September 12, 2019, 8:53 am

≫ Next: Talk Python to Me: #229 Building advanced Pythonic interviews with docassemble

≪ Previous: Wingware Blog: Presentation Mode in Wing 7

The first Early Access Program (EAP) for PyCharm 2019.3 is now available to be downloaded from our website!

New in PyCharm

Use macros as parameters to run scripts

We added a new possibility to customize your script execution with macros now available to be set as parameters in the Python run configuration. Use macros such as $ClipboardContent$ to get the content of the clipboard, $FilePath$ to retrieve the file path of the open file, $ModuleSdkPath$ for the project interpreter path or $Prompt$ for a string input dialog on running the configuration to be used as parameters in your script execution.

To set them go to the Run/Debug Configurations dialog, select the script you wanna configure from the list of the Python run/debug configurations. Then, click + in the Parameters field and select a macro from the list of the available macros.

Scale and configure preview area for Jupyter notebook file

Scale the preview area for a Jupyter notebook file according to your needs. Either zoom in and out directly (using Ctrl (Windows) or Cmd (macOS) + mouse wheel) on the preview area or configure the default font size by opening Settings/Preferences ⌃⌥S, select Build, Execution, Deployment | Jupyter dialog, deselect Use Font Editor Size, and choose the required size in the Font Size field.

Markdown support for PyCharm Community Edition

The Markdown editor is now available for PyCharm Community Edition. Now you can expect to manage headings, apply formatting to text, use completion capabilities to add links to other project documents or images, insert code blocks for various programming languages, and visualize DOT or PlantUML diagrams all from the Community Edition.

Further improvements

An issue that wasn’t allowing proper autocomplete import references with same namespaces was solved.
Fixes were made for unexpected warnings on assignment expressions for Python 3.8.
Auto-imports and suggested imports now show options in an organized order.
We fixed an issue causing Python interpreter not to be properly set if downloaded from the Windows Store.
For more details on what’s new in this version, see the release notes

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.

If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.

↧

Talk Python to Me: #229 Building advanced Pythonic interviews with docassemble

September 12, 2019, 1:00 am

≫ Next: Python Does What?!: Welcome to the float zone...

≪ Previous: PyCharm: 2019.3 EAP 1

On this episode, we dive into Python for lawyers and a special tool for conducting legal interviews. Imagine you have to collect details for 20,000 participants in a class-action lawsuit. docassemble, a sweet Python web app, can do it for you with easy.

↧

Python Does What?!: Welcome to the float zone...

September 12, 2019, 3:45 am

≫ Next: PyCon: Call for Proposals for PyCon 2020 is open!

≪ Previous: Talk Python to Me: #229 Building advanced Pythonic interviews with docassemble

Consider a REPL with two tuples, a and b.

>>> type(a), type(b)
(<type 'tuple'>, <type 'tuple'>)
>>> a == b
True

So far, so good. But let's dig deeper...

>>> a[0] == b[0]
False

The tuples are equal, but their contents is not.

>>> a is b
True

In fact, there was only ever one tuple.
What is this madness?

>>> a
(nan,)

Welcome to the float zone.

Many parts of python assume that a is b implies a == b, but floats break this assumption. They also break the assumption that hash(a) == hash(b) implies a == b.

>>> hash(float('nan')) == hash(float('nan'))
True

Dicts handle this pretty elegantly:

>>> n = float('nan')
>>> {n: 1}[n]
1

>>> a = {float('nan'): 1, float('nan'): 2}
>>> a
{nan: 1, nan: 2}

↧

PyCon: Call for Proposals for PyCon 2020 is open!

September 12, 2019, 7:14 am

≫ Next: Robin Wilson: I am now a freelancer in Remote Sensing, GIS, Data Science & Python

≪ Previous: Python Does What?!: Welcome to the float zone...

The time is upon us again! PyCon 2020’s Call for Proposals has officially opened for talks, tutorials, posters, education summit, and charlas. PyCon is made by you, so we need you to share what you’re working on, how you’re working on it, what you’ve learned, what you’re learning, and so much more.

Please make note of important deadlines for submissions:

Tutorial proposals are due November 22, 2019.
Talk, Charlas, Poster, and Education Summit proposals are due December 20, 2019.

We need beginner, intermediate, and advanced proposals on all sorts of topics as well as beginner, intermediate, and advanced speakers to present talks. You don’t need to be a 20-year veteran who has spoken at dozens of conferences. On all fronts, we need all types of people. That’s what this community is comprised of, so that’s what this conference’s schedule should be made from.

Who can help you with your proposal

Outside of our program committee, a great source of assistance with proposals comes from your local community. User groups around the world have had sessions where people bring ideas to the table and walk away with a full-fledged proposal. These sessions are especially helpful if you’re new to the process. If you’re experienced with the process, it’s a great way for you to reach out and help people level up. We’ll be sure to share these events as we find out about them, and be sure to tell us your plans if you want to host a proposal event of your own!

We’re again going to provide a mechanism to connect willing mentors and those seeking assistance for talk proposals through our site, helping with the process of developing the proposal, slides, and presentation itself. We will be trying something new this year and are happy to help with any of the following:

Exploring and brainstorming your interests to help you identify hidden topics that would make great talks.

Connecting you with experienced speakers to help develop your proposal and talk.

Reviewing your outline, slide deck, or notes.

Anything else that’d help you be at ease and excited about bringing your ideas to our audience!

You’ll find checkboxes for both giving and receiving mentorship on the Speakers Profile page.

Where should you submit your proposal?

After you have created an account at https://us.pycon.org/2020/accounts/signup/, you’ll want to create a speaker profile in your dashboard. While there, enter some details about yourself and check the various boxes about giving or receiving mentorship, as well as grant needs. Like proposals, you can come back and edit this later.

After that’s done, clicking on the “Submit a Proposal” button in your dashboard will give you the choice of proposal type, and from there you enter your proposal. We’ve provided some guidelines on the types of proposals you can submit, so please be sure to check out the following pages for more information:

We look forward to seeing all of your proposals in the coming months!

↧

Robin Wilson: I am now a freelancer in Remote Sensing, GIS, Data Science & Python

September 12, 2019, 12:09 pm

≫ Next: Peter Bengtsson: Fastest Python function to slugify a string

≪ Previous: PyCon: Call for Proposals for PyCon 2020 is open!

I've been doing a bit of freelancing 'on the side' for a while - but now I've made it official: I am available for freelance work. Please look at my new website or contact me if you're interested in what I can do for you, or carry on reading for more details.

Since I stopped working as an academic, and took time out to focus on my work and look after my new baby, I've been trying to find something which allows me to fit my work nicely around the rest of my life. I've done bits of short part-time work contracts, and various bits of freelance work - and I've now decided that freelancing is the way forward.

I've created a new freelance website which explains what I do and the experience I have - but to summarise here, my areas of focus are:

Remote Sensing - I am an expert at processing satellite and aerial imagery, and have processed time-series of thousands of images for a range of clients. I can help you produce useful information from raw satellite data, and am particularly experienced at atmospheric remote sensing and atmospheric correction.
GIS - I can process geographic data from a huge range of sources into a coherent data library, perform analyses and produce outputs in the form of static maps, webmaps and reports.
Data science - I have experience processing terabytes of data to produce insights which were used directly by the United Nations, and I can apply the same skills to processing your data: whether it is a single questionnaire or a huge automatically-generated dataset. I am particularly experienced at making research reproducible and self-documenting.
Python - I am an experienced Python programmer, and maintain a number of open-source modules (such as Py6S). I produce well-written, Pythonic code with high-quality tests and documentation.

The testimonials on my website show how much previous clients have valued the work I've done for them.

I've heard from a various people that they were rather put off by the nature of the auction that I ran for a day's work from me - so if you were interested in working with me but wanted a standard sort of contract, and more than a day's work, then please get in touch and we can discuss how we could work together.

(I'm aware that the last few posts on the blog have been focused on the auction for work, and this announcement of freelance work. Don't worry - I've got some more posts lined up which are more along my usual lines. Stay tuned for posts on Leaflet webmaps and machine learning of large raster stacks)

↧

Creating GUI Applications with wxPython

Installing PyPDF2

Designing the Interface

Creating the Application

The Main Module

The merge_panel Module

The split_panel Module

Using Threads in wxPython

Enhancing PDF Merging with Threads

Wrapping Up

Monitoring traffic of your Github repositories using Python and Google Cloud Platform — Part 1

Introduction

Building a REST API with Flask

Heroku

Heroku Account

Git

Deploying the App to Heroku

Testing the API

Conclusion

Comparing Languages: Python vs C++

Compilation vs Virtual Machine

Syntax Differences

Whitespace

Boolean Expressions

Variables and Pointers

Comprehensions

Python’s std::algorithms

Static vs Dynamic Typing

Static Typing

Duck Typing

Templates

Type Checking

Object-Oriented Programming

Similarities

Differences

Operator Overloads vs Dunder Methods

Memory Management

Reference Counting Collector

Generational Garbage Collector

When You Don’t Want Garbage Collection

Threading, Multiprocessing, and Async IO

Threading

Multiprocessing

Async IO

Miscellaneous Issues

Summary: Python vs C++

Conclusion

New in this Version

Further Improvements

Getting the New Version

Install Podman and Buildah on Red Hat Enterprise Linux 7

Set up a Flask example app

Run Python 3.6 and Flask in a Red Hat Enterprise Linux 8 container (manually)

Create a Flask container image with Buildah

Persist the SQLite database between containers

Run MariaDB in a container

Use Buildah to create an image with your Flask app

Next steps

Red Hat Universal Base Images

More information

Before

After

New in PyCharm

Use macros as parameters to run scripts

Scale and configure preview area for Jupyter notebook file

Markdown support for PyCharm Community Edition

Further improvements

Interested?

Who can help you with your proposal

Where should you submit your proposal?

Python’s `std::algorithms`