Quantcast
Channel: Planet Python
Viewing all 22412 articles
Browse latest View live

Mike Driscoll: Python 101 2nd Edition Kickstarter Preview

$
0
0

I have been kicking around the idea of updating my first book, Python 101, for over a year. After doing a lot of planning and outlining, I am ready to announce that I have started work on the book.

Python 101 2nd Ed Kickstarter

The new Python 101, 2nd Edition, will be a completely new book rather than just an updated book like a lot of publishers like to do. I feel like updating a chapter or two is a disservice to my readers. This new book will cover most of the items in the the original. However I am dropping the the tour of the standard library and replacing it with a “How-To” section as I think seeing live, working code is better than talking about syntax.

You can follow my Kickstarter now if you’d like to. The Kickstarter will go live February 17th at approx 8 a.m. CST and run 30 days.

The post Python 101 2nd Edition Kickstarter Preview appeared first on The Mouse Vs. The Python.


Stack Abuse: Bubble Sort in Python

$
0
0

Introduction

For most people, Bubble Sort is likely the first sorting algorithm they heard of in their Computer Science course.

It's highly intuitive and easy to "translate" into code, which is important for new software developers so they can ease themselves into turning ideas into a form that can be executed on a computer. However, Bubble Sort is one of the worst-performing sorting algorithms in every case except checking whether the array is already sorted, where it often outperforms more efficient sorting algorithms like Quick Sort.

Bubble Sort

The idea behind Bubble Sort is very simple, we look at pairs of adjacent elements in an array, one pair at a time, and swap their positions if the first element is larger than the second, or simply move on if it isn't. Let's look at an example and sort the array 8, 5, 3, 1, 4, 7, 9:

bubble sort

If you focus on the first number, the number 8, you can see it "bubbling up" the array into the correct place. Then, this process is repeated for the number 5 and so on.

Implementation

With the visualization out of the way, let's go ahead and implement the algorithm. Again, it's extremely simple:

def bubble_sort(our_list):
    # We go through the list as many times as there are elements
    for i in range(len(our_list)):
        # We want the last pair of adjacent elements to be (n-2, n-1)
        for j in range(len(our_list) - 1):
            if our_list[j] > our_list[j+1]:
                # Swap
                our_list[j], our_list[j+1] = our_list[j+1], our_list[j]

Now, let's populate a list and call the algorithm on it:

our_list = [19, 13, 6, 2, 18, 8]
bubble_sort(our_list)
print(our_list)

Output:

[2, 6, 8, 13, 18, 19]

Optimization

The simple implementation did its job, but there are two optimizations we can make here.

When no swaps are made, that means that the list is sorted. Though, with the previously implemented algorithm, it'll keep evaluating the rest of the list even though it really doesn't need to. To fix this, we'll keep a boolean flag and check if any swaps were made in the previous iteration.

If no swaps are made, the algorithm should stop:

def bubble_sort(our_list):
    # We want to stop passing through the list
    # as soon as we pass through without swapping any elements
    has_swapped = True

    while(has_swapped):
        has_swapped = False
        for i in range(len(our_list) - 1):
            if our_list[i] > our_list[i+1]:
                # Swap
                our_list[i], our_list[i+1] = our_list[i+1], our_list[i]
                has_swapped = True

The other optimization we can make leverages the fact that Bubble Sort works in such a way that the largest elements in a particular iteration end up at the end of the array.

The first time we pass through the list the position n is guaranteed to be the largest element, the second time we pass through the position n-1 is guaranteed to be the second-largest element and so forth.

This means that with each consecutive iteration we can look at one less element than before. More precisely, in the k-th iteration, only need to look at the first n - k + 1 elements:

def bubble_sort(our_list):
    has_swapped = True

    num_of_iterations = 0

    while(has_swapped):
        has_swapped = False
        for i in range(len(our_list) - num_of_iterations - 1):
            if our_list[i] > our_list[i+1]:
                # Swap
                our_list[i], our_list[i+1] = our_list[i+1], our_list[i]
                has_swapped = True
        num_of_iterations += 1

Time Comparison

Let's go ahead and compare the time it takes for each of these code snippets to sort the same list a thousand times using the timeit module:

Unoptimized Bubble Sort took: 0.0106407
Bubble Sort with a boolean flag took: 0.0078251
Bubble Sort with a boolean flag and shortened list took: 0.0075207

There isn't much of a difference between the latter two approaches due to the fact that the list is extremely short, but on larger lists - the second optimization can make a huge difference.

Conclusion

In the most inefficient approach, Bubble Sort goes through n-1 iterations, looking at n-1 pairs of adjacent elements. This gives it the time complexity of O(n2), in both best-case and average-case situations. O(n2) is considered pretty horrible for a sorting algorithm.

It does have an O(1) space complexity, but that isn't enough to compensate for its shortcomings in other fields.

However, it's still a big part of the software development community and history, and textbooks almost never fail to mention it when talking about basic sorting algorithms.

Real Python: Implementing an Interface in Python

$
0
0

Interfaces play an important role in software engineering. As an application grows, updates and changes to the code base become more difficult to manage. More often than not, you wind up having classes that look very similar but are unrelated, which can lead to some confusion. In this tutorial, you’ll see how you can use a Python interface to help determine what class you should use to tackle the current problem.

In this tutorial, you’ll be able to:

  • Understand how interfaces work and the caveats of Python interface creation
  • Comprehend how useful interfaces are in a dynamic language like Python
  • Implement an informal Python interface
  • Useabc.ABCMeta and @abc.abstractmethod to implement a formal Python interface

Interfaces in Python are handled differently than in most other languages, and they can vary in their design complexity. By the end of this tutorial, you’ll have a better understanding of some aspects of Python’s data model, as well as how interfaces in Python compare to those in languages like Java, C++, and Go.

Free Bonus:5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Python Interface Overview

At a high level, an interface acts as a blueprint for designing classes. Like classes, interfaces define methods. Unlike classes, these methods are abstract. An abstract method is one that the interface simply defines. It doesn’t implement the methods. This is done by classes, which then implement the interface and give concrete meaning to the interface’s abstract methods.

Python’s approach to interface design is somewhat different when compared to languages like Java, Go, and C++. These languages all have an interface keyword, while Python does not. Python further deviates from other languages in one other aspect. It doesn’t require the class that’s implementing the interface to define all of the interface’s abstract methods.

Informal Interfaces

In certain circumstances, you may not need the strict rules of a formal Python interface. Python’s dynamic nature allows you to implement an informal interface. An informal Python interface is a class that defines methods that can be overridden, but there’s no strict enforcement.

In the following example, you’ll take the perspective of a data engineer who needs to extract text from various different unstructured file types, like PDFs and emails. You’ll create an informal interface that defines the methods that will be in both the PdfParser and EmlParser concrete classes:

classInformalParserInterface:defload_data_source(self,path:str,file_name:str)->str:"""Load in the file for extracting text."""passdefextract_text(self,full_file_name:str)->dict:"""Extract text from the currently loaded file."""pass

InformalParserInterface defines the two methods .load_data_source() and .extract_text(). These methods are defined but not implemented. The implementation will occur once you create concrete classes that inherit from InformalParserInterface.

As you can see, InformalParserInterface looks identical to a standard Python class. You rely on duck typing to inform users that this is an interface and should be used accordingly.

Note: Haven’t heard of duck typing? This term says that if you have an object that looks like a duck, walks like a duck, and quacks like a duck, then it must be a duck! To learn more, check out Duck Typing.

With duck typing in mind, you define two classes that implement the InformalParserInterface. To use your interface, you must create a concrete class. A concrete class is a subclass of the interface that provides an implementation of the interface’s methods. You’ll create two concrete classes to implement your interface. The first is PdfParser, which you’ll use to parse the text from PDF files:

classPdfParser(InformalParserInterface):"""Extract text from a PDF"""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text(self,full_file_path:str)->dict:"""Overrides InformalParserInterface.extract_text"""pass

The concrete implementation of InformalParserInterface now allows you to extract text from PDF files.

The second concrete class is EmlParser, which you’ll use to parse the text from emails:

classEmlParser(InformalParserInterface):"""Extract text from an email"""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text_from_email(self,full_file_path:str)->dict:"""A method defined only in EmlParser.        Does not override InformalParserInterface.extract_text"""pass

The concrete implementation of InformalParserInterface now allows you to extract text from email files.

So far, you’ve defined two concrete implementations of the InformalPythonInterface. However, note that EmlParser fails to properly define .extract_text(). If you were to check whether EmlParser implements InformalParserInterface, then you’d get the following result:

# Check if both PdfParser and EmlParser implement InformalParserInterfaceissubclass(PdfParser,InformalParserInterface)# Trueissubclass(EmlParser,InformalParserInterface)# True

This would return True, which poses a bit of a problem since it violates the definition of an interface!

Now check the method resolution order (MRO) of PdfParser and EmlParser. This tells you the superclasses of the class in question, as well as the order in which they’re searched for executing a method. You can view a class’s MRO by using the dunder method cls.__mro__:

# (__main__.PdfParser, __main__.InformalParserInterface, object)PdfParser.__mro__# (__main__.EmlParser, __main__.InformalParserInterface, object)EmlParser.__mro__

Such informal interfaces are fine for small projects where only a few developers are working on the source code. However, as projects get larger and teams grow, this could lead to developers spending countless hours looking for hard-to-find logic errors in the codebase!

Using Metaclasses

Ideally, you would want issubclass(EmlParser, InformalParserInterface to return False when the implementing class doesn’t define all of the interface’s abstract methods. To do this, you’ll create a metaclass called ParserMeta. You’ll be overriding two dunder methods:

  1. __instancecheck__
  2. __subclasscheck__

In the code block below, you create a class called UpdatedInformalParserInterface that builds from the ParserMeta metaclass:

classParserMeta(type):"""A Parser metaclass that will be used for parser class creation."""def__instancecheck__(cls,instance):returncls.__subclasscheck__(type(instance))def__subclasscheck__(cls,subclass):return(hasattr(subclass,'load_data_source')andcallable(subclass.load_data_source)andhasattr(subclass,'extract_text')andcallable(subclass.extract_text))classUpdatedInformalParserInterface(metaclass=ParserMeta):"""This interface is used for concrete classes to inherit from.    There is no need to define the ParserMeta methods as any class    as they are implicitly made available via __subclasscheck__."""pass

Now that ParserMeta and UpdatedInformalParserInterface have been created, you can create your concrete implementations.

First, create a new class for parsing PDFs called PdfParserNew:

classPdfParserNew:"""Extract text from a PDF."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text(self,full_file_path:str)->dict:"""Overrides InformalParserInterface.extract_text"""pass

Here, PdfParserNew overrides .load_data_source() and .extract_text(), so issubclass(PdfParserNew, UpdatedInformalParserInterface) should return True.

In this next code block, you have a new implementation of the email parser called EmlParserNew:

classEmlParserNew:"""Extract text from an email."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text_from_email(self,full_file_path:str)->dict:"""A method defined only in EmlParser.        Does not override InformalParserInterface.extract_text"""pass

Here, you have a metaclass that’s used to create UpdatedInformalParserInterface. By using a metaclass, you don’t need to explicitly define the subclasses. Instead, the subclass must define the required methods. If it doesn’t, then issubclass(EmlParserNew, UpdatedInformalParserInterface) will return False.

Running issubclass() on your concrete classes will produce the following:

issubclass(PdfParserNew,UpdatedInformalParserInterface)# Trueissubclass(EmlParserNew,UpdatedInformalParserInterface)# False

As expected, EmlParserNew is not a subclass of UpdatedInformalParserInterface since .extract_text() wasn’t defined in EmlParserNew.

Now, let’s have a look at the MRO:

PdfParserNew.__mro__# (<class '__main__.PdfParserNew'>, <class 'object'>)

As you can see, UpdatedInformalParserInterface is a superclass of PdfParserNew, but it doesn’t appear in the MRO. This unusual behavior is caused by the fact that UpdatedInformalParserInterface is a virtual base class of PdfParserNew.

Using Virtual Base Classes

In the previous example, issubclass(EmlParserNew, UpdatedInformalParserInterface) returned True, even though UpdatedInformalParserInterface did not appear in the EmlParserNew MRO. That’s because UpdatedInformalParserInterface is a virtual base class of EmlParserNew.

The key difference between these and standard subclasses is that virtual base classes use the __subclasscheck__ dunder method to implicitly check if a class is a virtual subclass of the superclass. Additionally, virtual base classes don’t appear in the subclass MRO.

Take a look at this code block:

classPersonMeta(type):"""A person metaclass"""def__instancecheck__(cls,instance):returncls.__subclasscheck__(type(instance))def__subclasscheck__(cls,subclass):return(hasattr(subclass,'name')andcallable(subclass.name)andhasattr(subclass,'age')andcallable(subclass.age))classPersonSuper:"""A person superclass"""defget_name(self)->str:passdefget_age(self)->int:passclassPerson(metaclass=PersonMeta):"""Person interface built from PersonMeta metaclass."""pass

Here, you have the setup for creating your virtual base classes:

  1. The metaclass PersonMeta
  2. The base class PersonSuper
  3. The Python interface Person

Now that the setup for creating virtual base classes is done you’ll define two concrete classes, Employee and Friend. The Employee class inherits from PersonSuper, while Friend implicitly inherits from Person:

# Inheriting subclassesclassEmployee(PersonSuper):"""Inherits from PersonSuper    PersonSuper will appear in Employee.__mro__"""passclassFriend:"""Built implicitly from Person    Friend is a virtual subclass of Person since    both required methods exist.    Person not in Friend.__mro__"""defname(self):passdefage(self):pass

Although Friend does not explicitly inherit from Person, it implements .name() and .age(), so Person becomes a virtual base class of Friend. When you run issubclass(Friend, Person) it should return True, meaning that Friend is a subclass of Person.

The following UML diagram shows what happens when you call issubclass() on the Friend class:

virtual base class

Taking a look at PersonMeta, you’ll notice that there’s another dunder method called __instancecheck__. This method is used to check if instances of Friend are created from the Person interface. Your code will call __instancecheck__ when you use isinstance(Friend, Person).

Formal Interfaces

Informal interfaces can be useful for projects with a small code base and a limited number of programmers. However, informal interfaces would be the wrong approach for larger applications. In order to create a formal Python interface, you’ll need a few more tools from Python’s abc module.

Using abc.ABCMeta

To enforce the subclass instantiation of abstract methods, you’ll utilize Python’s builtin ABCMeta from the abc module. Going back to your UpdatedInformalParserInterface interface, you created your own metaclass, ParserMeta, with the overridden dunder methods __instancecheck__ and __subclasscheck__.

Rather than create your own metaclass, you’ll use abc.ABCMeta as the metaclass. Then, you’ll overwrite subclasshook in place of __instancecheck_ and __subclasscheck__, as it creates a more reliable implementation of these dunder methods.

Using __subclasshook__

Here’s the implementation of FormalParserInterface using abc.ABCMeta as your metaclass:

importabcclassFormalParserInterface(metaclass=abc.ABCMeta):@classmethoddef__subclasshook__(cls,subclass):return(hasattr(subclass,'load_data_source')andcallable(subclass.load_data_source)andhasattr(subclass,'extract_text')andcallable(subclass.extract_text))classPdfParserNew:"""Extract text from a PDF."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text(self,full_file_path:str)->dict:"""Overrides InformalParserInterface.extract_text"""passclassEmlParserNew:"""Extract text from an email."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text_from_email(self,full_file_path:str)->dict:"""A method defined only in EmlParser.        Does not override InformalParserInterface.extract_text"""pass

If you run issubclass() on PdfParserNew and EmlParserNew, then issubclass will return True and False, respectively.

Using abc to Register a Virtual Subclass

Once you’ve imported the abc module, you can directly register a virtual subclass by using the .register() metamethod. In the next example, you register the interface Double as a virtual base class of the built-in __float__ class:

classDouble(metaclass=abc.ABCMeta):"""Double precision floating point number."""passDouble.register(float)print(issubclass(float,Double))# Trueprint(isinstance(1.2345,Double))# True

By using the .register() meta method, you’ve successfully registered Double as a virtual subclass of float.

Once you’ve registered Double, you can use it as class decorator to set the decorated class as a virtual subclass:

@Double.registerclassDouble64:"""A 64-bit double-precision floating-point number."""passprint(issubclass(Double64,Double))# True

The decorator register method helps you to create a hierarchy of custom virtual class inheritance.

Using Subclass Detection With Registration

You must be careful when you’re combining __subclasshook__ with .register(), as __subclasshook__ takes precedence over virtual subclass registration. To ensure that the registered virtual subclasses are taken into consideration, you must add NotImplemented to the __subclasshook__ dunder method. The FormalParserInterface would be updated to the following:

classFormalParserInterface(metaclass=abc.ABCMeta):@classmethoddef__subclasshook__(cls,subclass):return(hasattr(subclass,'load_data_source')andcallable(subclass.load_data_source)andhasattr(subclass,'extract_text')andcallable(subclass.extract_text)orNotImplemented)classPdfParserNew:"""Extract text from a PDF."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text(self,full_file_path:str)->dict:"""Overrides InformalParserInterface.extract_text"""pass@FormalParserInterface.registerclassEmlParserNew:"""Extract text from an email."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text_from_email(self,full_file_path:str)->dict:"""A method defined only in EmlParser.        Does not override InformalParserInterface.extract_text"""passprint(issubclass(PdfParserNew,FormalParserInterface))# Trueprint(issubclass(EmlParserNew,FormalParserInterface))# True

Since you’ve used registration, you can see that EmlParserNew is considered a virtual subclass of your FormalParserInterface interface. This is not what you wanted since EmlParserNew doesn’t override .extract_text(). Please use caution with virtual subclass registration!

Using Abstract Method Declaration

An abstract method is a method that’s declared by the Python interface, but it may not have a useful implementation. The abstract method must be overridden by the concrete class that implements the interface in question.

To create abstract methods in Python, you add the @abc.abstractmethod decorator to the interface’s methods. In the next example, you update the FormalParserInterface to include the abstract methods .load_data_source() and .extract_text():

classFormalParserInterface(metaclass=abc.ABCMeta):@classmethoddef__subclasshook__(cls,subclass):return(hasattr(subclass,'load_data_source')andcallable(subclass.load_data_source)andhasattr(subclass,'extract_text')andcallable(subclass.extract_text)orNotImplemented)@abc.abstractmethoddefload_data_source(self,path:str,file_name:str):"""Load in the data set"""raiseNotImplementedError@abc.abstractmethoddefextract_text(self,full_file_path:str):"""Extract text from the data set"""raiseNotImplementedErrorclassPdfParserNew(FormalParserInterface):"""Extract text from a PDF."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text(self,full_file_path:str)->dict:"""Overrides InformalParserInterface.extract_text"""passclassEmlParserNew(FormalParserInterface):"""Extract text from an email."""defload_data_source(self,path:str,file_name:str)->str:"""overrides InformalParserInterface.load_data_source"""passdefextract_text_from_email(self,full_file_path:str)->dict:"""A method defined only in EmlParser.        Does not override InformalParserInterface.extract_text"""passpdf_parser=PdfParserNew()# Won't raise any errorseml_parser=EmlParserNew()# Will raise an error

In the above example, you’ve finally created a formal interface that will raise errors when the abstract methods aren’t overridden. The PdfParserNew instance, pdf_parser, won’t raise any errors, as PdfParserNew is correctly overriding the FormalParserInterface abstract methods. However, EmlParserNew will raise the following error:

Traceback(mostrecentcalllast):File"real_python_interfaces.py",line53,in<module>eml_interface=EmlParserNew()TypeError:Can't instantiate abstract class EmlParserNew with abstract methods extract_text

As you can see, the traceback message tells you that you haven’t overridden all the abstract methods. This is the behavior you expect when building a formal Python interface.

Interfaces in Other Languages

Interfaces appear in many programming languages, and their implementation varies greatly from language to language. In the next few sections, you’ll compare interfaces in Python to Java, C++, and Go.

Java

Unlike Python, Java contains an interface keyword. Keeping with the file parser example, you declare an interface in Java like so:

publicinterfaceFileParserInterface{// Static fields, and abstract methods go here ...}

Now you’ll create two concrete classes, PdfParser and EmlParser, to implement the FileParserInterface. To do so, you must use the implements keyword in the class definition, like so:

publicclassEmlParserimplementsFileParserInterface{publicvoidloadDataSource(){// Code to load the data set}publicvoidextractText(){// Code to extract the text}}

Continuing with your file parsing example, a fully-functional Java interface would look something like this:

importjava.util.*;importjava.io.*;publicclassFileParser{publicstaticvoidmain(String[]args)throwsIOException{System.out.println("Hello, World!");}publicinterfaceFileParserInterface{HashMap<String,ArrayList<String>>file_contents=null;publicvoidloadDataSource();publicvoidextractText();}publicclassPdfParserimplementsFileParserInterface{publicvoidloadDataSource(){// Code to load the data set}publicvoidextractText(){// Code to extract the text}}publicclassEmlParserimplementsFileParserInterface{publicvoidloadDataSource(){// Code to load the data set}publicvoidextractText(){// Code to extract the text}}}

As you can see, a Python interface gives you much more flexibility during creation than a Java interface does.

C++

Like Python, C++ uses abstract base classes to create interfaces. When defining an interface in C++, you use the keyword virtual to describe a method that should be overwritten in the concrete class:

classFileParserInterface{public:virtualvoidloadDataSource(std::stringpath,std::stringfile_name);virtualvoidextractText(std::stringfull_file_name);};

When you want to implement the interface, you’ll give the concrete class name, followed by a colon (:), and then the name of the interface. The following example demonstrates C++ interface implementation:

classPdfParser:FileParserInterface{public:voidloadDataSource(std::stringpath,std::stringfile_name);voidextractText(std::stringfull_file_name);};classEmlParser:FileParserInterface{public:voidloadDataSource(std::stringpath,std::stringfile_name);voidextractText(std::stringfull_file_name);};

A Python interface and a C++ interface have some similarities in that they both make use of abstract base classes to simulate interfaces.

Go

Although Go’s syntax is reminiscent of Python, the Go programming language contains an interface keyword, like Java. Let’s create the fileParserInterface in Go:

typefileParserInterfaceinterface{loadDataSet(pathstring,filenamestring)extractText(full_file_pathstring)}

A big difference between Python and Go is that Go doesn’t have classes. Rather, Go is similar to C in that it uses the struct keyword to create structures. A structure is similar to a class in that a structure contains data and methods. However, unlike a class, all of the data and methods are publicly accessed. The concrete structs in Go will be used to implement the fileParserInterface.

Here’s an example of how Go uses interfaces:

packagemainimport("fmt")typefileParserInterfaceinterface{loadDataSet(pathstring,filenamestring)extractText(full_file_pathstring)}typepdfParserstruct{// Data goes here ...}typeemlParserstruct{// Data goes here ...}func(ppdfParser)loadDataSet(){// Method definition ...}func(ppdfParser)extractText(){// Method definition ...}func(eemlParser)loadDataSet(){// Method definition ...}func(eemlParser)extractText(){// Method definition ...}funcmain(){fmt.Printf("Hello, World!")}

Unlike a Python interface, a Go interface is created using structs and the explicit keyword interface.

Conclusion

Python offers great flexibility when you’re creating interfaces. An informal Python interface is useful for small projects where you’re less likely to get confused as to what the return types of the methods are. As a project grows, the need for a formal Python interface becomes more important as it becomes more difficult to infer return types. This ensures that the concrete class, which implements the interface, overwrites the abstract methods.

Now you can:

  • Understand how interfaces work and the caveats of creating a Python interface
  • Understand the usefulness of interfaces in a dynamic language like Python
  • Implement formal and informal interfaces in Python
  • Compare Python interfaces to those in languages like Java, C++, and Go

Now that you’ve become familiar with how to create a Python interface, add a Python interface to your next project to see its usefulness in action!


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Learn PyQt: Display tables in PyQt5/PySide2, QTableView with conditional formatting, numpy and pandas

$
0
0

In the previous chapter we covered an introduction to the Model View architecture. However, we only touched on one of the model views — QListView. There are two other Model Views available in Qt5 — QTableView and QTreeView which provide tabular (Excel-like) and tree (file directory browser-like) views using the same QStandardItemModel.

In this tutorial we'll look at how to use QTableView from PyQt5, including how to model your data, format values for display and add conditional formatting.

You can use model views with any data source, as long as your model returns that data in a format that Qt can understand. Working with tabular data in Python opens up a number of possibilities for how we load and work with that data. Here we'll start with a simple nested list of list and then move onto integrating your Qt application with the popular numpy and pandas libraries. This will provide you with a great foundation for building data-focused applications.

Introduction to QTableView

QTableView is a Qt view widget which presents data in a spreadsheet-like table view. Like all widgets in the Model View Architecture, this uses a separate model to provide data and presentation information to the view. Data in the model can be updated as required, and the view notified of these changes to redraw/display the changes. By customising the model it is possible to have a huge amount of control over how the data is presented.

To use the model we'll need a basic application structure and some dummy data. A simple working example is shown below, which defines a custom model working with a simple nested-list as a data store.

We'll go into alternative data structures in detail a bit later.

python
importsysfromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5.QtCoreimportQtclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# See below for the nested-list data structure.# .row() indexes into the outer list,# .column() indexes into the sub-listreturnself._data[index.row()][index.column()]defrowCount(self,index):# The length of the outer list.returnlen(self._data)defcolumnCount(self,index):# The following takes the first sub-list, and returns# the length (only works if all rows are an equal length)returnlen(self._data[0])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=[[4,9,2],[1,0,0],[3,5,0],[3,3,2],[7,8,9],]self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()
python
importsysfromPySide2importQtCore,QtGui,QtWidgetsfromPySide2.QtCoreimportQtclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# See below for the nested-list data structure.# .row() indexes into the outer list,# .column() indexes into the sub-listreturnself._data[index.row()][index.column()]defrowCount(self,index):# The length of the outer list.returnlen(self._data)defcolumnCount(self,index):# The following takes the first sub-list, and returns# the length (only works if all rows are an equal length)returnlen(self._data[0])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=[[4,9,2],[1,0,0],[3,5,0],[3,3,2],[7,8,9],]self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()

As in our earlier model view examples, we create the QTableView widget, then create an instance of our custom model (which we've written to accept the data source as a parameter) and then we set the model on the view. That's all we need to do — the view widget now uses the model to get the data, and determine how to draw it.

Basic QTableView example Basic QTableView example

Nested list as a 2-dimensional data store

For a table you need a 2D data structure, with columns and rows. As shown in the example above you can model a simple 2D data structure using a nested Python list. We'll take a minute to look at this data structure, and it's limitations, below —

table=[[4,1,3,3,7],[9,1,5,3,8],[2,1,5,3,9],]

The nested list is a "list of lists of values"— an outer list containing a number of sub-lists which themselves contain the values. With this structure, to index into individual values (or "cells") you must index twice, first to return one of the inner list objects and then again to index into that list.

The typical arrangement is for the outer list to hold the rows and each nested list to contain the values for the columns. With this arrangement when you index, you index first by row, then by column— making our example table a 3 row, 5 column table. Helpfully, this matches the visual layout in the source code.

The first index into the table will return a nested sub-list —

row=2col=4>>>table[row][2,1,5,3,9]

Which you then index again to return the value —

>>>table[row][col]9

Note that using this type of structure you can't easily return an entire column, you would instead need to iterate all the rows. However, you are of course free to flip things on their head and use the first index as column depending on whether accessing by column or row is more useful to you.

table=[[4,9,2],[1,1,1],[3,5,5],[3,3,2],[7,8,9],]row=4# reversedcol=2# reversed>>>table[col][3,5,5]>>>table[col][row]9

NOTE: Nothing about this data structure enforces equal row or column lengths — one row can be 5 elements long, another 200. Inconsistencies can lead to unexpected errors on the table view. See the alternative data stores later if you're working with large or complex data tables.

Next we'll look in a bit more detail at our custom TableModel and see how it works with this simple data structure to display the values.

Writing a custom QAbstractTableModel

In the Model View Architecture the model is responsible for providing both the data and presentation metadata for display by the view. In order to interface between our data object and the view we need to write our own custom model, which understands the structure of our data.

To write our custom model we can create a subclass of QAbstractTableModel. The only required methods for a custom table model are data, rowCount and columnCount. The first returns data (or presentation information) for given locations in the table, while the latter two must return a single integer value for the dimensions of the data source.

classTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# See below for the nested-list data structure.# .row() indexes into the outer list,# .column() indexes into the sub-listreturnself._data[index.row()][index.column()]defrowCount(self,index):# The length of the outer list.returnlen(self._data)defcolumnCount(self,index):# The following takes the first sub-list, and returns# the length (only works if all rows are an equal length)returnlen(self._data[0])

QtCore.QAbstractTableModel is an abstract base class meaning it does not have implementations for the methods. If you try and use it directly, it will not work. You must sub-class it.

In the __init__ constructor we accept a single parameter data which we store as the instance attribute self._data so we can access it from our methods. The passed in data structure is stored by reference, so any external changes will be reflected here.

To notify the model of changes you need to trigger the model's layoutChanged signal, using self.model.layoutChanged.emit(). See the previous ModelView tutorial for more information.

The data method is called with two values index and role. The index parameter gives the location in the table for which information is currently being requested, and has two methods .row() and .column() which give the row and column number in the view respectively. In our example the data is stored as a nested list, and the row and column indices are used to index as follows data[row][column].

The view has no knowledge of the structure of the source data, and is the responsibility of the model to translate between the view's row and column and the relevant positions in your own data store.

The role parameter describes what kind of information the method should return on this call. To get the data to display the view calls this model method with the role of Qt.DisplayRole. However, role can have many other values including Qt.BackgroundRole, Qt.CheckStateRole, Qt.DecorationRole, Qt.FontRole, Qt.TextAlignmentRole and Qt.ForegroundRole, which each expect particular values in response (see later).

Qt.DisplayRole actually expects a string to be returned, although other basic Python types including float, int and bool will also be displayed using their default string representations. However, formatting these types to your strings is usually preferable.

Basic QTableView example Basic QTableView example

We'll cover how to use these other role types later, for now it is only necessary to know that you must check the role type is Qt.DisplayRole before returning your data for display.

The two custom methods columnCount and rowCount return the number of columns and rows in our data structure. In the case of a nested list of list in the arrangement we're using here, the number of rows is simply the number of elements in the outer list, and the number of columns is the number of elements in one of the inner lists — assuming they are all equal.

If these methods return values that are too high you will see out of bounds errors, if they return values that are too low, you'll see the table cut off.

Formatting numbers and dates

The data returned by the model for display is expected to be a string. While int and float values will also be displayed, using their default string representation, complex Python types will not. To display these, or to override the default formatting of float , int or bool values, you must format these to strings yourself.

You might be tempted to do this by converting your data to a table of strings in advance. However, by doing this you make it very difficult to continue working with the data in your table, whether for calculations or for updates.

Instead, you should use the model's data method to perform the string conversion on demand. By doing this you can continue to work with the original data, yet have complete control over how it is presented to the user — including changing this on the fly while through configuration.

Below is a simple custom formatter which looks up the values in our data table, and displays them in a number of different ways depending on the Python type of the data.

defdata(self,index,role):ifrole==Qt.DisplayRole:# Get the raw valuevalue=self._data[index.row()][index.column()]# Perform per-type checks and render accordingly.ifisinstance(value,datetime):# Render time to YYY-MM-DD.returnvalue.strftime("%Y-%m-%d")ifisinstance(value,float):# Render float to 2 dpreturn"%.2f"%valueifisinstance(value,str):# Render strings with quotesreturn'"%s"'%value# Default (anything not captured above: e.g. int)returnvalue

Use this together with the modified sample data below to see it in action.

data=[[4,9,2],[1,-1,'hello'],[3.023,5,-5],[3,3,datetime(2017,10,1)],[7.555,8,9],]
QTableView data formatting QTableView data formatting

So far we've only looked at how we can customize how the data itself is formatted. However, the model interface gives you far more control over the display of table cells including colours and icons. In the next part we'll look at how to use the model to customise QTableView appearance.

Styles & Colours with Roles

Using colours and icons to highlight cells in data tables can help make data easier to find and understand, or help users to select or mark data of interest. Qt allows for complete control of all of these from the model, by responding to the relevant role on the data method.

The types expected to be returned in response to the various role types are shown below.

RoleType
Qt.BackgroundRoleQBrush (also QColor)
Qt.CheckStateRoleQt.CheckState
Qt.DecorationRoleQIcon, QPixmap, QColor
Qt.DisplayRoleQString (also int, float, bool)
Qt.FontRoleQFont
Qt.SizeHintRoleQSize
Qt.TextAlignmentRoleQt.Alignment
Qt.ForegroundRoleQBrush (also QColor)=

By responding to a particular combination of role and index we can modify the appearance of particular cells, columns or rows in the table — for example, setting a blue background for all cells in the 3rd column.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.BackgroundRoleandindex.column()==3:# See below for the data structure.returnQtGui.QColor('blue')

By using the index to lookup values from our own data, we can also customise appearance based on values in our data. We'll go through some of the more common use-cases below.

Text alignment

In our previous formatting examples we had used text formatting to display float down to 2 decimal places. However, it's also common when displaying numbers to right-align them, to make it easier to compare across lists of numbers. This can be accomplished by returning Qt.AlignRight in response to Qt.TextAlignmentRole for any numeric values.

The modified data method is shown below. We check for role == Qt.TextAlignmentRole and look up the value by index as before, then determine if the value is numeric. If it is we can return Qt.AlignVCenter + Qt.AlignRight to align in the middle vertically, and on the right horizontally.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.TextAlignmentRole:value=self._data[index.row()][index.column()]ifisinstance(value,int)orisinstance(value,float):# Align right, vertical middle.returnQt.AlignVCenter+Qt.AlignRight

Other alignments are possible, including Qt.AlignHCenter to align centre horizontally. You can combine them together by adding them together e.g. Qt.AlignBottom + Qt.AlignRight.

QTableView cell alignment QTableView cell alignment

Text colours

If you've used spreadsheets like Excel you might be familiar with the concept of conditional formatting. These are rules you can apply to cells (or rows, or columns) which change text and background colours of cells depending on their value.

This can be useful to help visualise data, for example using red for negative numbers or highlighting ranges of numbers (e.g. low … high) with a gradient of blue to red.

First, the below example implements a Qt.ForegroundRole handler which checks if the value in the indexed cell is numeric, and below zero. If it is, then the handler returns the text (foreground) colour red.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.ForegroundRole:value=self._data[index.row()][index.column()]if((isinstance(value,int)orisinstance(value,float))andvalue<0):returnQtGui.QColor('red')

If you add this to your model's data handler, all negative numbers will now appear red.

QTableView text formatting, with red negative numbers QTableView text formatting, with red negative numbers

Number range gradients

The same principle can be used to apply gradients to numeric values in a table to, for example, highlight low and high values. First we define our colour scale, which is taken from colorbrewer2.org.

COLORS=['#053061','#2166ac','#4393c3','#92c5de','#d1e5f0','#f7f7f7','#fddbc7','#f4a582','#d6604d','#b2182b','#67001f']

Next we define our custom handler, this time for Qt.BackgroundRole. This takes the value at the given index, checks that this is numeric then performs a series of operations to constrain it to the range 0…10 required to index into our list.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.BackgroundRole:value=self._data[index.row()][index.column()]if(isinstance(value,int)orisinstance(value,float)):value=int(value)# Convert to integer for indexing.# Limit to range -5 ... +5, then convert to 0..10value=max(-5,value)# values < -5 become -5value=min(5,value)# valaues > +5 become +5value=value+5# -5 becomes 0, +5 becomes + 10returnQtGui.QColor(colors[value])

The logic used here for converting the value to the gradient is very basic, cutting off high/low values, and not adjusting to the range of the data. However, you can adapt this as needed, as long as the end result of your handler is to return a QColor or QBrush

QTableView with number-range colour gradients QTableView with number-range colour gradients

Icon & Image decoration

Each table cell contains a small decoration area which can be used to display icons, images or a solid block of colour, on the left hand side next to the data. This can be used to indicate data type, e.g. calendars for dates, ticks and crosses for bool values, or for a more subtle conditional-formatting for number ranges.

Below are some simple implementations of these ideas.

Indicating bool/date data types with icons

For dates we'll use Python's built-in datetime type. First, add the following import to the top of your file to import this type.

fromdatetimeimportdatetime

Then, update the data (set in the MainWindow.__init__) to add datetime and bool (True or False values), for example.

data=[[True,9,2],[1,0,-1],[3,5,False],[3,3,2],[datetime(2019,5,4),8,9],]

With these in place, you can update your model data method to show icons and formatted dates for date types, with the following.

#  icons indicating data typedefdata(self,index,role):ifrole==Qt.DisplayRole:value=self._data[index.row()][index.column()]ifisinstance(value,datetime):returnvalue.strftime('%Y-%m-%d')returnvalueifrole==Qt.DecorationRole:value=self._data[index.row()][index.column()]ifisinstance(value,datetime):returnQtGui.QIcon('calendar.png')
QTableView formatted dates with indicator icon QTableView formatted dates with indicator icon

The following shows how to use ticks and cross for boolean True and False values respectively.

# ticks and crosses for `bool`valuesdefdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.DecorationRole:value=self._data[index.row()][index.column()]ifisinstance(value,bool):ifvalue:returnQtGui.QIcon('tick.png')returnQtGui.QIcon('cross.png')

You can of course combine the above together, or any other mix of Qt.DecorationRole and Qt.DisplayRole handlers. It's usually simpler to keep each type grouped under the same role if branch, or as your model becomes more complex, to create sub-methods to handle each role.

QTableView boolean indicators QTableView boolean indicators

Colour blocks

If you return a QColor for Qt.DecorationRole a small square of colour will be displayed on the left hand side of the cell, in the icon location. This is identical to the earlier Qt.BackgroundRole conditional formatting example, except now handling and responding to Qt.DecorationRole.

# color blocksifrole==Qt.DecorationRole:value=self._data[index.row()][index.column()]if(isinstance(value,int)orisinstance(value,float)):value=int(value)# Limit to range -5 ... +5, then convert to 0..10value=max(-5,value)# values < -5 become -5value=min(5,value)# valaues > +5 become +5value=value+5# -5 becomes 0, +5 becomes + 10returnQtGui.QColor(COLORS[value])
QTableView color block decorations QTableView color block decorations

Alternative Python data structures

So far in our examples we've used simple nested Python lists to hold our data for display. This is fine for simple tables of data, however if you're working with large data tables there are some other better options in Python, which come with additional benefits. In the next parts we'll look at two Python data table libraries — numpy and pandas— and how to integrate these with Qt.

Numpy

Numpy is a library which provides support for large multi-dimensional arrays or matrix data structures in Python. The efficient and high-performance handling of large arrays makes numpy ideal for scientific and mathematical applications. This also makes numpy arrays an good data store for large, single-typed, data tables in PyQt.

Using numpy as a data source

To support numpy arrays we need to make a number of changes to the model, first modifying the indexing in the data method, and then changing the row and column count calculations for rowCount and columnCount.

The standard numpy API provides element-level access to 2D arrays, by passing the row and column in the same slicing operation, e.g. _data[index.row(), index.column()]. This is more efficient than indexing in two steps, as for the list of list examples.

In numpy the dimensions of an array are available through .shape which returns a tuple of dimensions along each axis in turn. We get the length of each axis by selecting the correct item from this tuple, e.g. _data.shape[0] gets the size of the first axis.

The following complete example shows how to display a numpy array using Qt's QTableView via a custom model.

python
importsysfromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5.QtCoreimportQtimportnumpyasnpclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# Note: self._data[index.row()][index.column()] will also workvalue=self._data[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=np.array([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()
python
importsysfromPySide2importQtCore,QtGui,QtWidgetsfromPySide2.QtCoreimportQtimportnumpyasnpclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# Note: self._data[index.row()][index.column()] will also workvalue=self._data[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=np.array([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()

While simple Python types such as int and float are displayed without converting to strings, numpy uses it's own types (e.g. numpy.int32) for array values. In order for these to be displayed we must first convert them to strings.

QTableView with numpy array QTableView with numpy array

With QTableView only 2D arrays can be displayed, however if you have a higher dimensional data structure you can combine the QTableView with a tabbed or scrollbar UI, to allow access to and display of these higher dimensions.

Pandas

Pandas is a Python library commonly used for data manipulation and analysis. It provides a nice API for loading 2D tabular data from various data sources and performing data analysis on it. By using the pandasDataTable as your QTableView model you can use these APIs to load and analyse your data from right within your application.

Using Pandas as a data source

The modifications of the model to work with pandas are fairly minor, requiring changes to the indexing in the data method and modifications to rowCount and columnCount. The changes for rowCount and columnCount are identical to numpy with pandas using a _data.shape tuple to represent the dimensions of the data.

For indexing we use the pandas.iloc method, for indexed locations — i.e. lookup by column and/or row index. This is done by passing the row, and then column to the slice _data.iloc[index.row(), index.column()] .

The following complete example shows how to display a pandas data frame using Qt QTableView via a custom model.

python
importsysfromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5.QtCoreimportQtimportpandasaspdclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:value=self._data.iloc[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]defheaderData(self,section,orientation,role):# section is the index of the column/row.ifrole==Qt.DisplayRole:iforientation==Qt.Horizontal:returnstr(self._data.columns[section])iforientation==Qt.Vertical:returnstr(self._data.index[section])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=pd.DataFrame([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],],columns=['A','B','C'],index=['Row 1','Row 2','Row 3','Row 4','Row 5'])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()
python
importsysfromPySide2importQtCore,QtGui,QtWidgetsfromPySide2.QtCoreimportQtimportpandasaspdclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:value=self._data.iloc[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]defheaderData(self,section,orientation,role):# section is the index of the column/row.ifrole==Qt.DisplayRole:iforientation==Qt.Horizontal:returnstr(self._data.columns[section])iforientation==Qt.Vertical:returnstr(self._data.index[section])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=pd.DataFrame([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],],columns=['A','B','C'],index=['Row 1','Row 2','Row 3','Row 4','Row 5'])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()

An interesting extension here is to use the table header of the QTableView to display row and pandas column header values, which can be taken from DataFrame.index and DataFrame.columns respectively.

QTableView pandas DataTable, with column and row headers QTableView pandas DataTable, with column and row headers

For this we need to implement a Qt.DisplayRole handler in a custom headerData method. This receives section, the index of the row/column (0…n), orientation which can be either Qt.Horizontal for the column headers, or Qt.Vertical for the row headers, and role which works the same as for the data method.

The headerData method also receives other roles, which can be used to customise the appearance of the headers further.

Conclusion

In this tutorial we've covered the basics of using QTableView and a custom model to display tabular data in your applications. This was extended to demonstrate how to format data and decorate cells with icons and colours. Finally, we demonstrated using QTableView with tabular data from numpy and pandas data structures, including displaying custom column and row headers.

Learn PyQt: Display tables in PyQt5/PySide2, QTableView with conditional formatting, numpy and pandas

$
0
0

In the previous chapter we covered an introduction to the Model View architecture. However, we only touched on one of the model views — QListView. There are two other Model Views available in Qt5 — QTableView and QTreeView which provide tabular (Excel-like) and tree (file directory browser-like) views using the same QStandardItemModel.

In this tutorial we'll look at how to use QTableView from PyQt5, including how to model your data, format values for display and add conditional formatting.

You can use model views with any data source, as long as your model returns that data in a format that Qt can understand. Working with tabular data in Python opens up a number of possibilities for how we load and work with that data. Here we'll start with a simple nested list of list and then move onto integrating your Qt application with the popular numpy and pandas libraries. This will provide you with a great foundation for building data-focused applications.

Introduction to QTableView

QTableView is a Qt view widget which presents data in a spreadsheet-like table view. Like all widgets in the Model View Architecture, this uses a separate model to provide data and presentation information to the view. Data in the model can be updated as required, and the view notified of these changes to redraw/display the changes. By customising the model it is possible to have a huge amount of control over how the data is presented.

To use the model we'll need a basic application structure and some dummy data. A simple working example is shown below, which defines a custom model working with a simple nested-list as a data store.

We'll go into alternative data structures in detail a bit later.

python
importsysfromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5.QtCoreimportQtclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# See below for the nested-list data structure.# .row() indexes into the outer list,# .column() indexes into the sub-listreturnself._data[index.row()][index.column()]defrowCount(self,index):# The length of the outer list.returnlen(self._data)defcolumnCount(self,index):# The following takes the first sub-list, and returns# the length (only works if all rows are an equal length)returnlen(self._data[0])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=[[4,9,2],[1,0,0],[3,5,0],[3,3,2],[7,8,9],]self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()
python
importsysfromPySide2importQtCore,QtGui,QtWidgetsfromPySide2.QtCoreimportQtclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# See below for the nested-list data structure.# .row() indexes into the outer list,# .column() indexes into the sub-listreturnself._data[index.row()][index.column()]defrowCount(self,index):# The length of the outer list.returnlen(self._data)defcolumnCount(self,index):# The following takes the first sub-list, and returns# the length (only works if all rows are an equal length)returnlen(self._data[0])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=[[4,9,2],[1,0,0],[3,5,0],[3,3,2],[7,8,9],]self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()

As in our earlier model view examples, we create the QTableView widget, then create an instance of our custom model (which we've written to accept the data source as a parameter) and then we set the model on the view. That's all we need to do — the view widget now uses the model to get the data, and determine how to draw it.

Basic QTableView example Basic QTableView example

Nested list as a 2-dimensional data store

For a table you need a 2D data structure, with columns and rows. As shown in the example above you can model a simple 2D data structure using a nested Python list. We'll take a minute to look at this data structure, and it's limitations, below —

table=[[4,1,3,3,7],[9,1,5,3,8],[2,1,5,3,9],]

The nested list is a "list of lists of values"— an outer list containing a number of sub-lists which themselves contain the values. With this structure, to index into individual values (or "cells") you must index twice, first to return one of the inner list objects and then again to index into that list.

The typical arrangement is for the outer list to hold the rows and each nested list to contain the values for the columns. With this arrangement when you index, you index first by row, then by column— making our example table a 3 row, 5 column table. Helpfully, this matches the visual layout in the source code.

The first index into the table will return a nested sub-list —

row=2col=4>>>table[row][2,1,5,3,9]

Which you then index again to return the value —

>>>table[row][col]9

Note that using this type of structure you can't easily return an entire column, you would instead need to iterate all the rows. However, you are of course free to flip things on their head and use the first index as column depending on whether accessing by column or row is more useful to you.

table=[[4,9,2],[1,1,1],[3,5,5],[3,3,2],[7,8,9],]row=4# reversedcol=2# reversed>>>table[col][3,5,5]>>>table[col][row]9

NOTE: Nothing about this data structure enforces equal row or column lengths — one row can be 5 elements long, another 200. Inconsistencies can lead to unexpected errors on the table view. See the alternative data stores later if you're working with large or complex data tables.

Next we'll look in a bit more detail at our custom TableModel and see how it works with this simple data structure to display the values.

Writing a custom QAbstractTableModel

In the Model View Architecture the model is responsible for providing both the data and presentation metadata for display by the view. In order to interface between our data object and the view we need to write our own custom model, which understands the structure of our data.

To write our custom model we can create a subclass of QAbstractTableModel. The only required methods for a custom table model are data, rowCount and columnCount. The first returns data (or presentation information) for given locations in the table, while the latter two must return a single integer value for the dimensions of the data source.

classTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# See below for the nested-list data structure.# .row() indexes into the outer list,# .column() indexes into the sub-listreturnself._data[index.row()][index.column()]defrowCount(self,index):# The length of the outer list.returnlen(self._data)defcolumnCount(self,index):# The following takes the first sub-list, and returns# the length (only works if all rows are an equal length)returnlen(self._data[0])

QtCore.QAbstractTableModel is an abstract base class meaning it does not have implementations for the methods. If you try and use it directly, it will not work. You must sub-class it.

In the __init__ constructor we accept a single parameter data which we store as the instance attribute self._data so we can access it from our methods. The passed in data structure is stored by reference, so any external changes will be reflected here.

To notify the model of changes you need to trigger the model's layoutChanged signal, using self.model.layoutChanged.emit(). See the previous ModelView tutorial for more information.

The data method is called with two values index and role. The index parameter gives the location in the table for which information is currently being requested, and has two methods .row() and .column() which give the row and column number in the view respectively. In our example the data is stored as a nested list, and the row and column indices are used to index as follows data[row][column].

The view has no knowledge of the structure of the source data, and is the responsibility of the model to translate between the view's row and column and the relevant positions in your own data store.

The role parameter describes what kind of information the method should return on this call. To get the data to display the view calls this model method with the role of Qt.DisplayRole. However, role can have many other values including Qt.BackgroundRole, Qt.CheckStateRole, Qt.DecorationRole, Qt.FontRole, Qt.TextAlignmentRole and Qt.ForegroundRole, which each expect particular values in response (see later).

Qt.DisplayRole actually expects a string to be returned, although other basic Python types including float, int and bool will also be displayed using their default string representations. However, formatting these types to your strings is usually preferable.

Basic QTableView example Basic QTableView example

We'll cover how to use these other role types later, for now it is only necessary to know that you must check the role type is Qt.DisplayRole before returning your data for display.

The two custom methods columnCount and rowCount return the number of columns and rows in our data structure. In the case of a nested list of list in the arrangement we're using here, the number of rows is simply the number of elements in the outer list, and the number of columns is the number of elements in one of the inner lists — assuming they are all equal.

If these methods return values that are too high you will see out of bounds errors, if they return values that are too low, you'll see the table cut off.

Formatting numbers and dates

The data returned by the model for display is expected to be a string. While int and float values will also be displayed, using their default string representation, complex Python types will not. To display these, or to override the default formatting of float , int or bool values, you must format these to strings yourself.

You might be tempted to do this by converting your data to a table of strings in advance. However, by doing this you make it very difficult to continue working with the data in your table, whether for calculations or for updates.

Instead, you should use the model's data method to perform the string conversion on demand. By doing this you can continue to work with the original data, yet have complete control over how it is presented to the user — including changing this on the fly while through configuration.

Below is a simple custom formatter which looks up the values in our data table, and displays them in a number of different ways depending on the Python type of the data.

defdata(self,index,role):ifrole==Qt.DisplayRole:# Get the raw valuevalue=self._data[index.row()][index.column()]# Perform per-type checks and render accordingly.ifisinstance(value,datetime):# Render time to YYY-MM-DD.returnvalue.strftime("%Y-%m-%d")ifisinstance(value,float):# Render float to 2 dpreturn"%.2f"%valueifisinstance(value,str):# Render strings with quotesreturn'"%s"'%value# Default (anything not captured above: e.g. int)returnvalue

Use this together with the modified sample data below to see it in action.

data=[[4,9,2],[1,-1,'hello'],[3.023,5,-5],[3,3,datetime(2017,10,1)],[7.555,8,9],]
QTableView data formatting QTableView data formatting

So far we've only looked at how we can customize how the data itself is formatted. However, the model interface gives you far more control over the display of table cells including colours and icons. In the next part we'll look at how to use the model to customise QTableView appearance.

Styles & Colours with Roles

Using colours and icons to highlight cells in data tables can help make data easier to find and understand, or help users to select or mark data of interest. Qt allows for complete control of all of these from the model, by responding to the relevant role on the data method.

The types expected to be returned in response to the various role types are shown below.

RoleType
Qt.BackgroundRoleQBrush (also QColor)
Qt.CheckStateRoleQt.CheckState
Qt.DecorationRoleQIcon, QPixmap, QColor
Qt.DisplayRoleQString (also int, float, bool)
Qt.FontRoleQFont
Qt.SizeHintRoleQSize
Qt.TextAlignmentRoleQt.Alignment
Qt.ForegroundRoleQBrush (also QColor)=

By responding to a particular combination of role and index we can modify the appearance of particular cells, columns or rows in the table — for example, setting a blue background for all cells in the 3rd column.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.BackgroundRoleandindex.column()==3:# See below for the data structure.returnQtGui.QColor('blue')

By using the index to lookup values from our own data, we can also customise appearance based on values in our data. We'll go through some of the more common use-cases below.

Text alignment

In our previous formatting examples we had used text formatting to display float down to 2 decimal places. However, it's also common when displaying numbers to right-align them, to make it easier to compare across lists of numbers. This can be accomplished by returning Qt.AlignRight in response to Qt.TextAlignmentRole for any numeric values.

The modified data method is shown below. We check for role == Qt.TextAlignmentRole and look up the value by index as before, then determine if the value is numeric. If it is we can return Qt.AlignVCenter + Qt.AlignRight to align in the middle vertically, and on the right horizontally.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.TextAlignmentRole:value=self._data[index.row()][index.column()]ifisinstance(value,int)orisinstance(value,float):# Align right, vertical middle.returnQt.AlignVCenter+Qt.AlignRight

Other alignments are possible, including Qt.AlignHCenter to align centre horizontally. You can combine them together by adding them together e.g. Qt.AlignBottom + Qt.AlignRight.

QTableView cell alignment QTableView cell alignment

Text colours

If you've used spreadsheets like Excel you might be familiar with the concept of conditional formatting. These are rules you can apply to cells (or rows, or columns) which change text and background colours of cells depending on their value.

This can be useful to help visualise data, for example using red for negative numbers or highlighting ranges of numbers (e.g. low … high) with a gradient of blue to red.

First, the below example implements a Qt.ForegroundRole handler which checks if the value in the indexed cell is numeric, and below zero. If it is, then the handler returns the text (foreground) colour red.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.ForegroundRole:value=self._data[index.row()][index.column()]if((isinstance(value,int)orisinstance(value,float))andvalue<0):returnQtGui.QColor('red')

If you add this to your model's data handler, all negative numbers will now appear red.

QTableView text formatting, with red negative numbers QTableView text formatting, with red negative numbers

Number range gradients

The same principle can be used to apply gradients to numeric values in a table to, for example, highlight low and high values. First we define our colour scale, which is taken from colorbrewer2.org.

COLORS=['#053061','#2166ac','#4393c3','#92c5de','#d1e5f0','#f7f7f7','#fddbc7','#f4a582','#d6604d','#b2182b','#67001f']

Next we define our custom handler, this time for Qt.BackgroundRole. This takes the value at the given index, checks that this is numeric then performs a series of operations to constrain it to the range 0…10 required to index into our list.

defdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.BackgroundRole:value=self._data[index.row()][index.column()]if(isinstance(value,int)orisinstance(value,float)):value=int(value)# Convert to integer for indexing.# Limit to range -5 ... +5, then convert to 0..10value=max(-5,value)# values < -5 become -5value=min(5,value)# valaues > +5 become +5value=value+5# -5 becomes 0, +5 becomes + 10returnQtGui.QColor(colors[value])

The logic used here for converting the value to the gradient is very basic, cutting off high/low values, and not adjusting to the range of the data. However, you can adapt this as needed, as long as the end result of your handler is to return a QColor or QBrush

QTableView with number-range colour gradients QTableView with number-range colour gradients

Icon & Image decoration

Each table cell contains a small decoration area which can be used to display icons, images or a solid block of colour, on the left hand side next to the data. This can be used to indicate data type, e.g. calendars for dates, ticks and crosses for bool values, or for a more subtle conditional-formatting for number ranges.

Below are some simple implementations of these ideas.

Indicating bool/date data types with icons

For dates we'll use Python's built-in datetime type. First, add the following import to the top of your file to import this type.

fromdatetimeimportdatetime

Then, update the data (set in the MainWindow.__init__) to add datetime and bool (True or False values), for example.

data=[[True,9,2],[1,0,-1],[3,5,False],[3,3,2],[datetime(2019,5,4),8,9],]

With these in place, you can update your model data method to show icons and formatted dates for date types, with the following.

#  icons indicating data typedefdata(self,index,role):ifrole==Qt.DisplayRole:value=self._data[index.row()][index.column()]ifisinstance(value,datetime):returnvalue.strftime('%Y-%m-%d')returnvalueifrole==Qt.DecorationRole:value=self._data[index.row()][index.column()]ifisinstance(value,datetime):returnQtGui.QIcon('calendar.png')
QTableView formatted dates with indicator icon QTableView formatted dates with indicator icon

The following shows how to use ticks and cross for boolean True and False values respectively.

# ticks and crosses for `bool`valuesdefdata(self,index,role):# existing `if role == Qt.DisplayRole:` block hidden# hidden for clarity.ifrole==Qt.DecorationRole:value=self._data[index.row()][index.column()]ifisinstance(value,bool):ifvalue:returnQtGui.QIcon('tick.png')returnQtGui.QIcon('cross.png')

You can of course combine the above together, or any other mix of Qt.DecorationRole and Qt.DisplayRole handlers. It's usually simpler to keep each type grouped under the same role if branch, or as your model becomes more complex, to create sub-methods to handle each role.

QTableView boolean indicators QTableView boolean indicators

Colour blocks

If you return a QColor for Qt.DecorationRole a small square of colour will be displayed on the left hand side of the cell, in the icon location. This is identical to the earlier Qt.BackgroundRole conditional formatting example, except now handling and responding to Qt.DecorationRole.

# color blocksifrole==Qt.DecorationRole:value=self._data[index.row()][index.column()]if(isinstance(value,int)orisinstance(value,float)):value=int(value)# Limit to range -5 ... +5, then convert to 0..10value=max(-5,value)# values < -5 become -5value=min(5,value)# valaues > +5 become +5value=value+5# -5 becomes 0, +5 becomes + 10returnQtGui.QColor(COLORS[value])
QTableView color block decorations QTableView color block decorations

Alternative Python data structures

So far in our examples we've used simple nested Python lists to hold our data for display. This is fine for simple tables of data, however if you're working with large data tables there are some other better options in Python, which come with additional benefits. In the next parts we'll look at two Python data table libraries — numpy and pandas— and how to integrate these with Qt.

Numpy

Numpy is a library which provides support for large multi-dimensional arrays or matrix data structures in Python. The efficient and high-performance handling of large arrays makes numpy ideal for scientific and mathematical applications. This also makes numpy arrays an good data store for large, single-typed, data tables in PyQt.

Using numpy as a data source

To support numpy arrays we need to make a number of changes to the model, first modifying the indexing in the data method, and then changing the row and column count calculations for rowCount and columnCount.

The standard numpy API provides element-level access to 2D arrays, by passing the row and column in the same slicing operation, e.g. _data[index.row(), index.column()]. This is more efficient than indexing in two steps, as for the list of list examples.

In numpy the dimensions of an array are available through .shape which returns a tuple of dimensions along each axis in turn. We get the length of each axis by selecting the correct item from this tuple, e.g. _data.shape[0] gets the size of the first axis.

The following complete example shows how to display a numpy array using Qt's QTableView via a custom model.

python
importsysfromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5.QtCoreimportQtimportnumpyasnpclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# Note: self._data[index.row()][index.column()] will also workvalue=self._data[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=np.array([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()
python
importsysfromPySide2importQtCore,QtGui,QtWidgetsfromPySide2.QtCoreimportQtimportnumpyasnpclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:# Note: self._data[index.row()][index.column()] will also workvalue=self._data[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=np.array([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()

While simple Python types such as int and float are displayed without converting to strings, numpy uses it's own types (e.g. numpy.int32) for array values. In order for these to be displayed we must first convert them to strings.

QTableView with numpy array QTableView with numpy array

With QTableView only 2D arrays can be displayed, however if you have a higher dimensional data structure you can combine the QTableView with a tabbed or scrollbar UI, to allow access to and display of these higher dimensions.

Pandas

Pandas is a Python library commonly used for data manipulation and analysis. It provides a nice API for loading 2D tabular data from various data sources and performing data analysis on it. By using the pandasDataTable as your QTableView model you can use these APIs to load and analyse your data from right within your application.

Using Pandas as a data source

The modifications of the model to work with pandas are fairly minor, requiring changes to the indexing in the data method and modifications to rowCount and columnCount. The changes for rowCount and columnCount are identical to numpy with pandas using a _data.shape tuple to represent the dimensions of the data.

For indexing we use the pandas.iloc method, for indexed locations — i.e. lookup by column and/or row index. This is done by passing the row, and then column to the slice _data.iloc[index.row(), index.column()] .

The following complete example shows how to display a pandas data frame using Qt QTableView via a custom model.

python
importsysfromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5.QtCoreimportQtimportpandasaspdclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:value=self._data.iloc[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]defheaderData(self,section,orientation,role):# section is the index of the column/row.ifrole==Qt.DisplayRole:iforientation==Qt.Horizontal:returnstr(self._data.columns[section])iforientation==Qt.Vertical:returnstr(self._data.index[section])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=pd.DataFrame([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],],columns=['A','B','C'],index=['Row 1','Row 2','Row 3','Row 4','Row 5'])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()
python
importsysfromPySide2importQtCore,QtGui,QtWidgetsfromPySide2.QtCoreimportQtimportpandasaspdclassTableModel(QtCore.QAbstractTableModel):def__init__(self,data):super(TableModel,self).__init__()self._data=datadefdata(self,index,role):ifrole==Qt.DisplayRole:value=self._data.iloc[index.row(),index.column()]returnstr(value)defrowCount(self,index):returnself._data.shape[0]defcolumnCount(self,index):returnself._data.shape[1]defheaderData(self,section,orientation,role):# section is the index of the column/row.ifrole==Qt.DisplayRole:iforientation==Qt.Horizontal:returnstr(self._data.columns[section])iforientation==Qt.Vertical:returnstr(self._data.index[section])classMainWindow(QtWidgets.QMainWindow):def__init__(self):super().__init__()self.table=QtWidgets.QTableView()data=pd.DataFrame([[1,9,2],[1,0,-1],[3,5,2],[3,3,2],[5,8,9],],columns=['A','B','C'],index=['Row 1','Row 2','Row 3','Row 4','Row 5'])self.model=TableModel(data)self.table.setModel(self.model)self.setCentralWidget(self.table)app=QtWidgets.QApplication(sys.argv)window=MainWindow()window.show()app.exec_()

An interesting extension here is to use the table header of the QTableView to display row and pandas column header values, which can be taken from DataFrame.index and DataFrame.columns respectively.

QTableView pandas DataTable, with column and row headers QTableView pandas DataTable, with column and row headers

For this we need to implement a Qt.DisplayRole handler in a custom headerData method. This receives section, the index of the row/column (0…n), orientation which can be either Qt.Horizontal for the column headers, or Qt.Vertical for the row headers, and role which works the same as for the data method.

The headerData method also receives other roles, which can be used to customise the appearance of the headers further.

Conclusion

In this tutorial we've covered the basics of using QTableView and a custom model to display tabular data in your applications. This was extended to demonstrate how to format data and decorate cells with icons and colours. Finally, we demonstrated using QTableView with tabular data from numpy and pandas data structures, including displaying custom column and row headers.

Roberto Alsina: Episodio 21: Quickie: pre commit hooks

$
0
0

Intentando cosas nuevas:

Un video corto mostrando una herramienta no tan conocida como debiera: http://pre-commit.com -- Control de calidad para tus commits.

Daniel Roy Greenfeld: Feed Generator

$
0
0

Late last year I changed my blog engine yet again. I've been happy with it so far, with the exception of XML feeds. The tooling I chose doesn't have good support for feeds, certainly not with the filtering I need. Specifically, I need to have a python feed, a family feed, and so on. As much as I love my wife and daughter, non-technical posts about them probably don't belong on places where this post will show up.

After trying to work within the framework of the blog engine (Vuepress), I got tired of fighting abstraction and gave up. My blog wouldn't have an XML feed.

Solution

Last night I decided to go around the problem. In 30 minutes I coded up a solution, a Python script that bypasses the Vuepress abstraction. You can see it below:

"""
generate_feed.py

Usage:

    python generate_feed.py TAGHERE

Note:

    Works with Python 3.8, untested otherwise.
"""

from glob import glob
import sys

try:
    from feedgen.feed import FeedGenerator
    from yaml import safe_load
    from markdown2 import Markdown
except ImportError:
    print("You need to install pyyaml, feedgen, and markdown2")
    sys.exit(1)


if __name__ == "__main__":

    try:
        tag = sys.argv[1]
    except IndexError:
        print('Add a tag argument such as "python"')
        sys.exit(1)

    # TODO - convert to argument
    YEARS = [
        "2020",
    ]

    markdowner = Markdown(extras=["fenced-code-blocks", ])

    fg = FeedGenerator()
    fg.id("https://daniel.roygreenfeld.com/")
    fg.title("pydanny")
    fg.author(
        {
            "name": "Daniel Roy Greenfeld",
            "email": "daniel.roy.greenfeld@roygreenfeld.com",
        }
    )
    fg.link(href="https://daniel.roygreenfeld.com", rel="alternate")
    fg.logo("https://daniel.roygreenfeld.com/images/personalPhoto.png")
    fg.subtitle("Inside the Head of Daniel Roy Greenfeld")
    fg.link(href=f"https://daniel.roygreenfeld.com/atom.{tag}.xml", rel="self")
    fg.language("en")

    years = [f"_posts/posts{x}/*.md" for x in YEARS]
    years.sort()
    years.reverse()

    def read_post(filename):
        with open(filename) as f:
            raw = f.read()[3:]

        config = safe_load(raw[: raw.index("---")])
        content = raw[raw.index("---") + 3 :]

        return config, content

    feed = []

    for year in years:
        posts = glob(year)
        posts.sort()
        posts.reverse()
        for post in posts:
            config, content = read_post(post)
            if tag not in config["tags"]:
                continue

            # add the metadata
            print(config["title"])
            entry = fg.add_entry()
            entry.id(f'https://daniel.roygreenfeld.com/{config["slug"]}.html')
            entry.title(config["title"])
            entry.description(config["description"])
            entry.pubDate(config["date"])

            # Add the content
            content = markdowner.convert(content)
            entry.content(content, type="html")

    print(fg.atom_str(pretty=True))
    fg.atom_file(f".vuepress/public/feeds/{tag}.atom.xml")

You call this on my blog for all python tagged content by running it thus:

python generate_feed.py python

The result validates per W3C and should work everywhere. Yeah!

Summary

This is what I've always enjoyed about Python. In a very short time I can throw together a script that makes my life better.

Daniel Roy Greenfeld: Our New Django Book Has Launched!

$
0
0

Audrey and I wrote a new book titled Django Crash Course. You can get it right now on our website at roygreenfeld.com/products/django-crash-course. Right now it's in alpha, which means only the e-book is available. Later we'll produce it in print formats (perfect bound, spiral, and hardcover).

As the book is in alpha, you're encouraged to submit bug reports to us for errors that you find. In turn we will give you credit for your contributions in not just the e-book, but also in the print paperback and online publicly on the web. This is your opportunity to have your name in one of our books as a contributor, which you are then welcome to add to your resume and LinkedIn profile. We followed the same pattern with our Two Scoops of Django books.

Check it out!

Cover for Django Crash Course

Django Crash Course is designed to build solid foundations for any developer looking to get quickly and solidly proficient with Django 3. Once you've finished the book, you'll be able to purchase Django Crash Course extensions on topics such as deployment on various platforms, Django REST Framework (DRF), Javascript frameworks like VueJS and/or React, third-party packages, and more.

Some of My Favorite Features

Friendly to Data Scientists

We chose Conda as our Python environment and pip for dependency management. This makes getting everything right across different operating systems for students very straightforward. These tools also empower us to create data-focused extensions to the core book.

Class-Based Views

Our opinion is that beginners should be taught Class-Based Views (CBVs) from the start. Knowing CBVs from the start makes understanding critical CBV-based packages like Django REST Framework much easier. We also believe that the explicit nature of the GET, POST, and other HTTP methods is easier for beginners to grasp. Years of success at levelling up people with Django supports our opinion.

Theme!

Tutorial themes are cheesy. That's why we use cheese as the theme for the main project in the book!

We also want to be the most successful dairy-themed tech book authors in the universe.

Conclusion

If you'd like to buy the book (or learn more about it), do so on the The Django Crash Course page on roygreenfeld.com.


tryexceptpass: Episode 3 - Decoupling Database Migrations at Application Startup

$
0
0
  • Data models change and evolve with your application.
  • There’s plenty of tools that keep track of database schemas and automatically generate scripts to upgrade or downgrade them.
  • It’s common for developers to run a migration at the start of their app before running app code.
  • Our author explains two common problems with this approach.
    1. Modern day production deployments and horizontal scaling can get you into a race condition.
    2. You start assuming that new code will only ever run with the new schema.
  • You can decouple migrations from code changes by disabling parallelism during this time.
  • Make it a separate command or lock the database during the upgrade.
  • We can easily implement locking ourselves in any language.
    • Use Redis locks if you’re ok with something external to the DB.
    • Use the DB itself by writing to an extra table to say that you’re upgrading it.
  • Plan your deployment appropriately so you can run old code with new by making migrations additive in the short term.
  • Using a script at startup that optionally performs the migration based on an environment variable integrates wel with Docker and cloud services.
  • Upgrades of both code and data should be part of your testing BEFORE releasing to production.

Daniel Roy Greenfeld: Our New Django Book Has Launched!

$
0
0

Audrey and I wrote a new book titled Django Crash Course. You can get it right now on our website at roygreenfeld.com/products/django-crash-course. Right now it's in alpha, which means only the e-book is available. Later we'll produce it in print formats (perfect bound, spiral, and hardcover).

As the book is in alpha, you're encouraged to submit bug reports to us for errors that you find. In turn we will give you credit for your contributions in not just the e-book, but also in the print paperback and online publicly on the web. This is your opportunity to have your name in one of our books as a contributor, which you are then welcome to add to your resume and LinkedIn profile. We followed the same pattern with our Two Scoops of Django books.

Check it out!

Cover for Django Crash Course

Django Crash Course is designed to build solid foundations for any developer looking to get quickly and solidly proficient with Django 3. Once you've finished the book, you'll be able to purchase Django Crash Course extensions on topics such as deployment on various platforms, Django REST Framework (DRF), Javascript frameworks like VueJS and/or React, third-party packages, and more.

Some of My Favorite Features

Friendly to Data Scientists

We chose Conda as our Python environment and pip for dependency management. This makes getting everything right across different operating systems for students very straightforward. These tools also empower us to create data-focused extensions to the core book.

Class-Based Views

Our opinion is that beginners should be taught Class-Based Views (CBVs) from the start. Knowing CBVs from the start makes understanding critical CBV-based packages like Django REST Framework much easier. We also believe that the explicit nature of the GET, POST, and other HTTP methods is easier for beginners to grasp. Years of success at levelling up people with Django supports our opinion.

Theme!

Tutorial themes are cheesy. That's why we use cheese as the theme for the main project in the book!

We also want to be the most successful dairy-themed tech book authors in the universe.

Conclusion

If you'd like to buy the book (or learn more about it), do so on the The Django Crash Course page on roygreenfeld.com.

Codementor: Deploy a Flask app on AWS EC2

$
0
0
Deploy your Flask app to AWS EC2 in simple steps

Continuum Analytics Blog: 6 Reasons Your Open-Source Data Science Pipeline Needs Attention Now

Anwesha Das: “Code is law” .. Technology is the regulator for today.

$
0
0

The technology leads, and the law follows, or is it vice a versa? Both sides are trying to prove their superiority above the other. Thus the situation has lead to much distance between the two. Both these species use Language (legal and programming) to attain their vision. However, both of them do not understand what the other is speaking. Instead, they stand in awe, disbelief, or scared about each other, their work. Technologists gaged law as a distant, creature, a thing to be afraid.

The end of the 20th century has marked the beginning of a world, the digital world. This world was designed for accessible communication, sharing of knowledge, and a home for mind. It created a sphere to be treated as equals without being discriminated against on the grounds of race, sex, color, caste, religion. It is a world that “is both everywhere and nowhere.” It is the reality of today. Now we have a parallel digital world to our physical world. The worlds are too conjoined to be separated.

The early internet was designed to be anonymous, private, stateless, and borderless. In 1992 when Sir Timothy John Berners-Lee seamed the world with the World Wide Web. Nevertheless, the subsequent innovations destroyed the early virtues. Instead, it is now operating to be exact opposite its fundamental ethos. The systems which once promised anonymity and privacy now has become the biggest panopticon world has ever noticed.

The early internet had a stateless architecture for interaction. It was impossible to trace out the “state” even a moment ago. The sites never asked to log in to the user. Therefore it was impossible to track someone’s digital activity. It kept the privacy of the user safe. However, it was terrible for business and commerce. Therefore Lou Montulli web-browser programmer, of Netscape invented a protocol for “Cookies” in 1994. This particular protocol through the HTTP protocol enabled the sites to know “who visited their websites.” It made it easy to target the user and tear apart the user privacy.

Similarly, the internet had a “Borderless” structure. TCP/IP does not, by design, know about “who the user is?", the geographic location of the user, or the user's activity. Later IP mapping technology was invented to answer the question of who, what, and where.

The legal system around the world tried to cope up with this change, by enacting, amending different sets of Acts, Rules, and Regulations. The speed at which the technology moving is much faster than the law. It is the time for the Legislative, and the Judiciary takes out their glasses of misbelief. And the technologists to comfortable with law. It is time for them to take a step back and think. Think that the innocent piece of computer program they write how in interjecting, affecting human rights, lives, and society.

Thinkers from as early as the late ’90s tried to enunciate the nature of the relationship between law and technology. In February 1998, Joel.R. Reidenberg, in his Lex Informatica article “The Formulation of Information Policy Rules through Technology.” There he laid down the path of technology being a regulator by itself. In 1999 Lawrence Lessig consolidated all the ideas in his book named Code and Other Laws of Cyberspace. The quote from the book which stood out is “Code is Law.”

Why and how the “Code is Law”?

Law is what society needs. Law is what regulates society. However, there are several conditions apart from law that affect the regulation - Norms, Commerce, and functional physical conditions. The Legislature does get affected by these attributes and take into account while drafting the laws. Thereby the abovementioned conditions affect lawmaking.

In both the abovementioned examples of change in TCP/IP protocol and introduction of the IP mapping system, the commerce needed a change. The change
took place by a piece of code. The change resulted in the alteration of behavior. Therefore the society and then law both changed due to the introduction of a small piece of code.

Therefore technologists to understand that the code they write has severe practical consequences. Therefore they do not have the luxury of ignorance to think that they are only dealing with technologies and not with human lives. The ethics for programmers have become crucial than ever as they are the people who possess the power to make the world a better place at the same time they can destroy it and all by an innocent piece of code. It is the time for lawyers, technologists, activists to come together to bring awareness to society.

PyBites: Introduction to Python Functions

$
0
0

While seemingly "simple" to comprehend and use, functions can definitely be a bit of a hurdle to overcome when you're new to Python or programming in general. In this article I'm going to break down what a function is and how you can use them to be a better coder.

What is a function?

Straight to the point, I like it.

Let's just say that a function is a reusable bit of code. It could be a single line of code or it could be many. It'll only have one purpose and it'll do it well.

Pretty vague right? Well that's kind of unavoidable. When you understand the ideology behind a function, then you'll start to get it.

Let's put the FUN in function!

Excuse the sub-header, I just had to say it.

First, let's look at why we need functions. Why do we need reusable bits of code?

The quickest answer is that it makes repetition easy.

For a moment, let's replace the word "function" with the word task. Think of a repetitive task that you have in your daily life. How about taking a shower.

If we break down taking a shower into a series of steps we could say that a shower consists of:

  1. Turning on the water.
  2. Using shampoo.
  3. Using soap.
  4. Rinsing off.
  5. Turning off the water.

If we were to write that in Python code it could look like this:

print("Turned on the water.")print("Used shampoo.")print("Used soap.")print("Rinsed off.")print("Turned off the water.")

Nice! We have the code to print out the steps or actions required for taking a shower.

Here's the thing though. What if, as part of a video game a player can choose to "take a shower" when certain things happen? Am I really going to write out the code for taking a shower every single time it's needed? That's five lines of code that I'd have to include for every scenario where they could choose to take a shower.

Bring on the fun(ction)!

This is why functions are so amazing. We can take those five repetitive steps for taking a shower and wrap them up in a nice little ball to be called whenever needed.

Here's what a function for taking a shower would look like:

deftake_a_shower():print("Turned on the water.")print("Used shampoo.")print("Used soap.")print("Rinsed off.")print("Turned off the water.")

To break it down:

  1. The def is how we indicate we're writing a function.
  2. We give the function a nice, user friendly name.
  3. The empty ()s allow you to specify any objects/variable you want to pass into the function. We don't need any right now so don't worry about that for now.
  4. Any code that is considered to be part of our function needs to be indented by 4 spaces. Thus, all of our print statements are in our take_a_shower function.

Calling the function

We have a take_a_shower function that contains our steps. Now what?

Well, any time a player chooses to take a shower, we can just call the take_a_shower function and it'll print out our actions for taking a shower. The way to call a function is to just use the function's name:

take_a_shower()

It makes more sense when you use it. We could add it to a game like this:

ifplayer=="stinky":
    take_a_shower()else:
    print("You smell fantastic! Let's go eat some chocolate!")

In this scenario, if the player is stinky, then the game calls the take_a_shower function. Otherwise the player gets to go eat some chocolate. When the take_a_shower function is called, it executes the lines of code we put in the function, i.e., our 5 print statements.

Simplistic but powerful

I know this seems inane and simplistic but I wanted to demonstrate it with something relatable.

The reality here is we can break down just about anything into chunks and create functions out of them, ready to be called whenever necessary.

Rather than having a long page of code, try breaking your code up into functions that can be called at any time.

When you're starting out with Python and creating really basic programs for yourself, some easy functions to create for yourself could include:

  • Basic math, e.g.: a + b
  • A text menu system for your program
  • Collecting user input
  • Different aspects/tasks of a text based game
  • Randomisation
  • Timers
  • Date Calculation

The list is essentially endless but you get the point.

Take Action

If you're new to Python and functions, go look at any code you've written to date and see how you can refactor your code and use functions instead.

Once done, leave a comment below and tell us about the functions you created!

Keep Calm and Code in Python!

-- Julian

With so many avenues to pursue in Python it can be tough to know what to do. If you're looking for some direction or want to take your Python code and career to the next level, schedule a call with us now. We can help you!

Mike Driscoll: How to Check if a File is a Valid Image with Python

$
0
0

Python has many modules in its standard library. One that is often overlooked is imghdr which lets you identify what image type that is contained in a file, byte stream or path-like object.

The imghdr can recognize the following image types:

  • rgb
  • gif
  • pbm
  • pgm
  • ppm
  • tiff
  • rast
  • xbm
  • jpeg / jpg
  • bmp
  • png
  • webp
  • exr

Here is how you would use it imghdr to detect the image type of a file:

>>> import imghdr
>>> path = 'python.jpg'
>>> imghdr.what(path)
'jpeg'
>>> path = 'python.png'
>>> imghdr.what(path)
'png'

All you need to do is pass a path to imghdr.what(path) and it will tell you what it thinks the image type is.

An alternative method to use would be to use the Pillow package which you can install with pip if you don’t already have it.

Here is how you can use Pillow:

>>> from PIL import Image
>>> img = Image.open('/home/mdriscoll/Pictures/all_python.jpg')
>>> img.format
'JPEG'

This method is almost as easy as using imghdr. In this case, you need to create an Image object and then call its format attribute. Pillow supports more image types than imghdr, but the documentation doesn’t really say if the format attribute will work for all those image types.

Anyway, I hope this helps you in identifying the image type of your files.

The post How to Check if a File is a Valid Image with Python appeared first on The Mouse Vs. The Python.


Mike Driscoll: PyDev of the Week: Paul Sokolovsky

$
0
0

This week we welcome Paul Sokolovsky as our PyDev of the Week! Paul is the creator of Pycopy, which is described as “a minimalist and memory-efficient Python implementation for constrained systems, microcontrollers, and just everything”. You can check out more of his contributions to open source on Github. Let’s take a few moments to get to know Paul better!

Paul Sokolovsky

Can you tell us a little about yourself (hobbies, education, etc):

I have Computer Science as my first masters, and later got another masters in Linguistics – when I was a CS student I was interested in Natural Language Processing subfield of AI, and wanted to get a formal degree to work in that areas, perhaps in academia, but that never panned out, I got sucked up into the IT industry, a common story ;-).

Hobbies – well, nothing special, I like to travel, and even if a plane carries me far away, I like to get on my feet and explore like humans did it for millennia. Though if there’s a motorbike for rent, I like to ride it to a more distant mountain before climbing it. My latest interest is history. Like, everyone took history lessons in school and might have their “favorite” history of a particular country at particular timeframe, but trying to grasp history of mankind across the mentioned millennia is a different matter.

Why did you start using Python?

Oh, as many students, at that age I drooled over Lisp and Scheme programming languages. I did a few projects in them, and while they were definitely great and I could grok them, it occurred to me that I wasn’t not sure about the rest of world. Programming is inherently social activity. And besides the power of those languages, their drawbacks were also evident, and while I was able to surmount them, other people might be not just unable, but even unwilling to do that.

So, I started my quest of the best-in-compromise programming languages, sifting thru dozens of both mainstream and obscure languages of that time. I stopped when I found Python. I think of it as “Lisp for real world”. Those were the times of Python 1.5.1…

What other programming languages do you know and which is your favorite?

Based on the above, it shouldn’t come as surprise that Python is my favorite languages. I know a bunch of scripting languages – Perl, PHP, Java, JavaScript, Lisp, Scheme, and more “systemish” ones like C and C++. I definitely watch the space and keep an eye on Go, Rust which approaching upstream and niche contenders like Nim, Zig, whatever. I don’t rush into using them – again, I passed that stage of language-hopping when I was a student.

What projects are you working on now?

My biggest project currently is Pycopy, which is a lightweight and minimalist implementation of Python, with a library and software ecosystem around it. While I loved Python all along, that was one of problems I’ve seem with it – it’s too big, so I couldn’t use it everywhere I wanted, like small embedded devices like routers or WiFi access points (that kind would be nowadays called “IoT devices”). I had to leave Python and try to involve with smaller languages like Lua, but I failed to acquire Stockholm syndrome for them, and always came back to Python, and thus zero progress with some interesting projects requiring small and cheap devices, like Smart Home.

That’s why when I heard about MicroPython Kickstarter campaign and read some technical descriptions which were very solid, I got instantly hooked and pledged to its author to open up source soon after the conclusion of the very successful campaign, instead of at the time of shipping the devices – all to allow open-source cooperation. I’ve
contributed to MicroPython for around 4 years since, having written some 30% of the codebase. Sadly, going forward, I felt that the ideas of minimalism with which we started are being betrayed, while conflict of interest was growing between a contributor doing a work in his own free time, based on beliefs and ideals, vs a small business which needs to satisfy its customers to stay afloat and generate profit. So, we parted our ways.

That’s how Pycopy came to be. Unlike MicroPython, which came to concentrate on “microcontrollers” (a mistake repeated after another small Python implementation, PyMite, which in my opinion came is its downfall), Pycopy wants to be a full-stack language for “everything”. The ideals is to use one language on your desktop/laptop, in the cloud if you use that, on mobile devices if you care to run your software on them, and all the way down to microcontrollers, though while cool, it still remains a relatively niche case, with small, but Linux-based devices often more accessible and cost-effective.

So, beyond the core interpreter with minimal set of modules, the Pycopy project offers standard library, which strives to be compatible with CPython, while still remaining small, a web micro-framework, database drivers, SDL (graphics library) bindings, FFMPEG (video/audio library) bindings, recently being developed binding for LLVM, which is compiler backend to easily develop accelerator JIT engines, as something which Python community direly needs, to counter those simpleton claims that “Python is slow” – and many other things. It’s really cool, you should check it out. It’s definitely something I wanted to develop whole my life, and I’m glad I’ll be content with having done that when I grow old ;-).

One important thing is that all the above isn’t done in some formal plan or to reimplement everything what “big” Python has. It’s all done based on demand/interest from myself and contributors. Like, I got interested in playing with video surveillance and that’s why I implemented FFMPEG bindings to access video from a camera (instead of
streaming it into to an unknown party’s cloud). I find this as recent, maybe niche, but definitely a trend – to come to up with human-scale computing – instead of corporate-scale or industry-scale – something, which a mere human can understand as a whole, modify for one’s own needs, and extend. That’s exactly the idea behind the Pycopy project! And don’t get me wrong – we need corporate-scale or industry-scale projects too, that’s what I do at my day job. But in my own free time, which generally belongs to my family and fellow humans with which I share common interests, I’d rather do something human-size.

Any other Python projects caught your attention recently?

One great project I spotted too late last August is PPCI, Pure PythonCompiler Infrastructure by Windel Bouwman. “Too late” as I usually watch project spaces of interest to me, for example I’m aware of 3-4 C language compilers implemented in Python. But PPCI is a true gem, living to its promise of being not just a small adhoc compiler, but the entire compiler infrastructure from parsing down to “almost industrial grade” optimization and codegeneration. And with all that, it’s largely a result of one man’s work in his own free time, which again shows the power Python brings. All in all, it’s a kind of project I always dreamed to work on, but never was brave enough to start. I now try to carve out time to contribute to it, and envision it to become a go-to project for all people interested in compilers and optimization.

Second project is a recent find. As an intro, it by now should be clear that I don’t just develop my own project, but keep an eye on the wide community progress. In my opinion, it’s the base of open-source idea that duplication of effort should be avoided, but if it happens, an author of such a project should have a clear answer for both themselves and users/contributors of why the duplication happens. So, I already mentioned PyMite as a “spiritual successor” of MicroPython/Pycopy.

Another similar project was TinyPy. Both of these projects implemented just a small subset of the language, so in effect they were Python-like languages and barely could run any real Python code. That’s why I jumped on MicroPython bandwagon, which promised (and delivered) almost complete compatibility with Python language (but not libraries, though users could add those themselves). Despite that, TinyPy is really cool project, e.g. it implements bytecode compiler in Python itself (something which I’m doing now for Pycopy too). The more interesting was to find out that some guy decided to revive the old TinyPy and build a new project Called TPython++ based on it. To me personally it looks like an adhoc attempt to marry it with a game engine, but I can’t know it all. If anything, it shows how vibrant, wide-reaching the Python community is.

Thanks for doing the interview, Paul!

The post PyDev of the Week: Paul Sokolovsky appeared first on The Mouse Vs. The Python.

Mike Driscoll: Python 101 2nd Edition Kickstarter Preview

$
0
0

I have been kicking around the idea of updating my first book, Python 101, for over a year. After doing a lot of planning and outlining, I am ready to announce that I have started work on the book.

Python 101 2nd Ed Kickstarter

The new Python 101, 2nd Edition, will be a completely new book rather than just an updated book like a lot of publishers like to do. I feel like updating a chapter or two is a disservice to my readers. This new book will cover most of the items in the the original. However I am dropping the the tour of the standard library and replacing it with a “How-To” section as I think seeing live, working code is better than talking about syntax.

You can follow my Kickstarter now if you’d like to. The Kickstarter will go live February 17th at approx 8 a.m. CST and run 30 days.

The post Python 101 2nd Edition Kickstarter Preview appeared first on The Mouse Vs. The Python.

Catalin George Festila: Python 3.7.5 : Using the hug framework - part 001.

$
0
0
Today I will come with another tutorial series about the hug framework. The hug framework is on the top 3 performing web frameworks for Python and comes with the cleanest way to create HTTP REST APIs on Python 3. The official webpage can be found hug web page with a good area for learn. Let's install this python package. [mythcat@desk projects]$ mkdir hug_001 [mythcat@desk projects]$ cd hug_001/

Mike Driscoll: How to Check if a File is a Valid Image with Python

$
0
0

Python has many modules in its standard library. One that is often overlooked is imghdr which lets you identify what image type that is contained in a file, byte stream or path-like object.

The imghdr can recognize the following image types:

  • rgb
  • gif
  • pbm
  • pgm
  • ppm
  • tiff
  • rast
  • xbm
  • jpeg / jpg
  • bmp
  • png
  • webp
  • exr

Here is how you would use it imghdr to detect the image type of a file:

>>> import imghdr
>>> path = 'python.jpg'
>>> imghdr.what(path)
'jpeg'
>>> path = 'python.png'
>>> imghdr.what(path)
'png'

All you need to do is pass a path to imghdr.what(path) and it will tell you what it thinks the image type is.

An alternative method to use would be to use the Pillow package which you can install with pip if you don’t already have it.

Here is how you can use Pillow:

>>> from PIL import Image
>>> img = Image.open('/home/mdriscoll/Pictures/all_python.jpg')
>>> img.format
'JPEG'

This method is almost as easy as using imghdr. In this case, you need to create an Image object and then call its format attribute. Pillow supports more image types than imghdr, but the documentation doesn’t really say if the format attribute will work for all those image types.

Anyway, I hope this helps you in identifying the image type of your files.

The post How to Check if a File is a Valid Image with Python appeared first on The Mouse Vs. The Python.

Mike Driscoll: PyDev of the Week: Paul Sokolovsky

$
0
0

This week we welcome Paul Sokolovsky as our PyDev of the Week! Paul is the creator of Pycopy, which is described as “a minimalist and memory-efficient Python implementation for constrained systems, microcontrollers, and just everything”. You can check out more of his contributions to open source on Github. Let’s take a few moments to get to know Paul better!

Paul Sokolovsky

Can you tell us a little about yourself (hobbies, education, etc):

I have Computer Science as my first masters, and later got another masters in Linguistics – when I was a CS student I was interested in Natural Language Processing subfield of AI, and wanted to get a formal degree to work in that areas, perhaps in academia, but that never panned out, I got sucked up into the IT industry, a common story ;-).

Hobbies – well, nothing special, I like to travel, and even if a plane carries me far away, I like to get on my feet and explore like humans did it for millennia. Though if there’s a motorbike for rent, I like to ride it to a more distant mountain before climbing it. My latest interest is history. Like, everyone took history lessons in school and might have their “favorite” history of a particular country at particular timeframe, but trying to grasp history of mankind across the mentioned millennia is a different matter.

Why did you start using Python?

Oh, as many students, at that age I drooled over Lisp and Scheme programming languages. I did a few projects in them, and while they were definitely great and I could grok them, it occurred to me that I wasn’t not sure about the rest of world. Programming is inherently social activity. And besides the power of those languages, their drawbacks were also evident, and while I was able to surmount them, other people might be not just unable, but even unwilling to do that.

So, I started my quest of the best-in-compromise programming languages, sifting thru dozens of both mainstream and obscure languages of that time. I stopped when I found Python. I think of it as “Lisp for real world”. Those were the times of Python 1.5.1…

What other programming languages do you know and which is your favorite?

Based on the above, it shouldn’t come as surprise that Python is my favorite languages. I know a bunch of scripting languages – Perl, PHP, Java, JavaScript, Lisp, Scheme, and more “systemish” ones like C and C++. I definitely watch the space and keep an eye on Go, Rust which approaching upstream and niche contenders like Nim, Zig, whatever. I don’t rush into using them – again, I passed that stage of language-hopping when I was a student.

What projects are you working on now?

My biggest project currently is Pycopy, which is a lightweight and minimalist implementation of Python, with a library and software ecosystem around it. While I loved Python all along, that was one of problems I’ve seem with it – it’s too big, so I couldn’t use it everywhere I wanted, like small embedded devices like routers or WiFi access points (that kind would be nowadays called “IoT devices”). I had to leave Python and try to involve with smaller languages like Lua, but I failed to acquire Stockholm syndrome for them, and always came back to Python, and thus zero progress with some interesting projects requiring small and cheap devices, like Smart Home.

That’s why when I heard about MicroPython Kickstarter campaign and read some technical descriptions which were very solid, I got instantly hooked and pledged to its author to open up source soon after the conclusion of the very successful campaign, instead of at the time of shipping the devices – all to allow open-source cooperation. I’ve
contributed to MicroPython for around 4 years since, having written some 30% of the codebase. Sadly, going forward, I felt that the ideas of minimalism with which we started are being betrayed, while conflict of interest was growing between a contributor doing a work in his own free time, based on beliefs and ideals, vs a small business which needs to satisfy its customers to stay afloat and generate profit. So, we parted our ways.

That’s how Pycopy came to be. Unlike MicroPython, which came to concentrate on “microcontrollers” (a mistake repeated after another small Python implementation, PyMite, which in my opinion came is its downfall), Pycopy wants to be a full-stack language for “everything”. The ideals is to use one language on your desktop/laptop, in the cloud if you use that, on mobile devices if you care to run your software on them, and all the way down to microcontrollers, though while cool, it still remains a relatively niche case, with small, but Linux-based devices often more accessible and cost-effective.

So, beyond the core interpreter with minimal set of modules, the Pycopy project offers standard library, which strives to be compatible with CPython, while still remaining small, a web micro-framework, database drivers, SDL (graphics library) bindings, FFMPEG (video/audio library) bindings, recently being developed binding for LLVM, which is compiler backend to easily develop accelerator JIT engines, as something which Python community direly needs, to counter those simpleton claims that “Python is slow” – and many other things. It’s really cool, you should check it out. It’s definitely something I wanted to develop whole my life, and I’m glad I’ll be content with having done that when I grow old ;-).

One important thing is that all the above isn’t done in some formal plan or to reimplement everything what “big” Python has. It’s all done based on demand/interest from myself and contributors. Like, I got interested in playing with video surveillance and that’s why I implemented FFMPEG bindings to access video from a camera (instead of
streaming it into to an unknown party’s cloud). I find this as recent, maybe niche, but definitely a trend – to come to up with human-scale computing – instead of corporate-scale or industry-scale – something, which a mere human can understand as a whole, modify for one’s own needs, and extend. That’s exactly the idea behind the Pycopy project! And don’t get me wrong – we need corporate-scale or industry-scale projects too, that’s what I do at my day job. But in my own free time, which generally belongs to my family and fellow humans with which I share common interests, I’d rather do something human-size.

Any other Python projects caught your attention recently?

One great project I spotted too late last August is PPCI, Pure PythonCompiler Infrastructure by Windel Bouwman. “Too late” as I usually watch project spaces of interest to me, for example I’m aware of 3-4 C language compilers implemented in Python. But PPCI is a true gem, living to its promise of being not just a small adhoc compiler, but the entire compiler infrastructure from parsing down to “almost industrial grade” optimization and codegeneration. And with all that, it’s largely a result of one man’s work in his own free time, which again shows the power Python brings. All in all, it’s a kind of project I always dreamed to work on, but never was brave enough to start. I now try to carve out time to contribute to it, and envision it to become a go-to project for all people interested in compilers and optimization.

Second project is a recent find. As an intro, it by now should be clear that I don’t just develop my own project, but keep an eye on the wide community progress. In my opinion, it’s the base of open-source idea that duplication of effort should be avoided, but if it happens, an author of such a project should have a clear answer for both themselves and users/contributors of why the duplication happens. So, I already mentioned PyMite as a “spiritual successor” of MicroPython/Pycopy.

Another similar project was TinyPy. Both of these projects implemented just a small subset of the language, so in effect they were Python-like languages and barely could run any real Python code. That’s why I jumped on MicroPython bandwagon, which promised (and delivered) almost complete compatibility with Python language (but not libraries, though users could add those themselves). Despite that, TinyPy is really cool project, e.g. it implements bytecode compiler in Python itself (something which I’m doing now for Pycopy too). The more interesting was to find out that some guy decided to revive the old TinyPy and build a new project Called TPython++ based on it. To me personally it looks like an adhoc attempt to marry it with a game engine, but I can’t know it all. If anything, it shows how vibrant, wide-reaching the Python community is.

Thanks for doing the interview, Paul!

The post PyDev of the Week: Paul Sokolovsky appeared first on The Mouse Vs. The Python.

Viewing all 22412 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>