How to seamlessly support typing.Protocol
on Python versions older and newer than 3.8. At the same time.
Hynek Schlawack: typing.Protocol Across Python Versions
ItsMyCode: [Solved] ImportError: No module named matplotlib.pyplot
The ImportError: No module named matplotlib.pyplot occurs if you have not installed the Matplotlib library in Python and trying to run the script which has matplotlib related code. Another issue might be that you are not importing the matplotlib.pyplot
properly in your Python code.
In this tutorial, let’s look at installing the matplotlib
module correctly in different operating systems and solve No module named matplotlib.pyplot.
ImportError: No module named matplotlib.pyplot
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Matplotlib
is not a built-in module (it doesn’t come with the default python installation) in Python, you need to install it explicitly using the pip installer and then use it.
If you looking at how to install pip or if you are getting an error installing pip checkout pip: command not found to resolve the issue.
Matplotlib releases are available as wheel packages for macOS, Windows and Linux on PyPI. Install it using pip
:
Install Matplotlib in OSX/Linux
The recommended way to install the matplotlib module is using pip or pip3 for Python3 if you have installed pip already.
Using Python 2
$ sudo pip install matplotlib
Using Python 3
$ sudo pip3 install matplotlib
Alternatively, if you have easy_install in your system, you can install matplotlib using the below command.
Using easy install
$ sudo easy_install -U matplotlib
For CentOs
$ yum install python-matplotlib
For Ubuntu
To install matplotlib module on Debian/Ubuntu :
$ sudo apt-get install python3-matplotlib
Install Matplotlib in Windows
In the case of windows, you can use pip or pip3 based on the Python version, you have to install the matplotlib module.
$ pip3 install matplotlib
If you have not added the pip to the environment variable path, you can run the below command in Python 3, which will install the matplotlib module.
$ py -m pip install matplotlib
Install Matplotlib in Anaconda
Matplotlib is available both via the anaconda main channel and it can be installed using the following command.
$ conda install matplotlib
You can also install it via the conda-forge community channel by running the below command.
$ conda install -c conda-forge matplotlib
In case you have installed it properly but it still throws an error, then you need to check the import statement in your code.
In order to plot the charts properly, you need to import the matplotlib as shown below.
# importing the matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
# car sales data
total_sales = [3000, 2245, 1235, 5330, 4200]
location = ['Bangalore', 'Delhi', 'Chennai', 'Mumbai', 'Kolkatta']
# Seaborn color palette to plot pie chart
colors = sns.color_palette('pastel')
# create pie chart using matplotlib
plt.pie(total_sales, labels=location, colors=colors)
plt.show()
Malthe Borch: PowerShell Remoting on Windows using Airflow
Apache Airflow is an open-source platform that allows you to programmatically author, schedule and monitor workflows. It comes with out-of-the-box integration to lots of systems, but the adage that the devil's in the details holds true with integration in general and remote execution is no exception – in particular PowerShell Remoting which comes with Windows as part of WinRM (Windows Remote Management).
In this post, I'll share some insights from a recent project on how to use Airflow to orchestrate the execution of Windows jobs without giving up on security.
Traditionally, job scheduling was done using agent software. An agent running locally as a system service would wake up and execute jobs at the scheduled time, reporting results back to a central system.
The configuration of the job schedule is either done by logging into the system itself or using a control channel. For example, the agent might connect to a central system to pull down work orders.
Meanwhile, Airflow has no such agents! Conveniently, WinRM works in push mode. It's a service running on Windows that you connect to using HTTP (or HTTPS). It's basically like connecting to a database and running a stored procedure.
From a security perspective, push mode is fundamentally different because traffic is initiated externally. While we might want to implement a thin agent to overcome this difference, such code is a liability on its own. Luckily, PowerShell Remoting comes with a framework that allows us to substantially limit the attack surface.
The aptly named Just-Enough-Administration (JEA) framework is basically sudo on steroids. It allows us to use PowerShell as an API, constraining the remote management interface to a configurable set of commands and executing as a specific user.
We can avoid running arbitrary code entirely by encapsulating the implementation details in predefined commands. In addition, we also separate the remote user that connects to the WinRM service from the user context that executes commands.
You can use PowerShell Remoting without JEA and/or constrained endpoints. But the intersection of Airflow and Windows is typically a bigger company or organization where security concerns mean that you want both of these.
As an aside, I mentioned stored procedures earlier on. Using JEA to change context to a different user is equivalent of Definer's Rights vs Invoker's Rights. Arguably, in a system-to-system integration, using Definer's Rights is helpful in reducing the attack surface because you can define and encapsulate the required functionality.
The steps required to register a JEA configuration are relatively straight-forward. I won't describe them in detail here but the following bullets should give an overview:
In summary, registering a JEA configuration can be as simple as defining a single role capabilities file and running a command to register the configuration.
Now, enter Airflow!
To get started, you'll need to add the PowerShell Remoting Protocol Provider to your Airflow installation.
Add a connection by providing the hostname of your Windows machine, username and password. If you're using HTTP (rather than HTTPS) then you should set up the connection to require Kerberos authentication such that credentials are not sent in clear text (in addition, WinRM will encrypt the protocol traffic using the Kerberos session key).
To require Kerberos authentication, provide {"auth": "kerberos"} in the connection extras. Most of the extra configuration options from the underlying Python library pypsrp are available as connection extras. For example, a JEA configuration (if using) can be specified using the "configuration_name" key.
You will need to install additional Python packages to use Kerberos. Here's a requirements file with the necessary dependencies:
apache-airflow-providers-microsoft-psrp gssapi krb5 pypsrp[kerberos]
Finally, a note on transport security. When WinRM is used with an HTTP listener, Kerberos authentication (acting as trusted 3rd party) supplants the use of SSL/TLS through the transparent encryption scheme employed by the protocol. You can configure WinRM to support only Kerberos (by default, "Negotiate" is also enabled) to ensure that all connections are secured in this way. Note that your IT department might still insist on using HTTPS.
Historically, Windows machines feel worse over time for no particular reason. It's common to restart them once in a while. We can use Airflow to do that!
from airflow.providers.microsoft.psrp.operators.psrp import PSRPOperator default_args = { "psrp_conn_id": <connection id> } with DAG(..., default_args=default_args) as dag: # "task_id" defaults to the value of "cmdlet" so can omit it here. restart_computer = PSRPOperator(cmdlet="Restart-Computer", parameters={"Force": None})
This will restart the computer forcefully (which is not a good idea, but it illustrates the use of parameters). In the example, "Force" is a switch so we pass a value of None– but values can be numbers, strings, lists and even dictionaries.
In the first example, we saw how task_id defaults to the value of cmdlet– that is sometimes useful, but it's not the only way we can cut verbosity.
PowerShell cmdlets (and functions which for our purposes are the same thing) follow the naming convention verb-noun. When we define our own commands, we can for example use the verb "Invoke", e.g. "Invoke-Job1". But invoking stuff is something we do all the time in Airflow and we don't want our task ids to have this meaningless prefix all over the place.
Here's an example of fixing that, making good use of Airflow's templating syntax:
from airflow.providers.microsoft.psrp.operators.psrp import PSRPOperator default_args = { "psrp_conn_id": <connection id>, "cmdlet": "Invoke-{{ task.task_id }}", } with DAG(..., default_args=default_args) as dag: # "cmdlet" here will be provided automatically as "Invoke-Job1". job1 = PSRPOperator(task_id="Job1")
Windows can have its verb-noun naming convention and we get to have short task ids.
By default, Airflow serializes operator output using XComs – a simple means of passing state between tasks.
Since XComs must be JSON-serializable, the PSRPOperator automatically converts PowerShell output values to JSON using ConvertTo-Json and then deserializes in Python before Airflow will then reserialize it when saving the XComs result to the database – there's room for optimization there! The point is that most of the time, you don't have to worry about it.
You can for example list a directory using Get-ChildItem and the resulting table will be returned as a list of dicts. Note that PowerShell has some flattening magic which generally does the right thing in terms of return values:
That is, functions don't really return a single value. Instead, there is a stream of output values stemming from each command being executed.
With do_xcom_push set to false, no XComs are saved and the conversion to JSON also does not happen.
PowerShell has a number of other streams besides the output stream. These are logged to Airflow's task log by default. Unlike the default logging setup, the debug is also included unless explicitly turned off logging_level– one justification for this is given in the next section.
In traditional automation, command echoing has been a simple way to figure out what a script is doing. PowerShell is a different beast altogether, but it is possible to expose the commands being executed using Set-PSDebug.
from pypsrp.powershell import Command, CommandParameter PS_DEBUG = Command( cmd="Set-PSDebug", args=(CommandParameter(name="Trace", value=1), ), is_script=False, ) default_args = { "psrp_conn_id": <connection id>, "psrp_session_init": PS_DEBUG, }
This requires that Set-PSDebug is listed under "VisibleCmdlets" in the role capabilities (like ConvertTo-Json if using XComs).
A tracing line will be sent for each line passed over during execution at logging level debug, but as mentioned above, this will nonetheless get included in the task log by default. Don't enable this and have a loop that iterates hundreds of times. You will quickly fill up the task log with useless messages.
Happy remoting!
IslandT: Move chess piece on the chessboard with python
Hello, it is me again and this is the second article about the chess game project which I have created earlier with python. In this article, I have updated the previous python program which will now be able to relocate the chess piece on the chessboard to a new location after I have clicked on any square on the chessboard. This is the first step to move the chess piece on the chessboard because after this, I will make the piece moves slowly by sliding it along the path to its destination as well as making sure the piece can move to that square, for example, a pawn can only move in the up or down direction and can only move sideways if it consumes another piece. All these will take another level of planning but for now, let us just concentrate on the relocation of the piece.
The pawn will originally be situated on one of the squares just like what you have seen in the previous article but after I have clicked on a new square it will relocate to that new square disregarding whether it can moves there or not!
The entire plan to achieve this is to get the dictionary key of that square and then plug in the key to the chess_dict dictionary to get the coordinates needed to draw the new position of the sprite.
# print the square name which you have clicked on for key, value in chess_dict.items(): if (x * width, y * width) == (value[0],value[1]): print(key) previous_square_list.append(key) #insert the next square if len(previous_square_list) > 1: previous_square_list.remove(previous_square_list[0])
The above snippet will save the new key whenever a user clicked on one of the squares on the chessboard.
This will draw the chess piece on the new location…
#draw the position of the pawn sprite if len(previous_square_list) == 0: screen.blit(pawn0, (0, 64)) # just testing... else: screen.blit(pawn0,(chess_dict[previous_square_list[0]])) # this will draw the pawn on new position
At first, when there is no click the piece will appear in the original location, after someone has clicked on it once the chess piece will get drawn to the new location.
Here is the entire code…
import sys, pygame import math pygame.init() size = width, height = 512, 512 white = 255, 178, 102 black = 126, 126, 126 hightlight = 192, 192, 192 title = "IslandT Chess" width = 64 # width of the square original_color = '' #empty chess dictionary chess_dict = {} #chess square list chess_square_list = [ "a8", "b8", "c8", "d8", "e8", "f8", "g8", "h8", "a7", "b7", "c7", "d7", "e7", "f7", "g7", "h7", "a6", "b6", "c6", "d6", "e6", "f6", "g6", "h6", "a5", "b5", "c5", "d5", "e5", "f5", "g5", "h5", "a4", "b4", "c4", "d4", "e4", "f4", "g4", "h4", "a3", "b3", "c3", "d3", "e3", "f3", "g3", "h3", "a2", "b2", "c2", "d2", "e2", "f2", "g2", "h2", "a1", "b1", "c1", "d1", "e1", "f1", "g1", "h1" ] # chess square position chess_square_position = [] #pawn image pawn0 = pygame.image.load("pawn.png") # create a list to map name of column and row for i in range(0, 8) : # control row for j in range(0, 8): # control column chess_square_position.append((j * width, i * width)) # create a dictionary to map name of column and row for n in range(0, len(chess_square_position)): chess_dict[chess_square_list[n]] = chess_square_position[n] screen = pygame.display.set_mode(size) pygame.display.set_caption(title) rect_list = list() # this is the list of brown rectangle #the previously touched square previous_square_list = [] # used this loop to create a list of brown rectangles for i in range(0, 8): # control the row for j in range(0, 8): # control the column if i % 2 == 0: # which means it is an even row if j % 2 != 0: # which means it is an odd column rect_list.append(pygame.Rect(j * width, i * width, width, width)) else: if j % 2 == 0: # which means it is an even column rect_list.append(pygame.Rect(j * width, i * width, width, width)) # create main surface and fill the base color with light brown color chess_board_surface = pygame.Surface(size) chess_board_surface.fill(white) # next draws the dark brown rectangles on the chess board surface for chess_rect in rect_list: pygame.draw.rect(chess_board_surface, black, chess_rect) while True: # displayed the chess surface #screen.blit(chess_board_surface, (0, 0)) # displayed the chess surface screen.blit(chess_board_surface, (0, 0)) #draw the position of the pawn sprite if len(previous_square_list) == 0: screen.blit(pawn0, (0, 64)) # just testing... else: screen.blit(pawn0,(chess_dict[previous_square_list[0]])) # this will draw the pawn on new position for event in pygame.event.get(): if event.type == pygame.QUIT: sys.exit() elif event.type == pygame.MOUSEBUTTONDOWN: pos = event.pos x = math.floor(pos[0] / width) y = math.floor(pos[1] / width) # print the square name which you have clicked on for key, value in chess_dict.items(): if (x * width, y * width) == (value[0],value[1]): print(key) previous_square_list.append(key) #insert the next square if len(previous_square_list) > 1: previous_square_list.remove(previous_square_list[0]) original_color = chess_board_surface.get_at((x * width, y * width )) pygame.draw.rect(chess_board_surface, hightlight, pygame.Rect((x) * width, (y) * width, 64, 64)) elif event.type == pygame.MOUSEBUTTONUP: pos = event.pos x = math.floor(pos[0] / width) y = math.floor(pos[1] / width) pygame.draw.rect(chess_board_surface, original_color, pygame.Rect((x) * width, (y) * width, 64, 64)) pygame.display.update()
Here is the outcome…
I hope you like it, the next step is to make the piece slide along the board as well as to allow it to move in the direction it is supposed to move to!
PyPy: Natural Language Processing for Icelandic with PyPy: A Case Study
Natural Language Processing for Icelandic with PyPy: A Case Study
Icelandic is one of the smallest languages of the world, with about 370.000 speakers. It is a language in the Germanic family, most similar to Norwegian, Danish and Swedish, but closer to the original Old Norse spoken throughout Scandinavia until about the 14th century CE.
As with other small languages, there are worries that the language may not survive in a digital world, where all kinds of fancy applications are developed first - and perhaps only - for the major languages. Voice assistants, chatbots, spelling and grammar checking utilities, machine translation, etc., are increasingly becoming staples of our personal and professional lives, but if they don’t exist for Icelandic, Icelanders will gravitate towards English or other languages where such tools are readily available.
Iceland is a technology-savvy country, with world-leading adoption rates of the Internet, PCs and smart devices, and a thriving software industry. So the government figured that it would be worthwhile to fund a 5-year plan to build natural language processing (NLP) resources and other infrastructure for the Icelandic language. The project focuses on collecting data and developing open source software for a range of core applications, such as tokenization, vocabulary lookup, n-gram statistics, part-of-speech tagging, named entity recognition, spelling and grammar checking, neural language models and speech processing.
My name is Vilhjálmur Þorsteinsson, and I’m the founder and CEO of a software startup Miðeind in Reykjavík, Iceland, that employs 10 software engineers and linguists and focuses on NLP and AI for the Icelandic language. The company participates in the government’s language technology program, and has contributed significantly to the program’s core tools (e.g., a tokenizer and a parser), spelling and grammar checking modules, and a neural machine translation stack.
When it came to a choice of programming languages and development tools for the government program, the requirements were for a major, well supported, vendor-and-OS-agnostic FOSS platform with a large and diverse community, including in the NLP space. The decision to select Python as a foundational language for the project was a relatively easy one. That said, there was a bit of trepidation around the well known fact that CPython can be slow for inner-core tasks, such as tokenization and parsing, that can see heavy workloads in production.
I first became aware of PyPy in early 2016 when I was developing a crossword game Netskrafl in Python 2.7 for Google App Engine. I had a utility program that compressed a dictionary into a Directed Acyclic Word Graph and was taking 160 seconds to run on CPython 2.7, so I tried PyPy and to my amazement saw a 4x speedup (down to 38 seconds), with literally no effort besides downloading the PyPy runtime.
This led me to select PyPy as the default Python interpreter for my company’s Python development efforts as well as for our production websites and API servers, a role in which it remains to this day. We have followed PyPy’s upgrades along the way, being just about to migrate our minimally required language version from 3.6 to 3.7.
In NLP, speed and memory requirements can be quite important for software usability. On the other hand, NLP logic and algorithms are often complex and challenging to program, so programmer productivity and code clarity are also critical success factors. A pragmatic approach balances these factors, avoids premature optimization and seeks a careful compromise between maximal run-time efficiency and minimal programming and maintenance effort.
Turning to our use cases, our Icelandic text tokenizer "Tokenizer" is fairly light, runs tight loops and performs a large number of small, repetitive operations. It runs very well on PyPy’s JIT and has not required further optimization.
Our Icelandic parser Greynir (known on PyPI as reynir) is, if I may say so myself, a piece of work. It parses natural language text according to a hand-written context-free grammar, using an Earley-type algorithm as enhanced by Scott and Johnstone. The CFG contains almost 7,000 nonterminals and 6,000 terminals, and the parser handles ambiguity as well as left, right and middle recursion. It returns a packed parse forest for each input sentence, which is then pruned by a scoring heuristic down to a single best result tree.
This parser was originally coded in pure Python and turned out to be unusably slow when run on CPython - but usable on PyPy, where it was 3-4x faster. However, when we started applying it to heavier production workloads, it became apparent that it needed to be faster still. We then proceeded to convert the innermost Earley parsing loop from Python to tight C++ and to call it from PyPy via CFFI, with callbacks for token-terminal matching functions (“business logic”) that remained on the Python side. This made the parser much faster (on the order of 100x faster than the original on CPython) and quick enough for our production use cases. Even after moving much of the heavy processing to C++ and using CFFI, PyPy still gives a significant speed boost over CPython.
Connecting C++ code with PyPy proved to be quite painless using CFFI, although we had to figure out a few magic incantations in our build module to make it compile smoothly during setup from source on Windows and MacOS in addition to Linux. Of course, we build binary PyPy and CPython wheels for the most common targets so most users don’t have to worry about setup requirements.
With the positive experience from the parser project, we proceeded to take a similar approach for two other core NLP packages: our compressed vocabulary package BinPackage (known on PyPI as islenska) and our trigrams database package Icegrams. These packages both take large text input (3.1 million word forms with inflection data in the vocabulary case; 100 million tokens in the trigrams case) and compress it into packed binary structures. These structures are then memory-mapped at run-time using mmap and queried via Python functions with a lookup time in the microseconds range. The low-level data structure navigation is done in C++, called from Python via CFFI. The ex-ante preparation, packing, bit-fiddling and data structure generation is fast enough with PyPy, so we haven’t seen a need to optimize that part further.
To showcase our tools, we host public (and open source) websites such as greynir.is for our parsing, named entity recognition and query stack and yfirlestur.is for our spell and grammar checking stack. The server code on these sites is all Python running on PyPy using Flask, wrapped in gunicorn and hosted on nginx. The underlying database is PostgreSQL accessed via SQLAlchemy and psycopg2cffi. This setup has served us well for 6 years and counting, being fast, reliable and having helpful and supporting communities.
As can be inferred from the above, we are avid fans of PyPy and commensurately thankful for the great work by the PyPy team over the years. PyPy has enabled us to use Python for a larger part of our toolset than CPython alone would have supported, and its smooth integration with C/C++ through CFFI has helped us attain a better tradeoff between performance and programmer productivity in our projects. We wish for PyPy a great and bright future and also look forward to exciting related developments on the horizon, such as HPy.
Podcast.__init__: Achieve Repeatable Builds Of Your Software On Any Machine With Earthly
Summary
It doesn’t matter how amazing your application is if you are unable to deliver it to your users. Frustrated with the rampant complexity involved in building and deploying software Vlad A. Ionescu created the Earthly tool to reduce the toil involved in creating repeatable software builds. In this episode he explains the complexities that are inherent to building software projects and how he designed the syntax and structure of Earthly to make it easy to adopt for developers across all language environments. By adopting Earthly you can use the same techniques for building on your laptop and in your CI/CD pipelines.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Vlad A. Ionescu about Earthly, a syntax and runtime for software builds to reduce friction between development and delivery
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Earthly is and the story behind it?
- What are the core principles that engineers should consider when designing their build and delivery process?
- What are some of the common problems that engineers run into when they are designing their build process?
- What are some of the challenges that are unique to the Python ecosystem?
- What is the role of Earthly in the overall software lifecycle?
- What are the other tools/systems that a team is likely to use alongside Earthly?
- What are the components that Earthly might replace?
- How is Earthly implemented?
- What were the core design requirements when you first began working on it?
- How have the design and goals of Earthly changed or evolved as you have explored the problem further?
- What is the workflow for a Python developer to get started with Earthly?
- How can Earthly help with the challenge of managing Javascript and CSS assets for web application projects?
- What are some of the challenges (technical, conceptual, or organizational) that an engineer or team might encounter when adopting Earthly?
- What are some of the features or capabilities of Earthly that are overlooked or misunderstood that you think are worth exploring?
- What are the most interesting, innovative, or unexpected ways that you have seen Earthly used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Earthly?
- When is Earthly the wrong choice?
- What do you have planned for the future of Earthly?
Keep In Touch
- @VladAIonescu on Twitter
- Website
Picks
- Tobias
- Shape Up book
- Vlad
- High Output Management by Andy Grove
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Earthly
- Bazel
- Pants
- ARM
- AWS Graviton
- Apple M1 CPU
- Qemu
- Phoenix web framework for Elixir language
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Python GUIs: Packaging PyQt5 applications into a macOS app with PyInstaller (updated for 2022)
There is not much fun in creating your own desktop applications if you can't share them with other people — whether than means publishing it commercially, sharing it online or just giving it to someone you know. Sharing your apps allows other people to benefit from your hard work!
The good news is there are tools available to help you do just that with your Python applications which work well with apps built using PyQt5. In this tutorial we'll look at the most popular tool for packaging Python applications: PyInstaller.
This tutorial is broken down into a series of steps, using PyInstaller to build first simple, and then more complex PyQt5 applications into distributable macOS app bundles. You can choose to follow it through completely, or skip to the parts that are most relevant to your own project.
We finish off by building a macOS Disk Image, the usual method for distributing applications on macOS.
You always need to compile your app on your target system. So, if you want to create a Mac .app you need to do this on a Mac, for an EXE you need to use Windows.
Example Disk Image Installer for macOS
If you're impatient, you can download the Example Disk Image for macOS first.
Requirements
PyInstaller works out of the box with PyQt5 and as of writing, current versions of PyInstaller are compatible with Python 3.6+. Whatever project you're working on, you should be able to package your apps.
You can install PyInstaller using pip
.
pip3 install PyInstaller
If you experience problems packaging your apps, your first step should always be to update your PyInstaller and hooks package the latest versions using
pip3 install --upgrade PyInstaller pyinstaller-hooks-contrib
The hooks module contains package-specific packaging instructions for PyInstaller which is updated regularly.
Install in virtual environment (optional)
You can also opt to install PyQt5 and PyInstaller in a virtual environment (or your applications virtual environment) to keep your environment clean.
python3 -m venv packenv
Once created, activate the virtual environment by running from the command line —
call packenv\scripts\activate.bat
Finally, install the required libraries. For PyQt5 you would use —
pip3 install PyQt5 PyInstaller
Getting Started
It's a good idea to start packaging your application from the very beginning so you can confirm that packaging is still working as you develop it. This is particularly important if you add additional dependencies. If you only think about packaging at the end, it can be difficult to debug exactly where the problems are.
For this example we're going to start with a simple skeleton app, which doesn't do anything interesting. Once we've got the basic packaging process working, we'll extend the application to include icons and data files. We'll confirm the build as we go along.
To start with, create a new folder for your application and then add the following skeleton app in a file named app.py
. You can also download the source code and associated files
from PyQt5 import QtWidgets
import sys
class MainWindow(QtWidgets.QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Hello World")
l = QtWidgets.QLabel("My simple app.")
l.setMargin(10)
self.setCentralWidget(l)
self.show()
if __name__ == '__main__':
app = QtWidgets.QApplication(sys.argv)
w = MainWindow()
app.exec()
This is a basic bare-bones application which creates a custom QMainWindow
and adds a simple widget QLabel
to it. You can run this app as follows.
python app.py
This should produce the following window (on macOS).
Simple skeleton app in PyQt5
Building a basic app
Now we have our simple application skeleton in place, we can run our first build test to make sure everything is working.
Open your terminal (command prompt) and navigate to the folder containing your project. You can now run the following command to run the PyInstaller build.
pyinstaller --windowed app.py
The --windowed
flag is neccessary to tell PyInstaller to build a macOS .app
bundle.
You'll see a number of messages output, giving debug information about what PyInstaller is doing. These are useful for debugging issues in your build, but can otherwise be ignored. The output that I get for running the command on my system is shown below.
martin@MacBook-Pro pyqt5 % pyinstaller --windowed app.py
74 INFO: PyInstaller: 4.8
74 INFO: Python: 3.9.9
83 INFO: Platform: macOS-10.15.7-x86_64-i386-64bit
84 INFO: wrote /Users/martin/app/pyqt5/app.spec
87 INFO: UPX is not available.
88 INFO: Extending PYTHONPATH with paths
['/Users/martin/app/pyqt5']
447 INFO: checking Analysis
451 INFO: Building because inputs changed
452 INFO: Initializing module dependency graph...
455 INFO: Caching module graph hooks...
463 INFO: Analyzing base_library.zip ...
3914 INFO: Processing pre-find module path hook distutils from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks/pre_find_module_path/hook-distutils.py'.
3917 INFO: distutils: retargeting to non-venv dir '/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9'
6928 INFO: Caching module dependency graph...
7083 INFO: running Analysis Analysis-00.toc
7091 INFO: Analyzing /Users/martin/app/pyqt5/app.py
7138 INFO: Processing module hooks...
7139 INFO: Loading module hook 'hook-PyQt6.QtWidgets.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7336 INFO: Loading module hook 'hook-xml.etree.cElementTree.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7337 INFO: Loading module hook 'hook-lib2to3.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7360 INFO: Loading module hook 'hook-PyQt6.QtGui.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7397 INFO: Loading module hook 'hook-PyQt6.QtCore.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7422 INFO: Loading module hook 'hook-encodings.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7510 INFO: Loading module hook 'hook-distutils.util.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7513 INFO: Loading module hook 'hook-pickle.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7515 INFO: Loading module hook 'hook-heapq.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7517 INFO: Loading module hook 'hook-difflib.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7519 INFO: Loading module hook 'hook-PyQt6.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7564 INFO: Loading module hook 'hook-multiprocessing.util.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7565 INFO: Loading module hook 'hook-sysconfig.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7574 INFO: Loading module hook 'hook-xml.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7677 INFO: Loading module hook 'hook-distutils.py' from '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks'...
7694 INFO: Looking for ctypes DLLs
7712 INFO: Analyzing run-time hooks ...
7715 INFO: Including run-time hook '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks/rthooks/pyi_rth_subprocess.py'
7719 INFO: Including run-time hook '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks/rthooks/pyi_rth_pkgutil.py'
7722 INFO: Including run-time hook '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks/rthooks/pyi_rth_multiprocessing.py'
7726 INFO: Including run-time hook '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks/rthooks/pyi_rth_inspect.py'
7727 INFO: Including run-time hook '/usr/local/lib/python3.9/site-packages/PyInstaller/hooks/rthooks/pyi_rth_pyqt6.py'
7736 INFO: Looking for dynamic libraries
7977 INFO: Looking for eggs
7977 INFO: Using Python library /usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/Python
7987 INFO: Warnings written to /Users/martin/app/pyqt5/build/app/warn-app.txt
8019 INFO: Graph cross-reference written to /Users/martin/app/pyqt5/build/app/xref-app.html
8032 INFO: checking PYZ
8035 INFO: Building because toc changed
8035 INFO: Building PYZ (ZlibArchive) /Users/martin/app/pyqt5/build/app/PYZ-00.pyz
8390 INFO: Building PYZ (ZlibArchive) /Users/martin/app/pyqt5/build/app/PYZ-00.pyz completed successfully.
8397 INFO: EXE target arch: x86_64
8397 INFO: Code signing identity: None
8398 INFO: checking PKG
8398 INFO: Building because /Users/martin/app/pyqt5/build/app/PYZ-00.pyz changed
8398 INFO: Building PKG (CArchive) app.pkg
8415 INFO: Building PKG (CArchive) app.pkg completed successfully.
8417 INFO: Bootloader /usr/local/lib/python3.9/site-packages/PyInstaller/bootloader/Darwin-64bit/runw
8417 INFO: checking EXE
8418 INFO: Building because console changed
8418 INFO: Building EXE from EXE-00.toc
8418 INFO: Copying bootloader EXE to /Users/martin/app/pyqt5/build/app/app
8421 INFO: Converting EXE to target arch (x86_64)
8449 INFO: Removing signature(s) from EXE
8484 INFO: Appending PKG archive to EXE
8486 INFO: Fixing EXE headers for code signing
8496 INFO: Rewriting the executable's macOS SDK version (11.1.0) to match the SDK version of the Python library (10.15.6) in order to avoid inconsistent behavior and potential UI issues in the frozen application.
8499 INFO: Re-signing the EXE
8547 INFO: Building EXE from EXE-00.toc completed successfully.
8549 INFO: checking COLLECT
WARNING: The output directory "/Users/martin/app/pyqt5/dist/app" and ALL ITS CONTENTS will be REMOVED! Continue? (y/N)y
On your own risk, you can use the option `--noconfirm` to get rid of this question.
10820 INFO: Removing dir /Users/martin/app/pyqt5/dist/app
10847 INFO: Building COLLECT COLLECT-00.toc
12460 INFO: Building COLLECT COLLECT-00.toc completed successfully.
12469 INFO: checking BUNDLE
12469 INFO: Building BUNDLE because BUNDLE-00.toc is non existent
12469 INFO: Building BUNDLE BUNDLE-00.toc
13848 INFO: Moving BUNDLE data files to Resource directory
13901 INFO: Signing the BUNDLE...
16049 INFO: Building BUNDLE BUNDLE-00.toc completed successfully.
If you look in your folder you'll notice you now have two new folders dist
and build
.
build & dist folders created by PyInstaller
Below is a truncated listing of the folder content, showing the build
and dist
folders.
.
&boxvr&boxh&boxh app.py
&boxvr&boxh&boxh app.spec
&boxvr&boxh&boxh build
&boxv &boxur&boxh&boxh app
&boxv &boxvr&boxh&boxh Analysis-00.toc
&boxv &boxvr&boxh&boxh COLLECT-00.toc
&boxv &boxvr&boxh&boxh EXE-00.toc
&boxv &boxvr&boxh&boxh PKG-00.pkg
&boxv &boxvr&boxh&boxh PKG-00.toc
&boxv &boxvr&boxh&boxh PYZ-00.pyz
&boxv &boxvr&boxh&boxh PYZ-00.toc
&boxv &boxvr&boxh&boxh app
&boxv &boxvr&boxh&boxh app.pkg
&boxv &boxvr&boxh&boxh base_library.zip
&boxv &boxvr&boxh&boxh warn-app.txt
&boxv &boxur&boxh&boxh xref-app.html
&boxur&boxh&boxh dist
&boxvr&boxh&boxh app
&boxv &boxvr&boxh&boxh libcrypto.1.1.dylib
&boxv &boxvr&boxh&boxh PyQt5
&boxv ...
&boxv &boxvr&boxh&boxh app
&boxv &boxur&boxh&boxh Qt5Core
&boxur&boxh&boxh app.app
The build
folder is used by PyInstaller to collect and prepare the files for bundling, it contains the results of analysis and some additional logs. For the most part, you can ignore the contents of this folder, unless you're trying to debug issues.
The dist
(for "distribution") folder contains the files to be distributed. This includes your application, bundled as an executable file, together with any associated libraries (for example PyQt5) and binary .so
files.
Since we provided the --windowed
flag above, PyInstaller has actually created two builds for us. The folder app
is a simple folder containing everything you need to be able to run your app. PyInstaller also creates an app bundle app.app
which is what you will usually distribute to users.
The app
folder is a useful debugging tool, since you can easily see the libraries and other packaged data files.
You can try running your app yourself now, either by double-clicking on the app bundle, or by running the executable file, named app.exe
from the dist
folder. In either case, after a short delay you'll see the familiar window of your application pop up as shown below.
Simple app, running after being packaged
In the same folder as your Python file, alongside the build
and dist
folders PyInstaller will have also created a .spec
file. In the next section we'll take a look at this file, what it is and what it does.
The Spec file
The .spec
file contains the build configuration and instructions that PyInstaller uses to package up your application. Every PyInstaller project has a .spec
file, which is generated based on the command line options you pass when running pyinstaller
.
When we ran pyinstaller
with our script, we didn't pass in anything other than the name of our Python application file and the --windowed
flag. This means our spec file currently contains only the default configuration. If you open it, you'll see something similar to what we have below.
# -*- mode: python ; coding: utf-8 -*-
block_cipher = None
a = Analysis(['app.py'],
pathex=[],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
[],
exclude_binaries=True,
name='app',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False,
disable_windowed_traceback=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None )
coll = COLLECT(exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='app')
app = BUNDLE(coll,
name='app.app',
icon=None,
bundle_identifier=None)
The first thing to notice is that this is a Python file, meaning you can edit it and use Python code to calculate values for the settings. This is mostly useful for complex builds, for example when you are targeting different platforms and want to conditionally define additional libraries or dependencies to bundle.
Because we used the --windowed
command line flag, the EXE(console=)
attribute is set to False
. If this is True
a console window will be shown when your app is launched -- not what you usually want for a GUI application.
Once a .spec
file has been generated, you can pass this to pyinstaller
instead of your script to repeat the previous build process. Run this now to rebuild your executable.
pyinstaller app.spec
The resulting build will be identical to the build used to generate the .spec
file (assuming you have made no changes). For many PyInstaller configuration changes you have the option of passing command-line arguments, or modifying your existing .spec
file. Which you choose is up to you.
Tweaking the build
So far we've created a simple first build of a very basic application. Now we'll look at a few of the most useful options that PyInstaller provides to tweak our build. Then we'll go on to look at building more complex applications.
Naming your app
One of the simplest changes you can make is to provide a proper "name" for your application. By default the app takes the name of your source file (minus the extension), for example main
or app
. This isn't usually what you want.
You can provide a nicer name for PyInstaller to use for the app (and dist
folder) by editing the .spec
file to add a name=
under the EXE, COLLECT and BUNDLE blocks.
exe = EXE(pyz,
a.scripts,
[],
exclude_binaries=True,
name='Hello World',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False
)
coll = COLLECT(exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='Hello World')
app = BUNDLE(coll,
name='Hello World.app',
icon=None,
bundle_identifier=None)
The name under EXE is the name of the executable file, the name under BUNDLE is the name of the app bundle.
Alternatively, you can re-run the pyinstaller
command and pass the -n
or --name
configuration flag along with your app.py
script.
pyinstaller -n "Hello World" --windowed app.py
# or
pyinstaller --name "Hello World" --windowed app.py
The resulting app file will be given the name Hello World.app
and the unpacked build placed in the folder dist\Hello World\
.
Application with custom name "Hello World"
The name of the .spec
file is taken from the name passed in on the command line, so this will also create a new spec file for you, called Hello World.spec
in your root folder.
Make sure you delete the old app.spec
file to avoid getting confused editing the wrong one.
Application icon
By default PyInstaller app bundles come with the following icon in place.
Default PyInstaller application icon, on app bundle
You will probably want to customize this to make your application more recognisable. This can be done easily by passing the --icon
command line argument, or editing the icon=
parameter of the BUNDLE section of your .spec
file. For macOS app bundles you need to provide an .icns
file.
app = BUNDLE(coll,
name='Hello World.app',
icon='Hello World.icns',
bundle_identifier=None)
To create macOS icons from images you can use the image2icon tool.
If you now re-run the build (by using the command line arguments, or running with your modified .spec
file) you'll see the specified icon file is now set on your application bundle.
Custom application icon on the app bundle
On macOS application icons are taken from the application bundle. If you repackage your app and run the bundle you will see your app icon on the dock!
Custom application icon on the dock
Data files and Resources
So far our application consists of just a single Python file, with no dependencies. Most real-world applications a bit more complex, and typically ship with associated data files such as icons or UI design files. In this section we'll look at how we can accomplish this with PyInstaller, starting with a single file and then bundling complete folders of resources.
First let's update our app with some more buttons and add icons to each.
from PyQt5.QtWidgets import QMainWindow, QApplication, QLabel, QVBoxLayout, QPushButton, QWidget
from PyQt5.QtGui import QIcon
import sys
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Hello World")
layout = QVBoxLayout()
label = QLabel("My simple app.")
label.setMargin(10)
layout.addWidget(label)
button1 = QPushButton("Hide")
button1.setIcon(QIcon("icons/hand.png"))
button1.pressed.connect(self.lower)
layout.addWidget(button1)
button2 = QPushButton("Close")
button2.setIcon(QIcon("icons/lightning.png"))
button2.pressed.connect(self.close)
layout.addWidget(button2)
container = QWidget()
container.setLayout(layout)
self.setCentralWidget(container)
self.show()
if __name__ == '__main__':
app = QApplication(sys.argv)
w = MainWindow()
app.exec_()
In the folder with this script, add a folder icons
which contains two icons in PNG format, hand.png
and lightning.png
. You can create these yourself, or get them from the source code download for this tutorial.
Run the script now and you will see a window showing two buttons with icons.
Window with two buttons with icons.
Even if you don't see the icons, keep reading!
Dealing with relative paths
There is a gotcha here, which might not be immediately apparent. To demonstrate it, open up a shell and change to the folder where our script is located. Run it with
python3 app.py
If the icons are in the correct location, you should see them. Now change to the parent folder, and try and run your script again (change <folder>
to the name of the folder your script is in).
cd ..
python3 <folder>/app.py
Window with two buttons with icons missing.
The icons don't appear. What's happening?
We're using relative paths to refer to our data files. These paths are relative to the current working directory -- not the folder your script is in. So if you run the script from elsewhere it won't be able to find the files.
One common reason for icons not to show up, is running examples in an IDE which uses the project root as the current working directory.
This is a minor issue before the app is packaged, but once it's installed it will be started with it's current working directory as the root /
folder -- your app won't be able to find anything. We need to fix this before we go any further, which we can do by making our paths relative to our application folder.
In the updated code below, we define a new variable basedir
, using os.path.dirname
to get the containing folder of __file__
which holds the full path of the current Python file. We then use this to build the relative paths for icons using os.path.join()
.
Since our app.py
file is in the root of our folder, all other paths are relative to that.
from PyQt5.QtWidgets import QMainWindow, QApplication, QLabel, QVBoxLayout, QPushButton, QWidget
from PyQt5.QtGui import QIcon
import sys, os
basedir = os.path.dirname(__file__)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Hello World")
layout = QVBoxLayout()
label = QLabel("My simple app.")
label.setMargin(10)
layout.addWidget(label)
button1 = QPushButton("Hide")
button1.setIcon(QIcon(os.path.join(basedir, "icons", "hand.png")))
button1.pressed.connect(self.lower)
layout.addWidget(button1)
button2 = QPushButton("Close")
button2.setIcon(QIcon(os.path.join(basedir, "icons", "lightning.png")))
button2.pressed.connect(self.close)
layout.addWidget(button2)
container = QWidget()
container.setLayout(layout)
self.setCentralWidget(container)
self.show()
if __name__ == '__main__':
app = QApplication(sys.argv)
w = MainWindow()
app.exec_()
Try and run your app again from the parent folder -- you'll find that the icons now appear as expected on the buttons, no matter where you launch the app from.
Packaging the icons
So now we have our application showing icons, and they work wherever the application is launched from. Package the application again with pyinstaller "Hello World.spec"
and then try and run it again from the dist
folder as before. You'll notice the icons are missing again.
Window with two buttons with icons missing.
The problem now is that the icons haven't been copied to the dist/Hello World
folder -- take a look in it. Our script expects the icons to be a specific location relative to it, and if they are not, then nothing will be shown.
This same principle applies to any other data files you package with your application, including Qt Designer UI files, settings files or source data. You need to ensure that relative path structures are replicated after packaging.
Bundling data files with PyInstaller
For the application to continue working after packaging, the files it depends on need to be in the same relative locations.
To get data files into the dist
folder we can instruct PyInstaller to copy them over.
PyInstaller accepts a list of individual paths to copy, together with a folder path relative to the dist/<app name>
folder where it should to copy them to. As with other options, this can be specified by command line arguments or in the .spec
file.
Files specified on the command line are added using --add-data
, passing the source file and destination folder separated by a colon :
.
The path separator is platform-specific: Linux or Mac use :
, on Windows use ;
pyinstaller --windowed --name="Hello World" --icon="Hello World.icns" --add-data="icons/hand.png:icons" --add-data="icons/lightning.png:icons" app.py
Here we've specified the destination location as icons
. The path is relative to the root of our application's folder in dist
-- so dist/Hello World
with our current app. The path icons
means a folder named icons
under this location, so dist/Hello World/icons
. Putting our icons right where our application expects to find them!
You can also specify data files via the datas
list in the Analysis section of the spec file, shown below.
a = Analysis(['app.py'],
pathex=[],
binaries=[],
datas=[('icons/hand.png', 'icons'), ('icons/lightning.png', 'icons')],
hiddenimports=[],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
Then rebuild from the .spec
file with
pyinstaller "Hello World.spec"
In both cases we are telling PyInstaller to copy the specified files to the location ./icons/
in the output folder, meaning dist/Hello World/icons
. If you run the build, you should see your .png
files are now in the in dist
output folder, under a folder named icons.
The icon file copied to the dist folder
If you run your app from dist
you should now see the icon icons in your window as expected!
Window with two buttons with icons, finally!
Bundling data folders
Usually you will have more than one data file you want to include with your packaged file. The latest PyInstaller versions let you bundle folders just like you would files, keeping the sub-folder structure.
Let's update our configuration to bundle our icons folder in one go, so it will continue to work even if we add more icons in future.
To copy the icons
folder across to our build application, we just need to add the folder to our .spec
file Analysis
block. As for the single file, we add it as a tuple with the source path (from our project folder) and the destination folder under the resulting folder in dist
.
# ...
a = Analysis(['app.py'],
pathex=[],
binaries=[],
datas=[('icons', 'icons')], # tuple is (source_folder, destination_folder)
hiddenimports=[],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
# ...
If you run the build using this spec file you'll see the icons
folder copied across to the dist\Hello World
folder. If you run the application from the folder, the icons will display as expected -- the relative paths remain correct in the new location.
Alternatively, you can bundle your data files using Qt's QResource architecture. See our tutorial for more information.
Building the App bundle into a Disk Image
So far we've used PyInstaller to bundle the application into macOS app, along with the associated data files. The output of this bundling process is a folder and an macOS app bundle, named Hello World.app
.
If you try and distribute this app bundle, you'll notice a problem: the app bundle is actually just a special folder. While macOS displays it as an application, if you try and share it, you'll actually be sharing hundreds of individual files. To distribute the app properly, we need some way to package it into a single file.
The easiest way to do this is to use a .zip
file. You can zip the folder and give this to someone else to unzip on their own computer, giving them a complete app bundle they can copy to their Applications folder.
However, if you've install macOS applications before you'll know this isn't the usual way to do it. Usually you get a Disk Image.dmg
file, which when opened shows the application bundle, and a link to your Applications folder. To install the app, you just drag it across to the target.
To make our app look as professional as possible, we should copy this expected behaviour. Next we'll look at how to take our app bundle and package it into a macOS Disk Image.
Making sure the build is ready.
If you've followed the tutorial so far, you'll already have your app ready in the /dist
folder. If not, or yours isn't working you can also download the source code files for this tutorial which includes a sample .spec
file. As above, you can run the same build using the provided Hello World.spec
file.
pyinstaller "Hello World.spec"
This packages everything up as an app bundle in the dist/
folder, with a custom icon. Run the app bundle to ensure everything is bundled correctly, and you should see the same window as before with the icons visible.
Window with two icons, and a button.
Creating an Disk Image
Now we've successfully bundled our application, we'll next look at how we can take our app bundle and use it to create a macOS Disk Image for distribution.
To create our Disk Image we'll be using the create-dmg tool. This is a command-line tool which provides a simple way to build disk images automatically. If you are using Homebrew, you can install create-dmg with the following command.
brew install create-dmg
...otherwise, see the Github repository for instructions.
The create-dmg
tool takes a lot of options, but below are the most useful.
create-dmg --help
create-dmg 1.0.9
Creates a fancy DMG file.
Usage: create-dmg [options] <output_name.dmg> <source_folder>
All contents of <source_folder> will be copied into the disk image.
Options:
--volname <name>
set volume name (displayed in the Finder sidebar and window title)
--volicon <icon.icns>
set volume icon
--background <pic.png>
set folder background image (provide png, gif, or jpg)
--window-pos <x> <y>
set position the folder window
--window-size <width> <height>
set size of the folder window
--text-size <text_size>
set window text size (10-16)
--icon-size <icon_size>
set window icons size (up to 128)
--icon file_name <x> <y>
set position of the file's icon
--hide-extension <file_name>
hide the extension of file
--app-drop-link <x> <y>
make a drop link to Applications, at location x,y
--no-internet-enable
disable automatic mount & copy
--add-file <target_name> <file>|<folder> <x> <y>
add additional file or folder (can be used multiple times)
-h, --help
display this help screen
The most important thing to notice is that the command requires a <source folder>
and all contents of that folder will be copied to the Disk Image. So to build the image, we first need to put our app bundle in a folder by itself.
Rather than do this manually each time you want to build a Disk Image I recommend creating a shell script. This ensures the build is reproducible, and makes it easier to configure.
Below is a working script to create a Disk Image from our app. It creates a temporary folder dist/dmg
where we'll put the things we want to go in the Disk Image -- in our case, this is just the app bundle, but you can add other files if you like. Then we make sure the folder is empty (in case it still contains files from a previous run). We copy our app bundle into the folder, and finally check to see if there is already a .dmg
file in dist
and if so, remove it too. Then we're ready to run the create-dmg
tool.
#!/bin/sh
# Create a folder (named dmg) to prepare our DMG in (if it doesn't already exist).
mkdir -p dist/dmg
# Empty the dmg folder.
rm -r dist/dmg/*
# Copy the app bundle to the dmg folder.
cp -r "dist/Hello World.app" dist/dmg
# If the DMG already exists, delete it.
test -f "dist/Hello World.dmg"&& rm "dist/Hello World.dmg"
create-dmg \
--volname "Hello World" \
--volicon "Hello World.icns" \
--window-pos 200 120 \
--window-size 600 300 \
--icon-size 100 \
--icon "Hello World.app" 175 120 \
--hide-extension "Hello World.app" \
--app-drop-link 425 120 \
"dist/Hello World.dmg" \
"dist/dmg/"
The options we pass to create-dmg
set the dimensions of the Disk Image window when it is opened, and positions of the icons in it.
Save this shell script in the root of your project, named e.g. builddmg.sh
. To make it possible to run, you need to set the execute bit with.
chmod +x builddmg.sh
With that, you can now build a Disk Image for your Hello World app with the command.
./builddmg.sh
This will take a few seconds to run, producing quite a bit of output.
No such file or directory
Creating disk image...
...............................................................
created: /Users/martin/app/dist/rw.Hello World.dmg
Mounting disk image...
Mount directory: /Volumes/Hello World
Device name: /dev/disk2
Making link to Applications dir...
/Volumes/Hello World
Copying volume icon file 'Hello World.icns'...
Running AppleScript to make Finder stuff pretty: /usr/bin/osascript "/var/folders/yf/1qvxtg4d0vz6h2y4czd69tf40000gn/T/createdmg.tmp.XXXXXXXXXX.RvPoqdr0""Hello World"
waited 1 seconds for .DS_STORE to be created.
Done running the AppleScript...
Fixing permissions...
Done fixing permissions
Blessing started
Blessing finished
Deleting .fseventsd
Unmounting disk image...
hdiutil: couldn't unmount "disk2" - Resource busy
Wait a moment...
Unmounting disk image...
"disk2" ejected.
Compressing disk image...
Preparing imaging engine…
Reading Protective Master Boot Record (MBR : 0)…
(CRC32 $38FC6E30: Protective Master Boot Record (MBR : 0))
Reading GPT Header (Primary GPT Header : 1)…
(CRC32 $59C36109: GPT Header (Primary GPT Header : 1))
Reading GPT Partition Data (Primary GPT Table : 2)…
(CRC32 $528491DC: GPT Partition Data (Primary GPT Table : 2))
Reading (Apple_Free : 3)…
(CRC32 $00000000: (Apple_Free : 3))
Reading disk image (Apple_HFS : 4)…
...............................................................................
(CRC32 $FCDC1017: disk image (Apple_HFS : 4))
Reading (Apple_Free : 5)…
...............................................................................
(CRC32 $00000000: (Apple_Free : 5))
Reading GPT Partition Data (Backup GPT Table : 6)…
...............................................................................
(CRC32 $528491DC: GPT Partition Data (Backup GPT Table : 6))
Reading GPT Header (Backup GPT Header : 7)…
...............................................................................
(CRC32 $56306308: GPT Header (Backup GPT Header : 7))
Adding resources…
...............................................................................
Elapsed Time: 3.443s
File size: 23178950 bytes, Checksum: CRC32 $141F3DDC
Sectors processed: 184400, 131460 compressed
Speed: 18.6Mbytes/sec
Savings: 75.4%
created: /Users/martin/app/dist/Hello World.dmg
hdiutil does not support internet-enable. Note it was removed in macOS 10.15.
Disk image done
While it's building, the Disk Image will pop up. Don't get too excited yet, it's still building. Wait for the script to complete, and you will find the finished .dmg
file in the dist/
folder.
The Disk Image created in the dist folder
Running the installer
Double-click the Disk Image to open it, and you'll see the usual macOS install view. Click and drag your app across the the Applications
folder to install it.
The Disk Image contains the app bundle and a shortcut to the applications folder
If you open the Showcase view (press F4) you will see your app installed. If you have a lot of apps, you can search for it by typing "Hello"
The app installed on macOS
Repeating the build
Now you have everything set up, you can create a new app bundle & Disk Image of your application any time, by running the two commands from the command line.
pyinstaller "Hello World.spec"
./builddmg.sh
It's that simple!
Wrapping up
In this tutorial we've covered how to build your PyQt5 applications into a macOS app bundle using PyInstaller, including adding data files along with your code. Then we walked through the process of creating a Disk Image to distribute your app to others. Following these steps you should be able to package up your own applications and make them available to other people.
For a complete view of all PyInstaller bundling options take a look at the PyInstaller usage documentation.
For more, see the complete PyQt5 tutorial.
Mike Driscoll: PyDev of the Week: Batuhan Taskaya
This week we welcome Batuhan Taskaya (@isidentical) as our PyDev of the Week! Batuhan is a core developer of the Python language. Batuhan is also a maintainer of multiple Python packages including parso and Black.
You can see what else Batuhan is up to by checking out his website or GitHub profile.
Let's take a few moments to get to know Batuhan better!
Can you tell us a little about yourself (hobbies, education, etc):
Hey there! My name is Batuhan, and I'm a software engineer who loves to work on developer tools to improve the overall productivity of the Python ecosystem.
I pretty much fill all my free time with open source maintenance and other programming related activities. If I am not programming at that time, I am probably reading a paper about PLT or watching some sci-fi show. I am a huge fan of the Stargate franchise.
Why did you start using Python?
I was always intrigued by computers but didn't do anything related to programming until I started using GNU/Linux on my personal computer (namely Ubuntu 12.04). Back then, I was searching for something to pass the time and found Python.
Initially, I was mind-blown by the responsiveness of the REPL. I typed `2 + 2`, it replied `4` back to me. Such a joy! For someone with literally zero programming experience, it was a very friendly environment. Later, I started following some tutorials, writing more code and repeating that process until I got a good grasp of the Python language and programming in general.
What other programming languages do you know and which is your favourite?
After being exposed to the level of elegancy and the simplicity in Python, I set the bar too high for adopting a new language. C is a great example where the language (in its own terms) is very straightforward, and currently, it is the only language I actively use apart from Python. I also think it goes really well when paired with Python, which might not be surprised considering the CPython itself and the extension modules are written in C.
If we let the mainstream languages go, I love building one-off compilers for weird/esoteric languages.
What projects are you working on now?
Most of my work revolves around CPython, which is the reference implementation of the Python language. In terms of the core, I specialize in the parser and the compiler. But outside of it, I maintain the ast module, and a few others.
One of the recent changes I've collaborated (with Pablo Galindo Salgado an Ammar Askar) on CPython was the new fancy tracebacks which I hope will really increase the productivity of the Python developers:
Traceback (most recent call last): File "query.py", line 37, in <module> magic_arithmetic('foo') ^^^^^^^^^^^^^^^^^^^^^^^ File "query.py", line 18, in magic_arithmetic return add_counts(x) / 25 ^^^^^^^^^^^^^ File "query.py", line 24, in add_counts return 25 + query_user(user1) + query_user(user2) ^^^^^^^^^^^^^^^^^ File "query.py", line 32, in query_user return 1 + query_count(db, response['a']['b']['c']['user'], retry=True) ~~~~~~~~~~~~~~~~~~^^^^^ TypeError: 'NoneType' object is not subscriptable
Alongside that, I help maintain projects like
and I am a core member of the fsspec.
Which Python libraries are your favorite (core or 3rd party)?
It might be a bit obvious, but I love the ast module. Apart from that, I enjoy using dataclasses and pathlib.
I generally avoid using dependencies since nearly %99 of the time, I can simply use the stdlib. But there is one exception, rich. For the last three months, nearly every script I've written uses it. It is such a beauty (both in terms of the UI and the API). I also really love pytest and pre-commit.
Not as a library, though one of my favorite projects from the python ecosystem is PyPy. It brings an entirely new python runtime, which depending on your work can be 1000X faster (or just 4X in general).
Is there anything else you’d like to say?
I've recently started a GitHub Sponsors Page, and if any of my work directly touches you (or your company) please consider sponsoring me!
Thanks for the interview Mike, and I hope people reading the article enjoyed it as much as I enjoyed answering these questions!
Thanks for doing the interview, Batuhan!
The post PyDev of the Week: Batuhan Taskaya appeared first on Mouse Vs Python.
Matt Layman: Episode 16 - Setting Your Sites
Real Python: Python News: What's New From January 2022?
In January 2022, the code formatter Black saw its first non-beta release and published a new stability policy. IPython, the powerful interactive Python shell, marked the release of version 8.0, its first major version release in three years. Additionally, PEP 665, aimed at making reproducible installs easier by specifying a format for lock files, was rejected. Last but not least, a fifteen-year-old memory leak bug in Python was fixed.
Let’s dive into the biggest Python news stories from the past month!
Free Bonus:Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.
Black No Longer Beta
The developers of Black, an opinionated code formatter, are now confident enough to call the latest release stable. This announcement brings Black out of beta for the first time:

Code formatting can be the source of a surprising amount of conflict among developers. This is why code formatters, or linters, help enforce style conventions to maintain consistency across a whole codebase. Linters suggest changes, while code formatters rewrite your code:

This makes your codebase more consistent, helps catch errors early, and makes code easier to scan.
YAPF is an example of a formatter. It comes with the PEP 8 style guide as a default, but it’s not strongly opinionated, giving you a lot of control over its configuration.
Black goes further: it comes with a PEP 8 compliant style, but on the whole, it’s not configurable. The idea behind disallowing configuration is that you free up your brain to focus on the actual code by relinquishing control over style. Many believe this restriction gives them much more freedom to be creative coders. But of course, not everyone likes to give up this control!
One crucial feature of opinionated formatters like Black is that they make your diffs much more informative. If you’ve ever committed a cleanup or formatting commit to your version control system, you may have inadvertently polluted your diff.
Read the full article at https://realpython.com/python-news-january-2022/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
death and gravity: Dealing with YAML with arbitrary tags in Python
... in which we use PyYAML to safely read and write YAML with any tags, in a way that's as straightforward as interacting with built-in types.
If you're in a hurry, you can find the code at the end.
Contents- Why is this useful?
- A note on PyYAML extensibility
- Preserving tags
- Unhashable keys
- Conclusion
- Bonus: hashable wrapper
- Bonus: broken YAML
Why is this useful?#
People mostly use YAML as a friendlier alternative to JSON1, but it can do way more.
Among others, it can represent user-defined and native data structures.
Say you need to read (or write) an AWS Cloud Formation template:
EC2Instance:Type:AWS::EC2::InstanceProperties:ImageId:!FindInMap[AWSRegionArch2AMI,!Ref'AWS::Region',!FindInMap[AWSInstanceType2Arch,!RefInstanceType,Arch],]InstanceType:!RefInstanceType
>>> yaml.safe_load(text)Traceback (most recent call last):...yaml.constructor.ConstructorError: could not determine a constructor for the tag '!FindInMap' in "<unicode string>", line 4, column 14: ImageId: !FindInMap [ ^
... or, you need to safely read untrusted YAML that represents Python objects:
!!python/object/new:module.Class{ attribute:value}
>>> yaml.safe_load(text)Traceback (most recent call last):...yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/new:module.Class' in "<unicode string>", line 1, column 1: !!python/object/new:module.Class ... ^
Warning
Historically, yaml.load(thing)
was unsafe for untrusted data,
because it allowed running arbitrary code.
Consider using safe_load()
instead.
For example, you could do this:
>>> yaml.load("!!python/object/new:os.system [echo WOOSH. YOU HAVE been compromised]")WOOSH. YOU HAVE been compromised0
There were a bunch of CVEs about it.
To address the issue, load()
requires an explicit Loader
since PyYAML 6.
Also, version 5 added two new functions and corresponding loaders:
full_load()
resolves all tags except those known to be unsafe (note that this was broken before 5.4, and thus vulnerable)unsafe_load()
resolves all tags, even those known to be unsafe (the oldload()
behavior)
safe_load()
resolves only basic tags, remaining the safest.
Can't I just get the data, without it being turned into objects?
You can! The YAML spec says:
In a given processing environment, there need not be an available native type corresponding to a given tag. If a node’s tag is unavailable, a YAML processor will not be able to construct a native data structure for it. In this case, a complete representation may still be composed and an application may wish to use this representation directly.
And PyYAML obliges:
>>> text="""\... one: !myscalar string... two: !mysequence [1, 2]... """>>> yaml.compose(text)MappingNode( tag='tag:yaml.org,2002:map', value=[ ( ScalarNode(tag='tag:yaml.org,2002:str', value='one'), ScalarNode(tag='!myscalar', value='string'), ), ( ScalarNode(tag='tag:yaml.org,2002:str', value='two'), SequenceNode( tag='!mysequence', value=[ ScalarNode(tag='tag:yaml.org,2002:int', value='1'), ScalarNode(tag='tag:yaml.org,2002:int', value='2'), ], ), ), ],)>>> print(yaml.serialize(_))one: !myscalar 'string'two: !mysequence [1, 2]
... the spec didn't say the representation has to be concise. ¯\_(ツ)_/¯
Here's how YAML processing works, to give you an idea what we're looking at:
The output of compose()
above is the representation (node graph).
From that, safe_load()
does its best to construct objects,
but it can't do anything for tags it doesn't know about.
There must be a better way!
Thankfully, the spec also says:
That said, tag resolution is specific to the application. YAML processors should therefore provide a mechanism allowing the application to override and expand these default tag resolution rules.
We'll use this mechanism to convert tagged nodes to almost-native types, while preserving the tags.
A note on PyYAML extensibility#
PyYAML is a bit unusual.
For each processing direction, you have a corresponding Loader/Dumper class.
For each processing step, you can add callbacks, stored in class-level registries.
The callbacks are method-like – they receive the Loader/Dumper as the first argument:
Dice=namedtuple('Dice','a b')defdice_representer(dumper,data):returndumper.represent_scalar(u'!dice',u'%sd%s'%data)yaml.Dumper.add_representer(Dice,dice_representer)
You may notice the add_...()
methods modify the class in-place,
for everyone,
which isn't necessarily great;
imagine getting a Dice from safe_load()
,
when you were expecting only built-in types.
We can avoid this by subclassing, since the registry is copied from the parent. Note that because of how copying is implemented, registries from two direct parents are not merged – you only get the registry of the first parent in the MRO.
So, we'll start by subclassing SafeLoader/Dumper:
|
|
Preserving tags#
Constructing unknown objects#
For now, we can use named tuples for objects with unknown tags, since they are naturally tag/value pairs:
|
|
Tag or no tag, all YAML nodes are either a scalar, a sequence, or a mapping. For unknown tags, we delegate construction to the loader's default constructors, and wrap the resulting value:
|
|
Constructors are registered by tag, with None meaning "unknown".
Things look much better already:
>>> yaml.load(text,Loader=Loader){ 'one': Tagged(tag='!myscalar', value='string'), 'two': Tagged(tag='!mysequence', value=[1, 2]),}
A better wrapper#
That's nice,
but every time we use any value,
we have to check if it's tagged,
and then go through value
if is:
>>> one=_['one']>>> one.tag'!myscalar'>>> one.value.upper()'STRING'
We could subclass the Python types corresponding to core YAML tags
(str, list, and so on),
and add a tag
attribute to each.
We could subclass most of them, anyway
– neither bool
nor NoneType
can be subclassed.
Or, we could wrap tagged objects
in a class with the same interface,
that delegates method calls and attribute access to the wrapee,
with a tag
attribute on top.
Tip
This is known as the decorator pattern design pattern (not to be confused with Python decorators).
Doing this naively entails writing one wrapper per type, with one wrapper method per method and one property per attribute. That's even worse than subclassing!
There must be a better way!
Of course, this is Python, so there is.
We can use an object proxy instead (also known as "dynamic wrapper"). While they're not perfect in general, the one wrapt provides is damn near perfect enough2:
|
|
>>> yaml.load(text,Loader=Loader){ 'one': Tagged('!myscalar', 'string'), 'two': Tagged('!mysequence', [1, 2]),}
The proxy behaves identically to the proxied object:
>>> one=_['one']>>> one.tag'!myscalar'>>> one.upper()'STRING'>>> one[:3]'str'
...up to and including fancy things like isinstance():
>>> isinstance(one,str)True>>> isinstance(one,Tagged)True
And now you don't have to care about tags if you don't want to.
Representing tagged objects#
The trip back is exactly the same, but much shorter:
|
|
Representers are registered by type.
>>> print(yaml.dump(Tagged('!hello','world'),Dumper=Dumper))!hello 'world'
Let's mark the occasion with some tests.
Since we still have stuff to do, we parametrize the tests from the start.
|
|
Loading works:
|
|
And dumping works:
|
|
... but only for known types:
|
|
Unhashable keys#
Let's try an example from the PyYAML documentation:
>>> text="""\... ? !!python/tuple [0,0]... : The Hero... ? !!python/tuple [1,0]... : Treasure... ? !!python/tuple [1,1]... : The Dragon... """
This is supposed to result in something like:
>>> yaml.unsafe_load(text){(0, 0): 'The Hero', (1, 0): 'Treasure', (1, 1): 'The Dragon'}
Instead, we get:
>>> yaml.load(text,Loader=Loader)Traceback (most recent call last):...TypeError: unhashable type: 'list'
That's because the keys are tagged lists, and neither type is hashable:
>>> yaml.load("!!python/tuple [0,0]",Loader=Loader)Tagged('tag:yaml.org,2002:python/tuple', [0, 0])
This limitation comes from how Python dicts are implemented,3 not from YAML; quoting from the spec again:
The content of a mapping node is an unordered set of key/value node pairs, with the restriction that each of the keys is unique. YAML places no further restrictions on the nodes. In particular, keys may be arbitrary nodes, the same node may be used as the value of several key/value pairs and a mapping could even contain itself as a key or a value.
Constructing pairs#
What now?
Same strategy as before: wrap the things we can't handle.
Specifically, whenever we have a mapping with unhashable keys, we return a list of pairs instead. To tell it apart from plain lists, we use a subclass:
|
|
Again, we let the loader do most of the work:
|
|
We set construct_mapping
so that any other Loader constructor
wanting to make a mapping gets to use it
(like our own construct_undefined()
above).
Don't be fooled by the assignment,
it's a method like any other.4
But we're changing the class from outside anyway,
it's best to stay consistent.
Note that overriding construct_mapping()
is not enough:
we have to register the constructor explictly,
otherwise SafeDumper's construct_mapping()
will be used
(since that's what was in the registry before).
Note
In case you're wondering, this feature is orthogonal from handling unknown tags; we could have used different classes for them. However, as mentioned before, the constructor registry breaks multiple inheritance, so we couldn't use the two features together.
Anyway, it works:
>>> yaml.load(text,Loader=Loader)Pairs( [ (Tagged('tag:yaml.org,2002:python/tuple', [0, 0]), 'The Hero'), (Tagged('tag:yaml.org,2002:python/tuple', [1, 0]), 'Treasure'), (Tagged('tag:yaml.org,2002:python/tuple', [1, 1]), 'The Dragon'), ])
Representing pairs#
Like before, the trip back is short and uneventful:
|
|
>>> print(yaml.dump(Pairs([([],'one')]),Dumper=Dumper))[]: one
Let's test this more thoroughly.
Because the tests are parametrized, we just need to add more data:
|
|
Conclusion#
YAML is extensible by design. I hope that besides what it says on the tin, this article shed some light on how to customize PyYAML for your own purposes, and that you've learned at least one new Python thing.
You can get the code here, and the tests here.
Learned something new today? Share this with others, it really helps!
Want more? Get updates via email or Atom feed.
Bonus: hashable wrapper#
You may be asking, why not make the wrapper hashable?
Most unhashable (data) objects are that for a reason: because they're mutable.
We have two options:
Make the wrapper hash change with the content. This this will break dictionaries in strange and unexpected ways (and other things too) – the language requires mutable objects to be unhashable.
Make the wrapper hash not change with the content, and wrappers equal only to themselves – that's what user-defined classes do by default anyway.
This works, but it's not very useful, because equal values don't compare equal anymore (
data != load(dump(data))
). Also, it means you can only get things from a dict if you already have the object used as key:>>> data={Hashable([1]):'one'}>>> data[Hashable([1])]Traceback (most recent call last):...KeyError: Hashable([1])>>> key=list(data)[0]>>> data[key]'one'
I'd file this under "strange and unexpected" too.
(You can find the code for the example above here.)
Bonus: broken YAML#
We can venture even farther, into arguably broken YAML. Let's look at some examples.
First, there are undefined tag prefixes:
>>> yaml.load("!m!xyz x",Loader=Loader)Traceback (most recent call last):...yaml.parser.ParserError: while parsing a nodefound undefined tag handle '!m!' in "<unicode string>", line 1, column 1: !m!xyz x ^
A valid version:
>>> yaml.load("""\... %TAG !m! !my-... ---... !m!xyz x... """,Loader=Loader)Tagged('!my-xyz', 'x')
Second, there are undefined aliases:
>>> yaml.load("two: *anchor",Loader=Loader)Traceback (most recent call last):...yaml.composer.ComposerError: found undefined alias 'anchor' in "<unicode string>", line 1, column 6: two: *anchor ^
A valid version:
>>> yaml.load("""\... one: &anchor [1]... two: *anchor... """,Loader=Loader){'one': [1], 'two': [1]}
It's likely possible to handle these in a way similar to how we handled undefined tags, but we'd have to go deeper – the exceptions hint to which processing step to look at.
Since I haven't actually encountered them in real life, we'll "save them for later" :)
Of which YAML is actually a superset. [return]
Using a hash table. For nice explanation of how it all works, complete with a pure-Python implementation, check out Raymond Hettinger's talk Modern Python Dictionaries: A confluence of a dozen great ideas (code). [return]
Almost. The zero argument form of super() won't work for methods defined outside of a class definition, but we're not using it here. [return]
Python Morsels: Making the len function work on your Python objects

In Python, you can make the built-in len
function work on your objects.
The len
function only works on objects that have a length
The built-in len
function works on some objects, but not on others.
Only things that have a length work with the len
function.
Lists, sets, dictionaries, strings, and most data structures in Python have a length:
>>> numbers=[2,1,3,4,7,11,18]>>> len(numbers)7
But numbers don't:
>>> n=10>>> len(n)Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: object of type 'int' has no len()
When making a class in Python, you can control whether instances of that class have a length.
Python's built-in len
function calls the __len__
method (pronounced "dunder len") on the object you give it.
So if that object has a __len__
method, it has a length:
>>> numbers=[2,1,3,4,7,11,18]>>> numbers.__len__()7
If it doesn't have a __len__
method, the len
function raises a TypeError
instead:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: object of type 'int' has no len()
How to make instances of your class have a length?
Python's random
module has a function (choice
) which can randomly select an item from a given sequence.
>>> importrandom>>> colors=['red','blue','green','purple']>>> random.choice(colors)'purple'
This choice
function only works on objects that can be indexed and have a length.
Here we have a class named ForgivingIndexer
:
classForgivingIndexer:def__init__(self,sequence):self.sequence=sequencedef__getitem__(self,index):returnself.sequence[int(index)]
This class has a __init__
method and a __getitem__
method.
That __getitem__
method allows instances of this class to be indexed using square brackets ([]
).
But this isn't quite enough to allow our ForgivingIndexer
objects to work with the random.choice
function.
If we pass a ForgivingIndexer
object to the random.choice
function, we'll get an error:
>>> importrandom>>> fruits=ForgivingIndexer(['apple','lime','pear','watermelon'])>>> random.choice(fruits)Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/random.py", line 378, in choicereturnseq[self._randbelow(len(seq))]TypeError: object of type 'ForgivingIndexer' has no len()
Python gives us an error because ForgivingIndexer
objects don't have a length, which the random.choice
function requires.
These objects don't work with the built-in len
function:
>>> fruits=ForgivingIndexer(['apple','lime','pear','watermelon'])>>> len(fruits)Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: object of type 'ForgivingIndexer' has no len()
In order to support the built-in len
function, we can add a __len__
method to this class:
def__len__(self):returnlen(self.sequence)
Now instances of this class have a length:
>>> importrandom>>> fruits=ForgivingIndexer(['apple','lime','pear','watermelon'])>>> len(fruits)4
And they also work with random.choice
:
>>> random.choice(fruits)'apple'
Summary
You can make your objects work with the built-in len
function by adding a __len__
method to them.
You'll pretty much only ever add a __len__
method if you're making a custom data structure, like a sequence or a mapping.
ItsMyCode: TypeError: method() takes 1 positional argument but 2 were given
If you define a method inside a class, you should add self
as the first argument. If you forget the self argument, then Python will raise TypeError: method() takes 1 positional argument but 2 were given
In this tutorial, we will look at what method() takes 1 positional argument but 2 were given error means and how to resolve this error with examples.
TypeError: method() takes 1 positional argument but 2 were given
In Python, we need to pass “self
” as the first argument for all the methods which is defined in a class. It is similar to this in JavaScript.
We know that class is a blueprint for the objects, and we can use the blueprints to create multiple instances of objects.
The self
is used to represent the instance(object) of the class. Using this keyword, we can access the attributes and methods of the class in Python.
Let us take a simple example to reproduce this error.
If you look at the below example, we have an Employee class, and we have a simple method that takes the name as a parameter and prints the Employee ID as output.
# Employee Class
class Employee:
# Get Employee method without self parameter
def GetEmployeeID(name):
print(f"The Employee ID of {name} ", 1234)
# instance of the employee
empObj = Employee()
empObj.GetEmployeeID("Chandler Bing")
Output
Traceback (most recent call last):
File "c:\Personal\IJS\Code\main.py", line 10, in <module>
empObj.GetEmployeeID("Chandler Bing")
TypeError: Employee.GetEmployeeID() takes 1 positional argument but 2 were given
When we run the code, we get a TypeError: method() takes 1 positional argument but 2 were given
How to fix TypeError: method() takes 1 positional argument but 2 were given
In our above code, we have not passed the self argument to the method defined in the Employee class, which leads to TypeError.
As shown below, we can fix the issue by passing the “self
” as a parameter explicitly to the GetEmployeeID()
method.
# Employee Class
class Employee:
# Get Employee method with self parameter
def GetEmployeeID(self,name):
print(f"The Employee ID of {name} ", 1234)
# instance of the employee
empObj = Employee()
empObj.GetEmployeeID("Chandler Bing")
Output
The Employee ID of Chandler Bing 1234
In Python, when we call the method with some arguments, the corresponding class function is called by placing the methods object before the first argument.
Example – object.method(args)
will become Class.method(obj,args)
.
The calling process is automatic, but it should be defined explicitly on the receiving side.
This is one of the main reasons the first parameter of a function in a class must be the object itself.
It is not mandatory to use “self
” as an argument; instead, we can pass anything over here.
The “self
” is neither a built-in keyword nor has special meaning in Python. It is just a better naming convention that developers use and improves the readability of the code.
Conclusion
The TypeError: method() takes 1 positional argument but 2 were given occurs if we do not pass the “self” as an argument to all the methods defined inside the class.
The self is used to represent the instance(object) of the class. Using this keyword, we can access the attributes and methods of the class in Python.
The issue is resolved by passing the “self
” as a parameter to all the methods defined in a class.
Stack Abuse: Convert Numpy Array to Tensor and Tensor to Numpy Array with PyTorch
Tensors are multi-dimensional objects, and the essential data representation block of Deep Learning frameworks such as Tensorflow and PyTorch.
A scalar has one dimension, a vector has two, and tensors have three or more. In practice, we oftentimes refer to scalars and vectors as tensors as well for convenience.
Note: A tensor can also be any n-dimensional array, just like a Numpy array can. Many frameworks have support for working with Numpy arrays, and many of them are built on top of Numpy so the integration is both natural and efficient.
However, a torch.Tensor
has more built-in capabilities than Numpy arrays do, and these capabilities are geared towards Deep Learning applications (such as GPU acceleration), so it makes sense to prefer torch.Tensor
instances over regular Numpy arrays when working with PyTorch. Additionally, torch.Tensor
s have a very Numpy-like API, making it intuitive for most with prior experience!
In this guide, learn how to convert between a Numpy Array and PyTorch Tensors.
Convert Numpy Array to PyTorch Tensor
To convert a Numpy array to a PyTorch tensor - we have two distinct approaches we could take: using the from_numpy()
function, or by simply supplying the Numpy array to the torch.Tensor()
constructor or by using the tensor()
function:
import torch
import numpy as np
np_array = np.array([5, 7, 1, 2, 4, 4])
# Convert Numpy array to torch.Tensor
tensor_a = torch.from_numpy(np_array)
tensor_b = torch.Tensor(np_array)
tensor_c = torch.tensor(np_array)
So, what's the difference? The from_numpy()
and tensor()
functions are dtype
-aware! Since we've created a Numpy array of integers, the dtype
of the underlying elements will naturally be int32
:
print(np_array.dtype)
# dtype('int32')
If we were to print out our two tensors:
print(f'tensor_a: {tensor_a}\ntensor_b: {tensor_b}\ntensor_c: {tensor_c}')
tensor_a
and tensor_c
retain the data type used within the np_array
, cast into PyTorch's variant (torch.int32
), while tensor_b
automatically assigns the values to floats:
tensor_a: tensor([5, 7, 1, 2, 4, 4], dtype=torch.int32)
tensor_b: tensor([5., 7., 1., 2., 4., 4.])
tensor_c: tensor([5, 7, 1, 2, 4, 4], dtype=torch.int32)
This can also be observed through checking their dtype
fields:
print(tensor_a.dtype) # torch.int32print(tensor_b.dtype) # torch.float32print(tensor_c.dtype) # torch.int32
Numpy Array to PyTorch Tensor with dtype
These approaches also differ in whether you can explicitly set the desired dtype
when creating the tensor. from_numpy()
and Tensor()
don't accept a dtype
argument, while tensor()
does:
# Retains Numpy dtype
tensor_a = torch.from_numpy(np_array)
# Creates tensor with float32 dtype
tensor_b = torch.Tensor(np_array)
# Retains Numpy dtype OR creates tensor with specified dtype
tensor_c = torch.tensor(np_array, dtype=torch.int32)
print(tensor_a.dtype) # torch.int32print(tensor_b.dtype) # torch.float32print(tensor_c.dtype) # torch.int32
Naturally, you can cast any of them very easily, using the exact same syntax, allowing you to set the dtype
after the creation as well, so the acceptance of a dtype
argument isn't a limitation, but more of a convenience:
tensor_a = tensor_a.float()
tensor_b = tensor_b.float()
tensor_c = tensor_c.float()
print(tensor_a.dtype) # torch.float32print(tensor_b.dtype) # torch.float32print(tensor_c.dtype) # torch.float32
Convert PyTorch Tensor to Numpy Array
Converting a PyTorch Tensor to a Numpy array is straightforward, since tensors are ultimately built on top of Numpy arrays, and all we have to do is "expose" the underlying data structure.
Since PyTorch can optimize the calculations performed on data based on your hardware, there are a couple of caveats though:
tensor = torch.tensor([1, 2, 3, 4, 5])
np_a = tensor.numpy()
np_b = tensor.detach().numpy()
np_c = tensor.detach().cpu().numpy()
So, why use
detach()
andcpu()
before exposing the underlying data structure withnumpy()
, and when should you detach and transfer to a CPU?
CPU PyTorch Tensor -> CPU Numpy Array
If your tensor is on the CPU, where the new Numpy array will also be - it's fine to just expose the data structure:
np_a = tensor.numpy()
# array([1, 2, 3, 4, 5], dtype=int64)
This works very well, and you've got yourself a clean Numpy array.
CPU PyTorch Tensor with Gradients -> CPU Numpy Array
However, if your tensor requires you to calculate gradients for it as well (i.e. the requires_grad
argument is set to True
), this approach won't work anymore. You'll have to detach the underlying array from the tensor, and through detaching, you'll be pruning away the gradients:
tensor = torch.tensor([1, 2, 3, 4, 5], dtype=torch.float32, requires_grad=True)
np_a = tensor.numpy()
# RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
np_b = tensor.detach().numpy()
# array([1., 2., 3., 4., 5.], dtype=float32)
GPU PyTorch Tensor -> CPU Numpy Array
Finally - if you've created your tensor on the GPU, it's worth remembering that regular Numpy arrays don't support GPU acceleration. They reside on the CPU! You'll have to transfer the tensor to a CPU, and then detach/expose the data structure.
Note: This can either be done via the to('cpu')
or cpu()
functions - they're functionally equivalent.
This has to be done explicitly, because if it were done automatically - the conversion between CPU and CUDA tensors to arrays would be different under the hood, which could lead to unexpected bugs down the line.
PyTorch is fairly explicit, so this sort of automatic conversion was purposefully avoided:
# Create tensor on the GPU
tensor = torch.tensor([1, 2, 3, 4, 5], dtype=torch.float32, requires_grad=True).cuda()
np_b = tensor.detach().numpy()
# TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
np_c = tensor.detach().cpu().numpy()
# array([1., 2., 3., 4., 5.], dtype=float32)
Note: It's highly advised to call detach()
beforecpu()
, to prune away the gradients before transferring to the CPU. The gradients won't matter anyway after the detach()
call - so copying them at any point is totally redundant and inefficient. It's better to "cut the dead weight" as soon as possible.
Generally speaking - this approach is the safest, as no matter which sort of tensor you're working - it won't fail. If you've got a CPU tensor, and you try sending it to the CPU - nothing happens. If you've got a tensor without gradients, and try detaching it - nothing happens. On the other end of the stick - exceptions are thrown.
Conclusion
In this guide - we've taken a look at what PyTorch tensors are, before diving into how to convert a Numpy array into a PyTorch tensor. Finally, we've explored how PyTorch tensors can expose the underlying Numpy array, and in which cases you'd have to perform additional transfers and pruning.
Glyph Lefkowitz: A Better Pygame Mainloop
I’ve written about this
before, but in that
context I was writing mainly about frame-rate independence, and only gave a
brief mention of vertical sync; the title also mentioned Twisted, and upon
re-reading it I realized that many folks who might get a lot of use out of its
technique would not have bothered to read it, just because I made it sound like
an aside in the context of an animation technique in a game that already
wanted to use Twisted for some reason, rather than a comprehensive best
practice. Now that Pygame 2.0 is out, though, and the vsync=1
flag is more
reliably available to everyone, I thought it would be worth revisiting.
Per the many tutorials out there, including the official one, most Pygame mainloops look like this:
12345678 |
|
Obviously that works okay, or folks wouldn’t do it, but it can give an impression of a certain lack of polish for most beginner Pygame games.
The thing that’s always bothered me personally about this idiom is: where does the networking go? After spending many years trying to popularize event loops in Python, I’m sad to see people implementing loops over and over again that have no way to get networking, or threads, or timers scheduled in a standard way so that libraries could be written without the application needing to manually call them every frame.
But, who cares how I feel about it? Lots of games don’t have networking1. There are more general problems with it. Specifically, it is likely to:
- waste power, and
- look bad.
Wasting Power
Why should anyone care about power when they’re making a video game? Aren’t games supposed to just gobble up CPUs and GPUs for breakfast, burning up as much power as they need for the most gamer experience possible?
Chances are, if you’re making a game that you expect anyone that you don’t personally know to play, they’re going to be playing it on a laptop2. Pygame might have a reputation for being “slow”, but for a simple 2D game with only a few sprites, Python can easily render several thousand frames per second. Even the fastest display in the world can only refresh at 360Hz3. That’s less than one thousand frames per second. The average laptop display is going to be more like 60Hz, or — if you’re lucky — maybe 120. By rendering thousands of frames that the user never even sees, you warm up their CPU uncomfortably4, and you waste 10x (or more) of their battery doing useless work.
At some point your game might have enough stuff going on that it will run the CPU at full tilt, and if it does, that’s probably fine; at least then you’ll be using up that heat and battery life in order to make their computer do something useful. But even if it is, it’s probably not doing that all of the time, and battery is definitely a use-over-time sort of problem.
Looking Bad
If you’re rendering directly to the screen without regard for vsync, your players are going to experience Screen Tearing, where the screen is in the middle of updating while you’re in the middle of drawing to it. This looks especially bad if your game is panning over a background, which is a very likely scenario for the usual genre of 2D Pygame game.
How to fix it?
Pygame lets you turn on
VSync,
and in Pygame 2, you can do this simply by passing the pygame.SCALED
flag and
the vsync=1
argument to
set_mode()
.
Now your game will have silky smooth animations and scrolling5! Solved!
But... if the fix is so simple, why doesn’t everybody — including, notably, the official documentation — recommend doing this?
The solution creates another problem: pygame.display.flip
may now block
until the next display refresh, which may be many milliseconds.
Even worse: note the word “may”. Unfortunately, behavior of vsync is quite
inconsistent between platforms and
drivers,
so for a properly cross-platform game it may be necessary to allow the user to
select a frame rate and wait on an asyncio.sleep
than running flip
in a
thread. Using the techniques from the answers to this stack overflow
answer
you can establish a reasonable heuristic for the refresh rate of the relevant
display, but if adding those libraries and writing that code is too complex,
“60” is probably a good enough value to start with, even if the user’s monitor
can go a little faster. This might save a little power even in the case where
you can rely on flip
to tell you when the monitor is actually ready again;
if your game can only reliably render 60FPS anyway because there’s too much
Python game logic going on to consistently go faster, it’s better to achieve
a consistent but lower framerate than to be faster but inconsistent.
The potential for blocking needs to be dealt with though, and it has several knock-on effects.
For one thing, it makes my “where do you put the networking” problem even worse: most networking frameworks expect to be able to send more than one packet every 16 milliseconds.
More pressingly for most Pygame users, however, it creates a minor performance
headache. You now spend a bunch of time blocked in the now-blocking flip
call, wasting precious milliseconds that you could be using to do stuff
unrelated to drawing, like handling user input, updating animations, running
AI, and so on.
The problem is that your Pygame mainloop has 3 jobs:
- drawing
- game logic (AI and so on)
- input handling
What you want to do to ensure the smoothest possible frame rate is to draw
everything as fast as you possibly can at the beginning of the frame and then
call flip
immediately to be sure that the graphics have been delivered to the
screen and they don’t have to wait until the next screen-refresh. However,
this is at odds with the need to get as much done as possible before you call
flip
and possibly block for 1/60th of a second.
So either you put off calling flip
, potentially risking a dropped frame if
your AI is a little slow, or you call flip
too eagerly and waste a bunch of
time waiting around for the display to refresh. This is especially true of
things like animations, which you can’t update before drawing, because you
have to draw this frame before you worry about the next one, but waiting until
afterflip
wastes valuable time; by the time you are starting your next
frame draw, you possibly have other code which now needs to run, and you’re
racing to get it done before that next flip
call.
Now, if your Python game logic is actually saturating your CPU — which is not hard to do — you’ll drop frames no matter what. But there are a lot of marginal cases where you’ve mostly got enough CPU to do what you need to without dropping frames, and it can be a lot of overhead to constantly check the clock to see if you have enough frame budget left to do one more work item before the frame deadline - or, for that matter, to maintain a workable heuristic for exactly when that frame deadline will be.
The technique to avoid these problems is deceptively simple, and in fact it was
covered with the deferToThread
trick presented in my earlier
post. But again,
we’re not here to talk about Twisted. So let’s do this the
no-additional-dependencies, stdlib-only way, with asyncio:
1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041 |
|
Go Forth and Loop Better
At some point I will probably release my own wrapper library6 which does something similar to this, but I really wanted to present this as a technique rather than as some packaged-up code to use, since do-it-yourself mainloops, and keeping dependencies to a minimum, are such staples of Pygame community culture.
As you can see, this technique is only a few lines longer than the standard recipe for a Pygame main loop, but you now have access to a ton of additional functionality:
- You can manage your framerate independence in both animations and game logic by just setting some timers and letting the frames update at the appropriate times; stop worrying about doing math on the clock by yourself!
- Do you want to add networked multiplayer? No problem! Networking all happens inside the event loop, make whatever network requests you want, and never worry about blocking the game’s drawing on a network request!
- Now your players’ laptops run cool while playing, and the graphics don’t have ugly tearing artifacts any more!
I really hope that this sees broader adoption so that the description “indie game made in Python” will no longer imply “runs hot and tears a lot when the screen is panning”. I’m also definitely curious to hear from readers, so please let me know if you end up using this technique to good effect!7
And, honestly, a few fewer could stand to have it, given how much unnecessary always-online stuff there is in single-player experiences these days. But I digress. That’s why I’m in a footnote, this is a good place for digressing. ↩
“Worldwide sales of laptops have eclipsed desktops for more than a decade. In 2019, desktop sales totaled 88.4 million units compared to 166 million laptops. That gap is expected to grow to 79 million versus 171 million by 2023.” ↩
At least, Nvidia says that “the world’s fastest esports displays” are both 360Hz and also support G-Sync, and who am I to disagree? ↩
They’re playing on a laptop, remember? So they’re literally uncomfortable. ↩
Assuming you’ve made everything frame-rate independent, as mentioned in the aforementioned post. ↩
because of course I will ↩
And also, like, if there are horrible bugs in this code, so I can update it. It is super brief and abstract to show how general it is, but that also means it’s not really possible to test it as-is; my full-working-code examples are much longer and it’s definitely possible something got lost in translation. ↩
ItsMyCode: AttributeError: Can only use .str accessor with string values
The AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas occurs if you try to replace the values of string column, but in reality, it is of a different type.
In this tutorial, we will look at what is AttributeError: Can only use .str accessor with string values and how to fix this error with examples.
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Let us take a simple example to reproduce this error. In the below example, we have Pandas DataFrame, which indicates the standing of each cricket team.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [12.0, 8.0, 3.0, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
print(df['points'])
df['points'] = df['points'].str.replace('.', '')
print(df['points'])
Output
0 12.0
1 8.0
2 3.0
3 5.0
Name: points, dtype: float64
raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!. Did you mean: 'std'?
When we run the above code, we get AttributeError Can only use .str accessor with string values!.
The points column is in the float datatype, and using the str.replace()
can be applied only on the string columns.
How to fix Can only use .str accessor with string values error?
We can fix the error by casting the DataFrame column “points” from float to string before replacing the values in the column.
Let us fix our code and run it once again.
# import pandas library
import pandas as pd
# create pandas DataFrame
df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'],
'points': [12.0, 8.0, 3.0, 5],
'runrate': [0.5, 1.4, 2, -0.6],
'wins': [5, 4, 2, 2]})
print(df['points'])
df['points'] = df['points'].astype(str).str.replace('.', '')
print(df['points'])
Output
0 12.0
1 8.0
2 3.0
3 5.0
Name: points, dtype: float64
0 120
1 80
2 30
3 50
Name: points, dtype: object
Notice that the error is gone, and the points column is converted from float to object, and also, the decimal has been replaced with an empty string.
Conclusion
The AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas occurs if you try to replace the values of string column, but in reality, it is of a different type.
We can fix the issue by casting the column to a string before replacing the values in the column.
TestDriven.io: Working with Static and Media Files in Django
Real Python: Defining Python Functions With Optional Arguments
Defining your own functions is an essential skill for writing clean and effective code. In this tutorial, you’ll explore the techniques you have available for defining Python functions that take optional arguments. When you master Python optional arguments, you’ll be able to define functions that are more powerful and more flexible.
In this course, you’ll learn how to:
- Distinguish between parameters and arguments
- Define functions with optional arguments and default parameter values
- Define functions using
args
andkwargs
- Deal with error messages about optional arguments
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Lucas Cimon: Useful short Python decorator to convert generators into lists
Python generators are awesome. Why ?
- their syntax is simple an concise
- they lazily generate values and hence are very memory efficient
- bonus point: since Python 3 you can chain them with
yield from
Their drawback ? They can be iterated only once, and they hide the iterable length.
I took an …
— Permalink
Python for Beginners: Count Digits Of An Integer in Python
In python, integer data type is used to represent positive and negative integers. In this article, we will discuss a program to count digits of an integer in python.
How to Count Digits of an Integer in Python?
To count the digits of a number, we will use an approach that divides the number by 10. When we divide an integer by 10, the resultant number gets reduced by one digit.
For instance, if we divide 1234 by 10, the result will be 123. Here, 1234 has 4 digits whereas 123 has only three digits. Similarly, when we divide 123 by 10, it will get reduced to a number with only 2 digits and so on. Finally the number will become 0.
You can observe that we can divide 1234 by 10 only 4 times before it becomes 0. In other words, if there are n digits in an integer, we can divide the integer by 10 only n times till it becomes 0.
Program to Count Digits of an Integer in Python
As discussed above, we will use the following approach to count digits of a number in python.
- First we will declare a value count and initialize it to 0.
- Then, we will use a while loop to divide the given number by 10 repeatedly.
- Inside the while loop, we will increment count by one each time we divide the number by 10.
- Once the number becomes 0, we will exit from the while loop.
- After executing the while loop, we will have the count of the digits of the integer in the count variable.
We can implement the above approach to count the number of digits of a number in python as follows.
number = 12345
print("The given number is:", number)
count = 0
while number > 0:
number = number // 10
count = count + 1
print("The number of digits is:", count)
Output:
The given number is: 12345
The number of digits is: 5
Conclusion
In this article, we have discussed an approach to count digits of an integer in python. To know more about numbers in python,you can read this article on decimal numbers in python. You might also like this article on complex numbers in python.
The post Count Digits Of An Integer in Python appeared first on PythonForBeginners.com.