Quantcast
Channel: Planet Python
Viewing all 23170 articles
Browse latest View live

Wyatt Baldwin: PDM vs Poetry

$
0
0
A few years back, I started using poetry to manage dependencies for all my Python projects. I ran into some minor issues early on but haven’t had any problems recently and prefer it to any of the other dependency management / packaging solutions I’ve tried so far. Recently, I’ve started hearing about pdm and how it’s the bee’s knees. I did a search for “pdm vs poetry” and didn’t find much, so I thought I’d play around with pdm a bit and write something myself.

CodersLegacy: Nuitka vs Pyinstaller for Python EXE’s

$
0
0

In this article, we will compare two popular Python libraries used for creating standalone executables, Nuitka and Pyinstaller. This “Nuitka vs Pyinstaller” article will directly compare both libraries on things like “load time”, “performance”, “space”, etc.

By the end of this article, we you will understand the benefits of each library, and which one you ought to be using.


What is Nuitka?

Nuitka is a Python compiler that converts Python code into a binary executable.

It is fully compatible with the Python language and can significantly improve the performance of Python code. It does this by compiling the Python code to a binary format, and then compiling it to machine code using a C compiler.

Nuitka can also create standalone executables that do not require a Python interpreter to be installed on the target system, making it easier to distribute Python applications.


Advantages of using Nuitka

Before a direct comparison with Pyinstaller, we will list down the general advantages of using Nuitka.

Improved performance: Nuitka can significantly improve the performance of Python code by compiling it to a binary format that can be executed more efficiently than interpreted code. This can result in faster execution times and better resource utilization.

This point in particular sets Nuitka apart from other converters, as the others do not improve performance.


Standalone executables: Nuitka can create standalone executables that do not require a Python interpreter to be installed on the target system. This makes it easier to distribute Python applications to users who may not have Python installed on their own systems.


Easy to use: Nuitka has a simple command-line interface and can be easily integrated into existing Python projects. No need to setup additional setup files, or create an entirely new project.


Cross-platform support: Nuitka can create executables for multiple platforms, including Windows, Linux, and macOS.


Improved security: Compiling Python code to a binary format can make it more difficult for attackers to reverse engineer or modify the code. This can improve the security of Python applications that are distributed as standalone executables. The commercial version of Nuitka also offers additional security and source code protection for serious users.


What is Pyinstaller?

PyInstaller is a tool for packaging Python applications as standalone executables. It converts Python code into a single executable file that can be run on systems without a Python interpreter installed.

PyInstaller can also be used to customize the way executables are built, including the ability to include or exclude specific files or modules easily (using a spec file).


Advantages of using Pyinstaller

Easy to use: Pyinstaller has a simple command-line interface and can be easily integrated into existing Python projects. No need to setup additional setup files, or create an entirely new project.

The compilation times are pretty fast as well, and the resulting EXE size is within acceptable limits (can be optimized further).


Cross-platform support: Pyinstaller can create executables for multiple platforms, including Windows, Linux, and macOS.


Improved security: By compiling the code to an executable, you can hide your source code. However, this can be reverse-engineered easily enough using tools. But it’s better than distributing your source-code as is.


Standalone executables: Nuitka can create standalone executables that do not require a Python interpreter to be installed on the target system. This makes it easier to distribute Python applications to users who may not have Python installed on their own systems.


Large community: Out of all libraries used for executable creation, Pyinstaller is probably the most well-known one. This also means it has a large number of online resources, videos and guides to follow.

You can easily find other people online posting about issues, and they solved them. This makes things alot easier and faster to solve when you run into any issues.


Pyinstaller vs Nuitka Comparison

Here is a direct comparison between the two libraries, summarizing the above points, adding a few extra ones, and including my own observations and personal benchmarks.

PyinstallerNuitka
PerformancePyinstaller works by bundling the Python interpreter in the Exe. Hence it has the same performance as regular Python code. Nuitka performs faster than Pyinstaller, and is a great way to get a speed boost for a computationally heavy application.
SizeVaries greatly based on libraries included and settings. For a general idea though:
1. Python (no libs): 20mb
2. Python + few libs: 100mb
3. Python + many/big libs: 300mb
Varies greatly based on libraries included and settings used. For a general idea though:
1. Python (no libs): 25mb
2. Python + few libs: 150mb
3. Python + many/big libs: 400mb
Cross-platform All Major platforms (Windows, macOS, Linux, BSD, etc.)All Major platforms (Windows, macOS, Linux, BSD, etc.)
Loading timeTakes a long time to load, especially the one-file setting. About 5 – 10 seconds for most applications.Loads much more quickly. About 2 – 5 seconds for most applications.
Compile timeCompile time scales well with the number of libraries. Doesn’t take longer than 5 minutes on average.Varies greatly based on the number of libraries. Can take a few mins, to a few hours (if using big libraries).
CommunityPyinstaller has been around longer than Nuitka, is more commonly used and has more online resources.Not many resources or guides on Nuitka. Its documentation is going to be your main resource.
OptimizationsPyinstaller can be optimized (size-wise) by using UPX or virtual environments to reduce bloat (extra libraries).Nuitka can be optimized in various ways.
1. Cache to improve compile time greatly.
2. zstandard module to reduce size (onefile mode)
3. UPX packer to reduce EXE size.

Pyinstaller vs Nuitka Conclusion?

Both are pretty solid options, and for about half of their use-cases there is little difference. Personally I would recommend Nuitka however, for the performance and load time improvement.

If you are looking for a (slightly) easier time though, Pyinstaller would be better. Faster compilation, better online support, etc.

Best to try out both, and then make a decision.


This marks the end of the Nuitka vs Pyinstaller comparison article. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

The post Nuitka vs Pyinstaller for Python EXE’s appeared first on CodersLegacy.

Python Anywhere: We're hiring!

$
0
0

Now that we’re part of Anaconda, we’re growing the team so that we can do more, faster :-)

Right now we’re looking for a senior engineer with lots of experience in backend stuff, but an interest in working across the full stack from obscure kernel wrangling, custom Linux container-based virtualization, Django and Flask on the mid-tier, up to TypeScript and React on the front end. There’s even a (tiny) bit of Lua thrown in there.

We’re an Extreme Programming team so you’ll be pairing with other team members from day one. All work is remote (bar occasional team meetups), and we can currently hire people based in the UK or in Germany.

There’s more detailed information about the role on the official Anaconda jobs board, and you can also apply there – or you can also drop us a line at jobs@pythonanywhere.com.

PyCharm: PyCharm 2022.3.1 Is Out!

$
0
0

You can get the latest version from our website, via the Toolbox App, from inside the IDE, or by using snaps if you’re a Ubuntu user.

Here are the most notable improvements in the new version:

  • The option to display editor tabs on multiple rows is available in the new UI. [IDEA-295095]
  • Actions on Save work as expected again. [IDEA-307368]
  • Excessive CPU usage and IDE freezes that occurred for certain tool window sizes have been fixed. [IDEA-306642]
  • Packaging: PyCharm no longer uses excessive disk space when caching PyPI. [PY-57156]
  • Running a script on the Python Debug Server with pydevd-pycharm works as expected again. [PY-57771]
  • Quick documentation popup: Markup used in docstrings is now rendered in Quick documentation. [PY-34667]
  • HTTP client: Setting a proxy no longer breaks package inspection. [PY-57612]
  • Python console: Code that is run with the Emulate terminal in output console option enabled now has the correct indentation level. [PY-57706]
  • Inspections:The Loose punctuation mark inspection now works correctly for reStructuredText fields in docstrings. [PY-53047]
  • Inspections: We fixed an SOE exception where processing generic types broke error highlighting in the editor. [PY-54336
  • Debugger: We fixed several issues with the debugger.[PY-57296], [PY-57055]
  • Code insight: Code insight for IntEnum properties is now correct. [PY-55734]
  • Code insight: Code insight has been improved for data class arguments when wildcard or custom module import is involved. [PY-36158

For the full list of improvements, please refer to the release notes. Share your feedback in the comments under this post or in our issue tracker.

Python Software Foundation: More Python Everywhere, All at Once: Looking Forward to 2023

$
0
0
The PSF works hard throughout the year to put on PyCon US, support smaller Python events around the world through our Grants program and of course to provide the critical infrastructure and expertise that keep CPython and PyPI running smoothly for the 8 million (and growing!) worldwide base of Python users. We want to invest more deeply in education and outreach in 2023, and donations from individuals (like you) can make sure we have the resources to start new projects and sustain them alongside our critical community functions.

Supporting Membership is a particularly great way to contribute to the PSF. By becoming a Supporting Member, you join a core group of PSF stakeholders, and since Supporting Members are eligible to vote in our Board and bylaws elections, you gain a voice in the future of the PSF. And we have just introduced a new sliding scale rate for Supporting Members, so you can join at the standard rate of an annual $99 contribution, or for as little as $25 annually if that works better for you. We are about three quarters of the way to our goal of 100 new supporting members by the end of 2022 – Can you sign up today and help push us over the edge?

Thank you for reading and for being a part of the one-of-a-kind community that makes Python and the PSF so special.

With warmest wishes to you and yours for a happy and healthy new year,
Deb

Ned Batchelder: Secure maintainer workflow, continued

$
0
0

Picking up from Secure maintainer workflow, especially the comments there (thanks!), here are some more things I’m doing to keep my maintainer workflow safe.

1Password ssh: I’m using 1Password as my SSH agent. It works really well, and uses the Mac Touch ID for authorization. Now I have no private keys in my ~/.ssh directory. I’ve been very impressed with 1Password’s helpful and comprehensive approach to configuration and settings.

Improved environment variables: I’ve updated my opvars and unopvars shell functions that set environment variables from 1Password. Now I can name sets of credentials (defaulting to the current directory name), and apply multiple sets. Then unopvars knows all that have been set, and clears all of them.

Public/private GitHub hosts: There’s a problem with using a fingerprint-gated SSH agent: some common operations want an SSH key but aren’t actually security sensitive. When pulling from a public repo, you don’t want to be interrupted to touch the sensor. Reading public information doesn’t need authentication, and you don’t want to become desensitized to the importance of the sensor. Pulling changes from a git repo with a “git@” address always requires SSH, even if the repo is public. It shouldn’t require an alarming interruption.

Git lets you define “insteadOf” aliases so that you can pull using “https:” and push using “git@”. The syntax seems odd and backwards to me, partly because I can define pushInsteadOf, but there’s no pullInsteadOf:

[url "git@github.com:"]
    # Git remotes of "git@github.com" should really be pushed using ssh.
    pushInsteadOf = git@github.com:

[url "https://github.com/"]
    # Git remotes of "git@github.com" should be pulled over https.
    insteadOf = git@github.com:

This works great, except that private repos still need to be pulled using SSH. To deal with this, I have a baroque contraption arrangement using a fake URL scheme “github_private:” like this:

[url "git@github.com:"]
    pushInsteadOf = git@github.com:
    # Private repos need ssh in both directions.
    insteadOf = github_private:

[url "https://github.com/"]
    insteadOf = git@github.com:

Now if I set the remote URL to “github_private:nedbat/secret.git”, then activity will use “git@github.com:nedbat/secret.git” instead, for both pushing and pulling. (BTW: if you start fiddling with this, “git remote -v” will show you the URLs after these remappings, and “git config --get-regex ‘remote.*.url’” will show you the actual settings before remapping.)

But how to set the remote to “github_private:nedbat/secret.git”? I can set it manually for specific repos with “git remote”, but I also clone entire organizations and don’t want to have to know which repos are private. I automate the remote-setting with an aliased git command I can run in a repo directory that sets the remote correctly if the repo is private:

[alias]
    # If this is a private repo, change the remote from "git@github.com:" to
    # "github_private:".  You can remap "github_private:" to "git@" like this:
    #
    #   [url "git@github.com:"]
    #       insteadOf = github_private:
    #
    # This requires the gh command: https://cli.github.com/
    #
    fix-private-remotes = "!f() { \
        vis=$(gh api 'repos/{owner}/{repo}' --template '{{.visibility}}'); \
        if [[ $vis == private ]]; then \
            for rem in $(git remote); do \
                echo Updating remote $rem; \
                git config remote.$rem.url $(git config remote.$rem.url | \
                    sed -e 's/git@github.com:/github_private:/'); \
            done \
        fi; \
    }; f"

This uses GitHub’s gh command-line tool, which is quite powerful. I’m using it more and more.

This is getting kind of complex, and is still a work in progress, but it’s working. I’m always interested in ideas for improvements.

Peter Bengtsson: Pip-Outdated.py - a script to compare requirements.in with the output of pip list --outdated

$
0
0

Simply by posting this, there's a big chance you'll say "Hey! Didn't you know there's already a well-known script that does this? Better." Or you'll say "Hey! That'll save me hundreds of seconds per year!"

The problem

Suppose you have a requirements.in file that is used, by pip-compile to generate the requirements.txt that you actually install in your Dockerfile or whatever server deployment. The requirements.in is meant to be the human-readable file and the requirements.txt is for the computers. You manually edit the version numbers in the requirements.in and then run pip-compile --generate-hashes requirements.in to generate a new requirements.txt. But the "first-class" packages in the requirements.in aren't the only packages that get installed. For example:

▶ cat requirements.in | rg '==' | wc -l
      54

▶ cat requirements.txt | rg '==' | wc -l
     102

In other words, in this particular example, there are 76 "second-class" packages that get installed. There might actually be more stuff installed that you didn't describe. That's why pip list | wc -l can be even higher. For example, you might have locally and manually done pip install ipython for a nicer interactive prompt.

The solution

The command pip list --outdated will list packages based on the requirements.txt not the requirements.in. To mitigate that, I wrote a quick Python CLI script that combines the output of pip list --outdated with the packages mentioned in requirements.in:

#!/usr/bin/env pythonimportsubprocessdefmain(*args):ifnotargs:requirements_in="requirements.in"else:requirements_in=args[0]required={}withopen(requirements_in)asf:forlineinf:if"=="inline:package,version=line.strip().split("==")package=package.split("[")[0]required[package]=versionres=subprocess.run(["pip","list","--outdated"],capture_output=True)ifres.returncode:raiseException(res.stderr)lines=res.stdout.decode("utf-8").splitlines()relevant=[lineforlineinlinesifline.split()[0]inrequired]longest_package_name=max([len(x.split()[0])forxinrelevant])ifrelevantelse0forlineinrelevant:p,installed,possible,*_=line.split()ifpinrequired:print(p.ljust(longest_package_name+2),"INSTALLED:",installed.ljust(9),"POSSIBLE:",possible,)if__name__=="__main__":importsyssys.exit(main(*sys.argv[1:]))

Installation

To install this, you can just download the script and run it in any directory that contains a requirements.in file.

Or you can install it like this:

curl -L https://gist.github.com/peterbe/099ad364657b70a04b1d65aa29087df7/raw/23fb1963b35a2559a8b24058a0a014893c4e7199/Pip-Outdated.py > ~/bin/Pip-Outdated.py
chmod +x ~/bin/Pip-Outdated.py

Pip-Outdated.py

PyPy: PyPy v7.3.11 release

$
0
0

PyPy v7.3.11: release of python 2.7, 3.8, and 3.9

The PyPy team is proud to release version 7.3.11 of PyPy. As could be expected, the first release of macOS arm64 impacted the macOS x86-64 build, so this is a bug release to restore the ability of macOS users to run PyPy on macOS < 11.0. It also incoporates the latest CPython stdlib updates released the day after 7.3.10 went out, and a few more bug fixes. The release includes three different interpreters:

  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.18+ (the + is for backported security updates)

  • PyPy3.8, which is an interpreter supporting the syntax and the features of Python 3.8, including the stdlib for CPython 3.8.16. Note we intend to drop support for this version in an upcoming release as soon as we release Pyython 3.10.

  • PyPy3.9, which is an interpreter supporting the syntax and the features of Python 3.9, including the stdlib for CPython 3.9.16.

The interpreters are based on much the same codebase, thus the multiple release. This is a micro release, all APIs are compatible with the other 7.3 releases and follows quickly on the heals of the 7.3.10 release on Dec 6.

We recommend updating. You can find links to download the v7.3.11 releases here:

https://pypy.org/download.html

We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for direct consulting work. If PyPy is helping you out, we would love to hear about it and encourage submissions to our blog via a pull request to https://github.com/pypy/pypy.org

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: bug fixes, PyPy and RPython documentation improvements, or general help with making RPython's JIT even better. Since the previous release, we have accepted contributions from one new contributor, thanks for pitching in, and welcome to the project!

If you are a python library maintainer and use C-extensions, please consider making a HPy / CFFI / cppyy version of your library that would be performant on PyPy. In any case, both cibuildwheel and the multibuild system support building wheels for PyPy.

What is PyPy?

PyPy is a Python interpreter, a drop-in replacement for CPython 2.7, 3.8 and 3.9. It's fast (PyPy and CPython 3.7.4 performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

We provide binary builds for:

  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)

  • 64-bit ARM machines running Linux (aarch64).

  • Apple M1 arm64 machines (macos_arm64).

  • s390x running Linux

PyPy support Windows 32-bit, Linux PPC64 big- and little-endian, and Linux ARM 32 bit, but does not release binaries. Please reach out to us if you wish to sponsor binary releases for those platforms. Downstream packagers provide binary builds for debian, Fedora, conda, OpenBSD, FreeBSD, Gentoo, and more.

What else is new?

For more information about the 7.3.11 release, see the full changelog.

Please update, and continue to help us make pypy better.

Cheers, The PyPy Team


Programiz: Python Program to Compute the Power of a Number

$
0
0
In this example, you will learn to compute the power of a number.

Programiz: Python Program to Capitalize the First Character of a String

$
0
0
In this example, you will learn to capitalize the first character of a string.

Programiz: Python Program to Create a Countdown Timer

$
0
0
In this example, you will learn to create a countdown timer.

Programiz: Python Program to Remove Duplicate Element From a List

$
0
0
In this example, you will learn to remove duplicate elements from a list.

Python Bytes: #316 Python 3.11 is here and it's fast (crossover)

$
0
0
<a href='https://www.youtube.com/watch?v=Iak-6AsMLsU' style='font-weight: bold;'>Watch on YouTube</a><br> <br> <p><strong>About the show</strong></p> <p>Sponsored by <a href="http://pythonbytes.fm/foundershub2022"><strong>Microsoft for Startups Founders Hub</strong></a>.</p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show announcements: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <hr /> <p>Hi folks. For our final episode of 2022 here on Python Bytes, we're crossing the streams with my other show, Talk Python To Me.</p> <p>I present to you one of the more important episodes of the year, the release of Python 3.11 with it's new features and 40% performance improvements.</p> <p>Thank you for listening to Python Bytes in 2022, have a great holiday break, and Brian and I will see you next week. </p> <hr /> <p>Python 3.11 is here! Keeping with the annual release cycle, the Python core devs have released the latest version of Python. And this one is a big one. It has more friendly error messages and is massively faster than 3.10 (between 10 to 60% faster) which is a big deal for a year over year release of a 30 year old platform.</p> <p>On this episode, we have Irit Katriel, Pablo Galindo Salgado, Mark Shannon, and Brandt Bucher all of whom participated in releasing Python this week on the show to tell us about that process and some of the highlight features.</p> <p><strong>Guests</strong></p> <ul> <li>Irit Katriel <a href="https://twitter.com/iritkatriel">@iritkatriel</a></li> <li>Mark Shannon <a href="https://www.linkedin.com/in/mark-shannon-bb459551/">linkedin.com</a></li> <li>Pablo Galindo Salgado <a href="https://twitter.com/pyblogsal">@pyblogsal</a></li> <li>Brandt Bucher <a href="https://github.com/brandtbucher/">github.com</a></li> </ul> <p><strong>Resources from the show</strong></p> <ul> <li>Michael's Python 3.11 Course <a href="https://talkpython.fm/py311">talkpython.fm/py311</a></li> <li>Python 3.11.0 is now available <a href="https://blog.python.org/2022/10/python-3110-is-now-available.html">blog.python.org</a></li> <li>PEP 101 - Releasing Python <a href="https://peps.python.org/pep-0101/">peps.python.org</a></li> <li>PEP 678 – Enriching Exceptions with Notes <a href="https://peps.python.org/pep-0678/">peps.python.org</a></li> <li>PEP 654 – Exception Groups and <code>except*</code> <a href="https://peps.python.org/pep-0654/">peps.python.org</a></li> <li>PEP 657 – Include Fine Grained Error Locations in Tracebacks <a href="https://peps.python.org/pep-0657/">peps.python.org</a></li> <li>Python Buildbot <a href="https://www.python.org/dev/buildbot/">python.org</a></li> <li>Making Python Faster Talk Python Episode <a href="https://talkpython.fm/episodes/show/339/making-python-faster-with-guido-and-mark">talkpython.fm</a></li> <li>Specializing, Adaptive Interpreter on Talk Python <a href="https://talkpython.fm/episodes/show/381/python-perf-specializing-adaptive-interpreter">talkpython.fm</a></li> <li>Specialist Visualizer <a href="https://github.com/brandtbucher/specialist">github.com</a></li> <li>"Zero cost" exception handling <a href="https://github.com/python/cpython/issues/84403">github.com</a></li> <li>Pyodide <a href="https://pyodide.org/en/stable/">pyodide.org</a></li> <li>pyscript <a href="https://pyscript.net">pyscript.net</a></li> </ul>

Talk Python to Me: #396: AI Goes on Trial For Writing Code (crossover)

$
0
0
For links and very detailed show notes, please view <a href="https://pythonbytes.fm/episodes/show/312/ai-goes-on-trial-for-writing-code">the original episode page</a> over on Python Bytes. Thanks for listening!<br/> <strong>Sponsors</strong><br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/awsinsiders'>AWS Insiders</a><br> <a href='https://talkpython.fm/assemblyai'>AssemblyAI</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>

The Python Coding Blog: Using Python’s NumPy To Improve Your Board Game Strategy: Your Odds When Attacking in ‘Risk’

$
0
0

I first played the board game Risk during my doctoral studies. We occasionally stayed up all night playing this game. I hadn’t played it for many years, but I bought it “for the kids” this Christmas, so I got to play it again. And soon, I found myself wondering what the odds are for the various attack configurations in the game. I could have gone down the route of working out the stats. But life’s too short. So I wrote a Python script instead to bring NumPy and board games together!

In this article, you’ll:

  • Learn how to use NumPy arrays instead of for loops to repeat operations
  • Use two-dimensional NumPy arrays
  • Find out more about named tuples

You can refer to Chapter 8 in The Python Coding Book to read a more in-detail introduction to numerical Python using NumPy.

Attack Configurations in Risk

This article is not about the board game itself. If you’ve played Risk, you may recall some or all of the rules. If you haven’t, you won’t learn all the rules here! I want to focus on the attack part of the game. A player can attack another player’s territory with one, two, or three dice. The player being attacked can defend with either one or two dice. Each die represents a toy troop in a toy army.

Here’s a summary of the rules for how one player attacks another player’s territory:

  • The two players roll one set of dice each. The attacker rolls the attack dice set. They can choose to roll one, two, or three dice. The defender rolls the defence dice set. They can choose to roll one or two dice. The players need sufficient toy troops available, but I won’t focus on this aspect of the gameplay here. So, for the sake of this article, all the combinations mentioned above are available
  • The highest value from the set of attack dice rolled is matched to the higher value from the defence set. If the attacker’s die has a greater value than the defender’s die, the attacker wins the bout, and the defender loses a toy troop from their territory. If the defender’s die has a higher value or if the values are tied, the attacker loses a toy troop
  • If both the attacker and defender rolled more than one die, the second highest values from both sets are matched, and the same rules as above apply

Let’s consider a few scenarios to clarify these rules. Here’s the first one:

  • The attacker chooses to attack with three dice and rolls 2, 6, and 3
  • The defender defends with two dice and rolls 4 and 5

The attacker’s 6, the highest value the attacker got, beats the defender’s higher value of 5. Therefore, the defender loses one toy troop.

The defender’s second value, 4, is higher than the attacker’s next-best value of 3. Therefore the attacker loses one toy troop.

Both players lose one toy troop each in this attack.

Here’s another scenario:

  • Attacker rolls three dice: 5, 5, and 3
  • Defender rolls two dice: 2 and 1

The defender loses two toy troops in this attack since the first 5 beats the 2 and the second 5 beats the 1.

Another attack:

  • Attacker rolls two dice: 4 and 2
  • Defender rolls two dice: 4 and 4

The attacker loses two toy troops in this attack since the defender’s first 4 beats the attacker’s 4 (recall that the defender wins when there’s a tie) and the defender’s second 4 beats the attacker’s 2.

Writing Python Code To Work Out An Attack’s Winning Odds

You can simulate an attack in Risk using a Python program:

  • Generate one, two, or three random numbers between 1 and 6 for the attacker and sort them in descending order
  • Generate one or two random numbers between 1 and 6 for the defender and sort them in descending order
  • Compare the maximum values from each set and determine whether the attacker or defender won
  • Compare the second value from each set, if present, and determine whether the attacker or defender won

You can simulate several attacks and keep a tally of how many bouts the attacker won and how many the defender won. The attacker’s winning percentage is:

(number of attacker wins / total number of bouts) * 100

To get a reasonable estimate of the probability of winning in each attack scenario, you’d need to run many attacks for each attack configuration and work out the winning percentages.

There are six attack configurations:

Attacker attacks with Defender defends with
32
31
22
21
12
11

Let’s run each scenario 10 million times to get a reasonable estimate of the winning probabilities.

Using a for loop to simulate repeated attacks

One way of proceeding would be to use nested for loops. You can loop through each of the six attack configurations shown in the table above. For each of these configurations, run another for loop which repeats 10 million times to simulate a large number of attacks. Keep a tally of wins and losses, and you can work out the winning percentage for each scenario.

I’ll show this code in an appendix, but it’s not the version I want to focus on in this article. So I’ll go straight to the second option…

NumPy and Board Games

NumPy is the key library in the Python ecosystem for numerical and scientific applications. The name NumPy stands for Numerical Python. If you’ve not used NumPy before, you can install it using pip by typing the following in the terminal:

$ pip install numpy

If you’ve never used NumPy before, you can still continue reading this article, as I’ll introduce everything I’ll use. However, you can also read a more in-detail introduction to NumPy in Chapter 8: Numerical Python for Quantitative Applications Using NumPy.

The key data structure NumPy introduces is the ndarray. The name stands for N-dimensional array. Although you’ll see some similarities between ndarray and Python’s list, there are also many differences between the two data structures.

NumPy is particularly efficient when you need to perform the same operation on each element of the structure. When using lists, you’d need to use a for loop to go through each element of the list. However, this is not required with NumPy arrays because of vectorization. You can perform element by element operations using NumPy arrays very efficiently because of how NumPy is written and implemented.

This feature makes NumPy perfect for testing a large number of runs of a board game such as Risk.

Simulating Attacks in Risk

Each attack in Risk can have one of 6 configurations depending on how many dice the attacker and defender choose to use. The table earlier in this article shows these configurations. You could create a list of tuples with all the options:

options = [(x, y) for x in range(1, 4) for y in range(1, 3)]

print(options)

This code gives the following output:

[(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)]

The pairs match the six attack configurations shown in the table earlier. In each pair, the first number is the number of attackers (the number of dice the attacker is using), and the second number is the number of defenders.

Using named tuples to store the attack configurations

However, I’ll go further and use a named tuple instead of a standard tuple to create these options. This will improve readability and minimise the risk of making errors in the code if you confuse the number of attackers and defenders.

There is more than one way of creating named tuples in Python. In the Sunrise Animation article I published earlier this year, I focused on using namedtuple from the collections module. In this article, I’ll use the NamedTuple class in the typing module:

from typing import NamedTuple

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

print(options)

The output now shows the same pairs as earlier, but this time as a list of named tuples Attack rather than standard tuples (I’m displaying this as a multi-line output for clarity):

[Attack(n_attackers=1, n_defenders=1),
 Attack(n_attackers=1, n_defenders=2),
 Attack(n_attackers=2, n_defenders=1),
 Attack(n_attackers=2, n_defenders=2),
 Attack(n_attackers=3, n_defenders=1),
 Attack(n_attackers=3, n_defenders=2)]

A named tuple allows you to access the data using the attribute names. You can still use the normal tuple indexing if you wish:

from typing import NamedTuple

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

option = options[-1]
print(option[0], option[1])
print(option.n_attackers, option.n_defenders)

The last two lines containing print() functions are identical:

3 2
3 2

Exploring Each of The Attack Configurations

Now, you can loop through the options list and test each of the six attack options. You can start by creating dice rolls for both attacker and defender depending on how many dice they each roll. You’ll use NumPy arrays for these. If you’ve used NumPy and random numbers before, you’re probably familiar with NumPy’s own version of the randint() function. You’ll start by using this function, np.random.randint():

from typing import NamedTuple

import numpy as np

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

for option in options:
    print(option)

    attack = np.random.randint(1, 7, option.n_attackers)
    print(attack)

    defence = np.random.randint(1, 7, option.n_defenders)
    print(defence)

You import NumPy using the conventional alias np. You create NumPy arrays attack and defence with the same length as the number of dice used. The number of dice is the value in either option.n_attackers or option.n_defenders.

This code gives the following output:

Attack(n_attackers=1, n_defenders=1)
[6]
[5]
Attack(n_attackers=1, n_defenders=2)
[3]
[4 3]
Attack(n_attackers=2, n_defenders=1)
[4 4]
[6]
Attack(n_attackers=2, n_defenders=2)
[5 2]
[6 6]
Attack(n_attackers=3, n_defenders=1)
[2 3 5]
[1]
Attack(n_attackers=3, n_defenders=2)
[5 1 6]
[2 5]

Each ndarray generated has random numbers between 1 and 6 representing the numbers on the dice rolled. The for loop runs six times. Therefore, you can see six sections in this output, one for each attack configuration.

Using NumPy’s Newer Generator For Random Numbers

If you’re familiar with the built-in random module, you’ve likely used random.randint() often. Therefore, replacing this with np.random.randint() may seem natural to you.

However, you should bear in mind that there are several differences between the two. A key difference which trips many is that random.randint() includes the endpoint, whereas np.random.randint() does not. Therefore a dice roll is represented as random.randint(1, 6) if using the built-in random module but np.random.randint(1, 7) if using NumPy.

In any case, NumPy has a newer system for generating random numbers and np.random.randint() is now a legacy function. So let’s replace this with NumPy’s preferred way of creating random numbers, which is to create a Generator instance using np.default_rng() and then call its methods:

from typing import NamedTuple

import numpy as np

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(1, 7, option.n_attackers)
    print(attack)

    defence = rng.integers(1, 7, option.n_defenders)
    print(defence)

I’m usually reluctant to use abbreviations such as rng for variable names. However, I’ll use the same variable name used in the NumPy documentation on this occasion. The method integers() is similar to NumPy’s randint(). The first and second arguments are the limits of the range from which the random number is chosen. By default, the high end of the range is exclusive, which is why you had to use 7 for a dice roll.

The third argument is the size of the array created. You’ll read more about this later.

Sorting the dice in each set in descending order

As you read earlier, you’ll need to match the maximum value from the attacker’s dice rolls with the maximum value from the defender’s dice rolls, and so on. Therefore, we can sort the arrays containing the dice rolls in descending order.

NumPy’s ndarray has a sort() method you can use:

from typing import NamedTuple

import numpy as np

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(1, 7, option.n_attackers)
    attack.sort()
    print(attack)

    defence = rng.integers(1, 7, option.n_defenders)
    defence.sort()
    print(defence)

This code outputs arrays with values sorted in ascending order:

Attack(n_attackers=1, n_defenders=1)
[1]
[4]
Attack(n_attackers=1, n_defenders=2)
[5]
[1 5]
Attack(n_attackers=2, n_defenders=1)
[2 5]
[4]
Attack(n_attackers=2, n_defenders=2)
[1 2]
[1 6]
Attack(n_attackers=3, n_defenders=1)
[2 3 3]
[4]
Attack(n_attackers=3, n_defenders=2)
[2 4 6]
[5 5]

Next, you can reverse the numbers in the arrays using the flip() function. flip() is not a method in the np.ndarray class but a function which returns a new array:

from typing import NamedTuple

import numpy as np

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(1, 7, option.n_attackers)
    attack.sort()
    attack = np.flip(attack)
    print(attack)

    defence = rng.integers(1, 7, option.n_defenders)
    defence.sort()
    defence = np.flip(defence)
    print(defence)

All the arrays are now sorted from highest to lowest:

Attack(n_attackers=1, n_defenders=1)
[5]
[6]
Attack(n_attackers=1, n_defenders=2)
[1]
[5 3]
Attack(n_attackers=2, n_defenders=1)
[6 1]
[4]
Attack(n_attackers=2, n_defenders=2)
[4 2]
[5 3]
Attack(n_attackers=3, n_defenders=1)
[4 1 1]
[2]
Attack(n_attackers=3, n_defenders=2)
[5 2 1]
[3 1]

Mid-Article Reflection

Let’s see what we’ve achieved so far and where to go from here. You’ve created all six attack configurations as a list of named tuples. You create a set of dice rolls for the attacker and defender for each one of these configurations. All dice rolls are ordered in descending order.

Here are the things you still need to do:

  • Determine how many wins the attacker has in each attack
  • Simulate 10 million attacks for each configuration
  • Work out the winning percentage for each attack configuration

Let’s use this mid-article reflection to look at how you can simulate 10 million attacks for each configuration.

One way of doing this would be to add another for loop within the one you already wrote. This would iterate 10 million times and generate different dice rolls every time.

However, when using NumPy, we want to learn to think using a different mindset when it comes to iteration. Instead of relying on loops using for or while, we should consider using “batch-processing” techniques with arrays. This process is called vectorization, as we’re using arrays as vectors. Don’t worry if you’re not familiar with vectors. You don’t need to understand vectors to understand vectorization.

Currently, the arrays attack and defence, which you’re creating using the integers() method, are one-dimensional (1D) arrays. They’re a single row of numbers, similar to Python’s list. We can change these arrays to two-dimensional (2D) ones with 10 million rows and one, two, or three columns, depending on how many dice are rolled. The rows represent different sets of dice rolls.

Changing attack and defence from 1D arrays to 2D arrays will affect all subsequent operations on these arrays. Therefore, you’ll need to refactor your code once you switch from 1D to 2D arrays.

When should you make this switch? There’s no right or wrong answer to this question. This depends on your experience with NumPy and your style of exploration when building a Python program. We could have created these arrays as 2D arrays right at the start when we first created them. We could do so now that we’ve made some progress using 1D arrays, or we could leave the switch until the latest point we can.

I’ll pick the third of these options for this article. However, remember there’s no single correct path to writing any program!

Removing The Dice Rolls That Are Not Needed

In most of the attack configurations, some dice rolls are not required. Once you’ve sorted the dice rolls in descending order, you can remove some of the values from the end of the arrays. For example, if the attacker attacks with three dice and the defender defends with two, you can discard the lowest dice roll in the attacker’s set.

The next step is to remove these unused dice rolls from the attack and defence arrays, as required. You’re trimming the arrays to have the same length within each attack configuration. You can remove values from the end of the arrays since you’ve already sorted them in descending order, and it’s the lowest values you need to remove. You can trim the longer one of the attack or defence arrays to the length of the shorter one.

Next, you can compare the trimmed arrays using the comparison operator >:

from typing import NamedTuple

import numpy as np

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(1, 7, option.n_attackers)
    attack.sort()
    attack = np.flip(attack)
    print(attack)

    defence = rng.integers(1, 7, option.n_defenders)
    defence.sort()
    defence = np.flip(defence)
    print(defence)

    min_length = min(len(attack), len(defence))
    result = attack[:min_length] > defence[:min_length]

    print(result)
    print()

You only need to compare attack and defence up to min_length since any values after that are the dice rolls you’re discarding. For example, if the attacker is attacking with three dice and the defender defends with two, then only the first two elements of attack need to be compared to the two values of defence.

You compare the truncated arrays directly using the greater than operator > and assign the returned value to the variable name result. Here’s the output from this code:

Attack(n_attackers=1, n_defenders=1)
[3]
[6]
[False]

Attack(n_attackers=1, n_defenders=2)
[1]
[4 1]
[False]

Attack(n_attackers=2, n_defenders=1)
[5 1]
[1]
[ True]

Attack(n_attackers=2, n_defenders=2)
[6 6]
[5 2]
[ True  True]

Attack(n_attackers=3, n_defenders=1)
[5 2 2]
[2]
[ True]

Attack(n_attackers=3, n_defenders=2)
[6 5 1]
[5 5]
[ True False]

If you’re new to NumPy, you may find the output surprising. The variable result is not a Boolean. Instead, it’s an array of Boolean values. When using comparison operators with NumPy arrays, the comparison operator acts element by element. The first value of one array is compared with the first value of the other array, and so on.

Let’s go through all six attack configurations simulated above. Recall that your output will differ from the version shown here since you’re generating random values each time you run this code.

  1. Attacker: one die (3). Defender: one die (6). The attacker loses since 3 is smaller than 6. result shows a single Boolean value False
  2. Attacker: one die (1). Defender: two dice (4, 1). The defender’s higher number, 4, is larger than the attacker’s only roll, 1. Therefore, the attacker loses their only toy troop and result has a single False value
  3. Attacker: two dice (5, 1). Defender: one die (1). Only the attacker’s higher number is taken into account. Since 5 is greater than 1, the attacker wins this bout, and result has a single True value
  4. Attacker: two dice (6, 6). Defender: two dice (5, 2). This attack has two bouts since both attacker and defender roll two dice each. The attacker wins with both dice and therefore secures two wins out of two. result has two values and both are True
  5. Attacker: three dice (5, 2, 2). Defender: one die (2). There’s only one bout to consider. The attacker’s highest number, 5, is greater than the defender’s die roll, 2. Therefore, the attacker wins the bout, and result has one True value
  6. Attacker: three dice (6, 5, 1). Defender: two dice (5, 5). There are two bouts to consider. The attacker wins the first one since 6 beats 5. The second number for both attacker and defender is 5, which means that the defender wins the bout since defenders win ties. result shows that the attacker had a 50% success rate in this attack with one win and one loss

Repeating The Exercise 10 Million Times

You’ve got code that can simulate any attack using any of the six possible attack configurations. To get a reasonable estimate of the success rate for each type of attack, you’ll need to repeat these attacks many times and get an average of the winning percentage.

Before you repeat each attack option 10 million times, let’s start by repeating them five times each. This will allow you to observe the results as you make changes to ensure you refactor your code correctly. When you’re convinced your code is working fine for five repetitions, you can repeat 10 million times.

2D NumPy arrays

Let’s explore 2D arrays in the REPL/Python Console before refactoring the main script. Let’s start with the 1D array you have so far:

>>> import numpy as np
>>> rng = np.random.default_rng()

>>> attack = rng.integers(1, 7, 3)
>>> attack
array([2, 2, 4])

>>> attack.shape
(3,)
>>> attack.ndim
1

rng.integers(1, 7, 3) creates an array of size 3 with random numbers from 1 to 6. The third argument in rng.integers() is size. When size is an integer, as in this case, the array created is a 1D array of length size.

The shape of this array is (3,). This notation may seem a bit strange, but it will make more sense when you see the shape of a 2D array. You also confirm that attack is a 1D array by using the ndim attribute.

Let’s change the argument assigned to size from a single integer to a tuple of integers. I’ll also explicitly name the argument for clarity:

>>> attack = rng.integers(1, 7, size=(5, 3))
>>> attack
array([[5, 2, 3],
       [6, 6, 2],
       [6, 5, 2],
       [1, 3, 3],
       [2, 3, 5]])

>>> attack.shape
(5, 3)
>>> attack.ndim
2

The size argument is now the tuple (5, 3). This represents the shape of the array, which you confirm when you show the value of attack.shape.

The array is now a 2D array with 5 rows and 3 columns. All elements in the array are random numbers between 1 and 6.

Let’s get some quick practice with manipulating a 2D NumPy array. Let’s assume you want to get the second item in the first row:

>>> attack[0, 1]
2

You use the familiar square brackets to index a NumPy array. However, you now use two index values. The first one represents the row index, and the second is the column index. As these are indices, they start from 0 like all indices in Python. Here are a few more examples:

# Second row. Third column
>>> attack[1, 2]
2

# Fourth row. Last column
>>> attack[3, -1]
3

# All the rows. Second column
>>> attack[:, 1]
array([2, 6, 5, 3, 3])

# Fifth row. All the columns
>>> attack[4, :]
array([2, 3, 5])

You’re now ready to refactor your Python script.

Refactoring code to add multiple attacks

You can add a new variable, n_repeats, and set it to 5. Therefore, you can use n_repeats to create 2D attack and defence arrays. Since you’re changing the nature of these arrays, you can comment out the lines with operations on these arrays for now as you’ll need to check that each one still does what you expect it to do:

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    # attack.sort()
    # attack = np.flip(attack)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    # defence.sort()
    # defence = np.flip(defence)
    print(defence)

    # min_length = min(len(attack), len(defence))
    # result = attack[:min_length] > defence[:min_length]

    # print(result)
    print()

The output from this code shows the attack and defence arrays for each of the six attack configurations. Each array has five rows representing the number of attacks you’re simulating for each configuration:

Attack(n_attackers=1, n_defenders=1)
[[1]
 [5]
 [6]
 [3]
 [3]]
[[6]
 [1]
 [5]
 [4]
 [6]]

Attack(n_attackers=1, n_defenders=2)
[[2]
 [6]
 [2]
 [2]
 [5]]
[[4 5]
 [1 4]
 [4 3]
 [1 5]
 [2 4]]

Attack(n_attackers=2, n_defenders=1)
[[6 2]
 [6 3]
 [3 1]
 [6 3]
 [5 1]]
[[3]
 [5]
 [3]
 [2]
 [4]]

Attack(n_attackers=2, n_defenders=2)
[[4 4]
 [1 2]
 [2 4]
 [6 5]
 [3 4]]
[[2 6]
 [2 2]
 [5 5]
 [2 4]
 [2 3]]

Attack(n_attackers=3, n_defenders=1)
[[3 2 3]
 [2 3 3]
 [3 4 6]
 [4 4 3]
 [4 5 6]]
[[4]
 [6]
 [4]
 [2]
 [4]]

Attack(n_attackers=3, n_defenders=2)
[[5 4 4]
 [4 6 1]
 [1 2 4]
 [2 2 5]
 [1 4 3]]
[[2 5]
 [1 4]
 [5 6]
 [5 4]
 [6 4]]

Sort each row in descending order

At the moment, each set of dice rolls is unordered. You can use the sort() method again. However, will this sort along rows or columns? One option is to try it out and see what happens:

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    # attack = np.flip(attack)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    # defence = np.flip(defence)
    print(defence)

    # min_length = min(len(attack), len(defence))
    # result = attack[:min_length] > defence[:min_length]

    # print(result)
    print()

Here’s the output from this code:

Attack(n_attackers=1, n_defenders=1)
[[1]
 [6]
 [5]
 [3]
 [1]]
[[5]
 [6]
 [3]
 [4]
 [6]]

Attack(n_attackers=1, n_defenders=2)
[[3]
 [2]
 [6]
 [6]
 [5]]
[[2 5]
 [1 6]
 [2 3]
 [1 5]
 [1 2]]

Attack(n_attackers=2, n_defenders=1)
[[1 2]
 [1 3]
 [3 6]
 [5 6]
 [3 3]]
[[1]
 [5]
 [4]
 [3]
 [2]]

Attack(n_attackers=2, n_defenders=2)
[[4 4]
 [3 5]
 [2 5]
 [5 6]
 [3 6]]
[[2 6]
 [2 5]
 [4 5]
 [2 2]
 [2 6]]

Attack(n_attackers=3, n_defenders=1)
[[3 5 6]
 [1 5 6]
 [2 4 5]
 [4 4 6]
 [3 4 5]]
[[6]
 [5]
 [3]
 [1]
 [1]]

Attack(n_attackers=3, n_defenders=2)
[[3 5 5]
 [4 5 6]
 [1 2 4]
 [3 5 6]
 [1 2 4]]
[[2 4]
 [1 3]
 [1 2]
 [3 3]
 [1 5]]

There are sufficient rows and columns to let you figure out the answer. sort() has sorted the values in each row and not in each column. You can confirm this by reading the documentation for sort(), which states that by default, the last axis is used. You’ll recall that the shape of the array is (5, 3). The last axis is the one that represents how many columns there are. This is the same as the number of items in each row.

Next, you need to reverse the numbers in each row using flip(). You were lucky with sort() since the default behaviour worked perfectly for this situation. So why not try your luck again with flip() and run the same line you had earlier?

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    attack = np.flip(attack)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    defence = np.flip(defence)
    print(defence)

    # min_length = min(len(attack), len(defence))
    # result = attack[:min_length] > defence[:min_length]

    # print(result)
    print()

The output seems to show that this works as well since the rows are in descending order:

Attack(n_attackers=1, n_defenders=1)
[[3]
 [5]
 [4]
 [1]
 [2]]
[[2]
 [2]
 [6]
 [3]
 [1]]

Attack(n_attackers=1, n_defenders=2)
[[5]
 [4]
 [4]
 [3]
 [1]]
[[5 3]
 [6 1]
 [3 1]
 [3 1]
 [5 3]]

Attack(n_attackers=2, n_defenders=1)
[[6 5]
 [6 3]
 [6 1]
 [4 4]
 [6 1]]
[[4]
 [6]
 [1]
 [6]
 [3]]

Attack(n_attackers=2, n_defenders=2)
[[3 1]
 [6 3]
 [5 2]
 [5 1]
 [6 2]]
[[4 1]
 [6 4]
 [6 5]
 [6 3]
 [5 3]]

Attack(n_attackers=3, n_defenders=1)
[[5 2 2]
 [4 4 2]
 [5 4 1]
 [6 3 1]
 [6 3 1]]
[[2]
 [3]
 [2]
 [6]
 [1]]

Attack(n_attackers=3, n_defenders=2)
[[6 3 2]
 [6 2 1]
 [4 3 2]
 [6 6 2]
 [6 6 2]]
[[5 3]
 [5 3]
 [5 2]
 [6 4]
 [6 3]]

All the rows are in descending order. Can you move one? Let’s look at the documentation for flip() first. This time, the default doesn’t flip the last axis. Instead, it flips over all the axes! In this example, it doesn’t really matter, as the order of the five rows is unimportant. However, this shows us that it’s still important to read the documentation even if we think we’ve reverse-engineered how the function works using trial and error!

Let’s dive a bit deeper into what flip() is doing with more experimentation in the REPL/Console:

>>> import numpy as np
>>> rng = np.random.default_rng()

>>> attack = rng.integers(1, 7, size=(5, 3))
>>> attack
array([[3, 2, 1],
       [5, 1, 4],
       [4, 4, 2],
       [3, 4, 4],
       [1, 3, 1]])

>>> np.flip(attack)
array([[1, 3, 1],
       [4, 4, 3],
       [2, 4, 4],
       [4, 1, 5],
       [1, 2, 3]])

>>> np.flip(attack, axis=0)
array([[1, 3, 1],
       [3, 4, 4],
       [4, 4, 2],
       [5, 1, 4],
       [3, 2, 1]])

>>> np.flip(attack, axis=1)
array([[1, 2, 3],
       [4, 1, 5],
       [2, 4, 4],
       [4, 4, 3],
       [1, 3, 1]])

When you call np.flip(attack) with no additional argument, the original array is flipped along both axes: left to right and top to bottom. Note that you’re not reassigning the array returned by flip() to attack, so attack always keeps its original values in this REPL example.

In the second test, you add axis=0 as an argument to flip(). axis=0 represents the vertical axis, and you’ll note that the array returned is flipped along the vertical axis but not along the horizontal.

In the final call, the argument is axis=1, which means that flip() will act along the horizontal axis. This is the option you need in your code since there’s no need to flip along the vertical axis as well. Therefore, you can add axis=1 to your calls to flip():

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    attack = np.flip(attack, axis=1)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    defence = np.flip(defence, axis=1)
    print(defence)

    # min_length = min(len(attack), len(defence))
    # result = attack[:min_length] > defence[:min_length]

    # print(result)
    print()

Truncate each row using the minimum length

In the previous version of the code, when you had 1D arrays, you used the built-in len() function to find the lengths of the attack and defence arrays. However, since you’re now using 2D arrays, you can use the shape attribute and get its second value which represents how many items there are in each row. Recall that if you have an array with five rows and three columns, its shape is (5, 3).

Therefore, you can replace:

min_length = min(len(attack), len(defence))

with:

min_length = min(attack.shape[1], defence.shape[1])

However, attack and defence are now 2D arrays. Therefore, attack[:min_length] and defence[:min_length] are no longer the expressions you need to slice these arrays since those are 1D slices.

You need to keep all the rows in the arrays but truncate the columns. You can achieve this using the expressions attack[:, :min_length] and defence[:, :min_length]. The colon before the comma represents all the rows. After the comma in the square brackets, you write the slice :min_length. This slice represents all the elements in each row from the beginning up to the element with index min_length - 1.

Here’s the updated code:

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    attack = np.flip(attack, axis=1)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    defence = np.flip(defence, axis=1)
    print(defence)

    min_length = min(attack.shape[1], defence.shape[1])
    result = attack[:, :min_length] > defence[:, :min_length]

    print(result)
    print()

Here’s the output from this code:

Attack(n_attackers=1, n_defenders=1)
[[4]
 [2]
 [2]
 [4]
 [4]]
[[3]
 [1]
 [3]
 [4]
 [3]]
[[ True]
 [ True]
 [False]
 [False]
 [ True]]

Attack(n_attackers=1, n_defenders=2)
[[4]
 [6]
 [4]
 [5]
 [1]]
[[5 4]
 [4 1]
 [6 1]
 [5 1]
 [5 5]]
[[False]
 [ True]
 [False]
 [False]
 [False]]

Attack(n_attackers=2, n_defenders=1)
[[6 1]
 [4 4]
 [6 4]
 [5 2]
 [5 1]]
[[4]
 [3]
 [3]
 [5]
 [2]]
[[ True]
 [ True]
 [ True]
 [False]
 [ True]]

Attack(n_attackers=2, n_defenders=2)
[[3 2]
 [3 2]
 [4 2]
 [5 1]
 [4 3]]
[[5 3]
 [6 5]
 [5 4]
 [4 3]
 [6 4]]
[[False False]
 [False False]
 [False False]
 [ True False]
 [False False]]

Attack(n_attackers=3, n_defenders=1)
[[6 5 2]
 [6 6 4]
 [5 4 2]
 [3 2 1]
 [6 4 4]]
[[3]
 [5]
 [2]
 [3]
 [5]]
[[ True]
 [ True]
 [ True]
 [False]
 [ True]]

Attack(n_attackers=3, n_defenders=2)
[[6 4 4]
 [4 4 2]
 [4 4 2]
 [5 1 1]
 [6 4 3]]
[[5 3]
 [5 3]
 [5 2]
 [6 1]
 [5 4]]
[[ True  True]
 [False  True]
 [False  True]
 [False False]
 [ True False]]

Let’s look at the output for the final attack configuration with three attackers and two defenders. attack is an array with five rows and three columns. Each row represents a set of dice rolls. defence also has five rows but only two columns since each set of dice rolls includes two dice.

The final array displayed is result which compares the truncated attack array with defence. Therefore, only the first two elements in each row of attack are compared with defence. The values in result are the outcomes from each comparison, element by element. Let’s look at the five attacks for the configuration with three attackers and two defenders:

  1. In the first attack out of the five simulated, the attacker got 6, 4 and 4. The defender got 5 and 3. Therefore the attacker wins both bouts. The first row in result is [True, True]
  2. Second attack: Attacker has 4, 4, 2. Defender has 5, 3. The defender’s 5 beats the attacker’s first 4. However, the attacker’s second 4 beats the defender’s 3. The defender wins the first bout, but the attacker wins the second: [False, True]
  3. Third attack: Attacker has 4, 4, 2. Defender has 5, 2. Result is similar to second attack. One win and one loss for the attacker
  4. Fourth attack: Attacker has 5, 1, 1. Defender has 6, 1. The defender’s 6 beats the attacker’s 5, and the defender wins the first bout. The defender’s 1 ties with the attacker’s 1 which means the defender also wins the second bout. The corresponding row in result is [False, False]
  5. Fifth attack: Attacker has 6, 4, 3. Defender has 5, 4. The attacker wins the first bout, but the defender wins the second because the defender wins in case of a tie: [True, False]

Working Out Winning Percentage

When you play Risk, you need to know how many of your toy troops you’re likely to keep when you attack. Therefore, you can find the number of times you win a bout as a percentage of all the bouts.

The number of items in result represents the total number of bouts. The number of True values in result represents the number of times the attacker won a bout. You can use these two numbers to determine the winning percentage for all six attack configurations.

You can start by counting how many True values there are by using np.sum(). Recall that in Python, True has the value of 1:

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    attack = np.flip(attack, axis=1)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    defence = np.flip(defence, axis=1)
    print(defence)

    min_length = min(attack.shape[1], defence.shape[1])
    result = attack[:, :min_length] > defence[:, :min_length]

    print(result)
    print(np.sum(result))
    print()

I’m only showing the output for the final attack configuration below for compactness:

# truncated output
# ...

Attack(n_attackers=3, n_defenders=2)
[[4 4 3]
 [6 4 1]
 [6 4 4]
 [5 2 1]
 [3 3 2]]
[[1 1]
 [4 3]
 [6 5]
 [3 2]
 [6 1]]
[[ True  True]
 [ True  True]
 [False False]
 [ True False]
 [False  True]]
6

There are six True values in result. Therefore, np.sum() returns this value. You can also find the total number of items in result using the attribute result.size. Now, you’re ready to work out and display the winning percentages:

from typing import NamedTuple

import numpy as np

n_repeats = 5

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    attack = np.flip(attack, axis=1)
    print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    defence = np.flip(defence, axis=1)
    print(defence)

    min_length = min(attack.shape[1], defence.shape[1])
    result = attack[:, :min_length] > defence[:, :min_length]

    print(result)
    winning_percentage = np.sum(result) / result.size * 100
    print(f"Winning percentage for {option} is {winning_percentage:0.2f} %")
    print()

The output shows how many times the attacker wins as a percentage:

Attack(n_attackers=1, n_defenders=1)
[[6]
 [1]
 [5]
 [5]
 [5]]
[[5]
 [1]
 [6]
 [5]
 [2]]
[[ True]
 [False]
 [False]
 [False]
 [ True]]
Winning percentage for Attack(n_attackers=1, n_defenders=1) is 40.00 %

Attack(n_attackers=1, n_defenders=2)
[[6]
 [5]
 [3]
 [5]
 [4]]
[[6 1]
 [5 1]
 [5 1]
 [6 1]
 [6 1]]
[[False]
 [False]
 [False]
 [False]
 [False]]
Winning percentage for Attack(n_attackers=1, n_defenders=2) is 0.00 %

Attack(n_attackers=2, n_defenders=1)
[[4 3]
 [3 3]
 [5 5]
 [2 1]
 [6 3]]
[[6]
 [5]
 [2]
 [6]
 [3]]
[[False]
 [False]
 [ True]
 [False]
 [ True]]
Winning percentage for Attack(n_attackers=2, n_defenders=1) is 40.00 %

Attack(n_attackers=2, n_defenders=2)
[[6 3]
 [4 4]
 [6 6]
 [6 1]
 [5 1]]
[[2 1]
 [4 4]
 [3 3]
 [5 5]
 [3 1]]
[[ True  True]
 [False False]
 [ True  True]
 [ True False]
 [ True False]]
Winning percentage for Attack(n_attackers=2, n_defenders=2) is 60.00 %

Attack(n_attackers=3, n_defenders=1)
[[6 6 1]
 [6 5 1]
 [6 6 3]
 [5 4 2]
 [6 2 1]]
[[4]
 [1]
 [2]
 [5]
 [5]]
[[ True]
 [ True]
 [ True]
 [False]
 [ True]]
Winning percentage for Attack(n_attackers=3, n_defenders=1) is 80.00 %

Attack(n_attackers=3, n_defenders=2)
[[4 3 2]
 [4 4 1]
 [6 4 4]
 [6 3 2]
 [6 5 2]]
[[2 1]
 [6 3]
 [3 1]
 [5 4]
 [5 5]]
[[ True  True]
 [False  True]
 [ True  True]
 [ True False]
 [ True False]]
Winning percentage for Attack(n_attackers=3, n_defenders=2) is 70.00 %

Getting an estimate for the likelihood of winning

You’re ready for the final step. You can change from simulating 5 attacks for each configuration to simulating 10 million. However, you probably don’t want to display the arrays this time! Your final step is to change the value of n_repeats and comment out or delete the print() functions which display the arrays:

from typing import NamedTuple

import numpy as np

n_repeats = 10_000_000

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

rng = np.random.default_rng()

for option in options:
    print(option)

    attack = rng.integers(
        1, 7, size=(n_repeats, option.n_attackers)
    )
    attack.sort()
    attack = np.flip(attack, axis=1)
    # print(attack)

    defence = rng.integers(
        1, 7, size=(n_repeats, option.n_defenders)
    )
    defence.sort()
    defence = np.flip(defence, axis=1)
    # print(defence)

    min_length = min(attack.shape[1], defence.shape[1])
    result = attack[:, :min_length] > defence[:, :min_length]

    # print(result)
    winning_percentage = np.sum(result) / result.size * 100
    print(f"Winning percentage for {option} is {winning_percentage:0.2f} %")
    print()

Here’s the final output from this code:

Attack(n_attackers=1, n_defenders=1)
Winning percentage for Attack(n_attackers=1, n_defenders=1) is 41.68 %

Attack(n_attackers=1, n_defenders=2)
Winning percentage for Attack(n_attackers=1, n_defenders=2) is 25.47 %

Attack(n_attackers=2, n_defenders=1)
Winning percentage for Attack(n_attackers=2, n_defenders=1) is 57.88 %

Attack(n_attackers=2, n_defenders=2)
Winning percentage for Attack(n_attackers=2, n_defenders=2) is 38.98 %

Attack(n_attackers=3, n_defenders=1)
Winning percentage for Attack(n_attackers=3, n_defenders=1) is 65.98 %

Attack(n_attackers=3, n_defenders=2)
Winning percentage for Attack(n_attackers=3, n_defenders=2) is 53.96 %

When an attacker attacks with one die and the defender defends with one die, the defender has the advantage since ties are settled in favour of the defender.

It’s also clear you should never attack with one die when the defender is defending with two!

The most common type of attack in Risk (at least when I play the game) is three attackers against two defenders. Although the attacker does have the advantage in this case, it’s not as high as one might expect when rolling three dice against the defender’s two!

Final Words

You’re now better placed at playing and winning Risk as long as your opponents haven’t read this article as well!

More importantly, you’ve explored how to use NumPy arrays to perform “batch processing” using vectorization. This type of operation is what makes NumPy faster in many situations. I present a version of this code that does not use NumPy in the appendix and compare the time it took for the NumPy and non-NumPy versions to run. I won’t provide spoilers here, so glance at the appendix if you want to see the performance improvement when using NumPy.

Often, when people first learn about NumPy and start to use it, they think that all they need to do is replace lists with NumPy arrays, and the rest doesn’t change. However, the coding style changes when you want to use NumPy arrays effectively. Code that uses NumPy will look and feel very different from code that uses ‘vanilla’ Python.


Appendix: Using for Loops Instead Of NumPy

What if you don’t want to use NumPy. Of course, there are other non-NumPy solutions, too. Here’s one alternative which includes using for loops and doesn’t use NumPy. I’ll present the code without explanation here, but you’ll spot a lot of the logic we’ve already used in the main article:

import random
from typing import NamedTuple

n_repeats = 10_000_000

class Attack(NamedTuple):
    n_attackers: int
    n_defenders: int

options = [
    Attack(x, y) for x in range(1, 4) for y in range(1, 3)
]

for option in options:
    print(option)
    win_tally = 0
    min_length = min(option.n_attackers, option.n_defenders)

    for _ in range(n_repeats):
        attack = [
            random.randint(1, 6)
            for _ in range(option.n_attackers)
        ]
        attack.sort(reverse=True)
        # print(attack)

        defence = [
            random.randint(1, 6)
            for _ in range(option.n_defenders)
        ]
        defence.sort(reverse=True)
        # print(defence)

        result = [
            attack_value > defence_value
            for attack_value, defence_value in zip(
                attack[:min_length], defence[:min_length]
            )
        ]

        # print(result)
        win_tally += sum(result)
        # print()

    winning_percentage = (
        win_tally / (n_repeats * min_length) * 100
    )
    print(
        f"Winning percentage for {option} is {winning_percentage:0.2f} %"
    )

Here’s the output from the code, which gives the same values as the NumPy version, as you would expect:

Attack(n_attackers=1, n_defenders=1)
Winning percentage for Attack(n_attackers=1, n_defenders=1) is 41.68 %
Attack(n_attackers=1, n_defenders=2)
Winning percentage for Attack(n_attackers=1, n_defenders=2) is 25.44 %
Attack(n_attackers=2, n_defenders=1)
Winning percentage for Attack(n_attackers=2, n_defenders=1) is 57.85 %
Attack(n_attackers=2, n_defenders=2)
Winning percentage for Attack(n_attackers=2, n_defenders=2) is 38.97 %
Attack(n_attackers=3, n_defenders=1)
Winning percentage for Attack(n_attackers=3, n_defenders=1) is 65.96 %
Attack(n_attackers=3, n_defenders=2)
Winning percentage for Attack(n_attackers=3, n_defenders=2) is 53.98 %

If you’re relatively new to Python, this version may make more sense to you and fits the programming style you’re more used to. Programming in NumPy requires a different mindset to make use of the advantages of vectorization.

So let’s finish by quantifying the benefit of using NumPy in this example. It took 2.5 seconds to run the code using NumPy. It took the code in this appendix, which uses for loops instead of NumPy, 106.6 seconds using Python 3.11. The NumPy version is over 40 times faster for this scenario.

Further Reading


Get the latest blog updates

No spam promise. You’ll get an email when a new blog post is published


The post Using Python’s NumPy To Improve Your Board Game Strategy: Your Odds When Attacking in ‘Risk’ appeared first on The Python Coding Book.


Python for Beginners: Split a String into Characters in Python

$
0
0

Strings are used in python to handle text data. In this article, we will discuss different ways to split a string into characters in Python.

Can We Split a String Into Characters Using the split() Method?

In python, we usually use the split() method on a string to split it into substrings. The split() method, when invoked on a string, takes a character as a separator as its input argument. After execution, it splits the string at all the instances where the separator is present and returns a list of substrings. For instance, consider the following example.

myStr="Python For Beginners"
print("The input string is:",myStr)
output=myStr.split()
print("The output is:",output)

Output:

The input string is: Python For Beginners
The output is: ['Python', 'For', 'Beginners']

One can say that we can use an empty string to split a string into characters. Let us try that.

myStr="Python For Beginners"
print("The input string is:",myStr)
output=myStr.split("")
print("The output is:",output)

Output:

ValueError: empty separator

In this example, we have passed an empty string as a separator to split the string into characters. However, the program runs into a ValueError exception saying that you have used an empty separator.

So, we cannot use the split() method to split a python string into characters. Let us discuss other approaches for this.

Split a String Into Characters Using the for Loop in Python

A string in python is an iterable object. Hence, we can access the character of a string one by one using a for loop. 

To split a string using the for loop, we will first define an empty list to contain the output characters. Then, we will iterate through the characters of the string using a python for loop. While iteration, we will add each character of the string to the list using the append() method. The append() method, when invoked on a list, takes a character as its input argument and appends it to the end of the list.

After execution of the for loop, we will get all the characters of the string in the list. You can observe this in the following example.

output=[]
myStr="Python For Beginners"
print("The input string is:",myStr)
for character in myStr:
    output.append(character)
print("The output is:",output)

Output:

The input string is: Python For Beginners
The output is: ['P', 'y', 't', 'h', 'o', 'n', ' ', 'F', 'o', 'r', ' ', 'B', 'e', 'g', 'i', 'n', 'n', 'e', 'r', 's']

In the above example, we have split the string "Python For Beginners"into a list of characters.

String to List of Characters Using List Comprehension

List comprehension is used to create a list from elements of an existing iterable object in python. It’s a better alternative to the approach using a for loop and the append() method as we can convert any iterable object into a list with a single python statement. 

You can split a string into a list of characters using list comprehension as shown below.

myStr="Python For Beginners"
print("The input string is:",myStr)
output=[character for character in myStr]
print("The output is:",output)

Output:

The input string is: Python For Beginners
The output is: ['P', 'y', 't', 'h', 'o', 'n', ' ', 'F', 'o', 'r', ' ', 'B', 'e', 'g', 'i', 'n', 'n', 'e', 'r', 's']

In this example, we have used list comprehension instead of the for loop. Hence, we are able to get the same results in fewer lines of code.

String to List of Characters Using the list() Function in Python

The list() constructor is used to create a list from any iterable object such as a string, set, or tuple in python. It takes the iterable object as its input argument and returns a list containing the elements of the iterable object. 

To break a string into characters, we will pass the input string to the list() function. After execution, it will return a list containing all the characters of the input string. You can observe this in the following example.

myStr="Python For Beginners"
print("The input string is:",myStr)
output=list(myStr)
print("The output is:",output)

Output:

The input string is: Python For Beginners
The output is: ['P', 'y', 't', 'h', 'o', 'n', ' ', 'F', 'o', 'r', ' ', 'B', 'e', 'g', 'i', 'n', 'n', 'e', 'r', 's']

Using the tuple() Function

A tuple is an immutable version of a list. Hence, you can also convert a string into a character of tuples. For this, you can use the tuple() function. The tuple() function takes the iterable object as its input argument and returns a tuple containing the elements of the iterable object. 

To split a string, we will pass the input string to the tuple() function. After execution, it will return a tuple containing all the characters of the input string. You can observe this in the following example.

myStr="Python For Beginners"
print("The input string is:",myStr)
output=tuple(myStr)
print("The output is:",output)

Output:

The input string is: Python For Beginners
The output is: ('P', 'y', 't', 'h', 'o', 'n', ' ', 'F', 'o', 'r', ' ', 'B', 'e', 'g', 'i', 'n', 'n', 'e', 'r', 's')

Conclusion

In this article, we have discussed different ways to split a string into characters in python. 

To learn more about python programming, you can read this article on string manipulation. You might also like this article on python simplehttpserver.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Split a String into Characters in Python appeared first on PythonForBeginners.com.

Moshe Zadka: The "Dynamic" Properties in PyProject

$
0
0

When writing a pyproject.toml file, the project section is optional. However, if it does exist, two of its properties are required:

  • name
  • version

If these two properties are not there, the section will be ignored.

This is a lie. But it is not a big lie: it is almost true.

In general, if either of these two properties are not there, the section will be ignored. However, there is a way to indicate that either, or both, of these properties will be filled in by the build system later on.

This is done with dynamic.

For example

[project]name="my-package"dynamic=["version"]

This is the most common setting. However, it is possible to set dynamic to ["name", "version"] and avoid both parameters.

Zero to Mastery: Python Monthly Newsletter 💻🐍

$
0
0
37th issue of the Python Monthly Newsletter! Read by 25,000+ Python developers every month. This monthly Python newsletter covers the latest Python news so that you stay up-to-date with the industry and keep your skills sharp.

CodersLegacy: Setup Virtual Environment for Pyinstaller with Venv

$
0
0

In this Python tutorial, we will discuss how to optimize your Pyinstaller EXE’s using Virtual Environments. We will be using the “venv” library to create the Virtual environment for Pyinstaller, which is actually already included in every Python installation by default (because its part of the standard library).

We will walk you through the entire process, starting from “what” virtual environments are, “why” we need them and “how” to create one.


Understanding Pyinstaller

Let me start by telling you how Pyinstaller works. We all know that Pyinstaller creates a standalone EXE which bundles all the dependencies, allowing it to run on any system.

What alot of people do not know however, is “HOW” Pyinstaller does this.

Let me explain.

What Pyinstaller does is “freezes” your Python environment into what we call a “frozen” application. In non-technical terms, this means to take bundle everything in your Python environment, like your Python installation, libraries that you have installed, and other dependencies (e.g DLL’s or Data files) you may be using, into a single application.

Kind of like taking a “snap-shot” of your program in it’s running state (with all the dependencies active) and saving it.

With this understanding, we can now safely explain the benefits of Virtual Environments.


What are Virtual Environments in Python?

Imagine for a moment that you have a 100 libraries installed for Python on your device. You might think you do not have many, but the truth is that when you download a big library (e.g Matplotlib) it downloads several other libraries along with it (as dependencies).

To check the currently installed libraries on our systems (excluding the ones included by default), run the following command.

pip list

It will give you something like that following output.

altgraph                  0.17.3
astroid                   2.12.9
async-generator           1.10
attrs                     22.2.0
auto-py-to-exe            2.24.1
Automat                   22.10.0
autopep8                  1.7.0
Babel                     2.10.3
beautifulsoup4            4.11.1
bottle                    0.12.23
bottle-websocket          0.2.9
bs4                       0.0.1
cad-to-shapely            0.3.1
cairocffi                 1.3.0
CairoSVG                  2.5.2
certifi                   2022.12.7
cffi                      1.15.1
chardet                   4.0.0
colorama                  0.4.5
constantly                15.1.0
cryptography              38.0.4
cssselect                 1.2.0
cssselect2                0.6.0
cx-Freeze                 6.13.1
cycler                    0.11.0
Cython                    0.29.32
defusedxml                0.7.1
dill                      0.3.5.1
Eel                       0.14.0
et-xmlfile                1.1.0
exceptiongroup            1.1.0
ezdxf                     0.18
filelock                  3.8.0
fonttools                 4.34.4
future                    0.18.2
geomdl                    5.3.1
gevent                    22.10.2
gevent-websocket          0.10.1
greenlet                  2.0.1
h11                       0.14.0
hyperlink                 21.0.0
idna                      2.10
incremental               22.10.0
isort                     5.10.1
itemadapter               0.7.0
itemloaders               1.0.6
Jinja2                    3.0.1
jmespath                  1.0.1
kiwisolver                1.4.4
lazy-object-proxy         1.7.1
lief                      0.12.3
lxml                      4.9.1
MarkupSafe                2.1.1
matplotlib                3.5.3
mccabe                    0.7.0
more-itertools            8.14.0
MouseInfo                 0.1.3
mpmath                    1.2.1
Nuitka                    1.2.4
numexpr                   2.8.3
numpy                     1.23.1
openpyxl                  3.0.10
ordered-set               4.1.0
outcome                   1.2.0
packaging                 21.3
pandas                    1.4.3
pandastable               0.13.0
parsel                    1.7.0
pefile                    2022.5.30
Pillow                    8.4.0
pip                       22.2.1
platformdirs              2.5.2
Protego                   0.2.1
pyasn1                    0.4.8
pyasn1-modules            0.2.8
PyAutoGUI                 0.9.53
pycodestyle               2.9.1
pycparser                 2.21
PyDispatcher              2.0.6
pygal                     3.0.0
pygame                    2.1.2
pygame-menu               4.2.8
PyGetWindow               0.0.9
pyinstaller               5.6.2
pyinstaller-hooks-contrib 2022.13
pylint                    2.15.2
PyMsgBox                  1.0.9
PyMuPDF                   1.20.2
pyOpenSSL                 22.1.0
pyparsing                 3.0.9
pyperclip                 1.8.2
PyQt5                     5.15.7
PyQt5-Qt5                 5.15.2
PyQt5-sip                 12.11.0
PyQt6                     6.4.0
PyQt6-Qt6                 6.4.1
PyQt6-sip                 13.4.0
PyRect                    0.2.0
PyScreeze                 0.1.28
PySocks                   1.7.1
pytest-check              1.0.6
python-dateutil           2.8.2
pytweening                1.0.4
pytz                      2022.1
pywin32-ctypes            0.2.0
queuelib                  1.6.2
requests                  2.25.1
requests-file             1.5.1
rhino-shapley-interop     0.0.4
rhino3dm                  7.15.0
scipy                     1.9.0
Scrapy                    2.7.1
sectionproperties         2.0.3
selenium                  4.7.2
service-identity          21.1.0
setuptools                63.2.0
Shapely                   1.8.2
six                       1.16.0
sniffio                   1.3.0
sortedcontainers          2.4.0
soupsieve                 2.3.2.post1
sympy                     1.10.1
tabulate                  0.8.10
tinycss2                  1.1.1
tkcalendar                1.6.1
tkdesigner                1.0.6
tkinter-tooltip           2.1.0
tldextract                3.4.0
toml                      0.10.2
tomli                     2.0.1
tomlkit                   0.11.4
triangle                  20220202
trio                      0.22.0
trio-websocket            0.9.2
Twisted                   22.10.0
twisted-iocpsupport       1.0.2
typing_extensions         4.3.0
urllib3                   1.26.13
w3lib                     2.1.1
webencodings              0.5.1
whichcraft                0.6.1
wrapt                     1.14.1
wsproto                   1.2.0
xlrd                      2.0.1
zope.event                4.5.0
zope.interface            5.5.2

That was quite a long list right? I don’t even recognize half of those libraries (they were installed as dependencies). And this is on a relatively new Python installation (3-4 months old). Running this on my old device might have given me double the above amount.

Now you might have already put two-and-two together and realized the problem here.

When we normally use Pyinstaller to bundle our applications, it ends up including ALL of the libraries that we have installed. Regardless of whether they are actually needed, or not.

Now obviously this is a big problem, especially if you have several large libraries lying around which are not actually being used.


The Solution?

Virtual Environments!

Now, what we could do is setup a new Python installation on your PC and only install the required packages (which you know are being used). But this is an extra hassle, and can cause issues with your current Python installation if you are not careful.

Instead, we use Virtual environments which basically create a “fresh copy” of your current Python version, without any of the installed libraries. You can create as many virtual environments as you want!

We can then compile our Pyinstaller EXE’s inside these virtual environments (just like how we normally do). This time the EXE will only include the bare minimum number of libraries.

It is actually recommended to have a virtual environment for each major project you have. This is to ensure that there is no library conflict, and to ensure version control (the version of the libraries you are using).


Version Control in Virtual Environments with Venv

For example, a common issue that can happen is when you install a new library “A” (unrelated to your application) and it requires a dependency “B”, which is also required by library “C”.

Library “A” requires the dependency to be at atleast version 1.3 (random version number i picked), whereas library “C” only works with the dependency “B” up-to version 1.2 (1.3 onwards not supported).

Hence, we now have a conflict issue. There are many other scenarios like this under which problems can occur. This is just one of them.

Virtual environments help isolate projects and dependencies into separate environments, minimizing the risk of conflict.


Creating a Pyinstaller Virtual Environment with Venv

Now for the actual implementation part of the tutorial. The first thing we will do is setup our Virtual environments.

For users with Python added to PATH, run the following command.

python -m venv tutorial

In the above command, “tutorial” is the name of the virtual environment. This is completely your choice what you choose to name it. Also pay attention to which folder you are running this command in. The virtual environment will be created there.

For users who do not have Python added to PATH, you need to find the path to your Python installation. You can typically find it in a location like this:

C:\Users\CodersLegacy\AppData\Local\Programs\Python\Python310

To create a virtual environment you need to run the following command (swapping out “python” for the location of your python.exe file)

C:\Users\CodersLegacy\AppData\Local\Programs\Python\Python310\python.exe -m venv tutorial

Activating the Virtual Environment

We aren’t done yet though. The Virtual environment needs to be activated first! If you completed the previous step, we should have a folder structure something like this.

-- virtuals_envs_folder
    -- tutorial

virtual_envs_folder is simply a parent folder where we ran the previous commands for creating the virtual environment.

We will now add a new Python file to the virtual_envs_folder (not the tutorial folder). So now our file structure is something like this.

-- virtuals_envs_folder
    -- tutorial
    -- file.py

file.py is where all the code will go, which we want want to convert to a pyinstaller exe. Any supporting files, folder or libraries you have created can also be added here.

Now we need to run another command which will activate the virtual environment. The command can vary slightly depending on what terminal/console/OS you are using.

Command Prompt: (Windows)

C:\Users\CodersLegacy\virtual_envs_folder> tutorial\Scripts\activate.bat

Windows PowerShell:

C:\Users\CodersLegacy\virtual_envs_folder> tutorial\Scripts\Activate.ps1

Linux:

C:\Users\CodersLegacy\virtual_envs_folder> tutorial\bin\activate

Congratulations, now your Virtual environment is now activated and ready to run! Our command-line will now be pointing to the Python installation inside our Virtual environment instead of the main Python installation. (This effect will end once you close the command line/terminal)

Your virtual environment folder (tutorial) should look something like this:

-- tutorial
   -- Include
   -- Lib
   -- Scripts
-- file.py

The two important files here are “Lib” and “Scripts”. “Lib” is where all of our installed libraries will go. “Scripts” is where our Python.exe file is.

Your command prompt/terminal should also look something like this:

(tutorial) C:\Users\CodersLegacy\virtual_envs_folder>

Notice the “(tutorial)” which is now included right in the start. If this has appeared, your Venv Virtual Environment is ready to use with Python and Pyinstaller.



Setting up our Application in the Virtual Environment

Now we will begin installing the required libraries we need. Here is some sample code we will be using in our file.py file.

import tkinter as tk
from tkinter.filedialog import askopenfilename, asksaveasfile
import numpy as np
from matplotlib.figure import Figure
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.path import Path
from matplotlib.patches import PathPatch
from matplotlib.collections import PatchCollection
from pandastable import Table
import pandas as pd

   
class Window():
    def __init__(self, master):
        self.main = tk.Frame(master, background="white")
        
        self.rightframe = tk.Frame(self.main, background="white")
        self.rightframe.pack(side=tk.LEFT)
        self.leftframe = tk.Frame(self.main, background="white")
        self.leftframe.pack(side=tk.LEFT)

        self.rightframeheader = tk.Frame(self.rightframe, background="white")
        self.button1 = tk.Button(self.rightframeheader, text='Import CSV',  command=self.import_csv, width=10)
        self.button1.pack(pady = (0, 5), padx = (10, 0), side = tk.LEFT)  

        self.button2 = tk.Button(self.rightframeheader, text='Clear',  command=self.clear, width=10)
        self.button2.pack(padx = (10, 0), pady = (0, 5), side = tk.LEFT)  

        self.button3 = tk.Button(self.rightframeheader, text='Generate Plot',  command=self.generatePlot, width=10)
        self.button3.pack(pady = (0, 5), padx = (10, 0), side = tk.LEFT)  
        self.rightframeheader.pack()

        self.tableframe = tk.Frame(self.rightframe, highlightbackground="blue", highlightthickness=5)
        self.table = Table(self.tableframe, dataframe=pd.DataFrame(), width=300, height=400)
        self.table.show()
        self.tableframe.pack()

        self.canvas = tk.Frame(self.leftframe)
        self.fig = Figure()
        self.ax = self.fig.add_subplot(111)
        self.graph = FigureCanvasTkAgg(self.fig, self.canvas)
        self.graph.draw()
        self.graph.get_tk_widget().pack()
        self.canvas.pack(padx=(20, 0))

        self.main.pack()


    def import_csv(self):
        types = [("CSV files","*.csv"),("Excel files","*.xlsx"),("Text files","*.txt"),("All files","*.*") ]
        csv_file_path = askopenfilename(initialdir = ".", title = "Open File", filetypes=types)
        tempdf = pd.DataFrame()

        try:
            tempdf = pd.read_csv(csv_file_path)
        except:
            tempdf = pd.read_excel(csv_file_path)
            
        self.table.model.df = tempdf
        self.table.model.df.columns = self.table.model.df.columns.str.lower()
        self.table.redraw()  
        self.generatePlot()

    def clear(self):
        self.table.model.df = pd.DataFrame()
        self.table.redraw()
        self.ax.clear()
        self.graph.draw_idle()

    def generatePlot(self):
        self.ax.clear()
        if not(self.table.model.df.empty): 
            df = self.table.model.df.copy()
            self.ax.plot(pd.to_numeric(df["x"]), pd.to_numeric(df["y"]), color ='tab:blue', picker=True, pickradius=5)   
        self.graph.draw_idle()     

root = tk.Tk()
window = Window(root)
root.mainloop()

Now you have two options. You can either go and install each library individually using “pip”, or you can create a requirements.txt file. We will go with the latter option, because its a recommended approach for helping to maintain the correct versions.

First create a new requirements.txt file in the parent folder of your virtual environment. File structure should look like this:

-- virtuals_envs_folder
    -- tutorial
    -- file.py
    -- requirements.txt

Now we will open a new command prompt (unrelated to our Virtual environment), and use it to check the versions of each our required libraries.

We can check the version of an installed library using pip show <library-name>. Running pip show matplotlib, gives us the following output.

Name: matplotlib
Version: 3.5.3

We will now add this information to our requirements.txt file. For the sample code we provided above, our file will look like this:

matplotlib==3.5.3
sympy==1.10.1
pandastable==0.13.0
pandas==1.4.3
numpy==1.23.1
scipy==1.9.0
pyinstaller==5.6.2
auto-py-to-exe==2.24.1

Now we will run the following command to have them all installed in one go.

pip install -r requirements.txt

Using Pyinstaller in our Venv Virtual Environment

And now, we are FINALLY done with all the setup!

But don’t forget why we are doing all of this!. As a test, try running pip show again, and see how many libraries get printed out this time. It should be alot less than what we had before.

All we need to do now is run the pyinstaller command like we would do normally.

pyinstaller --noconsole --onefile file.py

This will generate our Pyinstaller EXE in our Venv Virtual environment (or in the parent folder)! Now observe the size difference and let us know down in the comments section how much of an improvement you got!

There should also be a slight speed bonus, due to the smaller size and lower number of modules to load.


If you are looking to further reduce the size of your Python module, the UPX packer is a great way to easily bring down the size of your EXE substantially. Check out our tutorial on it!


This marks the end of the “SetupVirtual Environment for Pyinstaller with Venv” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

The post Setup Virtual Environment for Pyinstaller with Venv appeared first on CodersLegacy.

Wyatt Baldwin: PDM vs Poetry

$
0
0
A few years back, I started using poetry to manage dependencies for all my Python projects. I ran into some minor issues early on but haven’t had any problems recently and prefer it to any of the other dependency management / packaging solutions I’ve tried so far. Recently, I’ve started hearing about pdm and how it’s the bee’s knees. I did a search for “pdm vs poetry” and didn’t find much, so I thought I’d play around with pdm a bit and write something myself.
Viewing all 23170 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>