Mike C. Fletcher: Lessons from Implementing from Scratch

November 13, 2018, 11:52 am

≫ Next: PyCoder’s Weekly: Issue #342 (Nov. 13, 2018)

≪ Previous: Codementor: Quicksort tutorial: Python implementation with line by line explanation

So the last two days I was sprinting at PyCon CA. My original intent was to investigate tools for formalizing experiments with machine learning and reinforcement learning in particular. What we started with (and in the end, ended with) was a kind of "implement reinforcement (Deep Q) learning from scratch" project, where we tried to reconstruct my memories of Deep Q as a simple stand-alone solution and then spent the rest of the time debugging the result.

We had lots of moments where we wound up seeing no learning due to dumb/simple/avoidable errors (mostly mine). At least some of those were due to us trying to build up the solution from scratch level-by-level. That is, we tried to do first a random agent, then an agent that just mapped state directly to best-action... and oops, turns out that one is really just not going to work. The Q function really is needed to make the problem tractable, and we lost a lot of time trying to get to a stable point before going to the final solution. We were seeing lots of "passing" solutions with a straight random search (and a half a dozen random searches that were not intentionally random searches, but were actually vanishing or exploding gradients), but obviously the end of that process was junk. We also had our epsilon gradient calculation wrong, which meant that we'd just stop exploring way too early...

At the end of the sprint, it really does seem I need to focus on more formalizing and providing testing/insight during experiments. So many of the problems should have been obvious if we'd had tooling to show that given fields and values were being set to 0 or 1. At two different points we were debugging for a significant amount of time just to discover we had 0 records being passed into the training due to a missed append call. Dumb/simple metrics, insights into the weight tables (so you can see whether patterns are forming or just dropping to a single value), historic trial data to show you progress/regressions, etc. etc.

↧

PyCoder’s Weekly: Issue #342 (Nov. 13, 2018)

November 13, 2018, 12:30 pm

≫ Next: Peter Bengtsson: hashin 0.14.0 with --update-all and a bunch of other features

≪ Previous: Mike C. Fletcher: Lessons from Implementing from Scratch

PyCon 2019, testing, publishing packages on PyPI, and more body,#bodyTable,#bodyCell{ height:100% !important; margin:0; padding:0; width:100% !important; } table{ border-collapse:collapse; } img,a img{ border:0; outline:none; text-decoration:none; } h1,h2,h3,h4,h5,h6{ margin:0; padding:0; } p{ margin:1em 0; padding:0; } a{ word-wrap:break-word; } .mcnPreviewText{ display:none !important; } .ReadMsgBody{ width:100%; } .ExternalClass{ width:100%; } .ExternalClass,.ExternalClass p,.ExternalClass span,.ExternalClass font,.ExternalClass td,.ExternalClass div{ line-height:100%; } table,td{ mso-table-lspace:0pt; mso-table-rspace:0pt; } #outlook a{ padding:0; } img{ -ms-interpolation-mode:bicubic; } body,table,td,p,a,li,blockquote{ -ms-text-size-adjust:100%; -webkit-text-size-adjust:100%; } #bodyCell{ padding:0; } .mcnImage,.mcnRetinaImage{ vertical-align:bottom; } .mcnTextContent img{ height:auto !important; } body,#bodyTable{ background-color:#F2F2F2; } #bodyCell{ border-top:0; } h1{ color:#555 !important; display:block; font-family:Helvetica; font-size:40px; font-style:normal; font-weight:bold; line-height:125%; letter-spacing:-1px; margin:0; text-align:left; } h2{ color:#404040 !important; display:block; font-family:Helvetica; font-size:26px; font-style:normal; font-weight:bold; line-height:125%; letter-spacing:-.75px; margin:0; text-align:left; } h3{ color:#555 !important; display:block; font-family:Helvetica; font-size:18px; font-style:normal; font-weight:bold; line-height:125%; letter-spacing:-.5px; margin:0; text-align:left; } h4{ color:#808080 !important; display:block; font-family:Helvetica; font-size:16px; font-style:normal; font-weight:bold; line-height:125%; letter-spacing:normal; margin:0; text-align:left; } #templatePreheader{ background-color:#3399cc; border-top:0; border-bottom:0; } .preheaderContainer .mcnTextContent,.preheaderContainer .mcnTextContent p{ color:#ffffff; font-family:Helvetica; font-size:11px; line-height:125%; text-align:left; } .preheaderContainer .mcnTextContent a{ color:#ffffff; font-weight:normal; text-decoration:underline; } #templateHeader{ background-color:#FFFFFF; border-top:0; border-bottom:0; } .headerContainer .mcnTextContent,.headerContainer .mcnTextContent p{ color:#555; font-family:Helvetica; font-size:15px; line-height:150%; text-align:left; } .headerContainer .mcnTextContent a{ color:#6DC6DD; font-weight:normal; text-decoration:underline; } #templateBody{ background-color:#FFFFFF; border-top:0; border-bottom:0; } .bodyContainer .mcnTextContent,.bodyContainer .mcnTextContent p{ color:#555; font-size:16px; line-height:150%; text-align:left; margin: 0 0 1em 0; } .bodyContainer .mcnTextContent a{ color:#6DC6DD; font-weight:normal; text-decoration:underline; } #templateFooter{ background-color:#F2F2F2; border-top:0; border-bottom:0; } .footerContainer .mcnTextContent,.footerContainer .mcnTextContent p{ color:#555; font-family:Helvetica; font-size:11px; line-height:125%; text-align:left; } .footerContainer .mcnTextContent a{ color:#555; font-weight:normal; text-decoration:underline; } @media only screen and (max-width: 480px){ body,table,td,p,a,li,blockquote{ -webkit-text-size-adjust:none !important; } } @media only screen and (max-width: 480px){ body{ width:100% !important; min-width:100% !important; } } @media only screen and (max-width: 480px){ .mcnRetinaImage{ max-width:100% !important; } } @media only screen and (max-width: 480px){ table[class=mcnTextContentContainer]{ width:100% !important; } } @media only screen and (max-width: 480px){ .mcnBoxedTextContentContainer{ max-width:100% !important; min-width:100% !important; width:100% !important; } } @media only screen and (max-width: 480px){ table[class=mcpreview-image-uploader]{ width:100% !important; display:none !important; } } @media only screen and (max-width: 480px){ img[class=mcnImage]{ width:100% !important; } } @media only screen and (max-width: 480px){ table[class=mcnImageGroupContentContainer]{ width:100% !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageGroupContent]{ padding:9px !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageGroupBlockInner]{ padding-bottom:0 !important; padding-top:0 !important; } } @media only screen and (max-width: 480px){ tbody[class=mcnImageGroupBlockOuter]{ padding-bottom:9px !important; padding-top:9px !important; } } @media only screen and (max-width: 480px){ table[class=mcnCaptionTopContent],table[class=mcnCaptionBottomContent]{ width:100% !important; } } @media only screen and (max-width: 480px){ table[class=mcnCaptionLeftTextContentContainer],table[class=mcnCaptionRightTextContentContainer],table[class=mcnCaptionLeftImageContentContainer],table[class=mcnCaptionRightImageContentContainer],table[class=mcnImageCardLeftTextContentContainer],table[class=mcnImageCardRightTextContentContainer],.mcnImageCardLeftImageContentContainer,.mcnImageCardRightImageContentContainer{ width:100% !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageCardLeftImageContent],td[class=mcnImageCardRightImageContent]{ padding-right:18px !important; padding-left:18px !important; padding-bottom:0 !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageCardBottomImageContent]{ padding-bottom:9px !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageCardTopImageContent]{ padding-top:18px !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageCardLeftImageContent],td[class=mcnImageCardRightImageContent]{ padding-right:18px !important; padding-left:18px !important; padding-bottom:0 !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageCardBottomImageContent]{ padding-bottom:9px !important; } } @media only screen and (max-width: 480px){ td[class=mcnImageCardTopImageContent]{ padding-top:18px !important; } } @media only screen and (max-width: 480px){ table[class=mcnCaptionLeftContentOuter] td[class=mcnTextContent],table[class=mcnCaptionRightContentOuter] td[class=mcnTextContent]{ padding-top:9px !important; } } @media only screen and (max-width: 480px){ td[class=mcnCaptionBlockInner] table[class=mcnCaptionTopContent]:last-child td[class=mcnTextContent],.mcnImageCardTopImageContent,.mcnCaptionBottomContent:last-child .mcnCaptionBottomImageContent{ padding-top:18px !important; } } @media only screen and (max-width: 480px){ td[class=mcnBoxedTextContentColumn]{ padding-left:18px !important; padding-right:18px !important; } } @media only screen and (max-width: 480px){ td[class=mcnTextContent]{ padding-right:18px !important; padding-left:18px !important; } } @media only screen and (max-width: 480px){ table[class=templateContainer]{ max-width:600px !important; width:100% !important; } } @media only screen and (max-width: 480px){ h1{ font-size:24px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ h2{ font-size:20px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ h3{ font-size:18px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ h4{ font-size:16px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ table[class=mcnBoxedTextContentContainer] td[class=mcnTextContent],td[class=mcnBoxedTextContentContainer] td[class=mcnTextContent] p{ font-size:18px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ table[id=templatePreheader]{ display:block !important; } } @media only screen and (max-width: 480px){ td[class=preheaderContainer] td[class=mcnTextContent],td[class=preheaderContainer] td[class=mcnTextContent] p{ font-size:14px !important; line-height:115% !important; } } @media only screen and (max-width: 480px){ td[class=headerContainer] td[class=mcnTextContent],td[class=headerContainer] td[class=mcnTextContent] p{ font-size:18px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ td[class=bodyContainer] td[class=mcnTextContent],td[class=bodyContainer] td[class=mcnTextContent] p{ font-size:18px !important; line-height:125% !important; } } @media only screen and (max-width: 480px){ td[class=footerContainer] td[class=mcnTextContent],td[class=footerContainer] td[class=mcnTextContent] p{ font-size:14px !important; line-height:115% !important; } } @media only screen and (max-width: 480px){ td[class=footerContainer] a[class=utilityLink]{ display:block !important; } }

Registration for PyCon 2019 Is Now Open

#342 – NOVEMBER 13, 2018

VIEW IN BROWSER

Hi there!

Quick announcement—we now have a full issue archive going back all the way to 2012 for the newsletter.

So knock yourself out if you want to do some archeological digging. You can also comment on issues there, and full-text search is coming soon. Stay tuned!

— Dan Bader, Editor

P.S. Article & link submissions are also open again, so send us your favorite links and we’ll consider them for inclusion in next week’s issue.

Registration for PyCon 2019 Is Now Open
800 early bird tickets are availalbe at a discounted rate, so don’t wait too long. The direct link for the registration form is here.
PYCON CONFERENCE

Python Testing 101: Introduction to Testing
An introduction to high-level testing concepts and the first in a series of posts that details the author’s thought process for how they go about adding tests to a codebase. Chock-full of info and a great read, highly recommended.
ALY SIVJI

How to Publish a Python Package to PyPI
In this step-by-step tutorial, you’ll learn how to create a Python package for your project and how to publish it to PyPI, the Python Package Repository. Get up to speed on everything from naming your package to configuring it using setup.py.
REAL PYTHON

Find a Python Job Through Vettery

Vettery specializes in developer roles and is completely free for job seekers. Interested? Submit your profile, and if accepted onto the platform, you can receive interview requests directly from top companies seeking Python developers. Get started →
VETTERYsponsor

PyCoder’s Weekly Issue Archive
We now have a full issue archive going back all the way to 2012 so you can read previous issues of the PyCoder’s Weekly newsletter online. I’ve also added a comments feature. Full-text search is coming soon.
PYCODERS.COM

Python Qt Tutorial: Create a Simple GUI Chat Client
End-to-end tutorial for creating a chat client using Python for Windows, macOS or Linux. You’ll see how to install Qt for Python, write the client app, and build an installer for it. Python’s story in the GUI app space is getting better and this tutorial is visible proof.
FMAN.IO

What Any Developer Can Learn From the Best
Not Python-specific, but still a great post about the traits of effective developers and how to develop (ha!) those traits.
ERIC ELLIOT• Shared by Brian Okken (Python Bytes FM)

API Evolution the Right Way
Ten covenants that responsible Python library authors keep with their users. The PyCon Canada talk under the same name was great, and this post is a fantastic read too.
A. JESSE JIRYU DAVIS

Discussions

“Least Astonishment” and the Mutable Default Argument
Why something like def foo(a=[]): is usually a code smell, and a discussion of whether or not this is a design flaw in the Python language.
STACK OVERFLOW

A Spoonful of Advanced Python Per Day?
What to do when your Python skills plateau & a list of recommended resources.
REDDIT.COM

Python as an Excel Scripting Language (Microsoft Excel Team Suggestion Box)
Apparently this has gotten some traction again and the Excel team is considering it, or at least not ruling it out. Related discussion.
USERVOICE.COM

Python Jobs

Senior Software Engineer - Full Stack (Raleigh, NC)
SUGARCRM

Head of Engineering (Remote, Work from Anywhere)
FINDKEEP.LOVE

Senior Software Engineer (Los Angeles, CA)
GOODRX

Senior Developer (Chicago, IL)
PANOPTA

More Python Jobs >>>

Articles & Tutorials

Python Patterns Guide
Various Python programming patterns that Brandon covered in his talks and blog posts. Each pattern is explained in a detailed writeup.
BRANDON RHODES

Working Efficiently With Jupyter Notebooks
Several best practices and techniques recommended by the author that will help you to create notebooks which are focused, easy to comprehend, and easy to work with. Nice grab bag of tips!
FLORIAN WILHELM• Shared by Florian Wilhelm

Optimizing the Django Admin Paginator
Or: How to make Django admin fast for large tables where the paginator becomes the bottleneck.
HAKI BENITA

Introduction to Fountain Codes (Error Correction)
An intro to Luby Transform Code, an error correction algorithm belonging to the “fountain” code class, a type of corrector code that can generate an infinite number of packets to reconstitute data lost during transfers across different networks. In-depth tutorial with Python examples.
FRANÇOIS ANDRIEUX

“Ultimate Programmer Super Stack” Bundle [Expires Today]

The “super stack” is a coding book and courses bundle including Python Tricks: The Book, Mike Driscoll’s Python 201, and 25 other resources at a 95% discount. Become a well-rounded developer with these DRM-free coding courses and books for less than $2 each. This offer expires in a few hours today →
INFOSTACK.IOsponsor

Data Science With Python in Visual Studio Code
An overview of new features in VS Code that support common data science workflows. Namely exploring data within VS Code (just like you would with a Jupyter notebook) and turning notebooks into reproducible/”production-ready” Python code.
RONG LU & DAN TAYLOR (MICROSOFT)

Creating a Python 3 Culture at Facebook
Mike Kennedy interviews Jason Fried, who created a grassroots campaign to move Facebook’s massive Python 2 codebase to Python 3. Lots of lessons to be had here, especially in the light of the upcoming Python 2 EOL date.
TALK PYTHONpodcast

Turbocharging Python with Command Line Tools
How simple command-line tools can be an alternative to building a full-blown web app around your domain-specific code. Worth considering if your application is mainly used by developers and other CLI-savy folk.
NOAH GIFT

Analyze Podcast Transcripts With NLTK (Code Challenge)
In this two part challenge you’re going to do some natural language processing with Python on podcast transcript data.
PYBITES

How to Take a Random Sample of Rows From a Pandas DataFrame
How to use Pandas Sample to randomly select rows, setting the random seed, sampling by group, using weights, and more.
ERIK MARSJA

A Tour of Python Packaging
The current state of packaging a Python library (not a Python application). What tools to use and what to look out for.
NICK MAVRAKIS

Projects & Code

python-ls: dir() Replacement With Recursive Search Capabilities
A better version of Python’s built-in dir function with searching in mind. Learned about this at PyCon Canada and it is a super handy tool for working in the REPL or when debugging.
GITHUB.COM/GABRIELCNR• Shared by Aly Sivji

pampy: Python Pattern Matching Library
“Pattern Matching” in the sense of the syntactic construct used by languages like Haskell. That is, specifying patterns to which some data should conform, then checking to see if it does, and finally deconstructing the data according to those patterns.
GITHUB.COM/SANTINIC

Wikked: Plain-Text-Files/SCM-backed Wiki Engine
Wikked is a simple yet powerful wiki engine suitable for use as a personal electronic notebook, family digital blackboard, or wiki for a small team.
BOLT80.COM

waveglow: Generative Network for Speech Synthesis
A PyTorch implementation of WaveGlow, a flow-based generative network for speech synthesis.
GITHUB.COM/NPUICHIGO

Django 2.1.3 Bugfix Release
DJANGOPROJECT.COM

cursive_re: Readable Regular Expressions for Python 3.6 and Up
Similar to grimace.
GITHUB.COM/BOGDANP• Shared by Python Bytes FM

Starlette: ASGI (“Async WSGI”) Framework
A lightweight ASGI framework/toolkit meant for building high performance asyncio services.
STARLETTE.IO

PyDev IDE 7.0 Released
Lots of improvements here: mypy & black support, pipenv, faster debugger… PyDev is a free and open-source Python IDE for Eclipse.
PYDEV.BLOGSPOT.COM

Wily: CLI App for Tracking, Reporting on Complexity of Python Tests and Applications
PYPI.ORG

Events

DjangoCon US 2018 Videos
Talk recordings from this year’s DjangoCon.
YOUTUBE.COMvideo

PyCascades 2019 Ticket Sales Are Open
February 23rd–25th, 2019 in Seattle
PYCASCADES CONFERENCE

PyParis 2018
Nov 14 to 16 in Paris, France
PYTHON.ORG

Happy Pythoning!
Copyright © 2018 PyCoder’s Weekly, All rights reserved.

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

↧

Peter Bengtsson: hashin 0.14.0 with --update-all and a bunch of other features

November 13, 2018, 9:56 am

≫ Next: Mike Driscoll: Python 101 – Episode #33: The requests Package

≪ Previous: PyCoder’s Weekly: Issue #342 (Nov. 13, 2018)

If you don't know it is, hashin is a Python command line tool for updating your requirements file's packages and their hashes for use with pip install. It takes the pain out of figuring out what hashes each package on PyPI has. It also takes the pain out of figuring out what version you can upgrade to.

In the 0.14.0 release (changelog) there are a bunch of new features. The most exciting one is --update-all. Let's go through some of the new features:

Update all (`--update-all`)

Suppose you want to bravely upgrade all the pinned packages to the latest and greatest. Before version 0.14.0 you'd have to manually open the requirements file and list every single package on the command line:

$ less requirements.txt
$ hashin Django requests Flask cryptography black nltk numpy

With --update-all it's the same thing except it does that reading and copy-n-paste for you:

$ hashin --update-all

Particularly nifty is to combine this with --dry-run if you get nervous about that many changes.

Interactive update all (`--interactive`)

This new flag only makes sense when used together with --update-all. Used together, it basically reads all packages in the requirements file, and for each one that there is a new version it asks you if you want to update it or skip it:

It looks like this:

$ hashin --update-all --interactive
PACKAGE                        YOUR VERSION    NEW VERSION
Django                         2.1.2           2.1.3           ✓
requests                       2.20.0          2.20.1          ✘
numpy                          1.15.2          1.15.4          ?
Update? [Y/n/a/q/?]:

You can also use the aliases hashin -u -i to do the same thing.

Support for "extras"

If you want to have requests[security] or markus[datadog] in your requirements file, hashin used to not support that. This now works:

$ hashin "requests[security]"

Before, it would look for a package called verbatim requests[security] on PyPI which obviously doesn't exist. Now, it parses that syntax, makes a lookup for requests and when it's done it puts the extra syntax back into the requirements file.

Thanks Dustin Ingram for pushing for this one!

Atomic writes

Prior to this version, if you typed hashin requests flask numpy nltkay it would go ahead and do one of those packages at a time and effectively open and edit the requirements file as many times as there are packages mentioned. The crux of that is that if you, for example, have a typo (e.g. nltkay instead of nltk) it would crash there and not roll back any of the other writes. It's not a huge harm but it certainly is counter intuitive.

Another place where this matters is with --dry-run. If you specified something like hashin --dry-run requests flask numpy you would get one diff per package and thus repeat the diff header 3 (excessive) times.

The other reason why atomic writes is important is if you use hashin --update-all --interactive and it asks you if you want to update package1, package2, package3, and then you decide "Nah. I don't want any of this. I quit!" it would just do that without updating the requirements file.

Better not-found errors

This was never a problem if you used Python 2.7 but for Python 3.x, if you typoed a package name you'd get a Python exception about the HTTP call and it wasn't obvious that the mistake lies with your input and not the network. Basically, it traps any HTTP errors and if it's 404 it's handled gracefully.

(Internal) Black everything and pytest everything

All source code is now formatted with Black which, albeit imperfect, kills any boring manual review of code style nits. And, it uses therapist to wrap the black checks and fixes.

And all unit tests are now written for pytest. pytest was already the tool used in TravisCI but now all of those self.assertEqual(foo, bar)s have been replaced with assert foo == bar.

↧

Mike Driscoll: Python 101 – Episode #33: The requests Package

November 13, 2018, 10:05 pm

≫ Next: gamingdirectional: Detect boundary and respond to key press event in Pygame project

≪ Previous: Peter Bengtsson: hashin 0.14.0 with --update-all and a bunch of other features

In this screencast, I introduce the popular requests package, which is a replacement for Python’s urllib.

You can also read the chapter this video is based on here or get the book on Leanpub

↧

gamingdirectional: Detect boundary and respond to key press event in Pygame project

November 13, 2018, 11:27 pm

≫ Next: PyCharm: PyCharm 2018.2.5

≪ Previous: Mike Driscoll: Python 101 – Episode #33: The requests Package

Hello there, in today article we will look at two things in our new pygame project, 1) detect the boundary so the player object will stop at the boundary of the scene and not get passed it. 2) make the player object responds to key press event so it can go up, down, left and right whenever we have pressed the correct key on our keyboard. After this lesson we will look at another key press event in...

Source

↧

PyCharm: PyCharm 2018.2.5

November 14, 2018, 2:32 am

≫ Next: Real Python: Python Community Interview With Kenneth Reitz

≪ Previous: gamingdirectional: Detect boundary and respond to key press event in Pygame project

PyCharm 2018.3 is almost ready for release, however, first we’d like to release some important fixes for PyCharm 2018.2. You can update to 2018.2.5 now: either on our website, within PyCharm (Help | Check for Updates), or using JetBrains ToolBox.

New in 2018.2.5

An issue that causes PyCharm to crash on Ubuntu 16.04 has been resolved
Matplotlib 3.0.0 can now be imported in the Python Console
Python code now folds correctly after it’s minimized with Ctrl+Shift+Numpad – (Cmd+Shift+- on macOS)
And further fixes, see the release notes for more information

Interested?

Download PyCharm 2018.2.5

↧

Real Python: Python Community Interview With Kenneth Reitz

November 14, 2018, 6:00 am

≫ Next: Eli Bendersky: Type inference

≪ Previous: PyCharm: PyCharm 2018.2.5

This week, I’m excited to be interviewing the prolific Kenneth Reitz!

Kenneth is the author of the extremely popular requests and pipenv libraries. Join us as we talk about his latest projects and the most challenging code he’s written to date.

Ricky:Let’s start at the beginning… How did you get into programming, and when did you start using Python?

Kenneth Reitz

Kenneth: I started programming at a young age. My dad was a programmer, and I taught myself BASIC and C (with this help) at the age of 9. I started using Python in college, when I took my first CS course. Shortly after, I dropped out and learned many other programming languages, but I always kept coming back to Python.

Ricky:Congratulations on your new job with Digital Ocean. You’re the senior member of the Developer Relations team. How are you finding the change in role from your previous job at Heroku, and what can we expect from Digital Ocean moving forward in the Python space?

Kenneth: Thanks! I’m really enjoying the new role and the opportunity to serve the entire development community, not just the Python community. However, my latest work, Responder, has been a Digital Ocean project, so there’s room to expect more from us in the Python space 😊

Ricky:You are, of course, most famous for writing the extremely popular requests library and the new pipenv library. Python.org now recommends the use of pipenv for dependency management. How has the community received pipenv? Have you seen much resistance from the community with developers preferring to stick to venv or older methods of dependency management?

Kenneth: The community has received pipenv very well, and even companies like GitHub are using its standards for security vulnerability scanning. I haven’t seen much resistance from the community at all, aside from some hatred on reddit. It took me a while to realize that /r/python doesn’t represent the Python community as much as it represents redditors who use Python.

Ricky:Now hitting 300 million downloads on your requests library is cool and all, but as a guitarist, what I’m more excited about is your latest project PyTheory. Can you tell us a little about it and your plans for the project going forward?

Kenneth: PyTheory is a very interesting library that attempts to encapsulate all known musical systems into a library. Currently, there is one system: Western. It can render all the different scales for the Western system, programmatically, and tell you the pitches of the notes (either in decimal or symbolic notation). In addition, there are fretboards and chord charts, so you can specify a custom tuning for your guitar, and generate chord charts with it. It’s very abstract.

Definitely the most challenging thing I’ve ever written.

Ricky:So after recently ditching your Mac for a PC, and turning to Microsoft’s (amazing) VS Code for your Python development, are you happy and proud to be a Windows user? For those who may be reading who haven’t used Windows since Windows ‘95, what are they missing?

Kenneth: I love the Mac and prefer it to Windows. I’m just currently bored with my setup and decided to challenge myself by running Windows. I’m very happy and productive, though. Feels right at home.

Windows isn’t what it used to be. It’s a real solid piece of operating system now. I’m running it on my iMac Pro, and it hums like a dream.

Ricky:I know you’re a keen photographer. How long have you been into it, and what’s your favorite photo you’ve taken? Do you have any other hobbies and interests, aside from Python?

Kenneth: I’ve been into photography seriously for about 10 years. My favorite photo I’ve ever taken is probably this one. It was taken on a film camera, after I had a migraine for several weeks, and it was my first time being able to walk outside.

Ricky:Finally, any parting words of wisdom? Anything you’d like to share and/or plug?

Kenneth: Responder! My new web service framework for Python. It’s ASGI, it’s familiar looking, it’s fast, and it’s easier to use than Flask! Great for building APIs. Check it out!

Thank you, Kenneth, for joining me this week. It’s been great to watch the development of Responder happen in realtime on Twitter. You can follow its development or raise an issue here.

As always, if there is someone you would like me to interview in the future, reach out to me in the comments below, or send me a message on Twitter.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Eli Bendersky: Type inference

November 14, 2018, 6:16 am

≫ Next: PyPy Development: Guest Post: Implementing a Calculator REPL in RPython

≪ Previous: Real Python: Python Community Interview With Kenneth Reitz

Type inference is a major feature of several programming languages, most notably languages from the ML family like Haskell. In this post I want to provide a brief overview of type inference, along with a simple Python implementation for a toy ML-like language.

Uni-directional type inference

While static typing is very useful, one of its potential downsides is verbosity. The programmer has to annotate values with types throughout the code, which results in more effort and clutter. What's really annoying, though, is that in many cases these annotations feel superfluous. Consider this classical C++ example from pre-C++11 times:

std::vector<Blob*>blobs;std::vector<Blob*>::iteratoriter=blobs.begin();

Clearly when the compiler sees blobs.begin(), it knows the type of blobs, so it also knows the type of the begin() method invoked on it because it is familiar with the declaration of begin. Why should the programmer be burdened with spelling out the type of the iterator? Indeed, one of the most welcome changes in C++11 was lifting this burden by repurposing auto for basic type inference:

std::vector<Blob*>blobs;autoiter=blobs.begin();

Go has a similar capability with the := syntax. Given some function:

funcparseThing(...)(Node,error){}

We can simply write:

node,err:=parseThing(...)

Without having to explicitly declare that node has type Node and err has type error.

These features are certainly useful, and they involve some degree of type inference from the compiler. Some functional programming proponents say this is not real type inference, but I think the difference is just a matter of degree. There's certainly some inference going on here, with the compiler calculating and assigning the right types for expressions without the programmer's help. Since this calculation flows in one direction (from the declaration of the vector::begin method to the auto assignment), I'll call it uni-directional type inference.

Bi-directional type inference (Hindley-Milner)

If we define a new map function in Haskell to map a function over a list, we can do it as follows:

mymapf[]=[]mymapf(first:rest)=ffirst:mymapfrest

Note that we did not specify the types for either the arguments of mymap, or its return value. The Haskell compiler can infer them on its own, using the definition provided:

> :t Main.mymap
Main.mymap :: (t1 -> t) -> [t1] -> [t]

The compiler has determined that the first argument of mymap is a generic function, assigning its argument the type t1 and its return value the type t. The second argument of mymap has the type [t1], which means "list of t1"; then the return value of mymap has the type "list of t". How was this accomplished?

Let's start with the second argument. From the [] = [] variant, and also from the (first:rest) deconstruction, the compiler infers it has a list type. But there's nothing else in the code constraining the element type, so the compiler chooses a generic type specifier - t1. f first applies f to an element of this list, so f has to take t1; nothing constrains its return value type, so it gets the generic t. The result is f has type (t1 -> t), which in Haskell parlance means "a function from t1 to t".

Here is another example, written in a toy language I put together for the sake of this post. The language is called microml, and its implementation is described at the end of the post:

foo f g x = if f(x == 1) then g(x) else 20

Here foo is declared as a function with three arguments. What is its type? Let's try to run type inference manually. First, note that the body of the function consists of an if expresssion. As is common in programming languages, this one has some strict typing rules in microml; namely, the type of the condition is boolean (Bool), and the types of the then and else clauses must match.

So we know that f(x == 1) has to return a Bool. Moreover, since x is compared to an integer, x is an Int. What is the type of g? Well, it has an Int argument, and it return value must match the type of the else clause, which is an Int as well.

To summarize:

The type of x is Int
The type of f is Bool -> Bool
The type of g is Int -> Int

So the overall type of foo is:

((Bool -> Bool), (Int -> Int), Int) -> Int

It takes three arguments, the types of which we have determined, and returns an Int.

Note how this type inference process is not just going in one direction, but seems to be "jumping around" the body of the function figuring out known types due to typing rules. This is why I call it bi-directional type inference, but it's much better known as Hindley-Milner type inference, since it was independently discovered by Roger Hindley in 1969 and Robin Milner in 1978.

How Hindley-Milner type inference works

We've seen a couple of examples of manually running type inference on some code above. Now let's see how to translate it to an implementable algorithm. I'm going to present the process in several separate stages, for simplicity. Some other presentations of the algorithm combine several of these stages, but seeing them separately is more educational, IMHO.

The stages are:

Assign symbolic type names (like t1, t2, ...) to all subexpressions.
Using the language's typing rules, write a list of type equations (or constraints) in terms of these type names.
Solve the list of type equations using unification.

Let's use this example again:

foo f g x = if f(x == 1) then g(x) else 20

Starting with stage 1, we'll list all subexpressions in this declaration (starting with the declaration itself) and assign unique type names to them:

foo                                       t0
f                                         t1
g                                         t2
x                                         t3
if f(x == 1) then g(x) else 20            t4
f(x == 1)                                 t5
x == 1                                    t6
x                                         t3
g(x)                                      t7
20                                        Int

Note that every subexpression gets a type, and we de-duplicate them (e.g. x is encountered twice and gets the same type name assigned). Constant nodes get known types.

In stage 2, we'll use the language's typing rules to write down equations involving these type names. Usually books and papers use sligthly scary formal notation for typing rules; for example, for if:

\[\frac{\Gamma \vdash e_0 : Bool, \Gamma \vdash e_1 : T, \Gamma \vdash e_2 : T}{\Gamma \vdash if\: e_0\: then\: e_1\: else\: e_2 : T}\]

All this means is the intuitive typing of if we've described above: the condition is expected to be boolean, and the types of the then and else clauses are expected to match, and their type becomes the type of the whole expression.

To unravel the notation, prepend "given that" to the expression above the line and "we can derive" to the expression below the line; \Gamma \vdash e_0 : Bool means that e_0 is typed to Bool in the set of typing assumptions called \Gamma.

Similarly, a typing rule for single-argument function application would be:

\[\frac{\Gamma \vdash e_0 : T, \Gamma \vdash f : T \rightarrow U}{\Gamma \vdash f(e_0) : U}\]

The real trick of type inference is running these typing rules in reverse. The rule tells us how to assign types to the whole expression given its constituent types, but we can also use it as an equation that works both ways and lets us infer constituent types from the whole expression's type.

Let's see what equations we can come up with, looking at the code:

From f(x == 1) we infer t1 = (t6 -> t5), because t1 is the type of f, t6 is the type of x == 1, and t5 is the type of f(x == 1). Note that we're using the typing rules for function application here. Moreover, we can infer that t3 is Int and t6 is Bool because of the typing rule of the == operator.

Similarly, from g(x) we infer t2 = (t3 -> t7).

From the if expression, we infer that t6 is Bool (since it's the condition of the if) and that t4 = Int, because the then and else clauses must match.

Now we have a list of equations, and our task is to find the most general solution, treating the equations as constraints. This is done by using the unification algorithm which I described in detail in the previous post. The solution we're seeking here is precisely the most general unifier.

For our expression, the algorithm will find the type of foo to be:

((Bool -> Bool), (Int -> Int), Int) -> Int)

As expected.

If we make a slight modification to the expression to remove the comparison of x with 1:

foo f g x = if f(x) then g(x) else 20

Then we can no longer constrain the type of x, since all we know about it is that it's passed into functions f and g, and nothing else constrains the arguments of these functions. The type inference process will thus calculate this type for foo:

((a -> Bool), (a -> Int), a) -> Int

It assigns x the generic type name a, and uses it for the arguments of f and g as well.

The implementation

An implementation of microml is available here, as a self-contained Python program that parses a microml declaration and infers its type. The best starting point is main.py, which spells out the stages of type inference:

code='foo f g x = if f(x == 1) then g(x) else 20'print('Code','----',code,'',sep='\n')# Parse the microml code snippet into an AST.p=parser.Parser()e=p.parse_decl(code)print('Parsed AST','----',e,'',sep='\n')# Stage 1: Assign symbolic typenamestyping.assign_typenames(e.expr)print('Typename assignment','----',typing.show_type_assignment(e.expr),'',sep='\n')# Stage 2: Generate a list of type equationsequations=[]typing.generate_equations(e.expr,equations)print('Equations','----',sep='\n')foreqinequations:print('{:15} {:20} | {}'.format(str(eq.left),str(eq.right),eq.orig_node))# Stage 3: Solve equations using unificationunifier=typing.unify_all_equations(equations)print('','Inferred type','----',typing.get_expression_type(e.expr,unifier,rename_types=True),sep='\n')

This will print out:

Code
----
foo f g x = if f(x == 1) then g(x) else 20

Parsed AST
----
Decl(foo, Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20)))

Typename assignment
----
Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20))   t0
If(App(f, [(x == 1)]), App(g, [x]), 20)                      t4
App(f, [(x == 1)])                                           t5
f                                                            t1
(x == 1)                                                     t6
x                                                            t3
1                                                            Int
App(g, [x])                                                  t7
g                                                            t2
x                                                            t3
20                                                           Int

Equations
----
Int             Int                  | 1
t3              Int                  | (x == 1)
Int             Int                  | (x == 1)
t6              Bool                 | (x == 1)
t1              (t6 -> t5)           | App(f, [(x == 1)])
t2              (t3 -> t7)           | App(g, [x])
Int             Int                  | 20
t5              Bool                 | If(App(f, [(x == 1)]), App(g, [x]), 20)
t4              t7                   | If(App(f, [(x == 1)]), App(g, [x]), 20)
t4              Int                  | If(App(f, [(x == 1)]), App(g, [x]), 20)
t0              ((t1, t2, t3) -> t4) | Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20))

Inferred type
----
(((Bool -> Bool), (Int -> Int), Int) -> Int)

There are many more examples of type-inferred microml code snippets in the test file test_typing.py. Here's another example which is interesting:

> foo f x = if x then lambda t -> f(t) else lambda j -> f(x)
((Bool -> a), Bool) -> (Bool -> a)

The actual inference is implemented in typing.py, which is fairly well commented and should be easy to understand after reading this post. The trickiest part is probably the unification algorithm, but that one is just a slight adaptation of the algorithm presented in the previous post.

↧

PyPy Development: Guest Post: Implementing a Calculator REPL in RPython

November 15, 2018, 1:06 am

≫ Next: gamingdirectional: Create player missile manager and player missile class in Pygame

≪ Previous: Eli Bendersky: Type inference

This is a tutorial style post that walks through using the RPython translation toolchain to create a REPL that executes basic math expressions.

We will do that by scanning the user's input into tokens, compiling those tokens into bytecode and running that bytecode in our own virtual machine. Don't worry if that sounds horribly complicated, we are going to explain it step by step.

This post is a bit of a diversion while on my journey to create a compliant lox implementation using the RPython translation toolchain. The majority of this work is a direct RPython translation of the low level C guide from Bob Nystrom (@munificentbob) in the excellent book craftinginterpreters.com specifically the chapters 14 – 17.

The road ahead

As this post is rather long I'll break it into a few major sections. In each section we will have something that translates with RPython, and at the end it all comes together.

A REPL

So if you're a Python programmer you might be thinking this is pretty trivial right?

I mean if we ignore input errors, injection attacks etc couldn't we just do something like this:

"""
A pure python REPL that can parse simple math expressions
"""
while True:
    print(eval(raw_input("> ")))

Well it does appear to do the trick:

$ python2 section-1-repl/main.py
> 3 + 4 * ((1.0/(2 * 3 * 4)) + (1.0/(4 * 5 * 6)) - (1.0/(6 * 7 * 8)))
3.1880952381

So can we just ask RPython to translate this into a binary that runs magically faster?

Let's see what happens. We need to add two functions for RPython to get its bearings (entry_point and target) and call the file targetXXX:

targetrepl1.py

def repl():
    while True:
        print eval(raw_input('> '))


def entry_point(argv):
    repl()
    return 0


def target(driver, *args):
    return entry_point, None

Which at translation time gives us this admonishment that accurately tells us we are trying to call a Python built-in raw_input that is unfortunately not valid RPython.

$ rpython ./section-1-repl/targetrepl1.py
...SNIP...
[translation:ERROR] AnnotatorError: 

object with a __call__ is not RPython: <built-in function raw_input>
Processing block:
 block@18 is a <class 'rpython.flowspace.flowcontext.SpamBlock'> 
 in (target1:2)repl 
 containing the following operations: 
       v0 = simple_call((builtin_function raw_input), ('> ')) 
       v1 = simple_call((builtin_function eval), v0) 
       v2 = str(v1) 
       v3 = simple_call((function rpython_print_item), v2) 
       v4 = simple_call((function rpython_print_newline))

Ok so we can't use raw_input or eval but that doesn't faze us. Let's get the input from a stdin stream and just print it out (no evaluation).

targetrepl2.py

from rpython.rlib import rfile

LINE_BUFFER_LENGTH = 1024


def repl(stdin):
    while True:
        print "> ",
        line = stdin.readline(LINE_BUFFER_LENGTH)
        print line


def entry_point(argv):
    stdin, stdout, stderr = rfile.create_stdio()
    try:
        repl(stdin)
    except:
        return 0


def target(driver, *args):
    return entry_point, None

Translate targetrepl2.py– we can add an optimization level if we are so inclined:

$ rpython --opt=2 section-1-repl/targetrepl2.py
...SNIP...
[Timer] Timings:
[Timer] annotate                       ---  1.2 s
[Timer] rtype_lltype                   ---  0.9 s
[Timer] backendopt_lltype              ---  0.6 s
[Timer] stackcheckinsertion_lltype     ---  0.0 s
[Timer] database_c                     --- 15.0 s
[Timer] source_c                       ---  1.6 s
[Timer] compile_c                      ---  1.9 s
[Timer] =========================================
[Timer] Total:                         --- 21.2 s

No errors!? Let's try it out:

$ ./target2-c 
1 + 2
>  1 + 2

^C

Ahh our first success – let's quickly deal with the flushing fail by using the stdout stream directly as well. Let's print out the input in quotes:

from rpython.rlib import rfile

LINE_BUFFER_LENGTH = 1024


def repl(stdin, stdout):
    while True:
        stdout.write("> ")
        line = stdin.readline(LINE_BUFFER_LENGTH)
        print '"%s"' % line.strip()


def entry_point(argv):
    stdin, stdout, stderr = rfile.create_stdio()
    try:
        repl(stdin, stdout)
    except:
        pass
    return 0


def target(driver, *args):
    return entry_point, None

Translation works, and the test run too:

$ ./target3-c 
> hello this seems better
"hello this seems better"> ^C

So we are in a good place with taking user input and printing output... What about the whole math evaluation thing we were promised? For that we are can probably leave our RPython REPL behind for a while and connect it up at the end.

A virtual machine

A virtual machine is the execution engine of our basic math interpreter. It will be very simple, only able to do simple tasks like addition. I won't go into any depth to describe why we want a virtual machine, but it is worth noting that many languages including Java and Python make this decision to compile to an intermediate bytecode representation and then execute that with a virtual machine. Alternatives are compiling directly to native machine code like (earlier versions of) the V8 JavaScript engine, or at the other end of the spectrum executing an abstract syntax tree – which is what the Truffle approach to building VMs is based on.

We are going to keep things very simple. We will have a stack where we can push and pop values, we will only support floats, and our VM will only implement a few very basic operations.

OpCodes

In fact our entire instruction set is:

OP_CONSTANT
OP_RETURN
OP_NEGATE
OP_ADD
OP_SUBTRACT
OP_MULTIPLY
OP_DIVIDE

Since we are targeting RPython we can't use the nice enum module from the Python standard library, so instead we just define a simple class with class attributes.

We should start to get organized, so we will create a new file opcodes.py and add this:

class OpCode:
    OP_CONSTANT = 0
    OP_RETURN = 1
    OP_NEGATE = 2
    OP_ADD = 3
    OP_SUBTRACT = 4
    OP_MULTIPLY = 5
    OP_DIVIDE = 6

Chunks

To start with we need to get some infrastructure in place before we write the VM engine.

Following craftinginterpreters.com we start with a Chunk object which will represent our bytecode. In RPython we have access to Python-esq lists so our code object will just be a list of OpCode values – which are just integers. A list of ints, couldn't get much simpler.

section-2-vm/chunk.py

class Chunk:
    code = None

    def __init__(self):
        self.code = []

    def write_chunk(self, byte):
        self.code.append(byte)

    def disassemble(self, name):
        print "== %s ==\n" % name
        i = 0
        while i < len(self.code):
            i = disassemble_instruction(self, i)

From here on I'll only present minimal snippets of code instead of the whole lot, but I'll link to the repository with the complete example code. For example the various debugging including disassemble_instruction isn't particularly interesting to include verbatim. See the github repo for full details

We need to check that we can create a chunk and disassemble it. The quickest way to do this is to use Python during development and debugging then every so often try to translate it.

Getting the disassemble part through the RPython translator was a hurdle for me as I quickly found that many str methods such as format are not supported, and only very basic % based formatting is supported. I ended up creating helper functions for string manipulation such as:

def leftpad_string(string, width, char=""):
    l = len(string)
    if l > width:
        return string
    return char * (width - l) + string

Let's write a new entry_point that creates and disassembles a chunk of bytecode. We can set the target output name to vm1 at the same time:

targetvm1.py

def entry_point(argv):
    bytecode = Chunk()
    bytecode.write_chunk(OpCode.OP_ADD)
    bytecode.write_chunk(OpCode.OP_RETURN)
    bytecode.disassemble("hello world")
    return 0

def target(driver, *args):
    driver.exe_name = "vm1"
    return entry_point, None

Running this isn't going to be terribly interesting, but it is always nice to know that it is doing what you expect:

$ ./vm1 
== hello world ==

0000 OP_ADD       
0001 OP_RETURN

Chunks of data

Ref: http://www.craftinginterpreters.com/chunks-of-bytecode.html#constants

So our bytecode is missing a very crucial element – the values to operate on!

As with the bytecode we can store these constant values as part of the chunk directly in a list. Each chunk will therefore have a constant data component, and a code component.

Edit the chunk.py file and add the new instance attribute constants as an empty list, and a new method add_constant.

    def add_constant(self, value):
        self.constants.append(value)
        return len(self.constants) - 1

Now to use this new capability we can modify our example chunk to write in some constants before the OP_ADD:

    bytecode = Chunk()
    constant = bytecode.add_constant(1.0)
    bytecode.write_chunk(OpCode.OP_CONSTANT)
    bytecode.write_chunk(constant)

    constant = bytecode.add_constant(2.0)
    bytecode.write_chunk(OpCode.OP_CONSTANT)
    bytecode.write_chunk(constant)

    bytecode.write_chunk(OpCode.OP_ADD)
    bytecode.write_chunk(OpCode.OP_RETURN)

    bytecode.disassemble("adding constants")

Which still translates with RPython and when run gives us the following disassembled bytecode:

== adding constants ==

0000 OP_CONSTANT  (00)        '1'
0002 OP_CONSTANT  (01)        '2'
0004 OP_ADD       
0005 OP_RETURN

We won't go down the route of serializing the bytecode to disk, but this bytecode chunk (including the constant data) could be saved and executed on our VM later – like a Java .class file. Instead we will pass the bytecode directly to our VM after we've created it during the compilation process.

Emulation

So those four instructions of bytecode combined with the constant value mapping 00 -> 1.0 and 01 -> 2.0 describes individual steps for our virtual machine to execute. One major point in favor of defining our own bytecode is we can design it to be really simple to execute – this makes the VM really easy to implement.

As I mentioned earlier this virtual machine will have a stack, so let's begin with that. Now the stack is going to be a busy little beast – as our VM takes instructions like OP_ADD it will pop off the top two values from the stack, and push the result of adding them together back onto the stack. Although dynamically resizing Python lists are marvelous, they can be a little slow. RPython can take advantage of a constant sized list which doesn't make our code much more complicated.

To do this we will define a constant sized list and track the stack_top directly. Note how we can give the RPython translator hints by adding assertions about the state that the stack_top will be in.

class VM(object):
    STACK_MAX_SIZE = 256
    stack = None
    stack_top = 0

    def __init__(self):
        self._reset_stack()

    def _reset_stack(self):
        self.stack = [0] * self.STACK_MAX_SIZE
        self.stack_top = 0

    def _stack_push(self, value):
        assert self.stack_top < self.STACK_MAX_SIZE
        self.stack[self.stack_top] = value
        self.stack_top += 1

    def _stack_pop(self):
        assert self.stack_top >= 0
        self.stack_top -= 1
        return self.stack[self.stack_top]

    def _print_stack(self):
        print "",
        if self.stack_top <= 0:
            print "[]",
        else:
            for i in range(self.stack_top):
                print "[ %s ]" % self.stack[i],
        print

Now we get to the main event, the hot loop, the VM engine. Hope I haven't built it up to much, it is actually really simple! We loop until the instructions tell us to stop (OP_RETURN), and dispatch to other simple methods based on the instruction.

    def _run(self):
        while True:
            instruction = self._read_byte()

            if instruction == OpCode.OP_RETURN:
                print "%s" % self._stack_pop()
                return InterpretResultCode.INTERPRET_OK
            elif instruction == OpCode.OP_CONSTANT:
                constant = self._read_constant()
                self._stack_push(constant)
            elif instruction == OpCode.OP_ADD:
                self._binary_op(self._stack_add)

Now the _read_byte method will have to keep track of which instruction we are up to. So add an instruction pointer (ip) to the VM with an initial value of 0. Then _read_byte is simply getting the next bytecode (int) from the chunk's code:

    def _read_byte(self):
        instruction = self.chunk.code[self.ip]
        self.ip += 1
        return instruction

If the instruction is OP_CONSTANT we take the constant's address from the next byte of the chunk's code, retrieve that constant value and add it to the VM's stack.

    def _read_constant(self):
        constant_index = self._read_byte()
        return self.chunk.constants[constant_index]

Finally our first arithmetic operation OP_ADD, what it has to achieve doesn't require much explanation: pop two values from the stack, add them together, push the result. But since a few operations all have the same template we introduce a layer of indirection – or abstraction – by introducing a reusable _binary_op helper method.

    @specialize.arg(1)
    def _binary_op(self, operator):
        op2 = self._stack_pop()
        op1 = self._stack_pop()
        result = operator(op1, op2)
        self._stack_push(result)

    @staticmethod
    def _stack_add(op1, op2):
        return op1 + op2

Note we tell RPython to specialize _binary_op on the first argument. This causes RPython to make a copy of _binary_op for every value of the first argument passed, which means that each copy contains a call to a particular operator, which can then be inlined.

To be able to run our bytecode the only thing left to do is to pass in the chunk and call _run():

    def interpret_chunk(self, chunk):
        if self.debug_trace:
            print "== VM TRACE =="
        self.chunk = chunk
        self.ip = 0
        try:
            result = self._run()
            return result
        except:
            return InterpretResultCode.INTERPRET_RUNTIME_ERROR

targetvm3.py connects the pieces:

def entry_point(argv):
    bytecode = Chunk()
    constant = bytecode.add_constant(1)
    bytecode.write_chunk(OpCode.OP_CONSTANT)
    bytecode.write_chunk(constant)
    constant = bytecode.add_constant(2)
    bytecode.write_chunk(OpCode.OP_CONSTANT)
    bytecode.write_chunk(constant)
    bytecode.write_chunk(OpCode.OP_ADD)
    bytecode.write_chunk(OpCode.OP_RETURN)

    vm = VM()
    vm.interpret_chunk(bytecode)

    return 0

I've added some trace debugging so we can see what the VM and stack is doing.

The whole thing translates with RPython, and when run gives us:

./vm3
== VM TRACE ==
          []
0000 OP_CONSTANT  (00)        '1'
          [ 1 ]
0002 OP_CONSTANT  (01)        '2'
          [ 1 ] [ 2 ]
0004 OP_ADD       
          [ 3 ]
0005 OP_RETURN    
3

Yes we just computed the result of 1+2. Pat yourself on the back.

At this point it is probably valid to check that the translated executable is actually faster than running our program directly in Python. For this trivial example under Python2/pypy this targetvm3.py file runs in the 20ms – 90ms region, and the compiled vm3 runs in <5ms. Something useful must be happening during the translation.

I won't go through the code adding support for our other instructions as they are very similar and straightforward. Our VM is ready to execute our chunks of bytecode, but we haven't yet worked out how to take the entered expression and turn that into this simple bytecode. This is broken into two steps, scanning and compiling.

Scanning the source

All the source for this section can be found in section-3-scanning.

The job of the scanner is to take the raw expression string and transform it into a sequence of tokens. This scanning step will strip out whitespace and comments, catch errors with invalid token and tokenize the string. For example the input "( 1 + 2 ) would get tokenized into LEFT_PAREN, NUMBER(1), PLUS, NUMBER(2), RIGHT_PAREN.

As with our OpCodes we will just define a simple Python class to define an int for each type of token:

class TokenTypes:
    ERROR = 0
    EOF = 1
    LEFT_PAREN = 2
    RIGHT_PAREN = 3
    MINUS = 4
    PLUS = 5
    SLASH = 6
    STAR = 7
    NUMBER = 8

A token has to keep some other information as well – keeping track of the location and length of the token will be helpful for error reporting. The NUMBER token clearly needs some data about the value it is representing: we could include a copy of the source lexeme (e.g. the string 2.0), or parse the value and store that, or – what we will do in this blog – use the location and length information as pointers into the original source string. Every token type (except perhaps ERROR) will use this simple data structure:

class Token(object):

    def __init__(self, start, length, token_type):
        self.start = start
        self.length = length
        self.type = token_type

Our soon to be created scanner will create these Token objects which refer back to addresses in some source. If the scanner sees the source "( 1 + 2.0 )" it would emit the following tokens:

Token(0, 1, TokenTypes.LEFT_PAREN)
Token(2, 1, TokenTypes.NUMBER)
Token(4, 1, TokenTypes.PLUS)
Token(6, 3, TokenTypes.NUMBER)
Token(10, 1, TokenTypes.RIGHT_PAREN)

Scanner

Let's walk through the scanner implementation method by method. The scanner will take the source and pass through it once, creating tokens as it goes.

class Scanner(object):

    def __init__(self, source):
        self.source = source
        self.start = 0
        self.current = 0

The start and current variables are character indices in the source string that point to the current substring being considered as a token.

For example in the string "(51.05+2)" while we are tokenizing the number 51.05 we will have start pointing at the 5, and advance current character by character until the character is no longer part of a number. Midway through scanning the number the start and current values might point to 1 and 4 respectively:

0	1	2	3	4	5	6	7	8
"("	"5"	"1"	"."	"0"	"5"	"+"	"2"	")"
	^			^

From current=4 the scanner peeks ahead and sees that the next character (5) is a digit, so will continue to advance.

0	1	2	3	4	5	6	7	8
"("	"5"	"1"	"."	"0"	"5"	"+"	"2"	")"
	^				^

When the scanner peeks ahead and sees the "+" it will create the number token and emit it. The method that carry's out this tokenizing is _number:

    def _number(self):
        while self._peek().isdigit():
            self.advance()

        # Look for decimal point
        if self._peek() == '.' and self._peek_next().isdigit():
            self.advance()
            while self._peek().isdigit():
                self.advance()

        return self._make_token(TokenTypes.NUMBER)

It relies on a few helpers to look ahead at the upcoming characters:

    def _peek(self):
        if self._is_at_end():
            return '\0'
        return self.source[self.current]

    def _peek_next(self):
        if self._is_at_end():
            return '\0'
        return self.source[self.current+1]

    def _is_at_end(self):
        return len(self.source) == self.current

If the character at current is still part of the number we want to call advance to move on by one character.

    def advance(self):
        self.current += 1
        return self.source[self.current - 1]

Once the isdigit() check fails in _number() we call _make_token() to emit the token with the NUMBER type.

    def _make_token(self, token_type):
        return Token(
            start=self.start,
            length=(self.current - self.start),
            token_type=token_type
        )

Note again that the token is linked to an index address in the source, rather than including the string value.

Our scanner is pull based, a token will be requested via scan_token. First we skip past whitespace and depending on the characters emit the correct token:

    def scan_token(self):
        # skip any whitespace
        while True:
            char = self._peek()
            if char in ' \r\t\n':
                self.advance()
            break

        self.start = self.current

        if self._is_at_end():
            return self._make_token(TokenTypes.EOF)

        char = self.advance()

        if char.isdigit():
            return self._number()

        if char == '(':
            return self._make_token(TokenTypes.LEFT_PAREN)
        if char == ')':
            return self._make_token(TokenTypes.RIGHT_PAREN)
        if char == '-':
            return self._make_token(TokenTypes.MINUS)
        if char == '+':
            return self._make_token(TokenTypes.PLUS)
        if char == '/':
            return self._make_token(TokenTypes.SLASH)
        if char == '*':
            return self._make_token(TokenTypes.STAR)

        return ErrorToken("Unexpected character", self.current)

If this was a real programming language we were scanning, this would be the point where we add support for different types of literals and any language identifiers/reserved words.

At some point we will need to parse the literal value for our numbers, but we leave that job for some later component, for now we'll just add a get_token_string helper. To make sure that RPython is happy to index arbitrary slices of source we add range assertions:

    def get_token_string(self, token):
        if isinstance(token, ErrorToken):
            return token.message
        else:
            end_loc = token.start + token.length
            assert end_loc < len(self.source)
            assert end_loc > 0
            return self.source[token.start:end_loc]

A simple entry point can be used to test our scanner with a hard coded source string:

targetscanner1.py

from scanner import Scanner, TokenTypes, TokenTypeToName


def entry_point(argv):

    source = "(   1   + 2.0 )"

    scanner = Scanner(source)
    t = scanner.scan_token()
    while t.type != TokenTypes.EOF and t.type != TokenTypes.ERROR:
        print TokenTypeToName[t.type],
        if t.type == TokenTypes.NUMBER:
            print "(%s)" % scanner.get_token_string(t),
        print
        t = scanner.scan_token()
    return 0

RPython didn't complain, and lo it works:

$ ./scanner1 
LEFT_PAREN
NUMBER (1)
PLUS
NUMBER (2.0)
RIGHT_PAREN

Let's connect our REPL to the scanner.

targetscanner2.py

from rpython.rlib import rfile
from scanner import Scanner, TokenTypes, TokenTypeToName

LINE_BUFFER_LENGTH = 1024


def repl(stdin, stdout):
    while True:
        stdout.write("> ")
        source = stdin.readline(LINE_BUFFER_LENGTH)

        scanner = Scanner(source)
        t = scanner.scan_token()
        while t.type != TokenTypes.EOF and t.type != TokenTypes.ERROR:
            print TokenTypeToName[t.type],
            if t.type == TokenTypes.NUMBER:
                print "(%s)" % scanner.get_token_string(t),
            print
            t = scanner.scan_token()


def entry_point(argv):
    stdin, stdout, stderr = rfile.create_stdio()
    try:
        repl(stdin, stdout)
    except:
        pass
    return 0

With our REPL hooked up we can now scan tokens from arbitrary input:

$ ./scanner2
> (3 *4) - -3
LEFT_PAREN
NUMBER (3)
STAR
NUMBER (4)
RIGHT_PAREN
MINUS
MINUS
NUMBER (3)
> ^C

Compiling expressions

References

https://www.craftinginterpreters.com/compiling-expressions.html
http://effbot.org/zone/simple-top-down-parsing.htm

The final piece is to turn this sequence of tokens into our low level bytecode instructions for the virtual machine to execute. Buckle up, we are about to write us a compiler.

Our compiler will take a single pass over the tokens using Vaughan Pratt’s parsing technique, and output a chunk of bytecode – if we do it right it will be compatible with our existing virtual machine.

Remember the bytecode we defined above is really simple – by relying on our stack we can transform a nested expression into a sequence of our bytecode operations.

To make this more concrete let's go through by hand translating an expression into bytecode.

Our source expression:

(3 + 2) - (7 * 2)

If we were to make an abstract syntax tree we'd get something like this:

Now if we start at the first sub expression (3+2) we can clearly note from the first open bracket that we must see a close bracket, and that the expression inside that bracket must be valid on its own. Not only that but regardless of the inside we know that the whole expression still has to be valid. Let's focus on this first bracketed expression, let our attention recurse into it so to speak.

This gives us a much easier problem – we just want to get our virtual machine to compute 3 + 2. In this bytecode dialect we would load the two constants, and then add them with OP_ADD like so:

OP_CONSTANT  (00) '3.000000'
OP_CONSTANT  (01) '2.000000'
OP_ADD

The effect of our vm executing these three instructions is that sitting pretty at the top of the stack is the result of the addition. Winning.

Jumping back out from our bracketed expression, our next token is MINUS, at this point we have a fair idea that it must be used in an infix position. In fact whatever token followed the bracketed expression it must be a valid infix operator, if not the expression is over or had a syntax error.

Assuming the best from our user (naive), we handle MINUS the same way we handled the first PLUS. We've already got the first operand on the stack, now we compile the right operand and then write out the bytecode for OP_SUBTRACT.

The right operand is another simple three instructions:

OP_CONSTANT  (02) '7.000000'
OP_CONSTANT  (03) '2.000000'
OP_MULTIPLY

Then we finish our top level binary expression and write a OP_RETURN to return the value at the top of the stack as the execution's result. Our final hand compiled program is:

OP_CONSTANT  (00) '3.000000'
OP_CONSTANT  (01) '2.000000'
OP_ADD
OP_CONSTANT  (02) '7.000000'
OP_CONSTANT  (03) '2.000000'
OP_MULTIPLY
OP_SUBTRACT
OP_RETURN

Ok that wasn't so hard was it? Let's try make our code do that.

We define a parser object which will keep track of where we are, and whether things have all gone horribly wrong:

class Parser(object):
    def __init__(self):
        self.had_error = False
        self.panic_mode = False
        self.current = None
        self.previous = None

The compiler will also be a class, we'll need one of our Scanner instances to pull tokens from, and since the output is a bytecode Chunk let's go ahead and make one of those in our compiler initializer:

class Compiler(object):

    def __init__(self, source):
        self.parser = Parser()
        self.scanner = Scanner(source)
        self.chunk = Chunk()

Since we have this (empty) chunk of bytecode we will make a helper method to add individual bytes. Every instruction will pass from our compiler into an executable program through this simple .

    def emit_byte(self, byte):
        self.current_chunk().write_chunk(byte)

To quote from Bob Nystrom on the Pratt parsing technique:

the implementation is a deceptively-simple handful of deeply intertwined code

I don't actually think I can do justice to this section. Instead I suggest reading his treatment in Pratt Parsers: Expression Parsing Made Easy which explains the magic behind the parsing component. Our only major difference is instead of creating an AST we are going to directly emit bytecode for our VM.

Now that I've absolved myself from taking responsibility in explaining this somewhat tricky concept, I'll discuss some of the code from compiler.py, and walk through what happens for a particular rule.

I'll jump straight to the juicy bit the table of parse rules. We define a ParseRule for each token, and each rule comprises:

an optional handler for when the token is as a prefix (e.g. the minus in (-2)),
an optional handler for whet the token is used infix (e.g. the slash in 2/47)
a precedence value (a number that determines what is of higher precedence)

rules = [
    ParseRule(None,              None,            Precedence.NONE),   # ERROR
    ParseRule(None,              None,            Precedence.NONE),   # EOF
    ParseRule(Compiler.grouping, None,            Precedence.CALL),   # LEFT_PAREN
    ParseRule(None,              None,            Precedence.NONE),   # RIGHT_PAREN
    ParseRule(Compiler.unary,    Compiler.binary, Precedence.TERM),   # MINUS
    ParseRule(None,              Compiler.binary, Precedence.TERM),   # PLUS
    ParseRule(None,              Compiler.binary, Precedence.FACTOR), # SLASH
    ParseRule(None,              Compiler.binary, Precedence.FACTOR), # STAR
    ParseRule(Compiler.number,   None,            Precedence.NONE),   # NUMBER
]

These rules really are the magic of our compiler. When we get to a particular token such as MINUS we see if it is an infix operator and if so we've gone and got its first operand ready. At all times we rely on the relative precedence; consuming everything with higher precedence than the operator we are currently evaluating.

In the expression:

2 + 3 * 4

The * has higher precedence than the +, so 3 * 4 will be parsed together as the second operand to the first infix operator (the +) which follows the BEDMAS order of operations I was taught at high school.

To encode these precedence values we make another Python object moonlighting as an enum:

class Precedence(object):
    NONE = 0
    DEFAULT = 1
    TERM = 2        # + -
    FACTOR = 3      # * /
    UNARY = 4       # ! - +
    CALL = 5        # ()
    PRIMARY = 6

What happens in our compiler when turning -2.0 into bytecode? Assume we've just pulled the token MINUS from the scanner. Every expression has to start with some type of prefix – whether that is:

a bracket group (,
a number 2,
or a prefix unary operator -.

Knowing that, our compiler assumes there is a prefix handler in the rule table – in this case it points us at the unary handler.

    def parse_precedence(self, precedence):
        # parses any expression of a given precedence level or higher
        self.advance()
        prefix_rule = self._get_rule(self.parser.previous.type).prefix
        prefix_rule(self)

unary is called:

    def unary(self):
        op_type = self.parser.previous.type
        # Compile the operand
        self.parse_precedence(Precedence.UNARY)
        # Emit the operator instruction
        if op_type == TokenTypes.MINUS:
            self.emit_byte(OpCode.OP_NEGATE)

Here – before writing the OP_NEGATE opcode we recurse back into parse_precedence to ensure that whatever follows the MINUS token is compiled – provided it has higher precedence than unary– e.g. a bracketed group. Crucially at run time this recursive call will ensure that the result is left on top of our stack. Armed with this knowledge, the unary method just has to emit a single byte with the OP_NEGATE opcode.

Test compilation

Now we can test our compiler by outputting disassembled bytecode of our user entered expressions. Create a new entry_point targetcompiler:

from rpython.rlib import rfile
from compiler import Compiler

LINE_BUFFER_LENGTH = 1024


def entry_point(argv):
    stdin, stdout, stderr = rfile.create_stdio()

    try:
        while True:
            stdout.write("> ")
            source = stdin.readline(LINE_BUFFER_LENGTH)
            compiler = Compiler(source, debugging=True)
            compiler.compile()
    except:
        pass
    return 0

Translate it and test it out:

$ ./compiler1 
> (2/4 + 1/2)
== code ==

0000 OP_CONSTANT  (00) '2.000000'
0002 OP_CONSTANT  (01) '4.000000'
0004 OP_DIVIDE    
0005 OP_CONSTANT  (02) '1.000000'
0007 OP_CONSTANT  (00) '2.000000'
0009 OP_DIVIDE    
0010 OP_ADD       
0011 OP_RETURN

Now if you've made it this far you'll be eager to finally connect everything together by executing this bytecode with the virtual machine.

End to end

All the pieces slot together rather easily at this point, create a new file targetcalc.py and define our entry point:

from rpython.rlib import rfile
from compiler import Compiler
from vm import VM

LINE_BUFFER_LENGTH = 4096


def entry_point(argv):
    stdin, stdout, stderr = rfile.create_stdio()
    vm = VM()
    try:
        while True:
            stdout.write("> ")
            source = stdin.readline(LINE_BUFFER_LENGTH)
            if source:
                compiler = Compiler(source, debugging=False)
                compiler.compile()
                vm.interpret_chunk(compiler.chunk)
    except:
        pass
    return 0


def target(driver, *args):
    driver.exe_name = "calc"
    return entry_point, None

Let's try catch it out with a double negative:

$ ./calc 
> 2--3
== VM TRACE ==
          []
0000 OP_CONSTANT  (00) '2.000000'
          [ 2.000000 ]
0002 OP_CONSTANT  (01) '3.000000'
          [ 2.000000 ] [ 3.000000 ]
0004 OP_NEGATE    
          [ 2.000000 ] [ -3.000000 ]
0005 OP_SUBTRACT  
          [ 5.000000 ]
0006 OP_RETURN    
5.000000

Ok well let's evaluate the first 50 terms of the Nilakantha Series:

$ ./calc
> 3 + 4 * ((1/(2 * 3 * 4)) + (1/(4 * 5 * 6)) - (1/(6 * 7 * 8)) + (1/(8 * 9 * 10)) - (1/(10 * 11 * 12)) + (1/(12 * 13 * 14)) - (1/(14 * 15 * 16)) + (1/(16 * 17 * 18)) - (1/(18 * 19 * 20)) + (1/(20 * 21 * 22)) - (1/(22 * 23 * 24)) + (1/(24 * 25 * 26)) - (1/(26 * 27 * 28)) + (1/(28 * 29 * 30)) - (1/(30 * 31 * 32)) + (1/(32 * 33 * 34)) - (1/(34 * 35 * 36)) + (1/(36 * 37 * 38)) - (1/(38 * 39 * 40)) + (1/(40 * 41 * 42)) - (1/(42 * 43 * 44)) + (1/(44 * 45 * 46)) - (1/(46 * 47 * 48)) + (1/(48 * 49 * 50)) - (1/(50 * 51 * 52)) + (1/(52 * 53 * 54)) - (1/(54 * 55 * 56)) + (1/(56 * 57 * 58)) - (1/(58 * 59 * 60)) + (1/(60 * 61 * 62)) - (1/(62 * 63 * 64)) + (1/(64 * 65 * 66)) - (1/(66 * 67 * 68)) + (1/(68 * 69 * 70)) - (1/(70 * 71 * 72)) + (1/(72 * 73 * 74)) - (1/(74 * 75 * 76)) + (1/(76 * 77 * 78)) - (1/(78 * 79 * 80)) + (1/(80 * 81 * 82)) - (1/(82 * 83 * 84)) + (1/(84 * 85 * 86)) - (1/(86 * 87 * 88)) + (1/(88 * 89 * 90)) - (1/(90 * 91 * 92)) + (1/(92 * 93 * 94)) - (1/(94 * 95 * 96)) + (1/(96 * 97 * 98)) - (1/(98 * 99 * 100)) + (1/(100 * 101 * 102)))

== VM TRACE ==
          []
0000 OP_CONSTANT  (00) '3.000000'
          [ 3.000000 ]
0002 OP_CONSTANT  (01) '4.000000'
...SNIP...
0598 OP_CONSTANT  (101) '102.000000'
          [ 3.000000 ] [ 4.000000 ] [ 0.047935 ] [ 1.000000 ] [ 10100.000000 ] [ 102.000000 ]
0600 OP_MULTIPLY  
          [ 3.000000 ] [ 4.000000 ] [ 0.047935 ] [ 1.000000 ] [ 1030200.000000 ]
0601 OP_DIVIDE    
          [ 3.000000 ] [ 4.000000 ] [ 0.047935 ] [ 0.000001 ]
0602 OP_ADD       
          [ 3.000000 ] [ 4.000000 ] [ 0.047936 ]
0603 OP_MULTIPLY  
          [ 3.000000 ] [ 0.191743 ]
0604 OP_ADD       
          [ 3.191743 ]
0605 OP_RETURN    
3.191743

We just executed 605 virtual machine instructions to compute pi to 1dp!

This brings us to the end of this tutorial. To recap we've walked through the whole compilation process: from the user providing an expression string on the REPL, scanning the source string into tokens, parsing the tokens while accounting for relative precedence via a Pratt parser, generating bytecode, and finally executing the bytecode on our own VM. RPython translated what we wrote into C and compiled it, meaning our resulting calc REPL is really fast.

“The world is a thing of utter inordinate complexity and richness and strangeness that is absolutely awesome.”
― Douglas Adams

Many thanks to Bob Nystrom for writing the book that inspired this post, and thanks to Carl Friedrich and Matt Halverson for reviewing.

― Brian (@thorneynzb)

↧

gamingdirectional: Create player missile manager and player missile class in Pygame

November 15, 2018, 3:16 am

≫ Next: Red Hat Developers: Python in RHEL 8

≪ Previous: PyPy Development: Guest Post: Implementing a Calculator REPL in RPython

In this article we will create two classes that will assist the player object to launch missile, they are a player missile class which serves as the missile object and a missile manager class which will manage all the missiles that player has launched. We will tune up these classes later on when the project goes on but for now lets create these two simple classes first. First is the missile class.

Source

↧

Red Hat Developers: Python in RHEL 8

November 14, 2018, 6:01 am

≫ Next: PyCharm: PyCharm 2018.1.6 and 2017.3.7

≪ Previous: gamingdirectional: Create player missile manager and player missile class in Pygame

Ten years ago, the developers of the Python programming language decided to clean things up and release a backwards-incompatible version, Python 3. They initially underestimated the impact of the changes, and the popularity of the language. Still, in the last decade, the vast majority of community projects has migrated to the new version, and major projects are now dropping support for Python 2.

In Red Hat Enterprise Linux 8, Python 3.6 is the default. But Python 2 is remains available in RHEL 8.

Using Python in RHEL 8

To install Python, type yum install python3.

To run Python, type python3.

If that doesn’t work for you, or you need more details, read on!

Python 3

In RHEL 8, Python 3.6 is the default, fully supported version of Python. It is not always installed, however. Similarly to any other available tool, use yum install python3 to get it.

Add-on package names generally have the python3 prefix. Use yum install python3-requests to install the popular library for making HTTP connections.

Python 2

Not all existing software is ready to run on Python 3. And that’s OK! RHEL 8 Beta still contains the Python 2 stack, which can be installed in parallel with Python 3. Get it using yum install python2, and run with python2.

Why not just “Python”?

Okay, okay, so there’s python3 and python2. But what if I use just python? Well…

$ python
-bash: python: command not found

There is no python command by default.

Why? Frankly, we couldn’t agree what python should do. There are two groups of developers. One expects python to mean Python 2, and the other Python 3. The two don’t always talk to each other, so you might be a member of one camp and not know anyone from the other – but they do exist.

Today, in 2018, the python == python2 side is more popular, even among those that prefer Python 3 (which they spell out as python3). This side is also backed by an official upstream recommendation, PEP 394. However, we expect that this viewpoint will become much less popular over the lifespan of RHEL 8. By making python always mean Python 2, Red Hat would be painting itself into a corner.

Unversioned Python command

That said, there are applications that expect a python command to exist and that assumption might be hard to change. That’s why you can use the alternatives mechanism to enable the unversioned python command system-wide, and set it to a specific version:

alternatives --set python /usr/bin/python3

For Python 2, use /usr/bin/python2 instead. For details on how to revert the changes or do the setup interactively, see man unversioned-python.

Note, We do not recommend this approach. We recommend you explicitly refer to python3 or python2. That way, your scripts and commands will work on any machine that has the right version of Python installed.

Note that this works only for the python command itself. Packages and other commands don’t have configurable unversioned variants. Even if you configure python, the commands yum install python-requests or pip won’t work.

Always use the explicit version in these cases. Better yet, don’t rely on the wrapper scripts for pip, venv and other Python modules that you call from the command line. Instead use python3 -m pip, python3 -m venv, python2 -m virtualenv.

Third-party packages

Not all Python software is shipped with RHEL 8 – there’s only so much that Red Hat can verify, package and support.

To install a third-party package, many sources on the Internet will suggest using sudo pip install. Do not do this! This command translates to “download a package from the internet, and run it on my machine as root to install it”.

Even if the package is trustworthy, this is a bad idea. A large part of RHEL 8 relies on Python 3.6. If you throw in another package, there’s no guarantee that it will co-exist peacefully with the rest of the system. There are some protections in place, but you should generally assume that sudo pip will break your system.

(Not to mention it won’t work as-is: the command name is pip3 or pip2.)

If you want to use third-party packages, create a virtual environment using python3 -m venv --system-site-packages myenv (or for Python 2, install python2-virtualenv and run python2 -m virtualenv --system-site-packages myenv). Then, activate the environment using source myenv/bin/activate, and install packages into it using pip install. The packages will then be available as long as the environment is activated. While this does not protect you against malicious packages, it does protect the system from unexpected breakage.

When a virtual environment is active, unversioned commands like python and pip will refer to the Python version that created the virtual environment. So, to install the Requests package, run pip install requests (or if you prefer being explicit, python -m pip install requests).

The --system-site-packages switch makes the environment re-use libraries installed system-wide. Leave it out to get an isolated environment, where all libraries outside Python’s standard library need to be installed explicitly.

Another possibility is installing user-specific packages with pip’s --user switch. The command python3 -m pip install --user pyflakes will make the flake8 linter available to you personally, leaving system tools like yum unaffected.

If you truly need something installed system-wide, build a RPM package and use yum install.

Obligatory note: Third-party packages installed with pip are not reviewed or supported by Red Hat.

Platform-Python: The Python behind the curtain

Careful readers might have noticed a discrepancy here: Python is not installed by default, but yum is, and yum is written in Python. What magic makes that possible?

It turns out there is an internal Python interpreter called “Platform-Python”. This is what system tools use. It only includes the parts of Python needed for the system to function, and there are no guarantees that any particular feature won’t be removed from it in the future.

However, libraries for Platform-Python are shared with the “user-visible” Python 3.6. This conserves disk space, and it also means that, for example, yum extensions built for Python 3.6 will work for the system tool.

If you are not re-building the distro, do not use Platform-Python directly. Install python3 and use that. But remember, do not install

Porting to Python 3

It won’t be in RHEL 8 Beta, but there will come a day when support for Python 2 will end. If you maintain Python 2 code, you should think about porting it to Python 3.

Python 3 was first released in 2008. For over a decade, it has been improving in features, performance and – ironically – compatibility with Python 2. You might have heard horror stories and urban legends about porting code to Python 3.0 or 3.2 that would be much less scary nowadays.

I’m not saying porting is trivial now, but it’s definitely gotten easier. As with any other change to a system, porting to Python 3 mainly requires knowledge of your codebase, good tests – and some time.

What’s the reward? Python 3 is a better language – after all, it’s the language Python 2 developers choose to use! For enterprise applications, the main feature is reduced risk of hard-to-debug, input-dependent bugs when handling non-ASCII text such as people’s names (or emoji).

There are many community resources that document and help with porting to Python 3.

If you are reading this blog, you are probably working on a large, conservative code base. We ported a few of those, and distilled our experience in the the Conservative Porting Guide, a hands-on walkthrough that focuses on compatibility and keeping working code throughout the porting process. Give it a try, and if you find that something is not covered, let us know – or even send a pull request to it!

If you maintain Python C extensions, a similarly focused guide is part of the py3c project.

Takeaways

To install or run Python on RHEL 8, use python3– unless you have a different version in mind.

Do not use sudo pip.

Do not use platform-python for your applications. However, use platform-python if you are writing system/admin code for RHEL 8.

And if you have some code for Python 2, now is a great time to start modernizing it.

Enjoy Python in RHEL 8!

The post Python in RHEL 8 appeared first on RHD Blog.

↧

PyCharm: PyCharm 2018.1.6 and 2017.3.7

November 15, 2018, 8:37 am

≫ Next: Continuum Analytics Blog: Python Data Visualization 2018: Why So Many Libraries?

≪ Previous: Red Hat Developers: Python in RHEL 8

We’ve fixed an issue in our custom Java Runtime Environment for older versions of PyCharm. If you’re using PyCharm 2017.3 or PyCharm 2018.1, please update to the new version.

Fixed in These Versions

Keyboard issues in macOS Mojave

After typing a special character by pressing and holding a key, the IDE would not accept any other input after inserting the special character. This has now been resolved.

To Update

Download the new version from our website, choose Help | Check for Updates in the IDE, or use JetBrains Toolbox to keep all of your JetBrains IDEs updated.

↧

Continuum Analytics Blog: Python Data Visualization 2018: Why So Many Libraries?

November 15, 2018, 11:16 am

≫ Next: NumFOCUS: SunPy Maps: Digitizing Images of the Sun from the 1970s

≪ Previous: PyCharm: PyCharm 2018.1.6 and 2017.3.7

This post is the first in a three-part series on the state of Python data visualization tools and the trends that emerged from SciPy 2018. By James A. Bednar At a special session of SciPy 2018 in Austin, representatives of a wide range of open-source Python visualization tools shared their visions for the future of …
Read more →

The post Python Data Visualization 2018: Why So Many Libraries? appeared first on Anaconda.

↧

NumFOCUS: SunPy Maps: Digitizing Images of the Sun from the 1970s

November 15, 2018, 12:53 pm

≫ Next: Marcos Dione: pefan

≪ Previous: Continuum Analytics Blog: Python Data Visualization 2018: Why So Many Libraries?

The post SunPy Maps: Digitizing Images of the Sun from the 1970s appeared first on NumFOCUS.

↧

Marcos Dione: pefan

November 15, 2018, 1:27 pm

≫ Next: Talk Python to Me: #186 100 Days of Python in a Magical Universe

≪ Previous: NumFOCUS: SunPy Maps: Digitizing Images of the Sun from the 1970s

A few weeks ago I needed to do some line based manipulation that kinda went further of what you can easyly do with awk. My old-SysAdmin brain kicked in and the first though was, if you're going to use awk and sed, you might as well use perl. Thing is, I really can't remember when was the last time I wrote even a oneliner in perl, maybe 2011, in my last SysAdmin-like position.

Since then I've been using python for almost anything, so why not? Well, the python interpreter does not have an equivalent of perl's -n switch; and while we're at it, -a, -F, -p are also interesting for this.

So I wrote a little program for that. Based on those switch names, I called it pefan. As python does not have perl's special variables, and in particuar, $_ and @_, the wrapper sets the line variable for each line of the input, and if you use the -a or -F switches, the variable data with the list that's the result of splitting the line.

Meanwhile, while reading the perlrun manpage to write this post, I found out that -i and even -s sound useful, so I'll be adding support for those in the future. I'm also thinking of adding support for curly-brace-based block definitions, to make oneliners easier to write. Yes, it's a travesty, but it's all in line with my push to make python more SysAdmin friendly.

In the meantime, I added a couple of switches I find useful too. See the whole usage:

usage: pefan.py [-h] [-a] -e SCRIPT [-F SPLIT_CHAR] [-i] [-M MODULE_SPEC]
                [-m MODULE_SPEC] [-N] [-n] [-p] [--no-print] [-r RANDOM]
                [-s SETUP] [-t [FORMAT]] ...

Tries to emulate Perl's (Yikes!) -peFan switches.

positional arguments:
FILE                  Files to process. If ommited or file name is '-',
                      stdin is used. Notice you can use '-' at any point in
                      the list; f.i. "foo bar - baz".

optional arguments:
-h, --help            show this help message and exit
-a, --split           Turns on autosplit, so the line is split in elements.
                      The list of e lements go in the 'data' variable.
-e SCRIPT, --script SCRIPT
                      The script to run inside the loop.
-F SPLIT_CHAR, --split-char SPLIT_CHAR
                      The field delimiter. This implies [-a|--split].
-i, --ignore-empty    Do not print empty lines.
-M MODULE_SPEC, --import MODULE_SPEC
                      Import modules before runing any code. MODULE_SPEC can
                      be MODULE or MODULE,NAME,... The latter uses the 'from
                      MODULE import NAME, ...' variant. MODULE or NAMEs can
                      have a :AS_NAME suffix.
-m MODULE_SPEC        Same as [-M|--import]
-N, --enumerate-lines
                      Prepend each line with its line number, like less -N
                      does.
-n, --iterate         Iterate over all the lines of inputs. Each line is
                      assigned in the 'line' variable. This is the default.
-p, --print           Print the resulting line. This is the default.
--no-print            Don't automatically print the resulting line, the
                      script knows what to do with it
-r RANDOM, --random RANDOM
                      Print only a fraction of the output lines.
-s SETUP, --setup SETUP
                      Code to be run as setup. Run only once after importing
                      modules and before iterating over input.
-t [FORMAT], --timestamp [FORMAT]
                      Prepend a timestamp using FORMAT. By default prints it
                      in ISO-8601.

FORMAT can use Python's strftime()'s codes (see
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-
behavior).

Go get it here.

pythonpefan sysadmin

↧

Talk Python to Me: #186 100 Days of Python in a Magical Universe

November 15, 2018, 5:45 pm

≫ Next: Python Celery - Weekly Celery Tutorials and How-tos: Parse newspapers with Newspaper3k. Celery Docker Style.

≪ Previous: Marcos Dione: pefan

The key to making anything a habit, including learning to program, is to make it fun. That's exactly what Anna-Lena Popkes did with her 100 days of code challenge. She created a magical universe where Python-derived creatures and castles live.

↧

Python Celery - Weekly Celery Tutorials and How-tos: Parse newspapers with Newspaper3k. Celery Docker Style.

November 15, 2018, 10:00 am

≫ Next: Kushal Das: Introducing rpm-macros-virtualenv 0.0.1

≪ Previous: Talk Python to Me: #186 100 Days of Python in a Magical Universe

Docker is hot. Docker is hotter than hot. Docker 1.0 was released in June 2014. Since then, it has been adopted at a remarkable rate. Over 37 billion images have been pulled from Docker Hub, the Docker image repository service. Docker is so popular because it makes it very easy to package and ship applications.

How do you dockerise an app? And how do you orchestrate your stack of dockerised components? This blog post answers both questions in a hands-on way. We are going to build a small Celery app that periodically downloads newspaper articles. We then break up the stack into pieces, dockerising the Celery app. and its components Finally, we put it all back together as a multi-container app.

What is Docker?

Docker lets developers package up and run applications via standardised interfaces. Such a package is called a Docker image. A Docker image is a portable, self-sufficient artefact. Whichever programming language it was written in. This makes it easy to create, deploy and run applications. In a way, a Docker image is a bit like a virtual machine image. But container images take up less space than virtual machines.

When you run a Docker image to start an instance of your application, you get a Docker container. A Docker container is an isolated process that runs in user space and shares the OS kernel. Multiple containers can run on the same machine, each running as isolated processes.

So far so good. What’s in it for you? Containers provide a packaging mechanism. Through this packaging mechanism, your application, its dependencies and libraries all become one artefact. If your application requires Debian 8.11 with Git 2.19.1, Mono 5.16.0, Python 3.6.6, a bunch of pip packages and the environment variable PYTHONUNBUFFERED=1, you define it all in your Dockerfile.

The Dockerfile contains the build instructions for your Docker image. It also is an excellent documentation. If you or other developers need to understand the requirements of your application, read the Dockerfile. The Dockerfile describes your application and its dependencies.

Docker executes the Dockerfile instructions to build the Docker image. This gives you repeatable builds, whatever the programming language. And it lets you deploy your application in a predictable, consistent way. Whatever the target environment. Private data centre, the public cloud, Virtual Machines, bare metal or your laptop.

This gives you the ability to create predictable environments. Your development environment is exactly the same as your test and production environment. You as a developer can focus on writing code without worrying about the system that it will be running on.

For operations, Docker reduces the number of systems and custom deployment scripts. The focus shifts towards scheduling and orchestrating containers. Operations can focus on robustness and scalability. And they can stop worrying about individual applications and their peculiar environmental dependencies.

The newspaper3k Celery app

We are going to build a Celery app that periodically scans newspaper urls for new articles. We are going to save new articles to an Amazon S3-like storage service. This keeps things simple and we can focus on our Celery app and Docker. No database means no migrations. And S3-like storage means we get a REST API (and a web UI) for free. We need the following building blocks:

Our Celery application (the newspaper3k app)
RabbitMQ as a message broker
Minio (the Amazon S3-like storage service)

Both RabbitMQ and Minio are open-source applications. Both binaries are readily available. This leaves us with building the newspaper3k Celery application. Let’s start with the pip packages we need (the full source code is available on GitHub):

# requirements.txt

celery==4.2.1
minio==4.0.6
newspaper3k==0.2.8

Next up is the Celery app itself. I prefer keeping things clear-cut. So we create one file for the Celery worker, and another file for the task. The application code goes into a dedicated app folder:

├── requirements.txt
└── app/
       ├── worker.py
       └── tasks.py

worker.py instantiates the Celery app and configures the periodic scheduler:

# worker.py

from celery import Celery
app = Celery(
   broker='amqp://user:password@localhost:5672', 
   include=['tasks'])

app.conf.beat_schedule = {  
'refresh': {  
       'task': 'refresh',  
       'schedule': 300.0,
       'args': (['https://www.theguardian.com', 'https://www.nytimes.com'],)
    },  
}

The app task flow is as follows. Given a newspaper url, newspaper3k builds a list of article urls. For each article url, we need to fetch the page content and parse it. We calculate the article’s md5 hash. If the article does not exist in Minio, we save it to Minio. If the article does exist in Minio, we save it to Minio if the md5 hashes differ.

Our aim is concurrency and scalability. To achieve this, our tasks need to be atomic and idempotent. An atomic operation is an indivisible and irreducible series of operations such that either all occur, or nothing occurs. A task is idempotent if it does not cause unintended effects when called more than once with the same arguments. The refresh task takes a list of newspaper urls. For each newspaper url, the task asynchronously calls fetch_source, passing the url.

# tasks.py

@app.task(bind=True, name='refresh')  
def refresh(self, urls):  
   for url in urls:  
       fetch_source.s(url).delay()  

The fetch_source task takes a newspaper url as its argument. It generates a list of article urls. For each article url, it invokes fetch_article.

# tasks.py

@app.task(bind=True, name='fetch_source')  
def fetch_source(self, url):  
   source = newspaper.build(url)  
   for article in source.articles:  
       fetch_article.s(article.url).delay()

The fetch_article task expects the article url as its argument. It downloads and parses the article. It calls save_article, passing the newspaper’s domain name, the article’s title and its content.

# tasks.py

@app.task(bind=True, name='fetch_article')  
def fetch_article(self, url):  
   article = newspaper.Article(url)  
   article.download()  
   article.parse()  
   url = urlparse(article.source_url)  
   save_article.s(url.netloc, article.title, article.text).delay()

The save_article task, requires three arguments. The newspaper’s domain name, the article’s title and its content. The task takes care of saving the article to minio. The bucket name is the newspaper domain name. The key name is the article’s title. Here, we use the queue argument in the task decorator. This sends the save_task task to a dedicated Celery queue named minio. This gives us extra control over how fast we can write new articles to Minio. It helps us achieve a good scalable design.

# tasks.py

@app.task(bind=True, name='save_article', queue='minio')
def save_article(self, bucket, key, text):  
   minio_client = Minio('localhost:9000',
      access_key='AKIAIOSFODNN7EXAMPLE',
      secret_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
      secure=False)  
   try:  
      minio_client.make_bucket(bucket, location="us-east-1")  
   except BucketAlreadyExists:  
      pass  
   except BucketAlreadyOwnedByYou:  
      pass  

   hexdigest = hashlib.md5(text.encode()).hexdigest()

   try:  
      st = minio_client.stat_object(bucket, key)  
      update = st.etag != hexdigest  
   except NoSuchKey as err:  
      update = True  

   if update:  
       stream = BytesIO(text.encode())  
       minio_client.put_object(bucket, key, stream, stream.getbuffer().nbytes)  

When it comes to deploying and runing our application, we need to take care of a couple of things. This is typically solved by writing a scripts. Specifically, we need to:

ensure the correct Python version is available on the host machine and install or upgrade if necessary
ensure a virtual Python environment for our Celery app exists; create and run pip install -r requirements.txt if necessary
ensure the desired RabbitMQ version is running somewhere in our network
ensure the desired Minio version is running somewhere in our network
deploy the desired version of your Celery app
ensure the following processes are set up and configured in Supervisor or Upstart:
- Celery beat
- default queue Celery worker
- minio queue Celery worker
restart Supervisor or Upstart to start the Celery workers and beat after each deployment

Dockerise all the things

Easy things first. Both RabbitMQ and Minio are readily available als Docker images on Docker Hub. Docker Hub is the largest public image library. It is the go-to place for open-source images. This leaves us with dockerising our Celery app. The first step to dockerise the app is to create two new files: Dockerfile and .dockerignore.

├── Dockerfile
├── .dockerignore
├── requirements.txt
└── app/
       ├── worker.py
       └── tasks.py

.dockerignore serves a similar purpose as .gitignore. When we copy files into the Docker image during the Docker build process, any file that matches any pattern defined in .dockerignore is excluded.

Dockerfile contains the commands required to build the Docker image. Docker executes these commands sequentially. Each command is called a layer. Layers are re-used by multiple images. This saves disk space and reduces the time to build images.

FROM python:3.6.6  
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8 PYTHONUNBUFFERED=1

WORKDIR /  
COPY requirements.txt ./  
RUN pip install --no-cache-dir -r requirements.txt  
RUN rm requirements.txt  

COPY . /  
WORKDIR /app

We use the python:3.6.6 Docker image as our base. The python:3.6.6 image is available on Dockerhub. Then, we set some environment variables. LANG and LC_ALL configure Python’s default locale setting. Setting PYTHONUNBUFFERED=1 avoids some stdout log anomalies.

Next, COPY requirements.txt ./ copies requirements.txt file into the image’s root folder. We then run pip install. We then delete requirements.txt from the image as we no longer need it. Finally, COPY . / copies the entire project into the image’s root folder. Excluding stuff according to the .dockerignore file. As the app is now in the image’s /app directory, we make this our working directory. Meaning that any command executes inside this directory by default. Execute the Dockerfile build recipe to create the Docker image:

docker build . -t worker:latest

The -t option assigns a meaningful name (tag) to the image. The colon in the tag allows you to specify a version. If you do not provide a version (worker instead of worker:latest), Docker defaults to latest. Do specify a version for anything which is not local development. Otherwise, sooner or later, you will have a very hard time.

Refactor the Celery app

Containerising an application has an impact on how you architect the application. If you want to dive deeper, I recommend you check out the twelve-factor app manifesto. To ensure portability and scalability, twelve-factor requires separation of config from code. An app’s config is everything that is likely to vary betweeen environments.

The twelve-factor app stores config in environment variables. Environment variables are easy to change between environments. Environment variables are language-agnostic. Environment variables are deeply ingrained in Docker. Refactor how we instantiate the Celery app.

# worker.py

app = Celery(
   broker=os.environ['CELERY_BROKER_URL'], 
   include=('tasks',))
app.conf.beat_schedule = {
    'refresh': {
        'task': 'refresh',
        'schedule': float(os.environ['NEWSPAPER_SCHEDULE']),
        'args': (os.environ['NEWSPAPER_URLS'].split(','),)
    },
}

We can simplify further. Any Celery setting (the full list is available here) can be set via an environment variable. The name of the environment variable is derived from the setting name. Uppercase the setting name and prefix with CELERY_. For example, to set the broker_url, use the CELERY_BROKER_URL environment variable.

# worker.py

app = Celery(include=('tasks',))
app.conf.beat_schedule = {
    'refresh': {
        'task': 'refresh',
        'schedule': float(os.environ['NEWSPAPER_SCHEDULE']),
        'args': (os.environ['NEWSPAPER_URLS'].split(','),)
    },
}

We also need to refactor how we instantiate the Minio client.

# tasks.py

@app.task(bind=True, name='save_article')  
def save_article(self, bucket, key, text):  
  minio_client = Minio(os.environ['MINIO_HOST'],  
     access_key=os.environ['MINIO_ACCESS_KEY'],  
     secret_key=os.environ['MINIO_SECRET_KEY'],  
     secure=int(os.getenv('MINIO_SECURE', '0')))
  ...

Rebuild the image:

docker build -t worker:latest

Configuration

Our Celery app is now configurable via environment variables. Let’s summarise the environment variables required for our entire stack:

Worker image:

CELERY_BROKER_URL
MINIO_HOST
MINIO_ACCESS_KEY
MINIO_SECRET_KEY
NEWSPAPER_SCHEDULE
NEWSPAPER_URLS

Minio image:

MINIO_ACCESS_KEY
MINIO_SECRET_KEY

You need to pass the correct set of environment variables when you start the containers with docker run. In reality you will most likely never use docker run. Instead, you will use an orchestration tool like Docker Compose. Even when you do run only a single container. I will skip the details for docker run (you can find the docs here) and jump straight to Docker Compose.

Orchestrate the stack with docker-compose

Now that have all our Docker images, we need to configure, run and make them work together. This is similar to arranging music for performance by an orchestra. We have individual lines of music. But we need to make them work together in harmony.

Container orchestration is about automating deployment, configuration, scaling, networking and availability of containers. Docker Compose is a simple tool for defining and running multi-container Docker applications. With Docker Compose, we can describe and configure our entire stack using a YAML file. The docker-compose.yml. With a single command, we can create, start and stop the entire stack.

Docker Compose creates a single network for our stack. Each container joins the network and becomes reachable by other containers. Docker Compose assigns each container a hostname identical to the container name. This makes each container discoverable within the network.

We define five services (worker, minio worker, beat, rabbitmq and minio) and one volume in docker-compose.yml. Services are Docker Compose speak for containers in production. A service runs an image and codifies the way that image runs. Volumes provide persistent storage. For a complete reference, make sure to check out the Docker Compose file docs.

version: '3.4'
services: 
  worker:
    build: .
    image: &img worker 
    command: [celery, worker, --app=worker.app, --pool=gevent, --concurrency=20, --loglevel=INFO]
    environment: &env      
      - CELERY_BROKER_URL=amqp://guest:guest@rabbitmq:5672
      - MINIO_HOST=minio:9000
      - MINIO_ACCESS_KEY=token
      - MINIO_SECRET_KEY=secret
      - NEWSPAPER_URLS=https://www.theguardian.com,https://www.nytimes.com
      - NEWSPAPER_SCHEDULE=300
    depends_on:
      - beat
      - rabbitmq
    restart: 'no'
    volumes:
      - ./app:/app 

  worker-minio:
    build: .
    image: *img
    command: [celery, worker, --app=worker.app, --pool=gevent, --concurrency=20, --queues=minio, --loglevel=INFO]
    environment: *env
    depends_on:
      - beat
      - rabbitmq
    restart: 'no'
    volumes: 
      - ./app:/app

  beat:
    build: .
    image: *img
    command: [celery, beat, --app=worker.app, --loglevel=INFO]
    environment: *env
    depends_on:
      - rabbitmq
    restart: 'no'
    volumes:
      - ./app:/app

  rabbitmq:
    image: rabbitmq:3.7.8
    
  minio:
    image: minio/minio:RELEASE.2018-11-06T01-01-02Z
    command: [server, /data]
    environment: *env
    ports:
      - 80:9000
    volumes:
      - minio:/data
      
  volumes:
    minio:

Let’s go through the service properties one-by-one.

build: a string containing the path to the build context (directory where the Dockerfile is located). Or, as an object with the path specified under context and optionally Dockerfile and args. This is useful when using docker-compose build worker as an alternative to docker build. Or when you want Docker Compose to automatically build the image for you when it does not exist.
image: the image name
command: the command to execute inside the container
environment: environment variables
ports: expose container ports on your host machine. For example, minio runs on port 9000. We map it to port 80, meaning it becomes available on localhost:80.
restart: what to do when the container process terminates. Here, we do not want Docker Compose to restart it.
volumes: map a persistent storage volume (or a host path) to an internal container path. For local development, mapping to a host path allows you to develop inside the container. For anything that requires persistent storage, use Docker volume. Here, we get minio to use a Docker volume. Otherwise, we lose all data when the container shuts down. And containers are very transient by design.
depends_on: determines the order Docker Compose start the containers. This only determines the startup order. It does not guarantee that the container it depends on, is up and running. RabbitMQ starts before the beat and the worker containers. By the time the beat and worker containers are up and running, RabbitMQ is still starting. Check out the logs using docker-compose logs worker or docker-compose logs beat.

Persistent storage is defined in the volumes section. Here, we declare one volume named minio. This volume is mounted as /data inside the Minio container. And we start Minio so it stores its data to the /data path. Which is the minio volume. Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. You can find out more how Docker volumes work here. And here more about the volumes section in the docker-compose.yml.

In case you are wondering what the ampersand - & - and asterisks - * - are all about. They help you with repeated nodes. An ampersand identifies a node. You can reference this node with an asterisk thereafter. This is very helpful for image names. If you use the same image in different services, you need to define the image only once. When you upgrade to a newer image version, you only need to do it in one place within your yaml.

Same applies to environment variables. You define them for your entire stack only once. And you can then reference them in all your services. When you need to amend something, you need to do it only once. This also helps sharing the same environment variables across your stack. For instance, the minio container requires MINIO_ACCESS_KEY and MINIO_SECRET_KEY for access control. We reuse the same variables on the client side in our Celery app.

Start the Docker stack

With the docker-compose.yml in place, we are ready for show time. Go to the folder where docker-compose.yml is located. Start the docker stack with

docker-compose up -d

Minio should become available on http://localhost. Use the key and secret defined in the environment variable section to log in. Follow the logs with docker-compose logs -f. Or docker-compose logs –f worker to follow the workers logs only.

Say, you need to add another Celery worker (bringing the total threads from 20 to 40).

docker-compose up -d --scale worker=2

And back down again.

docker-compose up -d --scale worker=1

Conclusion

This was pretty intense. But we have come a long way. We started discussing the benefits of running an application on Docker. We then took a deep dive into two important building blocks when moving to Docker:

containerise a Celery application
orchestrating a container stack with Docker Compose

I’ve compiled a small list of resources covering important aspects of dockerisation. It’s about important design aspects when building a containerised app:

And here’s a list of resources on orchestration with Docker Compose:

Docker Compose is a great starting point. It’s a great tool for local development and continuous integration. And it can make sense in small production environments. At the same time, Docker Compose is tied to a single host and limited in larger and dynamic environments.

This is where kubernetes shines. Kubernetes_ is the de-facto standard for container orchestration which excels at scale. In my next blog post, we will migrate our little Celery-newspaper3k-RabbitMQ-Minio stack from Docker Compose to kubernetes.

↧

Kushal Das: Introducing rpm-macros-virtualenv 0.0.1

November 15, 2018, 8:03 pm

≫ Next: PyCharm: PyCharm 2018.3 RC 2

≪ Previous: Python Celery - Weekly Celery Tutorials and How-tos: Parse newspapers with Newspaper3k. Celery Docker Style.

Let me introduce rpm-macros-virtualenv 0.0.1 to you all.

This is a small set of RPM macros, which can be used by the spec files to build and package any Python application along with a virtualenv. Thus, removing the need of installing all dependencies via dnf/rpm repository. One of the biggest usecase will be to help to install latest application code and all the latest dependencies into a virtualenv and also package the whole virtualenv into the RPM package.

This will be useful for any third part vendor/ISV, who would want to package their Python application for Fedora/RHEL/CentOS along with the dependencies. But, remember not to use this for any package inside of Fedora land as this does not follow the Fedora packaging guidelines.

This is the very initial release, and it will get a lot of updates in the coming months. The project idea is also not new, Debian already has dh-virtualenv doing this for a long time.

How to install?

I will be building an rpm package, for now download the source code and the detached signature to verify it against my GPG key.

wget https://kushaldas.in/packages/rpm-macros-virtualenv-0.0.1.tar.gz
wget https://kushaldas.in/packages/rpm-macros-virtualenv-0.0.1.tar.gz.asc
gpg2 --verify rpm-macros-virtualenv-0.0.1.tar.gz.asc rpm-macros-virtualenv-0.0.1.tar.gz

Untar the directory, and then copy the macros.python-virtualenv file to the RPM macros directory in your system.

tar -xvf rpm-macros-virtualenv-0.0.1.tar.gz
cd rpm-macros-virtualenv-0.0.1/
sudo cp macros.python-virtualenv /usr/lib/rpm/macros.d/

How to use?

Here is a minimal example.

# Fedora 27 and newer, no need to build the debug package
%if 0%{?fedora} >= 27 || 0%{?rhel} >= 8
%global debug_package %{nil}
%endif
# Use our interpreter for brp-python-bytecompile script
%global __python /opt/venvs/%{name}/bin/python3


%prep
%setup -q

%build
%pyvenv_create
%{__pyvenvpip3} install --upgrade pip
%pyvenv_build

%install
%pyvenv_create
%{__pyvenvpip3} install --upgrade pip
%pyvenv_install
ln -s /opt/venvs/%{name}/bin/examplecommand $RPM_BUILD_ROOT%{_bindir}/examplecommand

%files
%doc README.md LICENSE
/opt/venvs/%{name}/*

As you can see, in both %build and in %install, first we have to call %pyvenv_install, that will create our virtualenv. Then we are installing the latest pip in that environment.

Then in the %build, we are calling %pyvenv_build to create the wheel.

In the %install section, we are calling %pyvenv_install macro to install the project, this command will also install all the required dependencies (from the requirements.txt of the project) by downloading them from https://pypi.org.

If you have any command/executable which gets installed in the virtualenv, you should create a symlink to that from $RPM_BUILD_ROOT/usr/bin/ directory in the %install section.

Now, I have an example in the git repository, where I have taken the Ansible 2.7.1 spec file from Fedora, and converted it to these macros. I have build the package for Fedora 25 to verify that this works.

↧

PyCharm: PyCharm 2018.3 RC 2

November 16, 2018, 4:00 am

≫ Next: Zato Blog: Making API servers start up faster

≪ Previous: Kushal Das: Introducing rpm-macros-virtualenv 0.0.1

We’re putting the final touches on PyCharm 2018.3 to prepare for our release. You can get our second release candidate from our website

Fixed in This Version

The “There is a plugin available” notification would open the wrong window
For some users, some UI elements weren’t visible
The Markdown preview window wouldn’t load images correctly
Read the release notes for details

Interested?

Download the RC from our website. Alternatively, you can use the JetBrains Toolbox App.

If you’re on Ubuntu 16.04 or later, you can use snap to get the PyCharm RC, and stay up to date. You can find the installation instructions on our website.

The release candidate is not an EAP version, this means that you will either need to have an active PyCharm license, or you’ll receive a 30 day free trial for PyCharm Professional Edition. The community edition is free and open source as usual.

↧

Zato Blog: Making API servers start up faster

November 16, 2018, 9:39 am

≫ Next: Tiago Montes: A Sheet of Python

≪ Previous: PyCharm: PyCharm 2018.3 RC 2

This post describes a couple of new techniques that Zato 3.0 employs to make API servers start up faster.

When a Zato server starts, it carries out a series of steps, one of which is deployment of internal API services. There are 550+ of internal services, which means 550+ of individual features that can be made use of - REST, publish/subscribe, SSO, AMQP, IBM MQ, Cassandra, caching, SAP Odoo, and hundreds more pieces are available.

Yet, what internal services have in common is that they change relatively infrequently. They do change from time to time but this does not happen very often. This realization led to the creation of a start-up cache of internal services.

Auto-caching on first deployment

Observe the output when a server is started right after installation, with all the internal services about to be deployed along with some of the user-defined ones.

In this particular case, the server needed around 8.5 second to deploy its internal services but while it was doing it, it also cached them all for later use.

Now, when the same server is stopped and started again, the output will be different. Nothing changed as far as user-defined services go but things changed with regards to the internal ones - not only did the server deploy the internal services but it also did it by re-using the cache created above and, consequently, 3 seconds were needed to deploy them.

Such a cache of internal services is created and maintained by Zato automatically, no user action is required.

Disabling internal services

Auto-caching is already a nice improvement but it is possible to go one better. By default, servers deploy all of the internal services that exist - this is because users may want to choose in their projects any and all of the features that the internal services represent.

However, in practice, most projects will use a select few technologies, e.g. REST and AMQP, or REST, IBM MQ, SAP and ElasticSearch, or any other combination, but not all of what is possible.

This explains the addition of a new feature which allows one to disable all the internal services that are known not to be needed in a particular project.

When you open a given server's server.conf file, you will find entries in the [deploy_internal] stanza whose subset is below. Note that if your Zato 3.0 version does not have it, you can copy the stanza over from a newly created server.

The list contains not internal services as such but Python modules to which the services belong, each module concerns a particular feature or technology, AMQP, JMS IBM MQ, WebSockets, Amazon S3 and anything else. Thus, if something is not needed, you can simply change True to False for each module that is not used.

But, you need to keep in mind that all the internal services were already cached before so, having changed True to False in as many places as needed, we also need a way to recreate the cache.

This is done by specifying the --sync-internal flag when servers are started; observe below what happens when some of the internal services were disabled and the flag was provided.

All the user-defined services deployed as previously but the cache for the internal ones was recreated and only some of them were deployed, only the ones that were needed in this particular project, which happens to primarily include REST, WebSockets, Vault and publish/subscribe.

Note that even without the cache, the server needed only 4.1 second to deploy internal services which neatly dovetails with the fact that previously it needed 8.5 to deploy roughly twice as many of them.

This also means that with the cache already in place, the services will be deployed even much faster, which is indeed the case below. This time the server deployed the internal services needed in this project in 1.3 second, which is much faster than the original 8.5 second.

This process can be applied as many times as needed, each time you need new functionality disabled or enabled, you just edit server.conf, restart servers and that is it, the caches will be populated automatically.

With some of the services disabled, a caveat is that parts of web-admin will not be able to list or manage connections whose backend services were taken out but this is to be expected, e.g. if FTP connections were disabled in server.conf then it will not be possible to access them in web-admin.

One final note is that --sync-internal should really only be used when needed. The rationale behind the start-up cache is to make the process faster so this flag should not be used all the time, rather, there are two cases where it needs to be used:

When changing which internal services to deploy, as detailed in this post
When applying updates to your Zato installation - some of the updates may change, delete or add new internal services, which is why the caches need to be recreated in such cases

↧

Discussions

Python Jobs

Articles & Tutorials

Projects & Code

Events

Update all (--update-all)

Interactive update all (--interactive)

Support for "extras"

Atomic writes

Better not-found errors

(Internal) Black everything and pytest everything

Related posts:

New in 2018.2.5

Interested?

Uni-directional type inference

Bi-directional type inference (Hindley-Milner)

How Hindley-Milner type inference works

The implementation

The road ahead

A REPL

A virtual machine

OpCodes

Chunks

Chunks of data

Emulation

Scanning the source

Scanner

Compiling expressions

References

Test compilation

End to end

Related posts:

Using Python in RHEL 8

Python 3

Python 2

Why not just “Python”?

Unversioned Python command

Third-party packages

Platform-Python: The Python behind the curtain

Porting to Python 3

Takeaways

Fixed in These Versions

Keyboard issues in macOS Mojave

To Update

What is Docker?

The newspaper3k Celery app

Dockerise all the things

Refactor the Celery app

Configuration

Orchestrate the stack with docker-compose

Start the Docker stack

Conclusion

How to install?

How to use?

Fixed in This Version

Interested?

Auto-caching on first deployment

Disabling internal services

Update all (`--update-all`)

Interactive update all (`--interactive`)