Mike Driscoll: Jupyter Notebook Extension Basics

October 1, 2018, 10:05 pm

≫ Next: Matt Layman: Consistent Python code with Black

≪ Previous: Kay Hayen: Nuitka this week #8

There are several methods of extending the functionality of Jupyter Notebooks. Here are four of them:

Kernels
IPython kernel extensions
Notebook extensions
Notebook server extensions

For the purposes of this article, I will be focusing on the third item, Notebook Extensions. However, let’s take a moment and talk about the other three so that you are aware of how they can affect your Notebook.

Kernels

The Kernel is basically the language run-time used. The default is Python via the IPython kernel. You can extend your Jupyter Notebook to use other languages besides Python. Check out the following URL for more information:

https://jupyter.readthedocs.io/en/latest/projects/kernels.html

I won’t be covering the installation of other kernels in this article as each kernel has different installation instructions. The URL above should be used as it has hyperlinks to the most up-to-date information on this topic.

You can also implement your own kernel if you want to. Here’s a great primer on the topic:

https://jupyter-client.readthedocs.io/en/latest/kernels.html

IPython Kernel Extensions

The IPython Kernel extension is just a Python module that can be used to modify the interactive shell environment. In Jupyter’s case, these extensions would modify how code cells behave. You can use this type of extension to register new magics, define variables and modify the user’s namespace. You can use the following three magics for managing IPython extensions:

%load_ext
%reload_ext
%unload_ext

See the IPython documentation for full details on these magics.

Notebook Server Extensions

Jupyter Notebook also has the concept of “server extensions”. A server extension is a Python module that loads when the Notebook’s web server application starts. The current method of loading this type of extension is via Jupyter’s configuration system, which we talked about in Chapter 3. You will need to specify which extensions to load in the configuration file or via the command line interface.

If you add a new extension while Jupyter Notebook is running, you will have to restart the Notebook process to activate the new extension.

Notebook Extensions

The type of extension that we care the most about in this chapter is the Notebook Extension. A Notebook extension (or nbextensions) are JavaScript modules that you can load on most of the views in your Notebook’s frontend. They can access the page’s DOM and the Jupyter JavaScript API which allows the extension to modify the user experience and interface. This type of extension is exclusive to the Notebook frontend.

Let’s learn how you might install a Notebook Extension. The manual method of installing a Jupyter Notebook extension would look something like this, assuming you have already downloaded / pip installed the package that contains the extension:

jupyter nbextension install EXTENSION
jupyter nbextension enable EXTENSION

Note that you would replace EXTENSION with the name of the extension you are intending to install.

Another method that seems to have gained a lot of backing for managing Notebook extensions is by using the Jupyter NbExtensions Configurator. You can check out the project here.

This package is not part of the Jupyter Project, but it is quite helpful. You can use pip or conda to install the Configurator. Here’s how to do it with pip:

pip install jupyter_nbextensions_configurator

If you use conda as your Python package manager, then you would install the Configurator like this:

conda install -c conda-forge jupyter_nbextensions_configurator

The Configurator is a Jupyter server extension and must be enabled. You will need to run the following command in your terminal before starting Jupyter Notebook (or you can restart the server):

jupyter nbextensions_configurator enable --user

When I ran this command, I got the following output:

Enabling: jupyter_nbextensions_configurator
- Writing config: /Users/michael/.jupyter
    - Validating...
      jupyter_nbextensions_configurator0.4.0 OK
Enabling notebook nbextension nbextensions_configurator/config_menu/main...
Enabling tree nbextension nbextensions_configurator/tree_tab/main...

After starting up Jupyter Notebook, just click on the Nbextensions tab and you should see something like the following:

Since we don’t have any extensions downloaded and installed, the Configurator looks kind of barren and can’t really do all that much for us right now. Let’s get some extensions to try!

If you were to search for Jupyter Notebook extensions, you would find the **jupyter_contrib_nbextensions** pretty quickly. It is a collection of Notebook extensions provided by the Jupyter community. You can read more about the extensions that are included in this package at either of the following links:

To install this set of extensions, you can once again use either pip or conda. Here is the pip command you will need:

pip install jupyter_contrib_nbextensions

And here is the conda command:

conda install -c conda-forge jupyter_contrib_nbextensions

Once you have this package downloaded and installed, you will need to use a Jupyter to install the javascript and css files to the right location so that the Notebook can access them. Here is the command you should run:

jupyter contrib nbextension install --user

Now that you have the new extensions installed, you can use the Configurator tool we installed earlier to easily enable or disable them:

As you can see in the screenshot above, there is now a LOT of extensions to play around with. The Configurator is really handy for figuring out which extensions you have installed and enabled. You can also enable and disable Notebook extensions by hand in the terminal using jupyter nbextension enable EXTENSION or jupyter nbextension disable EXTENSION respectively, but I personally find the Configurator easier to use.

Wrapping Up

In this article we learned about the different types of extensions you can use with Jupyter. The one we cared the most about were the Jupyter Notebook extensions. We then went on to learn the basics of installing extensions. We also learned how to enable and disable extensions using the terminal and also via the Jupyter NbExtensions Configurator package.

Matt Layman: Consistent Python code with Black

October 1, 2018, 5:00 pm

≫ Next: Stack Abuse: A Brief Look at Web Development in Python

≪ Previous: Mike Driscoll: Jupyter Notebook Extension Basics

Code formatting is the subject of millions of fiery, nerdy debates. Developers love to argue about code style because we read code a lot. The style matters because it affects readability. We have examples of communities that benefit from a shared code style. The Go programming language has gofmt (i.e., “go format”) baked in as a core tool in the language. Their core team decided on a set of rules to use and everyone gets the benefit of reading code that looks very similar.

↧

Stack Abuse: A Brief Look at Web Development in Python

October 2, 2018, 5:40 am

≫ Next: Codementor: How to detect faces using OpenCV and Python/C++?

≪ Previous: Matt Layman: Consistent Python code with Black

Introduction

Since 2003, Python has ranked in the top 10 programming languages to learn and its ranking has been consistently improving ever since. According to a statistic, Python is one of the top 5 languages to learn in 2019 and has become an essential part of the programming community, thanks to its simplicity, flexibility, robustness, ease of use, compatibility, speed, and versatility. Furthermore, tech giants like Instagram, Spotify, and Google base, at least in part, of their architecture in Python.

In short, Python has become a central figure of the programming and business world with the rise of Silicon Valley and Wall Street poster child: Fintech. The reasons are many, but Python offers the security and scalability sought by the digital-first-approach trend assumed by a considerable portion of the business and financial sectors.

Though Python can be used to perform a variety of tasks ranging from machine learning and data science to robotics and hardware programming, in this article we will study how Python can be used for web development.

Web Development Using Python

Python offers something for everyone through its many frameworks. A framework is a bundle of packages and modules that provide an abstraction, or generic functionality, that can be selectively changed to create application-specific software.

But how do you know which web framework is right for you? For full-fledged web applications, Django and Pyramid are the way to go. For better control and visualization or prototyping an app, Web2py or Flask may have something to offer to your project. CheeryPy is a must for simple, minimalist solutions. Tornado will handle 10,000 or more concurrent connections to your app at the same time while Dash is the perfect choice for analytical applications.

In this article, we will provide a brief overview for three of the most popular selections by developers and programming companies alike: Django, Pyramid, and Flask. After the overview we'll show the most popular framework, Django, in action through the use of an example login system.

Django

This framework is the embodiment of the "batteries included" phrase. Defined as "the web framework for perfectionists with deadlines". Its built-in features allow for a wide range of web applications such as database applications, chatbots, GPS solutions, etc.

It's DRY (Don't Repeat Yourself) philosophy not only allows, but also promotes, the reuse of code, slicing the coding time in half. Furthermore, its modular/decoupled architecture allows seamless modification of the code components, allowing you to add or remove components as much as needed with little to no effort.

Django also possesses something called an ORM (Object-Relational Mapping), which makes it not only highly compatible with most of the popular databases like SQL or Oracle but allows it to work with several databases at once.

Finally, Django is SEO (Search Engine Optimization) friendly. For example, it allows the reduction of the page loading time through techniques/features like caching templates and compressing JavaScript.

Pyramid

This framework defines itself as "not too small, not too big, just right". Pyramid is a finishing-focused framework with the ability to start small, allowing you to code a solid foundation for your solution, and then to scale it up as needed. It is similar to Django in its compatibility with small and large applications, but sets itself apart from Django in its complexity.

While on its own it can be considered a lean option when compared to other frameworks, Pyramid shines with its plugin system, allowing developers to plug-in whatever is needed, which allows for the implementation of multiple solutions for a given task.

Pyramid is even ideal for single-file applications, flexible authentication, and authorization or apps oriented to view predicates.

Flask

While Pyramid and Django share the same core philosophy, Flask goes in the other direction. If the end goal is something simple, manageable, and customizable, I'd suggest that you always use Flask instead of using an overkill power horse like Django. Flask is heavily based on Jinja 2 templating and the Werkzeug WSGI (Web Server Gateway Interface) toolkit.

Self-defined as a microframework, Flask is tailored to small-scale solutions, like simple apps or API's, where lean functionality is a top priority. Flask is also the most used microframework for creating prototypes. When building a working application from the ground up in a short amount of time, it takes priority over the management of the said application in the long term.

Simple Login System with Django

In this section, we are going to explain how to create a simple login system with the Django framework. While a lot of things happen "offstage", and many things can be customized to the developers' liking, only the most basic steps will be explained in order to demonstrate how easy it is to develop applications with the Django framework.

Installing Django

For this example, the PIP module needs to be installed. Once that it is done, Django can be installed and a new project can be created as follows:

$ python3 -m venv ~/.virtualenvs/dProject # Creates a virtual enviroment named dProject
$ source ~/.virtualenvs/dProject/bin/activate # A path is created
(dProject) $ pip install django # Django is installed
(dProject) $ django-admin.py startproject LoginProject_D # The project is created with the name LoginProject_D 
(dProject) $ ./manage.py migrate # Migrate creates a new SQLite database
(dProject) $ ./manage.py runserver # Calls the local server
(dProject) $ ./manage.py startapp dProject # This creates a dedicated app that will allow the making of a view and url registration.

After this is done, the project can be previewed in a browser via the "http://127.0.0.1:8000" address. The Django welcome screen will load in the browser, indicating that installation was a success.

Django's Auth App

When a project is created, Django installs the "auth" app by default. This can be confirmed by checking the file "settings.py", which is created automatically with the new project, under the "INSTALLED_APPS" section as follows:

INSTALLED_APPS = [  
    …
    'django.contrib.admin',
    'django.contrib.auth', # Here it is! Note that several built-in apps are included in this section.
   …
]

"Django-auth", or "django.contrib.auth", is the Django framework's built-in authenthication system, and contains its default models.

In order to use the "auth" app, we need to add it to the project-level file "urls.py":

# Importing this module will allow us to set the routes login and logout views
from django.conf.urls import url  
from django.contrib import admin  
from django.contrib.auth import views as auth_views 

# This section adds Django site authentication urls (for login, logout, password management)
urlpatterns = [  
    url(r'^login/$', auth_views.login, name='login'),
    url(r'^logout/$', auth_views.logout, name='logout'),
    url(r'^admin/', admin.site.urls),
    path('dProject/', include('django.contrib.auth.urls')),
]

The "auth" app provides us with the following URLs, each one associated with "auth" views, allowing us to use them by simply creating their view templates:

dProject/login/ [name='login']  
dProject/logout/ [name='logout']  
dProject/password_change/ [name='password_change']  
dProject/password_change/done/ [name='password_change_done']  
dProject/password_reset/ [name='password_reset']  
dProject/password_reset/done/ [name='password_reset_done']  
dProject/reset/<uidb64>/<token>/ [name='password_reset_confirm']  
dProject/reset/done/ [name='password_reset_complete']

The django.contrib.auth.views.login view will create the "registration/login.html" template by default. This creates a folder named REGISTRATION with a "login.html" template within. The following block of code is a basic login template that can be used:

{% extends 'base.html' %}

{% block title %}Login{% endblock %}

{% block content %}
  <h2>Login</h2>
  <form method="post"> #This is a standard form to send data.
    {% csrf_token %} #Security tag to prevent <a target="_blank" href="https://www.owasp.org/index.php/Cross-site_Scripting_(XSS)">XSS Attacks</a>, among other concerns.
    {{ form.as_p }} #Ouputs the form's contents between paragraph tags.
    <button type="submit">Login</button> #A submit button. 
  </form>
{% endblock %}

Finally, we set the project to look for the "templates" folder through the "settings.py" file, updating DIRS:

TEMPLATES = [  
    {
        ...
        'DIRS': [os.path.join(BASE_DIR, 'templates')],
        ...
    },
]

Voila! A simple login page that can correctly authenticate a user through a username and password validation.

Conclusion

Python has been widely used for server-side programming, owing to its dynamic website creation capabilities. This language is widely used for fast prototyping and building highly scalable web applications by technology leaders like Google and even NASA!

Python is, without a doubt, a must when taking a digital-first approach to staying competitive, which is further enhanced by the meteoric raising of the fintech industry.

Furthermore, these Python frameworks reduce the development effort through the provision of a variety of built-in functionalities. The only challenge would be which one to use, tailored to specific needs for better results.

↧

Codementor: How to detect faces using OpenCV and Python/C++?

October 2, 2018, 7:34 am

≫ Next: Python Anywhere: Turning a Python script into a website

≪ Previous: Stack Abuse: A Brief Look at Web Development in Python

Implementation of face detection(Open CV) in python and C++. Updated.

↧

Python Anywhere: Turning a Python script into a website

October 2, 2018, 8:37 am

≫ Next: Talk Python to Me: #180 What's new in Python 3.7 and beyond

≪ Previous: Codementor: How to detect faces using OpenCV and Python/C++?

.jab-post img { border: 2px solid #eeeeee; padding: 5px; }

One question we often hear from people starting out with PythonAnywhere is "how do I turn this script I've written into a website so that other people can run it?"

That's actually a bigger topic than you might imagine, and a complete answer would wind up having to explain almost everything about web development. So we won't do all of that in this blog post :-) The good news is that simple scripts can often be turned into simple websites pretty easily, and in this blog post we'll work through a couple of examples.

Let's get started!

The simplest case: a script that takes some inputs and returns an output

Let's say you have this Python 3.x script:

number1 = float(input("Enter the first number: "))
number2 = float(input("Enter the second number: "))
solution = number1 + number2
print("The sum of your numbers is {}".format(solution))

Obviously that's a super-simple example, but a lot of more complicated scripts follow the same kind of form. For example, a script for a financial analyst might have these equivalent steps:

Get data about a particular stock from the user.
Run some kind of complicated analysis on the data.
Print out a result saying how good the algorithm thinks the stock is as an investment.

The point is, we have three phases, input, processing and output.

(Some scripts have more phases -- they gather some data, do some processing, gather some more data, do more processing, and so on, and eventually print out a result. We'll come on to those later on.)

Let's work through how we would change our three-phase input-process-output script into a website.

Step 1: extract the processing into a function

In a website's code, we don't have access to the Python input or print functions, so the input and output phases will be different -- but the processing phase will be the same as it was in the original script. So the first step is to extract our processing code into a function so that it can be re-used. For our example, that leaves us with something like this:

def do_calculation(number1, number2):
    return number1 + number2

number1 = float(input("Enter the first number: "))
number2 = float(input("Enter the second number: "))
solution = do_calculation(number1, number2)
print("The sum of your numbers is {}".format(solution))

Simple enough. In real-world cases like the stock-analysis then of course there would be more inputs, and the do_calculation function would be considerably more complicated, but the theory is the same.

Step 2: create a website

Firstly, create a PythonAnywhere account if you haven't already. A free "Beginner" account is enough for this tutorial.

Once you've signed up, you'll be taken to the dashboard, with a tour window. It's worth going through the tour so that you can learn how the site works -- it'll only take a minute or so.

At the end of the tour you'll be presented with some options to "learn more". You can just click "End tour" here, because this tutorial will tell you all you need to know.

Now you're presented with the PythonAnywhere dashboard. I recommend you check your email and confirm your email address -- otherwise if you forget your password later, you won't be able to reset it.

Now you need to create a website, which requires a web framework. The easiest web framework to get started with when creating this kind of thing is Flask; it's very simple and doesn't have a lot of the built-in stuff that other web frameworks have, but for our purposes that's a good thing.

To create your site, go to the "Web" page using the tab near the top right:

Click on the "Add a new web app" button to the left. This will pop up a "Wizard" which allows you to configure your site. If you have a free account, it will look like this:

If you decided to go for a paid account (thanks :-), then it will be a bit different:

What we're doing on this page is specifying the host name in the URL that people will enter to see your website. Free accounts can have one website, and it must be at yourusername.pythonanywhere.com. Paid accounts have the option of using their own custom host names in their URLs.

For now, we'll stick to the free option. If you have a free account, just click the "Next" button, and if you have a paid one, click the checkbox next to the yourusername.pythonanywhere.com, then click "Next". This will take you on to the next page in the wizard.

This page is where we select the web framework we want to use. We're using Flask, so click that one to go on to the next page.

PythonAnywhere has various versions of Python installed, and each version has its associated version of Flask. You can use different Flask versions to the ones we supply by default, but it's a little more tricky (you need to use a thing called a virtualenv), so for this tutorial we'll create a site using Python 3.6, with the default Flask version. Click the option, and you'll be taken to the next page:

This page is asking you where you want to put your code. Code on PythonAnywhere is stored in your home directory, /home/yourusername, and in its subdirectories. Flask is a particularly lightweight framework, and you can write a simple Flask app in a single file. PythonAnywhere is asking you where it should create a directory and put a single file with a really really simple website. The default should be fine; it will create a subdirectory of your home directory called mysite and then will put the Flask code into a file called flask_app.py inside that directory.

(It will overwrite any other file with the same name, so if you're not using a new PythonAnywhere account, make sure that the file that it's got in the "Path" input box isn't one of your existing files.)

Once you're sure you're OK with the filename, click "Next". There will be a brief pause while PythonAnywhere sets up the website, and then you'll be taken to the configuration page for the site:

You can see that the host name for the site is on the left-hand side, along with the "Add a new web app" button. If you had multiple websites in your PythonAnywhere account, they would appear there too. But the one that's currently selected is the one you just created, and if you scroll down a bit you can see all of its settings. We'll ignore most of these for the moment, but one that is worth noting is the "Best before date" section.

If you have a paid account, you won't see that -- it only applies to free accounts. But if you have a free account, you'll see something saying that your site will be disabled on a date in three months' time. Don't worry! You can keep a free site up and running on PythonAnywhere for as long as you want, without having to pay us a penny. But we do ask you to log in every now and then and click the "Run until 3 months from today" button, just so that we know you're still interested in keeping it running.

Before we do any coding, let's check out the site that PythonAnywhere has generated for us by default. Right-click the host name, just after the words "Configuration for", and select the "Open in new tab" option; this will (of course) open your site in a new tab, which is useful when you're developing -- you can keep the site open in one tab and the code and other stuff in another, so it's easier to check out the effects of the changes you make.

Here's what it should look like.

OK, it's pretty simple, but it's a start. Let's take a look at the code! Go back to the tab showing the website configuration (keeping the one showing your site open), and click on the "Go to directory" link next to the "Source code" bit in the "Code" section:

You'll be taken to a different page, showing the contents of the subdirectory of your home directory where your website's code lives:

Click on the flask_app.py file, and you'll see the (really really simple) code that defines your Flask app. It looks like this:

It's worth working through this line-by-line:

from flask import Flask

As you'd expect, this loads the Flask framework so that you can use it.

app = Flask(__name__)

This creates a Flask application to run your code.

@app.route('/')

This decorator specifies that the following function defines what happens when someone goes to the location "/" on your site -- eg. if they go to http://yourusername.pythonanywhere.com/. If you wanted to define what happens when they go to http://yourusername.pythonanywhere.com/foo then you'd use @app.route('/foo') instead.

def hello_world():
    return 'Hello from Flask!'

This simple function just says that when someone goes to the location, they get back the (unformatted) text "Hello from Flask".

Try changing it -- for example, to "This is my new shiny Flask app". Once you've made the change, click the "Save" button at the top to save the file to PythonAnywhere:

...then the reload button (to the far right, looking like two curved arrows making a circle), which stops your website and then starts it again with the fresh code.

A "spinner" will appear next to the button to tell you that PythonAnywhere is working. Once it has disappeared, go to the tab showing the website again, hit the page refresh button, and you'll see that it has changed as you'd expect.

Step 3: make the processing code available to the web app

Now, we want our Flask app to be able to run our code. We've already extracted it into a function of its own. It's generally a good idea to keep the web app code -- the basic stuff to display pages -- separate from the more complicated processing code (after all, if we were doing the stock analysis example rather than this simple add-two-numbers script, the processing could be thousands of lines long).

So, we'll create a new file for our processing code. Go back to the browser tab that's showing your editor page; up at the top, you'll see "breadcrumb" links showing you where the file is stored. They'll be a series of directory names separated by "/" characters, each one apart from the last being a link. The last one, just before the name of the file containing your Flask code, will probably be mysite. Right-click on that, and open it in a new browser tab -- the new tab will show the directory listing you had before:

In the input near the top right, where it says "Enter new file name, eg. hello.py", enter the name of the file that will contain the processing code. Let's (uninventively) call it processing.py. Click the "New file" button, and you'll have another editor window open, showing an empty file. Copy/paste your processing function into there; that means that the file should simply contain this code:

def do_calculation(number1, number2):
    return number1 + number2

Save that file, then go back to the tab you kept open that contains the Flask code. At the top, add a new line just after the line that imports Flask, to import your processing code:

from processing import do_calculation

While we're at it, let's also add a line to make debugging easier if you have a typo or other error in the code; just after the line that says

app = Flask(__name__)

...add this:

app.config["DEBUG"] = True

Save the file; you'll see that you get a warning icon next to the new import line. If you move your mouse pointer over the icon, you'll see the details:

It says that the function was imported but is not being used, which is completely true! That moves us on to the next step.

Step 4: Accepting input

What we want our site to do is display a page that allows the user to enter two numbers. To do that, we'll change the existing function that is run to display the page. Right now we have this:

@app.route('/')
def hello_world():
    return 'This is my new shiny Flask app'

We want to display more than text, we want to display some HTML. Now, the best way to do HTML in Flask apps is to use templates (which allow you to keep the Python code that Flask needs in separate files from the HTML), but we have other tutorials that go into the details of that. In this case we'll just put the HTML right there inside our Flask code -- and while we're at it, we'll rename the function:

@app.route('/')
def adder_page():
    return '''
        <html>
            <body>
                <p>Enter your numbers:
                <form>
                    <p><input name="number1" /></p>
                    <p><input name="number2" /></p>
                    <p><input type="submit" value="Do calculation" /></p>
                </form>
            </body>
        </html>
    '''

We won't go into the details of how HTML works here, there are lots of excellent tutorials online and one that suits the way you learn is just a Google search away. For now, all we need to know is that where we were previously returning a single-line string, we're now returning a multi-line one (that's what the three quotes in a line mean, in case you're not familiar with them -- one string split over multiple lines). The multi-line string contains HTML code, which just displays a page that asks the user to enter two numbers, and a button that says "Do calculation". Click on the editor's "reload website" button:

...and then check out your website again in the tab that you (hopefully) kept open, and you'll see something like this:

However, as we haven't done anything to wire up the input to the processing, clicking the "Do calculation" button won't do anything but reload the page.

Step 5: validating input

We could at this stage go straight to adding on the code to do the calculations, and I was originally planning to do that here. But after thinking about it, I realised that doing that would basically be teaching you to shoot yourself in the foot... When you put a website up on the Internet, you have to allow for the fact that the people using it will make mistakes. If you created a site that allowed people to enter numbers and add them, sooner or later someone will type in "wombat" for one of the numbers, or something like that, and it would be embarrassing if your site responded with an internal server error.

So let's add on some basic validation -- that is, some code that makes sure that people aren't providing us with random marsupials instead of numbers.

A good website will, when you enter an invalid input, display the page again with an error message in it. A bad website will display a page saying "Invalid input, please click the back button and try again". Let's write a good website.

The first step is to change our HTML so that the person viewing the page can click the "Do calculation" button and get a response. Just change the line that says

<form>

So that it says this:

<form method="post" action=".">

What that means is that previously we had a form, but now we have a form that has an "action" telling it that when the button that has the type "submit" is clicked, it should request the same page as it is already on, but this time it should use the "post" method.

(HTTP methods are extra bits of information that are tacked on to requests that are made by a browser to a server. The "get" method, as you might expect, means "I just want to get a page". The "post" method means "I want to provide the server with some information to store or process". There are vast reams of details that I'm skipping over here, but that's the most important stuff for now.)

So now we have a way for data to be sent back to the server. Reload the site using the button in the editor, and refresh the page in the tab where you're viewing your site. Try entering some numbers, and click the "Do calculation" button, and you'll get... an incomprehensible error message:

Well, perhaps not entirely incomprehensible. It says "method not allowed". Previously we were using the "get" method to get our page, but we just told the form that it should use the "post" method when the data was submitted. So Flask is telling us that it's not going to allow that page to be requested with the "post" method.

By default, Flask view functions only accept requests using the "get" method. It's easy to change that. Back in the code file, where we have this line:

@app.route('/')

...replace it with this:

@app.route("/", methods=["GET", "POST"])

Save the file, hit the reload button in the editor, then go to the tab showing your page; click back to get away from the error page if it's still showing, then enter some numbers and click the "Do calculation" button again.

You'll be taken back to the page with no error. Success! Kind of.

Now let's add the validation code. The numbers that were entered will be made available to us in our Flask code via the form attribute of a global variable called request. So we can add validation logic by using that. The first step is to make the request variable available by importing it; change the line that says

from flask import Flask

to say

from flask import Flask, request

Now, add this code to the view function, before the return statement:

    errors = ""
    if request.method == "POST":
        number1 = None
        number2 = None
        try:
            number1 = float(request.form["number1"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number1"])
        try:
            number2 = float(request.form["number2"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number2"])

Basically, we're saying that if the method is "post", we do the validation.

Finally, add some code to put those errors into the page's HTML; replace the bit that returns the multi-line string with this:

    return '''
        <html>
            <body>
                {errors}
                <p>Enter your numbers:
                <form method="post" action=".">
                    <p><input name="number1" /></p>
                    <p><input name="number2" /></p>
                    <p><input type="submit" value="Do calculation" /></p>
                </form>
            </body>
        </html>
    '''.format(errors=errors)

This is exactly the same page as before, we're just interpolating the string that contains any errors into it just above the "Enter your numbers" header.

Save the file; you'll see more warnings for the lines where we define variables called number1 and number2, because we're not using those variables. We know we're going to fix that, so they can be ignored for now.

Reload the site, and head over to the page where we're viewing it, and try to add a koala to a wallaby -- you'll get an appropriate error:

Try adding 23 to 19, however, and you won't get 42 -- you'll just get the same input form again. So now, the final step that brings it all together.

Step 6: doing the calculation!

We're all set up to do the calculation. What we want to do is:

If the request used a "get" method, just display the input form
If the request used a "post" method, but one or both of the numbers are not valid, then display the input form with error messages.
If the request used a "post" method, and both numbers are valid, then display the result.

We can do that by adding something inside the if request.method == "POST": block, just after we've checked that number2 is valid:

        if number1 is not None and number2 is not None:
            result = do_calculation(number1, number2)
            return '''
                <html>
                    <body>
                        <p>The result is {result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

Adding that code should clear out all of the warnings in the editor page, and if you reload your site and then try using it again, it should all work fine!

Pause for breath...

So if all has gone well, you've now converted a simple script that could add two numbers into a simple website that lets other people add numbers. If you're getting error messages, it's well worth trying to debug them yourself to find out where any typos came in. An excellent resource is the website's error log; there's a link on the "Web" page:

...and the most recent error will be at the bottom:

That error message is telling me that I mistyped "flask" as "falsk", and the traceback tells me exactly which line the typo is on.

However, if you get completely stuck, here's the code you should currently have:

from flask import Flask, request

from processing import do_calculation

app = Flask(__name__)
app.config["DEBUG"] = True

@app.route("/", methods=["GET", "POST"])
def adder_page():
    errors = ""
    if request.method == "POST":
        number1 = None
        number2 = None
        try:
            number1 = float(request.form["number1"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number1"])
        try:
            number2 = float(request.form["number2"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number2"])
        if number1 is not None and number2 is not None:
            result = do_calculation(number1, number2)
            return '''
                <html>
                    <body>
                        <p>The result is {result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

    return '''
        <html>
            <body>
                {errors}
                <p>Enter your numbers:
                <form method="post" action=".">
                    <p><input name="number1" /></p>
                    <p><input name="number2" /></p>
                    <p><input type="submit" value="Do calculation" /></p>
                </form>
            </body>
        </html>
    '''.format(errors=errors)

The next step -- multi-phase scripts

So now that we've managed to turn a script that had the simple three-phase input-process-output structure into a website, how about handling the more complicated case where you have more phases? A common case is where you have an indefinite number of inputs, and the output depends on all of them. For example, here's a simple script that will allow you to enter a list of numbers, one after another, and then will display the statistical mode (the most common number) in the list, with an appropriate error message if there is no most common number (for example in the list [1, 2, 3, 4]).

import statistics

def calculate_mode(number_list):
    try:
        return "The mode of the numbers is {}".format(statistics.mode(number_list))
    except statistics.StatisticsError as exc:
        return "Error calculating mode: {}".format(exc)


inputs = []
while True:
    if len(inputs) != 0:
        print("Numbers so far:")
        for input_value in inputs:
            print(input_value)
    value = input("Enter a number, or just hit return to calculate: ")
    if value == "":
        break
    try:
        inputs.append(float(value))
    except:
        print("{} is not a number")

print(calculate_mode(inputs))

How can we turn that into a website? We could display, say, 100 input fields and let the user leave the ones they don't want blank, but (a) that would look hideous, and (b) it would leave people who wanted to get the mode of 150 numbers stuck.

(Let's put aside for the moment the fact that entering lots of numbers into a website would be deathly dull -- there's a solution coming for that :-)

What we need is a page that can accumulate numbers; the user enters the first, then clicks a button to send it to the server, which puts it in a list somewhere. Then they enter the next, and the server adds that one to the list. Then the next, and so on, until they're finished, at which point they click a button to get the result.

Here's a naive implementation. By "naive", I mean that it sort of works in some cases, but doesn't in general; it's the kind of thing that one might write, only to discover that when other people start using it, it breaks in really weird and confusing ways. It's worth going through, though, because the way in which is is wrong is instructive.

Firstly, in our processing.py file we have the processing code, just as before:

import statistics

def calculate_mode(number_list):
    try:
        return "The mode of the numbers is {}".format(statistics.mode(number_list))
    except statistics.StatisticsError as exc:
        return "Error calculating mode: {}".format(exc)

That should be pretty clear. Now, in flask_app.py we have the following code:

(A step-by-step explanation is coming later, but it's worth reading through now to see if you can see how at least some of it it works.)

from flask import Flask, request

from processing import calculate_mode

app = Flask(__name__)
app.config["DEBUG"] = True

inputs = []

@app.route("/", methods=["GET", "POST"])
def mode_page():
    errors = ""
    if request.method == "POST":
        try:
            inputs.append(float(request.form["number"]))
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number"])

        if request.form["action"] == "Calculate number":
            result = calculate_mode(inputs)
            inputs.clear()
            return '''
                <html>
                    <body>
                        <p>{result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

    if len(inputs) == 0:
        numbers_so_far = ""
    else:
        numbers_so_far = "<p>Numbers so far:</p>"
        for number in inputs:
            numbers_so_far += "<p>{}</p>".format(number)

    return '''
        <html>
            <body>
                {numbers_so_far}
                {errors}
                <p>Enter your number:
                <form method="post" action=".">
                    <p><input name="number" /></p>
                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>
                </form>
            </body>
        </html>
    '''.format(numbers_so_far=numbers_so_far, errors=errors)

All clear? Maybe... It does work, though, sort of. Let's try it -- copy the code for the two files into your editor tabs, reload the site, and give it a go. If you have a free account, it will work!

Enter "1", and you get this:

Enter some more numbers:

...and calculate the result:

But if you have a paid account, you'll see some weird behaviour. Exactly what you'll get will depend on various random factors, but it will be something like this:

Enter 1, and you might get this:

Enter 2, and you might get this:

Huh? Where did the "1" go? Well, let's enter "3":

Well, that seems to have worked. We'll add "4":

And now we'll add "1" again:

So now our original 1 has come back, but all of the other numbers have disappeared.

In general, it will seem to sometimes forget numbers, and then remember them again later, as if it has multiple lists of numbers -- which is exactly what it does.

Before we go into why it's actually wrong (and why, counterintuitively, it works worse on a paid account than on a free one), here's the promised step-by-step runthrough, with comments after each block of code. Starting off:

from flask import Flask, request

from processing import calculate_mode

app = Flask(__name__)
app.config["DEBUG"] = True

All that is just copied from the previous website.

inputs = []

We're initialising a list for our inputs, and putting it in the global scope, so that it will persist over time. This is because each view of our page will involve a call to the view function:

@app.route("/", methods=["GET", "POST"])
def mode_page():

...which is exactly the same kind of setup for a view function as we had before.

    errors = ""
    if request.method == "POST":
        try:
            inputs.append(float(request.form["number"]))
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number"])

We do very similar validation to the number as we did in our last website, and if the number is valid we add it to the global list.

        if request.form["action"] == "Calculate number":

This bit is a little more tricky. On our page, we have two buttons -- one to add a number, and one to say "do the calculation" -- here's the bit of the HTML code from further down that specifies them:

<p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>

This means that when we get a post request from a browser, the "action" value in the form object will contain the text of the submit button that was actually clicked.

So, if the "Calculate number" button was the one that the user clicked...

            result = calculate_mode(inputs)
            inputs.clear()
            return '''
                <html>
                    <body>
                        <p>{result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

...we do the calculation and return the result (clearing the list of the inputs at the same time so that the user can try again with another list).

If, however, we get past that if request.form["action"] == "Calculate number" statement, it means either that:

The request was using the post method, and we've just added a number to the list or set the error string to reflect the fact that the user entered an invalid number, or
The request was using the get method

So:

    if len(inputs) == 0:
        numbers_so_far = ""
    else:
        numbers_so_far = "<p>Numbers so far:</p>"
        for number in inputs:
            numbers_so_far += "<p>{}</p>".format(number)

...we generate a list of the numbers so far, if there are any, and then:

    return '''
        <html>
            <body>
                {numbers_so_far}
                {errors}
                <p>Enter your number:
                <form method="post" action=".">
                    <p><input name="number" /></p>
                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>
                </form>
            </body>
        </html>
    '''.format(numbers_so_far=numbers_so_far, errors=errors)

We return our page asking for a number, with the list of numbers so far and errors if either is applicable.

Phew!

So why is it incorrect? If you have a paid account, you've already seen evidence that it doesn't work very well. If you have a free account, here's a thought experiment -- what if two people were viewing the site at the same time? In fact, you can see exactly what would happen if you use the "incognito" or "private tab" feature on your browser -- or, if you have multiple browsers installed, if you use two different browsers (say by visiting the site in Chrome and in Firefox at the same time).

What you'll see is that both users are sharing a list of numbers. The Chrome user starts off, and adds a number to the list:

Now the Firefox user adds a number -- but they see not only the number they added, but also the Chrome user's number:

It's pretty clear what's going on here. There's one server handling the requests from both users, so there's only one list of inputs -- so everyone shares the same list.

But what about the situation for websites running on paid accounts? If you'll remember, it looked like the opposite was going on there -- there were multiple lists, even within the same browser.

This is because paid accounts have multiple servers for the same website. This is a good thing, it means that if they get lots of requests coming in at the same time, then everything gets processed more quickly -- so they can have higher-traffic websites. But it also means that different requests, even successive requests from the same browser, can wind up going to different servers, and because each server has its own list, the browser will see one list for one request, but see a different list on the next request.

What this all means is that global variables don't work for storing state in website code. On each server that's running to control your site, everyone will see the same global variables. And if you have multiple servers, then each one will have a different set of global variables.

What to do?

Sessions to the rescue!

What we need is a way to keep a set of "global" variables that are specific to each person viewing the site, and are shared between all servers. If two people, Alice and Bob, are using the site, then Alice will have her own list of inputs, which all servers can see, and Bob will have a different list of inputs, separate from Alice's but likewise shared between servers.

The web dev mechanism for this is called sessions, and is built into Flask. Let's make a tiny set of modifications to the Flask app to make it work properly. Firstly, we'll import support for sessions by changing our Flask import line from this:

from flask import Flask, request

...to this:

from flask import Flask, request, session

In order to use sessions, we'll also need to configure Flask with a "secret key" -- sessions use cryptography, which requires a random number. Add a line like this just after the line where we configure Flask's debug setting to be True:

app.config["SECRET_KEY"] = "lkmaslkdsldsamdlsdmasldsmkdd"

Use a different string to the one I put above; mashing the keyboard randomly is a good way to get a reasonably random string, though if you want to do things properly, find something truly random.

Next, we'll get rid of the global inputs list by deleting this line:

inputs = []

Now we'll use an inputs list that's stored inside the session object (which looks like a dictionary) instead of using our global variable. Firstly, let's makes sure that whenever we're in our view function, we have a list of inputs associated with the current session if there isn't one already. Right at the start of the view function, add this:

    if "inputs" not in session:
        session["inputs"] = []

Next, inside the bit of code where we're adding a number to the inputs list, replace this line:

        inputs.append(float(request.form["number"]))

...with this one that uses the list on the session:

        session["inputs"].append(float(request.form["number"]))

There's also a subtlety here; because we're changing a list inside a session (instead of adding a new thing to the session), we need to tell the session object that it has changed by putting this line immediately after the last one:

        session.modified = True

Next, when we're calculating the mode, we need to look at our session again to get the list of inputs:

        result = calculate_mode(inputs)

...becomes

        result = calculate_mode(session["inputs"])

...and the line that clears the inputs so that the user can do another list likewise changes from

        inputs.clear()

to:

        session["inputs"].clear()
        session.modified = True

Finally, the code that generates the "numbers so far" list at the start of the page needs to change to use the session:

if len(inputs) == 0:
    numbers_so_far = ""
else:
    numbers_so_far = "<p>Numbers so far:</p>"
    for number in inputs:
        numbers_so_far += "<p>{}</p>".format(number)

...becomes:

if len(session["inputs"]) == 0:
    numbers_so_far = ""
else:
    numbers_so_far = "<p>Numbers so far:</p>"
    for number in session["inputs"]:
        numbers_so_far += "<p>{}</p>".format(number)

Once all of those code changes have been done, you should have this:

from flask import Flask, request, session

from processing import calculate_mode

app = Flask(__name__)
app.config["DEBUG"] = True
app.config["SECRET_KEY"] = "lkmaslkdsldsamdlsdmasldsmkdd"

@app.route("/", methods=["GET", "POST"])
def mode_page():
    if "inputs" not in session:
        session["inputs"] = []

    errors = ""
    if request.method == "POST":
        try:
            session["inputs"].append(float(request.form["number"]))
            session.modified = True
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number"])

        if request.form["action"] == "Calculate number":
            result = calculate_mode(session["inputs"])
            session["inputs"].clear()
            session.modified = True
            return '''
                <html>
                    <body>
                        <p>{result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

    if len(session["inputs"]) == 0:
        numbers_so_far = ""
    else:
        numbers_so_far = "<p>Numbers so far:</p>"
        for number in session["inputs"]:
            numbers_so_far += "<p>{}</p>".format(number)

    return '''
        <html>
            <body>
                {numbers_so_far}
                {errors}
                <p>Enter your number:
                <form method="post" action=".">
                    <p><input name="number" /></p>
                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>
                </form>
            </body>
        </html>
    '''.format(numbers_so_far=numbers_so_far, errors=errors)

Hit the reload button, and give it a try! If you have a paid account, you'll find that now it all works properly -- and if you have a free account, you'll see that separate browsers now have separate lists of numbers :-)

So now we have a multi-user website that keeps state around between page visits.

Processing files

Now, entering all of those numbers one-by-one would be tedious if there were a lot of them. A lot of Python scripts don't request the user to enter data a line at a time; they take a file as their input, process it, and produce a file as the output. Here's a simple script that asks for an input filename and an output filename. It expects the input file to contain a number of lines, each with a comma-separated list of numbers on it. It writes to the output file the same number of lines, each one containing the sum of the numbers from the equivalent line in the input file.

def process_data(input_data):
    result = ""
    for line in input_data.split("\n"):
        if line != "":
            numbers = [float(n) for n in line.split(", ")]
            result += str(sum(numbers))
        result += "\n"
    return result

input_filename = input("Enter the input filename: ")
output_filename = input("Enter the output filename: ")

with open(input_filename, "r") as input_file:
    input_data = input_file.read()

with open(output_filename, "w") as output_file:
    output_file.write(process_data(input_data))

What we want is a Flask app that will allow the user to upload a file like the input file that that script requires, and will then provide the output file to download. This is actually pretty similar to the original app we did -- there's just three phases, input-process-output. So the Flask app looks very similar.

Firstly, we put our calculating routine into processing.py, as normal:

def process_data(input_data):
    result = ""
    for line in input_data.split("\n"):
        if line != "":
            numbers = [float(n) for n in line.split(", ")]
            result += str(sum(numbers))
        result += "\n"
    return result

...and now we write a Flask app that looks like this:

from flask import Flask, make_response, request

from processing import process_data

app = Flask(__name__)
app.config["DEBUG"] = True

@app.route("/", methods=["GET", "POST"])
def file_summer_page():
    if request.method == "POST":
        input_file = request.files["input_file"]
        input_data = input_file.stream.read().decode("utf-8")
        output_data = process_data(input_data)
        response = make_response(output_data)
        response.headers["Content-Disposition"] = "attachment; filename=result.csv"
        return response

    return '''
        <html>
            <body>
                <p>Select the file you want to sum up:
                <form method="post" action="." enctype="multipart/form-data">
                    <p><input type="file" name="input_file" /></p>
                    <p><input type="submit" value="Process the file" /></p>
                </form>
            </body>
        </html>
    '''

Again, we'll go through that bit-by-bit in a moment (though it's worth noting that although this feels like something that should be much harder than the first case, the Flask app is much shorter :-) But let's try it out first -- once you've saved the code on PythonAnywhere and reloaded the site, visit the page:

We specify a file with contents (mine just has "1, 2, 3" on the first line and "4, 5, 6" on the second):

...then we click the button. You'll have to watch for it, but a file download will almost immediately start. In Chrome, for example, this will appear at the bottom of the window:

Open the file in an appropriate application -- here's what it looks like in gedit:

We've got a website where we can upload a file, process it, and download the results :-)

Obviously the user interface could use a bit of work, but that's left as an exercise for the reader...

So, how dow the code work? Here's the breakdown:

from flask import Flask, make_response, request

from processing import process_data

app = Flask(__name__)
app.config["DEBUG"] = True

This is our normal Flask setup code.

@app.route("/", methods=["GET", "POST"])
def file_summer_page():

As usual, we define a view.

    if request.method == "POST":

If the request is use the "post" method...

        input_file = request.files["input_file"]
        input_data = input_file.stream.read().decode("utf-8")

...we ask Flask to extract the uploaded file from the request object, and then we read it into memory. The file it will provide us with will be in binary format, so we convert it into a string, assuming that it's in the UTF-8 character set.

        output_data = process_data(input_data)

Now we process the data using our function. The next step is where it gets a little more complicated:

        response = make_response(output_data)
        response.headers["Content-Disposition"] = "attachment; filename=result.csv"

In the past, we just returned strings from our Flask view functions and let it sort out how that should be presented to the browser. But this time, we want to take a little more control over the kind of response that's going back. In particular, we don't want to dump all of the output into the browser window so that the user has to copy/paste the (potentially thousands of lines of) output into their spreadsheet or whatever. Instead, we want to tell the browser "the thing I'm sending you is a file called 'result.csv', so please download it appropriately". That's what these two lines do -- the first is just a way to tell Flask that we're going to need some detailed control over the response, and the second does that control. Next:

        return response

...we just return the response.

Now that we're out of that first if statement, we know that the request we're handling isn't one with a "post" method, so it must be a "get". So we display the form:

    return '''
        <html>
            <body>
                <p>Select the file you want to sum up:
                <form method="post" action="." enctype="multipart/form-data">
                    <p><input type="file" name="input_file" /></p>
                    <p><input type="submit" value="Process the file" /></p>
                </form>
            </body>
        </html>
    '''

In this case we just return a string of HTML like we did in the previous examples. There are only two new things in there:

<form method="post" action="." enctype="multipart/form-data">

The enctype="multipart/form-data" in there is just an extra flag that is needed to tell the browser how to format files when it uploads them as part of the "post" request that it's sending to the server, and:

<p><input type="file" name="input_file" /></p>

....is just how you specify an input where the user can select a file to upload

So that's it!

And we're done

In this blog post we've presented three different Flask apps, each of which shows how a specific kind of normal Python script can be converted into a website that other people can access to reap the benefits of the code you've written.

Hopefully they're all reasonably clear, and you can see how you could apply the same techniques to your own scripts. If you have any comments or questions, please post them in the comments below -- and if you have any thoughts about other kinds of patterns that we could consider adding to an updated version of this post, or to a follow-up, do let us know.

Thanks for reading!

↧

Talk Python to Me: #180 What's new in Python 3.7 and beyond

October 2, 2018, 1:00 am

≫ Next: Continuum Analytics Blog: Anaconda Distribution 5.3.0 Released

≪ Previous: Python Anywhere: Turning a Python script into a website

The Python core developers recently released Python 3.7 and are now busy planning what's coming in 3.8. That makes right now a great time to dig into what was included in Python 3.7 and what's on deck for the next great release of CPython. This week we have Anthony Shaw back on the podcast to tell us all about it.

↧

Continuum Analytics Blog: Anaconda Distribution 5.3.0 Released

October 2, 2018, 1:31 pm

≫ Next: Mike Driscoll: Python 101 – Episode #27: Profiling Python Code

≪ Previous: Talk Python to Me: #180 What's new in Python 3.7 and beyond

We’re excited to announce the release of Anaconda Distribution 5.3.0! Anaconda Distribution is the world’s most popular and easiest way to learn and perform data science and machine learning. Here’s a rundown of new features. In addition to our Python 2.7 Anaconda installers, as well as Python 3.6 Anaconda metapackages, Anaconda Distribution 5.3 is compiled …
Read more →

The post Anaconda Distribution 5.3.0 Released appeared first on Anaconda.

↧

Mike Driscoll: Python 101 – Episode #27: Profiling Python Code

October 2, 2018, 10:05 pm

≫ Next: Dataquest: Python Dictionary Tutorial

≪ Previous: Continuum Analytics Blog: Anaconda Distribution 5.3.0 Released

In this screencast you will learn the basics of profiling your code using Python’s built-in cProfile module.

You can read the chapter this screencast is based on here – http://python101.pythonlibrary.org/ or get a copy of the book on Leanpub.

↧

Dataquest: Python Dictionary Tutorial

October 2, 2018, 11:16 pm

≫ Next: PyCharm: PyCharm 2018.3 EAP 5

≪ Previous: Mike Driscoll: Python 101 – Episode #27: Profiling Python Code

Python offers a variety of data structures to hold our information — the dictionary being one of the most useful. Python dictionaries quick, easy to use, and flexible. As a beginning programmer, you can use this Python tutorial to become familiar with dictionaries and their common uses so that you can start incorporating them immediately into your own code.

When performing data analysis, you'll often have data that is an unusable or hard-to-use form. Dictionaries can help here, by making it easier to read and change your data.

For this tutorial, we will use the Craft Beers data sets from Kaggle. There is one data set describing beer characterstics, and another that stores geographical information on brewery companies. For the purposes of this article, our data will be stored in the beers and breweries variables, each as a list of lists. The tables below give a quick look at what the data look like.

This table contains the first row from the beers data set.

	abv	ibu	id	name	style	brewery_id	ounces
0	0.05		1436	Pub Beer	American Pale Lager	408	12.0

This table contains the first row from the breweries data set.

	name	city	state
0	Northgate Brewing	Minneapolis	MN

Prerequisite knowledge

This article assumes basic knowledge of Python. To fully understand the article, you should be comfortable working with lists and for loops.

We'll cover:

Key terms and concepts to dictionaries
- Dictionary rules
Basic dictionary operations
- creation and deletion
- access and insertion
- membership checking
Looping techniques
Dictionary comprehensions
Dictionary advantages and disadvantages

Getting into our role

We will assume the role of a reviewer for a beer enthusiast magazine. We want to know ahead of time what each brewery will have before we arrive to review, so that we can gather useful background information. Our data sets hold information on beers and breweries, but the data themselves are not immediately accessible.

The data are currently in the form of a list of lists. To access individual data rows, you must use a numbered index. To get the first data row of breweries, you look at the 2nd item (the column names are first).

breweries[1]

breweries[1] is a list, so you can also index from it as well. Getting the third item in this list would look like:

breweries[1][2]

If you didn't know that breweries was data on breweries, you'd have a hard time understanding what the indexing is trying to do. Imagine writing this code and looking at it again 6 months in the future. You're more than likely to forget, so it merits us reformatting the data in a more readable way.

Key terms and concepts

Dictionaries are made up of key-value pairs. Looking at the key in a Python dictionary is akin to the looking for a particular word in a physical dictionary. The value is the corresponding data that is associated with the key, comparable to the definition associated with the word in the physical dictionary. The key is what we look up, and it's the value that we're actually interested in.

Python Dictionary Tutorial

We say that values are mapped to keys. In the example above, if we look up the word "programmer" in the English dictionary, we'll see: "a person who writes computer programs." The word "programmer" is the key mapped to the definition of the word.

Dictionary rules for keys and values

Dictionaries are immensely flexible because they allow anything to be stored as a value, from primitive types like strings and floats to more complicated types like objects and even other dictionaries (more on this later).

By contrast, there are limitations to what can be used as a key.

A key is required to be an immutable object in Python, meaning that it cannot be alterable. This rule allows strings, integers, and tuples as keys, but excludes lists and dictionaries since they are mutable, or able to be altered. The rationale is simple: if any changes happen to a key without you knowing, you won't be able to access the value anymore, rendering the dictionary useless. Thus, only immutable objects are allowed to be keys. A key must also be unique within a dictionary.

The key-value structuring of a dictionary is what makes it so powerful, and throughout this post we'll delve into its basic operations, use cases, and their advantages and disadvantages.

Basic dictionary operations

Creation and deletion

Let's start with how to create a dictionary. First, we will learn how to make an empty dictionary since you'll often find that you want to start with an empty one and populate with data as needed.

To create an empty dictionary, we can either use the dict() function with no inputs, or assign a pair of curly brackets with nothing in between to a variable. We can confirm that both methods will produce the same result.

empty = {}
also_empty = dict()

empty == also_empty
>>> True

Now, an empty dictionary isn't of much use to anybody, so we must add our own key-value pairs. We will cover this later in the article, but know that we are able to start with empty dictionaries and populate them after the fact. This will allow us to add in more information when we need it.

empty["First key"] = "First value"

empty["First key"]
>>> "First value"

Alternatively, you can also create a dictionary and pre-populate it with key-value pairs. There are two ways to do this.

The first is to use brackets containing the key-value pairs. Each key and value are separated by a :, while individual pairs are separated by a comma. While you can fit everything on one line, it's better to split up your key-value pairs among different lines to improve readability.

data = {
    "beer_data": beers,
    "brewery_data": breweries
}

The above code creates a single dictionary, data, where our keys are descriptive strings and the values are our data sets. This single dictionary allows us to access both data sets by name.

The second way is through the dict() method. You can supply the keys and values either as keyword arguments or as a list of tuples. We will recreate the data dictionary from above using the dict() methods and providing the key-value pairs appropriately.

# Using keyword arguments
data2 = dict(beer_data=beers, brewery_data=breweries)

# Using a list of tuples
tuple_list = [("brewery_data", breweries), ("beer_data", beers)]
data3 = dict(tuple_list)

We can confirm that each of the data dictionaries are equivalent in Python's eyes.

data == data2 == data3
>>> True

With each option, the key and value pairs must be formatted in a particular way, so it's easy to get mixed up. The diagram below helps to sort out where keys and values are in each.

Python Dictionary Tutorial

We now have three dictionaries storing the exact same information, so it's best if we just keep one. Dictionaries themselves don't have a method for deletion, but Python provides the del statement for this purpose.

del data2
del data3

After creating your dictionaries, you'll almost certainly need to add and remove items from them. Python provides a simple, readable syntax for these operations.

Data access and insertion

The current state of our beers and breweries dictionary is still dire — each of the data sets originally was a list of lists, which makes it difficult to access specific data rows.

We can achieve better structure by reorganizing each of the data sets into its own dictionary and creating some helpful key-value pairs to describe the data within the dictionary. The raw data itself is mixed. The first row in each list of lists is a list of strings containing the column names, but the rest contains the actual data. It'll be better to separate the columns from the data so we can be more explicit. Thus, for each data set, we'll create a dictionary with three keys mapped to the following values:

The raw data itself
The list containing the column names
The list of lists containing the rest of the data

beer_details = {
    "raw_data": beers,
    "columns": beers[0],
    "data": beers[1:]
}

brewery_details = {
    "raw_data": breweries,
    "columns": breweries[0],
    "data": breweries[1:]
}

Now, we can reference the columns key explicitly to list the column names of the data instead of indexing the first item in raw_data. Similarly, we are now able to explicitly ask for just the data from the data key.

So far, we've learned how to create empty and prepopulated dictionaries, but we do not know how to read information from them once they've been made. To access items within a dictionary, we need bracket notation. We reference the dictionary itself followed by a pair of brackets with the key that we want to look up. For example below, we read the column names from brewery_details.

brewery_details["columns"]
>>> ['', 'name', 'city', 'state']

This action should feel similar to us looking up a word in a English dictionary. We "looked up" a key and got the information we wanted back in the mapped value.

In addition to looking up key-value pairs, sometimes we'll actually want to change the value associated with a key in our dictionaries. This operation also uses bracket notation. To change a dictionary value, we first access the key and then reassign it using an = expression.

We saw in the code above that one of the brewery columns is an empty string, but this first column actually contains a unique ID for each brewery! We will reassign the first column name to a more informative name.

# reassigning the first column of the breweries data set
brewery_details["columns"][0] = 'brewery_id'

# confirming that our reassignment worked
brewery_details["columns"][0]
>>> "brewery_id"

If the series of brackets looks confusing, don't fret. We have taken advantage of nesting. We know that the columns key in brewery_details is mapped to a list, so we can treat brewery_details["columns"] as a list (i.e. we can use list indexing). Nesting can get confusing if we lose track of what each level represents, but we visualize this nesting below to clarify.

Python Dictionary Tutorial

It's also common practice to nest dictionaries within dictionaries because it creates self-documenting code. That is to say, it is evident what the code is doing just by reading it, without any comments to help. Self-documenting code is immensely useful because it is easier and faster to understand at a moment's read through. We want to preserve this self-documenting quality, so we will nest the beer_details and brewery_details dictionaries into a centralized dictionary. The end result are nested dictionaries that are easier to read from than the original raw data itself.

# datasets is now a dictionary whose values are other dictionaries
datasets = {
    "beer": beer_details,
    "breweries": brewery_details
}

# This structure allows us to make self-documenting inquiries to both data sets
datasets["beer"]["columns"]
>>> ['', 'abv', 'ibu', 'id', 'name', 'style', 'brewery_id', 'ounces']

# Compare the above to how our older data dictionary would have been written
data["beer"][0]
>>> ['', 'abv', 'ibu', 'id', 'name', 'style', 'brewery_id', 'ounces']

The information embeded in the code is clear if we nest dictionaries within dictionaries. We've created a structure that easily describes the intent of the programmer. The following illustration breaks down the dictionary nesting.

Python Dictionary Tutorial

From here on out, we'll use datasets to manage our data sets and perform more data reorganization.

The beer and brewery dictionaries we made are a good start, but we can do more. We'd like to create a new key-value pair to contain a string description of what each data set contains in case we forget.

We can create dictionaries and read and change values of present key-value pairs, but we don't know how to insert a new key-value pair. Thankfully, inserting pairs is similar to reassigning dictionary values. If we assign a value to a key that doesn't exist in a dictionary, Python will take the new key and value and create the pair within the dictionary.

# The description key currently does not exist in either the inner dictionary
datasets["beer"]["description"] = "Contains data on beers and their qualities"
datasets["breweries"]["description"] = "Contains data on breweries and their locations"

While Python makes it easy to insert new pairs into the dictionary, it stops users if they try to access keys that don't exist. If you try to access a key that doesn't exist, Python will throw an error and stop your code from running.

# The key best_beer doesn't currently exist, so we cannot access it
datasets["beer"]["best_beer"]
>>> KeyError: 'best_beer'

As your dictionaries get more complex, it's easier to lose track of which keys are present. If you leave your code for a week and forget what's in your dictionary, you'll constantly run into KeyErrors. Thankfully, Python provides us with an easy way to check the present keys in a dictionary. This process is called membership checking.

Membership checking

If we want to check if a key exists within a dictionary, we can use the in operator. You can also check on whether a key doesn't exist by using not in. The resulting code reads almost like natural English, which also means it is easier to understand at first glance.

"beer" in datasets
>>> True

"wine" in datasets
>>> False

"wine" not in datasets
>>> True

Using in for membership checking has great utility in conjunction with if-else statements. This combination allows you to set up conditional logic that will prevent you from getting KeyErrors and enable you to make more sophisticated code. We won't delve too deeply into this concept, but there's resources at the end for the curious.

Section summary

At this point, we know how to create, read, update, and delete data from our dictionaries. We transformed our two raw data sets into dictionaries with greater readability and ease of use. With these basic dictionary operations, we can start performing more complex operations. For example, it is extremely common to want to loop over the key-value pairs and perform some operation on each pair.

Looping techniques

When we created the description key for each of the data sets, we made two individual statements to create each key-value pair. Since we performed the same operation, it would be more efficient to use loops.

Python provides three main methods to use dictionaries in loops: keys(), values(), and items(). Using keys() and values() allows us to loop over those parts of the dictionary.

for key in datasets.keys():
    print(key)
>>> beer
>>> breweries

for val in datasets.values():
    print(type(val))
>>> <class 'dict'>
>>> <class 'dict'>

The items() method combines both into one. When used in a loop, items() returns the key-value pairs as tuples. The first element of this tuple is the key, while the second is the value. We can use destructuring to get these elements into properly informative variable names. The first variable key will take the key in the tuple, while val will get the mapped value.

for key, val in datasets.items():
    print(f'The {key} data set has {len(val["data"])} rows.')
>>> The beer data set has 2410 rows.
>>> The breweries data set has 558 rows.

The above loop tells us that the beer data set is much bigger than the brewery data set. We would expect breweries to sell multiple types of beers, so there should be more beers than breweries overall. Our loop confirms this thought.

Currently, each of the data rows are a list, so referencing these elements by number is undesirable. Instead, we'll turn each of the data rows into its own dictionary, with the column name mapped to its actual value. This would make analyzing the data easier in the long run.

We should do this operation on both data sets, so we'll leverage our looping techniques.

# Perform this operation for both beers and breweries data sets
for k, v in datasets.items():
    
    # Initialize a key-value pair to hold our reformatted data
    v["data_as_dicts"] = []
    
    # For every data row, create a new dictionary based on column names
    for row in v["data"]:
        data_row_dict = dict(zip(v["columns"], row))
        v["data_as_dicts"].append(data_row_dict)

There's a lot going on above, so we'll slowly break it down.

We loop through datasets to ensure we transform both of the beer and breweries data.
Then, we create a new key called data_as_dicts mapped to an empty array which will hold our new dictionaries.
Then we start iterating over all the data, contained in the data key. zip() is a function that takes two or more lists and makes tuples based off these lists.
We take advantage of the zip() output and use dict() to create new data in our preferred form: column names mapped to their actual value.
Finally, we append it to data_as_dicts list. The end result is better formatted data that is easier to read and come back to repeatedly.

We can look at the end result below.

# The first data row in the beers data set
datasets["beer"]["data_as_dicts"][0]
>>> {'': '0',
    'abv': '0.05',
    'brewery_id': '408',
    'ibu': '',
    'id': '1436',
    'name': 'Pub Beer',
    'ounces': '12.0',
    'style': 'American Pale Lager'}
 
# The first data row in its original form
datasets["beer"]["raw_data"][0]
>>> ['0', '0.05', '408', '', '1436', 'Pub Beer', '12.0', 'American Pale Lager']

Section summary

In this section, we learned how to use dictionaries with the for loop. Using loops, we reformatted each data row into dictionaries for enhanced readability. Our future selves will thank us later when we look back at the code we've written. We're now set up to perform our final operation: matching all the beers to their respective breweries.

Each of the beers has a brewery that it originates from, given by the brewery_id key in both data sets. We will create a whole new data set that matches all the beers to their brewery. We could use loops to accomplish this, but we have access to an advanced dictionary operation that could turn this data transformation from a multi-line loop to a single line of code.

Dictionary comprehensions

Each beer in the beers data set was associated with a brewery_id, which is linked to a single brewery in breweries. Using this ID, we can pair up all of the beers with their brewery. It's generally a better idea to transform the raw data and place it in a new variable, rather than alter the raw data itself. Thus, we'll create another dictionary within datasets to hold our pairing. In this new dicitonary, the brewery name itself is the key, the mapped value will be a list containing the names of all of the beers the brewery offers, and we will match them based on the brewery_id data element.

We can perform this matching just fine with the looping techniques we learned previously, but there still remains one last dictionary aspect to teach. Instead of a loop, we can perform the matching succinctly using dictionary comprehension. A "comprehension" in computer science terms means to perform some task or function on all items of a collection (like a list). A dictionary comprehension is similar to a list comprehension in terms of syntax, but instead creates dictionaries from a base list. If you need a refresher on list comprehensions, you can check out this tutorial here.

To give a quick example, we'll use a dictionary comprehension to create a dictionary from a list of numbers.

nums = [1, 2, 3, 4, 5]

dict_comprehension = {
    str(n) : "The corresponding key is" + str(n) for n in nums
}

for val in dict_comprehension.values():
    print(val)
>>> The corresponding key is 1
The corresponding key is 2
The corresponding key is 3
The corresponding key is 4
The corresponding key is 5

We will dissect the dictionary comprehension code below:

Python Dictionary Tutorial

To create a dictionary comprehension, we wrap 3 elements around an opening and closing bracket:

A base list
What the key should be for each item from the base list
What the value should be for each item from the base list

nums forms the base list that the key-value pairs of dict_comprehension are based off of. The keys are stringified versions of each number (to differentiate it from list indexing), while the values are a string describing what the key is. This pet example is useless by itself, but serves to illustrate the somewhat complicated syntax of a dictionary comprehension.

Now that we know how a dictionary comprehension is composed, we will see its real utility when we apply it to our beer and breweries data set.

We only need a two aspects of the breweries data set to perform the matching:

The brewery name
The brewery ID

To start off, we'll create a list of tuples containing the name and ID for each brewery. Thanks to the reformatted data in the data_as_dicts key, this code is easy to write in a list comprehension.

# This list comprehension captures all of the brewery IDs and names from store
brewery_id_name_pairs = [
    (row["brewery_id"], row["name"]) for row in datasets["breweries"]["data_as_dicts"]
]

brewery_id_name_pairs is now a list of tuples and will form the base list of the dictionary comprehension. With this base list, we will use to the name of the brewery name as our key and a list comprehension as the value.

brewery_to_beers = {
        pair[1] : [b["name"] for b in datasets["beer"]["data_as_dicts"] if b["brewery_id"] == pair[0]] for pair in brewery_id_name_pairs
    }

Before we discuss how this monster works, it's worth taking some time to see what the actual result is.

# Confirming that a dictionary comprehension creates a dictionary
type(brewery_to_beers)
>>> <class 'dict'>

# Let's see what the Angry Orchard Cider Company (a personal favorite) makes
brewery_to_beers["Angry Orchard Cider Company"]
>>> ["Angry Orchard Apple Ginger", "Angry Orchard Crisp Apple", "Angry Orchard Crisp Apple"]

As we did with the simple example, we will highlight the crucial parts of this unwieldy (albeit interesting) dictionary comprehension.

Python Dictionary Tutorial

If we break apart the code and highlight the specific parts, the structure behind the code becomes more clear. The key is taken from the appropriate part of the brewery_id_name_pair. It is the mapped value that takes up most of the logic here. The value is a list comprehension with conditional logic. In plain English, the list comprehension will store any beers from the beer data when the beer's associated brewery_id matches the current brewery in the iteration.

Another illustration below lays out the code for the list comprehension by its purpose.

Python Dictionary Tutorial

Since we based the dictionary comprehension off of a list of all the breweries, the end result is what we wanted: a new dictionary that maps brewery names to all the beers that it sells! Now, we can just consult brewery_to_beers when we arrive at a brewery and find out instantly what they have!

This section had some complicated code, but it's wholly within your grasp. If you're still having trouble, keep reviewing the syntax and try to make your own dicitonary comprehensions. Before long, you'll have them in your coding arsenal.

We've covered a lot of ground on how to use dictionaries in this tutorial, but it's important to take a step back and look at why we might want to use (or not use) them.

Dictionary Advantages and Disadvantages

We've mentioned many times throughout that dictionaries increase the readability of our code. Being able to write out our own keys gives us flexibility and adds a layer of self-documentation. The less time it takes to understand what your code is doing, the easier it is to understand and debug and the faster you can implement your analyses.

Aside from the human-centered advantages, there are also speed advantages. Looking up a key in a dictionary is fast. Computer scientists can measure how long a computer task (ie looking up a key or running an algorithm) will take by seeing how many operations it will take to finish. They describe these times with Big-O notation.

Some tasks are fast and are done in constant time while more hefty tasks may require an exponential amount of operations and are done in polynomial time. In this case, looking up a key is done in constant time. Compare this to searching for the same item in a large list. The computer must look through each item in the list, so the time taken will scale with the length of the list. We call this linear time. If your list is exceptionally large, then looking for one item will take much longer than just assigning it to a key-value pair in a dictionary and looking for the key.

On a deeper level, a dictionary is an implementation of a hash table, an explanation of which is outside the scope of this article. What's important to know is that the benefits we dictionary are essentially the benefits of the hash table itself: speedy key look ups and membership checks.

We mentioned earlier that dictionaries are unordered, making them unsuitable data structures for data where order matters. Relative to other Python data structures, dictionaries take up a lot more space, especially when you have a large amount of keys. Given how cheap memory is, this disadvantage doesn't usually make itself apparent, but it's good to know about the overhead produced by dictionaries.

We've only discussed vanilla dictionaries, but there are other implementations in Python that add additional functionality. I've included a link for further reading at the end. I hope that after reading this article, you will be more comfortable using dictionaries and finding use for them in your own programming. Perhaps you have even found a beer you might want to try in the future!

PyCharm: PyCharm 2018.3 EAP 5

October 3, 2018, 5:30 am

≫ Next: Stack Abuse: Creating a Neural Network from Scratch in Python: Adding Hidden Layers

≪ Previous: Dataquest: Python Dictionary Tutorial

We’re excited to bring you the fifth release in the Early Access Program (EAP) for PyCharm 2018.3, this version comes with some great improvements. You can get it right now from our website.

New in This Version

F-String Improvements

One of the most used new features of Python 3.6 (and 3.7 of course) are F-strings, which allow you to easily interpolate variables in strings. We initially supported F-strings immediately when Python 3.6 was released, but this release comes with a great improvement. Due to the way that F-strings were interpreted, sometimes PyCharm wasn’t as fast when editing F-string as when editing other Python code. The new support for F-strings is a lot faster, try it yourself now!

Apart from the performance boost, a lot of issues were resolved. For example, multi-line f-strings should now work properly. Many refactoring operations (like extract variable) also work correctly with F-strings now.

Further Improvements

There was an issue where after upgrading PyCharm a project configured with Docker Compose may have issues starting. This was caused by an old version of the PyCharm helpers still being present in the Docker configuration, this has been resolved in this version.
Sometimes, you get bad JSON files, which is sad. Even sadder is when PyCharm freezes when you try to open it to fix it. We can’t prevent bad JSON from happening, but the freezing problem is fixed in this version.
PyCharm has sorted import statements in a case-sensitive way, from this version onward you can change this to be case-insensitive. If you prefer this, you can enable this behavior in the code style options.
And more, read the release notes here

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.

If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.

PyCharm 2018.3 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2018.3 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.

All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you’ll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.

↧

Stack Abuse: Creating a Neural Network from Scratch in Python: Adding Hidden Layers

October 3, 2018, 6:15 am

≫ Next: Real Python: Building and Documenting Python REST APIs With Flask and Connexion – Part 2

≪ Previous: PyCharm: PyCharm 2018.3 EAP 5

Introduction

In the previous article, we started our discussion about artificial neural networks; we saw how to create a simple neural network with one input and one output layer, from scratch in Python. Such a neural network is called a perceptron. However, real-world neural networks, capable of performing complex tasks such as image classification and stock market analysis, contain multiple hidden layers in addition to the input and output layer.

In the previous article, we concluded that a Perceptron is capable of finding linear decision boundary. We used perceptron to predict whether a person is diabetic or not using a toy dataset. However, a perceptron is not capable of finding non-linear decision boundaries.

In this article, we will build upon the concepts that we studied in Part 1 of this series and will develop a neural network with one input layer, one hidden layer, and one output layer. We will see that the neural network that we will develop will be capable of finding non-linear boundaries.

Note: If you are absolutely beginner to neural networks, you should read Part 1 of this series first. Once you are comfortable with the concepts explained in that article, you can come back and continue with this article.

Dataset

For this article, we need a non-linearly separable data. In other words, we need a dataset that cannot be classified using a straight line.

Luckily, Python's Scikit Learn library comes with a variety of tools that can be used to automatically generate different types of datasets.

Execute the following script to generate the dataset that we are going to use, in order to train and test our neural network.

from sklearn import datasets

np.random.seed(0)  
feature_set, labels = datasets.make_moons(100, noise=0.10)  
plt.figure(figsize=(10,7))  
plt.scatter(feature_set[:,0], feature_set[:,1], c=labels, cmap=plt.cm.winter)

In the script above we import the datasets class from the sklearn library. To create non-linear dataset of 100 data-points, we use the make_moons method and pass it 100 as the first parameter. The method returns a dataset, which when plotted contains two interleaving half circles, as shown in the figure below:

Moons dataset

You can clearly see that this data cannot be separated by a single straight line, hence the perceptron cannot be used to correctly classify this data.

Let's verify this concept. To do so, we'll use a simple perceptron with one input layer and one output layer (the one we created in the last article) and try to classify our "moons" dataset. Execute the following script:

from sklearn import datasets  
import numpy as np  
import matplotlib.pyplot as plt

np.random.seed(0)  
feature_set, labels = datasets.make_moons(100, noise=0.10)  
plt.figure(figsize=(10,7))  
plt.scatter(feature_set[:,0], feature_set[:,1], c=labels, cmap=plt.cm.winter)

labels = labels.reshape(100, 1)

def sigmoid(x):  
    return 1/(1+np.exp(-x))

def sigmoid_der(x):  
    return sigmoid(x) *(1-sigmoid (x))

np.random.seed(42)  
weights = np.random.rand(2, 1)  
lr = 0.5  
bias = np.random.rand(1)

for epoch in range(200000):  
    inputs = feature_set

    # feedforward step 1
    XW = np.dot(feature_set,weights) + bias

    # feedforward step 2
    z = sigmoid(XW)

    # backpropagation step 1
    error_out = ((1 / 2) * (np.power((z - labels), 2)))
    print(error_out.sum())

    error = z - labels

    # backpropagation step 2
    dcost_dpred = error
    dpred_dz = sigmoid_der(z) 

    z_delta = dcost_dpred * dpred_dz

    inputs = feature_set.T
    weights -= lr * np.dot(inputs, z_delta)

    for num in z_delta:
        bias -= lr * num

You will see that the value of mean squared error will not converge beyond 4.17 percent, no matter what you do. This indicates to us that we can't possibly correctly classify all points of the dataset using this perceptron, no matter what we do.

Neural Networks with One Hidden Layer

In this section, we will create a neural network with one input layer, one hidden layer, and one output layer. The architecture of our neural network will look like this:

Neural network with hidden layer

In the figure above, we have a neural network with 2 inputs, one hidden layer, and one output layer. The hidden layer has 4 nodes. The output layer has 1 node since we are solving a binary classification problem, where there can be only two possible outputs. This neural network architecture is capable of finding non-linear boundaries.

No matter how many nodes and hidden layers are there in the neural network, the basic working principle remains the same. You start with the feed-forward phase where inputs from the previous layer are multiplied with the corresponding weights and are passed through the activation function to get the final value for the corresponding node in the next layer. This process is repeated for all the hidden layers until the output is calculated. In the back-propagation phase, the predicted output is compared with the actual output and the cost of error is calculated. The purpose is to minimize the cost function.

This is pretty straight-forward if there is no hidden layer involved as we saw in the previous article.

However, if one or more hidden layers are involved, the process becomes a bit more complex because the error has to be propagated back to more than one layer since weights in all the layers are contributing towards the final output.

In this article, we will see how to perform feed-forward and back-propagation steps for the neural network having one or more hidden layers.

Feed Forward

For each record, we have two features "x1" and "x2". To calculate the values for each node in the hidden layer, we have to multiply the input with the corresponding weights of the node for which we are calculating the value. We then pass the dot product through an activation function to get the final value.

For instance to calculate the final value for the first node in the hidden layer, which is denoted by "ah1", you need to perform the following calculation:

$$ zh1 = x1w1 + x2w2
$$

$$ ah1 = \frac{\mathrm{1} }{\mathrm{1} + e^{-zh1} }
$$

This is the resulting value for the top-most node in the hidden layer. In the same way, you can calculate the values for the 2nd, 3rd, and 4th nodes of the hidden layer.

Similarly, to calculate the value for the output layer, the values in the hidden layer nodes are treated as inputs. Therefore, to calculate the output, multiply the values of the hidden layer nodes with their corresponding weights and pass the result through an activation function.

This operation can be mathematically expressed by the following equation:

$$ zo = a01w9 + a02w10 + a03w11 + a04w12
$$

$$ a0 = \frac{\mathrm{1} }{\mathrm{1} + e^{-z0} }
$$

Here "a0" is the final output of our neural network. Remember that the activation function that we are using is the sigmoid function, as we did in the previous article.

Note: For the sake of simplicity, we did not add a bias term to each weight. You will see that the neural network with hidden layer will perform better than the perceptron, even without the bias term.

Back Propagation

The feed forward step is relatively straight-forward. However, the back-propagation is not as straight-forward as it was in Part 1 of this series.

In the back-propagation phase, we will first define our loss function. We will be using the mean squared error cost function. It can be represented mathematically as:

$$ MSE =
\frac{\mathrm{1} }{\mathrm{n}} \sum\nolimits_{i=1}^{n} (predicted - observed)^{2} $$

Here n is the number of observations.

Phase 1

In the first phase of back propagation, we need to update weights of the output layer i.e w9, w10, w11, and w12. So for the time being, just consider that our neural network has the following part:

Back propagation phase 1

This looks similar to the perceptron that we developed in the last article. The purpose of the first phase of back propagation is to update weights w9, w10, w11, and w12 in such a way that the final error is minimized. This is an optimization problem where we have to find the function minima for our cost function.

To find the minima of a function, we can use the gradient decent algorithm. The gradient decent algorithm can be mathematically represented as follows:

$$ repeat \ until \ convergence: \begin{Bmatrix} w_j := w_j - \alpha \frac{\partial }{\partial w_j} J(w_0,w_1 ....... w_n) \end{Bmatrix} ............. (1) $$

The details regarding how gradient decent function minimizes the cost have already been discussed in the previous article. Here we will jus see the mathematical operations that we need to perform.

Our cost function is:

$$ MSE = \frac{\mathrm{1} }{\mathrm{n}} \sum\nolimits_{i=1}^{n}(predicted - observed)^{2}
$$

In our neural network, the predicted output is represented by "ao". Which means that we have to basically minimize this function:

$$ cost = \frac{\mathrm{1} }{\mathrm{n}} \sum\nolimits_{i=1}^{n}(ao - observed)^{2}
$$

From the previous article, we know that to minimize the cost function, we have to update weight values such that the cost decreases. To do so, we need to take derivative of the cost function with respect to each weight. Since in this phase, we are dealing with weights of the output layer, we need to differentiate cost function with respect to w9, w10, w11, and w2.

The differentiation of the cost function with respect to weights in the output layer can be mathematically represented as follows using the chain rule of differentiation.

$$ \frac {dcost}{dwo} = \frac {dcost}{dao} *, \frac {dao}{dzo} * \frac {dzo}{dwo} ...... (1) $$

Here "wo" refers to the weights in the output layer. The letter "d" at the start of each term refers to derivative.

Let's find the value for each expression in Equation 1.

Here,

$$ \frac {dcost}{dao} = \frac {2}{n} * (ao - labels) $$

Here 2 and n are constant. If we ignore them, we have the following equation.

$$ \frac {dcost}{dao} = (ao - labels) ........ (5) $$

Next, we can find "dao" with respect to "dzo" as follows:

$$ \frac {dao}{dzo} = sigmoid(zo) * (1-sigmoid(zo)) ........ (6) $$

Finally, we need to find "dzo" with respect to "dwo". The derivative is simply the inputs coming from the hidden layer as shown below:

$$ \frac {dzo}{dwo} = ah $$

Here "ah" refers to the 4 inputs from the hidden layers. Equation 1 can be used to find the updated weight values for the weights for the output layer. To find new weight values, the values returned by Equation 1 can be simply multiplied with the learning rate and subtracted from the current weight values. This is straight forward and we have done this previously.

Phase 2

In the previous section, we saw how we can find the updated values for the output layer weights i.e. w9, w10, w11, and 12. In this section, we will back-propagate our error to the previous layer and find the new weight values for hidden layer weights i.e. weights w1 to w8.

Let's collectively denote hidden layer weights as "wh". We basically have to differentiate the cost function with respect to "wh". Mathematically we can use chain rule of differentiation to represent it as:

$$ \frac {dcost}{dwh} = \frac {dcost}{dah} *, \frac {dah}{dzh} * \frac {dzh}{dwh} ...... (2) $$

Here again we will break Equation 2 into individual terms.

The first term "dcost" can be differentiated with respect to "dah" using the chain rule of differentiation as follows:

$$ \frac {dcost}{dah} = \frac {dcost}{dzo} *, \frac {dzo}{dah} ...... (3) $$

Let's again break the Equation 3 into individual terms. Using the chain rule again, we can differentiate "dcost" with respect to "dzo" as follows:

$$ \frac {dcost}{dzo} = \frac {dcost}{dao} *, \frac {dao}{dzo} ...... (4) $$

We have already calculated the value of dcost/dao in Equation 5 and dao/dzo in Equation 6.

Now we need to find dzo/dah from Equation 3. If we look at zo, it has the following value:

$$ zo = a01w9 + a02w10 + a03w11 + a04w12
$$

If we differentiate it with respect to all inputs from the hidden layer, denoted by "ao", then we are left with all the weights from the output layer, denoted by "wo". Therefore,

$$ \frac {dzo}{dah} = wo ...... (7) $$

Now we can find the value of dcost/dah by replacing the values from Equations 7 and 4 in Equation 3.

Coming back to Equation 2, we have yet to find dah/dzh and dzh/dwh.

The first term dah/dzh can be calculated as:

$$ \frac {dah}{dzh} = sigmoid(zh) * (1-sigmoid(zh)) ........ (8) $$

And finally, dzh/dwh is simply the input values:

$$ \frac {dzh}{dwh} = input features ........ (9) $$

If we replace the values from Equations 3, 8 and 9 in Equation 3, we can get the updated matrix for the hidden layer weights. To find new weight values for the hidden layer weights "wh", the values returned by Equation 2 can be simply multiplied with the learning rate and subtracted from the current weight values. And that's pretty much it.

The equations may look exhausting to you since there are a lot of calculations being performed. However, if you look at them closely, there are just two operations being performed in a chain: derivations and multiplications.

One of the reasons that neural networks are slower than the other machine learning algorithms is the fact that lots of computations are being performed at the back end. Our neural network had just one hidden layer with four nodes, two inputs and one output, yet we had to perform lengthy derivation and multiplication operations, in order to update the weights for a single iteration. In real world, neural networks can have hundreds of layers with hundreds of inputs and output values. Therefore, neural networks execute slowly.

Code for Neural Networks with One Hidden Layer

Now let's implement the neural network that we just discussed in Python from scratch. You will clearly see the correspondence between the code snippets and the theory that we discussed in the previous section. We will again try to classify the non-linear data that we created in the Dataset section of the article. Take a look at the following script.

# -*- coding: utf-8 -*-
"""
Created on Tue Sep 25 13:46:08 2018

@author: usman
"""

from sklearn import datasets  
import numpy as np  
import matplotlib.pyplot as plt

np.random.seed(0)  
feature_set, labels = datasets.make_moons(100, noise=0.10)  
plt.figure(figsize=(10,7))  
plt.scatter(feature_set[:,0], feature_set[:,1], c=labels, cmap=plt.cm.winter)

labels = labels.reshape(100, 1)

def sigmoid(x):  
    return 1/(1+np.exp(-x))

def sigmoid_der(x):  
    return sigmoid(x) *(1-sigmoid (x))

wh = np.random.rand(len(feature_set[0]),4)  
wo = np.random.rand(4, 1)  
lr = 0.5

for epoch in range(200000):  
    # feedforward
    zh = np.dot(feature_set, wh)
    ah = sigmoid(zh)

    zo = np.dot(ah, wo)
    ao = sigmoid(zo)

    # Phase1 =======================

    error_out = ((1 / 2) * (np.power((ao - labels), 2)))
    print(error_out.sum())

    dcost_dao = ao - labels
    dao_dzo = sigmoid_der(zo) 
    dzo_dwo = ah

    dcost_wo = np.dot(dzo_dwo.T, dcost_dao * dao_dzo)

    # Phase 2 =======================

    # dcost_w1 = dcost_dah * dah_dzh * dzh_dw1
    # dcost_dah = dcost_dzo * dzo_dah
    dcost_dzo = dcost_dao * dao_dzo
    dzo_dah = wo
    dcost_dah = np.dot(dcost_dzo , dzo_dah.T)
    dah_dzh = sigmoid_der(zh) 
    dzh_dwh = feature_set
    dcost_wh = np.dot(dzh_dwh.T, dah_dzh * dcost_dah)

    # Update Weights ================

    wh -= lr * dcost_wh
    wo -= lr * dcost_wo

In the script above we start by importing the desired libraries and then we create our dataset. Next, we define the sigmoid function along with its derivative. We then initialize the hidden layer and output layer weights with random values. The learning rate is 0.5. I tried different learning rates and found that 0.5 is a good value.

We then execute the algorithm for 2000 epochs. Inside each epoch, we first perform the feed-forward operation. The code snippet for the feed forward operation is as follows:

zh = np.dot(feature_set, wh)  
ah = sigmoid(zh)

zo = np.dot(ah, wo)  
ao = sigmoid(zo)

As discussed in theory section, back propagation consists of two phases. In the first phase, the gradients for the output layer weights are calculated. The following script executes in the first phase of the back-propagation.

error_out = ((1 / 2) * (np.power((ao - labels), 2)))  
print(error_out.sum())

dcost_dao = ao - labels  
dao_dzo = sigmoid_der(zo)  
dzo_dwo = ah

dcost_wo = np.dot(dzo_dwo.T, dcost_dao * dao_dzo)

In the second phase, the gradients for the hidden layer weights are calculated. The following script executes in the second phase of the back-propagation.

dcost_dzo = dcost_dao * dao_dzo  
dzo_dah = wo  
dcost_dah = np.dot(dcost_dzo , dzo_dah.T)  
dah_dzh = sigmoid_der(zh)  
dzh_dwh = feature_set  
dcost_wh = np.dot( dzh_dwh.T, dah_dzh * dcost_dah)

Finally, the weights are updated in the following script:

wh -= lr * dcost_wh  
wo -= lr * dcost_wo

When the above script executes, you will see minimum mean squared error value of 1.50 which is less than our previous mean squared error of 4.17, which was obtained using the perceptron. This shows that the neural network with hidden layers performs better in the case of non-linearly separable data.

Conclusion

In this article, we saw how we can create a neural network with 1 hidden layer, from scratch in Python. We saw how our neural network outperformed a neural network with no hidden layers for the binary classification of non-linear data.

However, we may need to classify data into more than two categories. In our next article, we will see how to create a neural network from scratch in Python for multi-class classification problems.

↧

Real Python: Building and Documenting Python REST APIs With Flask and Connexion – Part 2

October 3, 2018, 7:00 am

≫ Next: Will Kahn-Greene: Bleach v3.0.0 released!

≪ Previous: Stack Abuse: Creating a Neural Network from Scratch in Python: Adding Hidden Layers

In Part 1 of this series, you used Flask and Connexion to create a REST API providing CRUD operations to a simple in-memory structure called PEOPLE. That worked to demonstrate how the Connexion module helps you build a nice REST API along with interactive documentation.

As some noted in the comments for Part 1, the PEOPLE structure is re-initialized every time the application is restarted. In this article, you’ll learn how to store the PEOPLE structure, and the actions the API provides, to a database using SQLAlchemy and Marshmallow.

SQLAlchemy provides an Object Relational Model (ORM), which stores Python objects to a database representation of the object’s data. That can help you continue to think in a Pythonic way and not be concerned with how the object data will be represented in a database.

Marshmallow provides functionality to serialize and deserialize Python objects as they flow out of and into our JSON-based REST API. Marshmallow converts Python class instances to objects that can be converted to JSON.

You can find the Python code for this article here.

Free Bonus:Click here to download a copy of the "REST API Examples" Guide and get a hands-on introduction to Python + REST API principles with actionable examples.

Who This Article Is For

If you enjoyed Part 1 of this series, this article expands your tool belt even further. You’ll be using SQLAlchemy to access a database in a more Pythonic way than straight SQL. You’ll also use Marshmallow to serialize and deserialize the data managed by the REST API. To do this, you’ll be making use of basic Object Oriented Programming features available in Python.

You’ll also be using SQLAlchemy to create a database as well as interact with it. This is necessary to get the REST API up and running with the PEOPLE data used in Part 1.

The web application presented in Part 1 will have its HTML and JavaScript files modified in minor ways in order to support the changes as well. You can review the final version of the code from Part 1 here.

Additional Dependencies

Before you get started building this new functionality, you’ll need to update the virtualenv you created in order to run the Part 1 code, or create a new one for this project. The easiest way to do that after you have activated your virtualenv is to run this command:

$ pip install Flask-SQLAlchemy flask-marshmallow marshmallow-sqlalchemy marshmallow

This adds more functionality to your virtualenv:

Flask-SQLAlchemy adds SQLAlchemy, along with some tie-ins to Flask, allowing programs to access databases.
flask-marshmallow adds the Flask parts of Marshmallow, which lets programs convert Python objects to and from serializable structures.
marshmallow-sqlalchemy adds some Marshmallow hooks into SQLAlchemy to allow programs to serialize and deserialize Python objects generated by SQLAlchemy.
marshmallow adds the bulk of the Marshmallow functionality.

People Data

As mentioned above, the PEOPLE data structure in the previous article is an in-memory Python dictionary. In that dictionary, you used the person’s last name as the lookup key. The data structure looked like this in the code:

# Data to serve with our APIPEOPLE={"Farrell":{"fname":"Doug","lname":"Farrell","timestamp":get_timestamp()},"Brockman":{"fname":"Kent","lname":"Brockman","timestamp":get_timestamp()},"Easter":{"fname":"Bunny","lname":"Easter","timestamp":get_timestamp()}}

The modifications you’ll make to the program will move all the data to a database table. This means the data will be saved to your disk and will exist between runs of the server.py program.

Because the last name was the dictionary key, the code restricted changing a person’s last name: only the first name could be changed. In addition, moving to a database will allow you to change the last name as it will no longer be used as the lookup key for a person.

Conceptually, a database table can be thought of as a two-dimensional array where the rows are records, and the columns are fields in those records.

Database tables usually have an auto-incrementing integer value as the lookup key to rows. This is called the primary key. Each record in the table will have a primary key whose value is unique across the entire table. Having a primary key independent of the data stored in the table frees you to modify any other field in the row.

Note:

The auto-incrementing primary key means that the database takes care of:

Incrementing the largest existing primary key field every time a new record is inserted in the table
Using that value as the primary key for the newly inserted data

This guarantees a unique primary key as the table grows.

You’re going to follow a database convention of naming the table as singular, so the table will be called person. Translating our PEOPLE structure above into a database table named person gives you this:

person_id	lname	fname	timestamp
1	Farrell	Doug	2018-08-08 21:16:01.888444
2	Brockman	Kent	2018-08-08 21:16:01.889060
3	Easter	Bunny	2018-08-08 21:16:01.886834

Each column in the table has a field name as follows:

person_id: primary key field for each person
lname: last name of the person
fname: first name of the person
timestamp: timestamp associated with insert/update actions

Database Interaction

You’re going to use SQLite as the database engine to store the PEOPLE data. SQLite is the mostly widely distributed database in the world, and it comes with Python for free. It’s fast, performs all its work using files, and is suitable for a great many projects. It’s a complete RDBMS (Relational Database Management System) that includes SQL, the language of many database systems.

For the moment, imagine the person table already exists in a SQLite database. If you’ve had any experience with RDBMS, you’re probably aware of SQL, the Structured Query Language most RDBMSes use to interact with the database.

Unlike programming languages like Python, SQL doesn’t define how to get the data: it describes what data is desired, leaving the how up to the database engine.

A SQL query getting all of the data in our person table, sorted by last name, would look this this:

SELECT*FROMpersonORDERBY'lname';

This query tells the database engine to get all the fields from the person table and sort them in the default, ascending order using the lname field.

If you were to run this query against a SQLite database containing the person table, the results would be a set of records containing all the rows in the table, with each row containing the data from all the fields making up a row. Below is an example using the SQLite command line tool running the above query against the person database table:

sqlite> SELECT * FROM person ORDER BY lname;2|Brockman|Kent|2018-08-08 21:16:01.8884443|Easter|Bunny|2018-08-08 21:16:01.8890601|Farrell|Doug|2018-08-08 21:16:01.886834

The output above is a list of all the rows in the person database table with pipe characters (‘|’) separating the fields in the row, which is done for display purposes by SQLite.

Python is completely capable of interfacing with many database engines and executing the SQL query above. The results would most likely be a list of tuples. The outer list contains all the records in the person table. Each individual inner tuple would contain all the data representing each field defined for a table row.

Getting data this way isn’t very Pythonic. The list of records is okay, but each individual record is just a tuple of data. It’s up to the program to know the index of each field in order to retrieve a particular field. The following Python code uses SQLite to demonstrate how to run the above query and display the data:

 1 importsqlite3 2  3 conn=sqlite3.connect('people.db') 4 cur=conn.cursor() 5 cur.execute('SELECT * FROM person ORDER BY lname') 6 people=cur.fetchall() 7 forpersoninpeople: 8 print(f'{person[2]}{person[1]}')

The program above does the following:

Line 1 imports the sqlite3 module.
Line 3 creates a connection to the database file.
Line 4 creates a cursor from the connection.
Line 5 uses the cursor to execute a SQL query expressed as a string.
Line 6 gets all the records returned by the SQL query and assigns them to the people variable.
Line 7 & 8 iterate over the people list variable and print out the first and last name of each person.

The people variable from Line 6 above would look like this in Python:

people=[(2,'Brockman','Kent','2018-08-08 21:16:01.888444'),(3,'Easter','Bunny','2018-08-08 21:16:01.889060'),(1,'Farrell','Doug','2018-08-08 21:16:01.886834')]

The output of the program above looks like this:

Kent BrockmanBunny EasterDoug Farrell

In the above program, you have to know that a person’s first name is at index 2, and a person’s last name is at index 1. Worse, the internal structure of person must also be known whenever you pass the iteration variable person as a parameter to a function or method.

It would be much better if what you got back for person was a Python object, where each of the fields is an attribute of the object. This is one of the things SQLAlchemy does.

Little Bobby Tables

In the above program, the SQL statement is a simple string passed directly to the database to execute. In this case, that’s not a problem because the SQL is a string literal completely under the control of the program. However, the use case for your REST API will take user input from the web application and use it to create SQL queries. This can open your application to attack.

You’ll recall from Part 1 that the REST API to get a single person from the PEOPLE data looked like this:

GET /api/people/{lname}

This means your API is expecting a variable, lname, in the URL endpoint path, which it uses to find a single person. Modifying the Python SQLite code from above to do this would look something like this:

 1 lname='Farrell' 2 cur.execute('SELECT * FROM person WHERE lname = \'{}\''.format(lname))

The above code snippet does the following:

Line 1 sets the lname variable to 'Farrell'. This would come from the REST API URL endpoint path.
Line 2 uses Python string formatting to create a SQL string and execute it.

To keep things simple, the above code sets the lname variable to a constant, but really it would come from the API URL endpoint path and could be anything supplied by the user. The SQL generated by the string formatting looks like this:

SELECT*FROMpersonWHERElname='Farrell'

When this SQL is executed by the database, it searches the person table for a record where the last name is equal to 'Farrell'. This is what’s intended, but any program that accepts user input is also open to malicious users. In the program above, where the lname variable is set by user-supplied input, this opens your program to what’s called a SQL Injection Attack. This is what’s affectionately known as Little Bobby Tables:

Image: xkcd.com

For example, imagine a malicious user called your REST API in this way:

GET /api/people/Farrell');DROP TABLE person;

The REST API request above sets the lname variable to 'Farrell');DROP TABLE person;', which in the code above would generate this SQL statement:

SELECT*FROMpersonWHERElname='Farrell');DROPTABLEperson;

The above SQL statement is valid, and when executed by the database it will find one record where lname matches 'Farrell'. Then, it will find the SQL statement delimiter character ; and will go right ahead and drop the entire table. This would essentially wreck your application.

You can protect your program by sanitizing all data you get from users of your application. Sanitizing data in this context means having your program examine the user-supplied data and making sure it doesn’t contain anything dangerous to the program. This can be tricky to do right and would have to be done everywhere user data interacts with the database.

There’s another way that’s much easier: use SQLAlchemy. It will sanitize user data for you before creating SQL statements. It’s another big advantage and reason to use SQLAlchemy when working with databases.

Modeling Data With SQLAlchemy

SQLAlchemy is a big project and provides a lot of functionality to work with databases using Python. One of the things it provides is an ORM, or Object Relational Mapper, and this is what you’re going to use to create and work with the person database table. This allows you to map a row of fields from the database table to a Python object.

Object Oriented Programming allows you to connect data together with behavior, the functions that operate on that data. By creating SQLAlchemy classes, you’re able to connect the fields from the database table rows to behavior, allowing you to interact with the data. Here’s the SQLAlchemy class definition for the data in the person database table:

classPerson(db.Model):__tablename__='person'person_id=db.Column(db.Integer,primary_key=True)lname=db.Column(db.String)fname=db.Column(db.String)timestamp=db.Column(db.DateTime,default=datetime.utcnow,onupdate=datetime.utcnow)

The class Person inherits from db.Model, which you’ll get to when you start building the program code. For now, it means you’re inheriting from a base class called Model, providing attributes and functionality common to all classes derived from it.

The rest of the definitions are class-level attributes defined as follows:

__tablename__ = 'person' connects the class definition to the person database table.
person_id = db.Column(db.Integer, primary_key=True) creates a database column containing an integer acting as the primary key for the table. This also tells the database that person_id will be an autoincrementing Integer value.
lname = db.Column(db.String) creates the last name field, a database column containing a string value.
fname = db.Column(db.String) creates the first name field, a database column containing a string value.
timestamp = db.Column(db.DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) creates a timestamp field, a database column containing a date/time value. The default=datetime.utcnow parameter defaults the timestamp value to the current utcnow value when a record is created. The onupdate=datetime.utcnow parameter updates the timestamp with the current utcnow value when the record is updated.

Note: UTC Timestamps

You might be wondering why the timestamp in the above class defaults to and is updated by the datetime.utcnow() method, which returns a UTC, or Coordinated Universal Time. This is a way of standardizing your timestamp’s source.

The source, or zero time, is a line running north and south from the Earth’s north to south pole through the UK. This is the zero time zone from which all other time zones are offset. By using this as the zero time source, your timestamps are offsets from this standard reference point.

Should your application be accessed from different time zones, you have a way to perform date/time calculations. All you need is a UTC timestamp and the destination time zone.

If you were to use local time zones as your timestamp source, then you couldn’t perform date/time calculations without information about the local time zones offset from zero time. Without the timestamp source information, you couldn’t do any date/time comparisons or math at all.

Working with a timestamps based on UTC is a good standard to follow. Here’s a toolkit site to work with and better understand them.

Where are you heading with this Person class definition? The end goal is to be able to run a query using SQLAlchemy and get back a list of instances of the Person class. As an example, let’s look at the previous SQL statement:

SELECT*FROMpeopleORDERBYlname;

Show the same small example program from above, but now using SQLAlchemy:

 1 frommodelsimportPerson 2  3 people=Person.query.order_by(Person.lname).all() 4 forpersoninpeople: 5 print(f'{person.fname}{person.lname}')

Ignoring line 1 for the moment, what you want is all the person records sorted in ascending order by the lname field. What you get back from the SQLAlchemy statements Person.query.order_by(Person.lname).all() is a list of Person objects for all records in the person database table in that order. In the above program, the people variable contains the list of Person objects.

The program iterates over the people variable, taking each person in turn and printing out the first and last name of the person from the database. Notice the program doesn’t have to use indexes to get the fname or lname values: it uses the attributes defined on the Person object.

Using SQLAlchemy allows you to think in terms of objects with behavior rather than raw SQL. This becomes even more beneficial when your database tables become larger and the interactions more complex.

Serializing/Deserializing Modeled Data

Working with SQLAlchemy modeled data inside your programs is very convenient. It is especially convenient in programs that manipulate the data, perhaps making calculations or using it to create presentations on screen. Your application is a REST API essentially providing CRUD operations on the data, and as such it doesn’t perform much data manipulation.

The REST API works with JSON data, and here you can run into an issue with the SQLAlchemy model. Because the data returned by SQLAlchemy are Python class instances, Connexion can’t serialize these class instances to JSON formatted data. Remember from Part 1 that Connexion is the tool you used to design and configure the REST API using a YAML file, and connect Python methods to it.

In this context, serializing means converting Python objects, which can contain other Python objects and complex data types, into simpler data structures that can be parsed into JSON datatypes, which are listed here:

string: a string type
number: numbers supported by Python (integers, floats, longs)
object: a JSON object, which is roughly equivalent to a Python dictionary
array: roughly equivalent to a Python List
boolean: represented in JSON as true or false, but in Python as True or False
null: essentially a None in Python

As an example, your Person class contains a timestamp, which is a Python DateTime. There is no date/time definition in JSON, so the timestamp has to be converted to a string in order to exist in a JSON structure.

Your Person class is simple enough so getting the data attributes from it and creating a dictionary manually to return from our REST URL endpoints wouldn’t be very hard. In a more complex application with many larger SQLAlchemy models, this wouldn’t be the case. A better solution is to use a module called Marshmallow to do the work for you.

Marshmallow helps you to create a PersonSchema class, which is like the SQLAlchemy Person class we created. Here however, instead of mapping database tables and field names to the class and its attributes, the PersonSchema class defines how the attributes of a class will be converted into JSON-friendly formats. Here’s the Marshmallow class definition for the data in our person table:

classPersonSchema(ma.ModelSchema):classMeta:model=Personsqla_session=db.session

The class PersonSchema inherits from ma.ModelSchema, which you’ll get to when you start building the program code. For now, this means PersonSchema is inheriting from a Marshmallow base class called ModelSchema, providing attributes and functionality common to all classes derived from it.

The rest of the definition is as follows:

class Meta defines a class named Meta within your class. The ModelSchema class that the PersonSchema class inherits from looks for this internal Meta class and uses it to find the SQLAlchemy model Person and the db.session. This is how Marshmallow finds attributes in the Person class and the type of those attributes so it knows how to serialize/deserialize them.
model tells the class what SQLAlchemy model to use to serialize/deserialize data to and from.
db.session tells the class what database session to use to introspect and determine attribute data types.

Where are you heading with this class definition? You want to be able to serialize an instance of a Person class into JSON data, and to deserialize JSON data and create a Person class instances from it.

Create the Initialized Database

SQLAlchemy handles many of the interactions specific to particular databases and lets you focus on the data models as well as how to use them.

Now that you’re actually going to create a database, as mentioned before, you’ll use SQLite. You’re doing this for a couple of reasons. It comes with Python and doesn’t have to be installed as a separate module. It saves all of the database information in a single file and is therefore easy to set up and use.

Installing a separate database server like MySQL or PostgreSQL would work fine but would require installing those systems and getting them up and running, which is beyond the scope of this article.

Because SQLAlchemy handles the database, in many ways it really doesn’t matter what the underlying database is.

You’re going to create a new utility program called build_database.py to create and initialize the SQLite people.db database file containing your person database table. Along the way, you’ll create two Python modules, config.py and models.py, which will be used by build_database.py and the modified server.py from Part 1.

Here’s where you can find the source code for the modules you’re about to create, which are introduced here:

config.py gets the necessary modules imported into the program and configured. This includes Flask, Connexion, SQLAlchemy, and Marshmallow. Because it will be used by both build_database.py and server.py, some parts of the configuation will only apply to the server.py application.
models.py is the module where you’ll create the Person SQLAlchemy and PersonSchema Marshmallow class definitions described above. This module is dependent on config.py for some of the objects created and configured there.

Config Module

The config.py module, as the name implies, is where all of the configuration information is created and initialized. We’re going to use this module for both our build_database.py program file and the soon to be updated server.py file from the Part 1 article. This means we’re going to configure Flask, Connexion, SQLAlchemy, and Marshmallow here.

Even though the build_database.py program doesn’t make use of Flask, Connexion, or Marshmallow, it does use SQLAlchemy to create our connection to the SQLite database. Here is the code for the config.py module:

 1 importos 2 importconnexion 3 fromflask_sqlalchemyimportSQLAlchemy 4 fromflask_marshmallowimportMarshmallow 5  6 basedir=os.path.abspath(os.path.dirname(__file__)) 7  8 # Create the Connexion application instance 9 connex_app=connexion.App(__name__,specification_dir=basedir)10 11 # Get the underlying Flask app instance12 app=connex_app.app13 14 # Configure the SQLAlchemy part of the app instance15 app.config['SQLALCHEMY_ECHO']=True16 app.config['SQLALCHEMY_DATABASE_URI']='sqlite:////'+os.path.join(basedir,'people.db')17 app.config['SQLALCHEMY_TRACK_MODIFICATIONS']=False18 19 # Create the SQLAlchemy db instance20 db=SQLAlchemy(app)21 22 # Initialize Marshmallow23 ma=Marshmallow(app)

Here’s what the above code is doing:

Lines 2 – 4 import Connexion as you did in the server.py program from Part 1. It also imports SQLAlchemy from the flask_sqlalchemy module. This gives your program database access. Lastly, it imports Marshmallow from the flask_marshamllow module.
Line 6 creates the variable basedir pointing to the directory the program is running in.
Line 9 uses the basedir variable to create the Connexion app instance and give it the path to the swagger.yml file.
Line 12 creates a variable app, which is the Flask instance initialized by Connexion.
Lines 15 uses the app variable to configure values used by SQLAlchemy. First it sets SQLALCHEMY_ECHO to True. This causes SQLAlchemy to echo SQL statements it executes to the console. This is very useful to debug problems when building database programs. Set this to False for production environments.
Line 16 sets SQLALCHEMY_DATABASE_URI to sqlite:////' + os.path.join(basedir, 'people.db'). This tells SQLAlchemy to use SQLite as the database, and a file named people.db in the current directory as the database file. Different database engines, like MySQL and PostgreSQL, will have different SQLALCHEMY_DATABASE_URI strings to configure them.
Line 17 sets SQLALCHEMY_TRACK_MODIFICATIONS to False, turning off the SQLAlchemy event system, which is on by default. The event system generates events useful in event-driven programs but adds significant overhead. Since you’re not creating an event-driven program, turn this feature off.
Line 19 creates the db variable by calling SQLAlchemy(app). This initializes SQLAlchemy by passing the app configuration information just set. The db variable is what’s imported into the build_database.py program to give it access to SQLAlchemy and the database. It will serve the same purpose in the server.py program and people.py module.
Line 23 creates the ma variable by calling Marshmallow(app). This initializes Marshmallow and allows it to introspect the SQLAlchemy components attached to the app. This is why Marshmallow is initialized after SQLAlchemy.

Models Module

The models.py module is created to provide the Person and PersonSchema classes exactly as described in the sections above about modeling and serializing the data. Here is the code for that module:

 1 fromdatetimeimportdatetime 2 fromconfigimportdb,ma 3  4 classPerson(db.Model): 5 __tablename__='person' 6 person_id=db.Column(db.Integer,primary_key=True) 7 lname=db.Column(db.String(32),index=True) 8 fname=db.Column(db.String(32)) 9 timestamp=db.Column(db.DateTime,default=datetime.utcnow,onupdate=datetime.utcnow)10 11 classPersonSchema(ma.ModelSchema):12 classMeta:13 model=Person14 sqla_session=db.session

Here’s what the above code is doing:

Line 1 imports the datatime object from the datetime module that comes with Python. This gives you a way to create a timestamp in the Person class.
Line 2 imports the db and ma instance variables defined in the config.py module. This gives the module access to SQLAlchemy attributes and methods attached to the db variable, and the Marshmallow attributes and methods attached to the ma variable.
Lines 4 – 9 define the Person class as discussed in the data modeling section above, but now you know where the db.Model that the class inherits from originates. This gives the Person class SQLAlchemy features, like a connection to the database and access to its tables.
Lines 11 – 14 define the PersonSchema class as was discussed in the data serialzation section above. This class inherits from ma.ModelSchema and gives the PersonSchema class Marshmallow features, like introspecting the Person class to help serialize/deserialize instances of that class.

Creating the Database

You’ve seen how database tables can be mapped to SQLAlchemy classes. Now use what you’ve learned to create the database and populate it with data. You’re going to build a small utility program to create and build the database with the People data. Here’s the build_database.py program:

 1 importos 2 fromconfigimportdb 3 frommodelsimportPerson 4  5 # Data to initialize database with 6 PEOPLE=[ 7 {'fname':'Doug','lname':'Farrell'}, 8 {'fname':'Kent','lname':'Brockman'}, 9 {'fname':'Bunny','lname':'Easter'}10 ]11 12 # Delete database file if it exists currently13 ifos.path.exists('people.db'):14 os.remove('people.db')15 16 # Create the database17 db.create_all()18 19 # Iterate over the PEOPLE structure and populate the database20 forpersoninPEOPLE:21 p=Person(lname=person['lname'],fname=person['fname'])22 db.session.add(p)23 24 db.session.commit()

Here’s what the above code is doing:

Line 2 imports the db instance from the config.py module.
Line 3 imports the Person class definition from the models.py module.
Lines 6 – 10 create the PEOPLE data structure, which is a list of dictionaries containing your data. The structure has been condensed to save presentation space.
Lines 13 & 14 perform some simple housekeeping to delete the people.db file, if it exists. This file is where the SQLite database is maintained. If you ever have to re-initialize the database to get a clean start, this makes sure you’re starting from scratch when you build the database.
Line 17 creates the database with the db.create_all() call. This creates the database by using the db instance imported from the config module. The db instance is our connection to the database.
Lines 20 – 22 iterate over the PEOPLE list and use the dictionaries within to instantiate a Person class. After it is instantiated, you call the db.session.add(p) function. This uses the database connection instance db to access the session object. The session is what manages the database actions, which are recorded in the session. In this case, you are executing the add(p) method to add the new Person instance to the session object.
Line 24 calls db.session.commit() to actually save all the person objects created to the database.

Note: At Line 22, no data has been added to the database. Everything is being saved within the session object. Only when you execute the db.session.commit() call at Line 24 does the session interact with the database and commit the actions to it.

In SQLAlchemy, the session is an important object. It acts as the conduit between the database and the SQLAclchemy Python objects created in a program. The session helps maintain the consistency between data in the program and the same data as it exists in the database. It saves all database actions and will update the underlying database accordingly by both explicit and implicit actions taken by the program.

Now you’re ready to run the build_database.py program to create and initialize the new database. You do so with the following command, with your Python virtual environment active:

python build_database.py

When the program runs, it will print SQLAlchemy log messages to the console. These are the result of setting SQLALCHEMY_ECHO to True in the config.py file. Much of what’s being logged by SQLAlchemy is the SQL commands it’s generating to create and build the people.db SQLite database file. Here’s an example of what’s printed out when the program is run:

2018-09-11 22:20:29,951 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_12018-09-11 22:20:29,951 INFO sqlalchemy.engine.base.Engine ()2018-09-11 22:20:29,952 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_12018-09-11 22:20:29,952 INFO sqlalchemy.engine.base.Engine ()2018-09-11 22:20:29,956 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("person")2018-09-11 22:20:29,956 INFO sqlalchemy.engine.base.Engine ()2018-09-11 22:20:29,959 INFO sqlalchemy.engine.base.Engine CREATE TABLE person (    person_id INTEGER NOT NULL,     lname VARCHAR,     fname VARCHAR,     timestamp DATETIME,     PRIMARY KEY (person_id))2018-09-11 22:20:29,959 INFO sqlalchemy.engine.base.Engine ()2018-09-11 22:20:29,975 INFO sqlalchemy.engine.base.Engine COMMIT2018-09-11 22:20:29,980 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)2018-09-11 22:20:29,983 INFO sqlalchemy.engine.base.Engine INSERT INTO person (lname, fname, timestamp) VALUES (?, ?, ?)2018-09-11 22:20:29,983 INFO sqlalchemy.engine.base.Engine ('Farrell', 'Doug', '2018-09-12 02:20:29.983143')2018-09-11 22:20:29,984 INFO sqlalchemy.engine.base.Engine INSERT INTO person (lname, fname, timestamp) VALUES (?, ?, ?)2018-09-11 22:20:29,985 INFO sqlalchemy.engine.base.Engine ('Brockman', 'Kent', '2018-09-12 02:20:29.984821')2018-09-11 22:20:29,985 INFO sqlalchemy.engine.base.Engine INSERT INTO person (lname, fname, timestamp) VALUES (?, ?, ?)2018-09-11 22:20:29,985 INFO sqlalchemy.engine.base.Engine ('Easter', 'Bunny', '2018-09-12 02:20:29.985462')2018-09-11 22:20:29,986 INFO sqlalchemy.engine.base.Engine COMMIT

Using the Database

Once the database has been created, you can modify the existing code from Part 1 to make use of it. All of the modifications necessary are due to creating the person_id primary key value in our database as the unique identifier rather than the lname value.

Update the REST API

None of the changes are very dramatic, and you’ll start by re-defining the REST API. The list below shows the API definition from Part 1 but is updated to use the person_id variable in the URL path:

Action	HTTP Verb	URL Path	Description
Create	`POST`	`/api/people`	Defines a unique URL to create a new person
Read	`GET`	`/api/people`	Defines a unique URL to read a collection of people
Read	`GET`	`/api/people/{person_id}`	Defines a unique URL to read a particular person by `person_id`
Update	`PUT`	`/api/people/{person_id}`	Defines a unique URL to update an existing person by `person_id`
Delete	`DELETE`	`/api/orders/{person_id}`	Defines a unique URL to delete an existing person by `person_id`

Where the URL definitions required an lname value, they now require the person_id (primary key) for the person record in the people table. This allows you to remove the code in the previous app that artificially restricted users from editing a person’s last name.

In order for you to implement these changes, the swagger.yml file from Part 1 will have to be edited. For the most part, any lname parameter value will be changed to person_id, and person_id will be added to the POST and PUT responses. You can check out the updated swagger.yml file.

Update the REST API Handlers

With the swagger.yml file updated to support the use of the person_id identifier, you’ll also need to update the handlers in the people.py file to support these changes. In the same way that the swagger.yml file was updated, you need to change the people.py file to use the person_id value rather than lname.

Here’s part of the updated person.py module showing the handler for the REST URL endpoint GET /api/people:

 1 fromflaskimport( 2 make_response, 3 abort, 4 ) 5 fromconfigimportdb 6 frommodelsimport( 7 Person, 8 PersonSchema, 9 )10 11 defread_all():12 """13     This function responds to a request for /api/people14     with the complete lists of people15 16     :return:        json string of list of people17 """18 # Create the list of people from our data19 people=Person.query \
20 .order_by(Person.lname) \
21 .all()22 23 # Serialize the data for the response24 person_schema=PersonSchema(many=True)25 returnperson_schema.dump(people).data

Here’s what the above code is doing:

Lines 1 – 9 import some Flask modules to create the REST API responses, as well as importing the db instance from the config.py module. In addition, it imports the SQLAlchemy Person and Marshmallow PersonSchema classes to access the person database table and serialize the results.
Line 11 starts the definition of read_all() that responds to the REST API URL endpoint GET /api/people and returns all the records in the person database table sorted in ascending order by last name.
Lines 19 – 22 tell SQLAlchemy to query the person database table for all the records, sort them in ascending order (the default sorting order), and return a list of Person Python objects as the variable people.
Line 24 is where the Marshmallow PersonSchema class definition becomes valuable. You create an instance of the PersonSchema, passing it the parameter many=True. This tells PersonSchema to expect an interable to serialize, which is what the people variable is.
Line 25 uses the PersonSchema instance variable (person_schema), calling its dump() method with the people list. The result is an object having a data attribute, an object containing a people list that can be converted to JSON. This is returned and converted by Connexion to JSON as the response to the REST API call.

Note: The people list variable created on Line 24 above can’t be returned directly because Connexion won’t know how to convert the timestamp field into JSON. Returning the list of people without processing it with Marshmallow results in a long error traceback and finally this Exception:

TypeError: Object of type Person is not JSON serializable

Here’s another part of the person.py module that makes a request for a single person from the person database. Here, read_one(person_id) function receives a person_id from the REST URL path, indicating the user is looking for a specific person. Here’s part of the updated person.py module showing the handler for the REST URL endpoint GET /api/people/{person_id}:

 1 defread_one(person_id): 2 """ 3     This function responds to a request for /api/people/{person_id} 4     with one matching person from people 5  6     :param person_id:   ID of person to find 7     :return:            person matching ID 8 """ 9 # Get the person requested10 person=Person.query \
11 .filter(Person.person_id==person_id) \
12 .one_or_none()13 14 # Did we find a person?15 ifpersonisnotNone:16 17 # Serialize the data for the response18 person_schema=PersonSchema()19 returnperson_schema.dump(person).data20 21 # Otherwise, nope, didn't find that person22 else:23 abort(404,'Person not found for Id: {person_id}'.format(person_id=person_id))

Here’s what the above code is doing:

Lines 10 – 12 use the person_id parameter in a SQLAlchemy query using the filter method of the query object to search for a person with a person_id attribute matching the passed-in person_id. Rather than using the all() query method, use the one_or_none() method to get one person, or return None if no match is found.
Line 15 determines whether a person was found or not.
Line 17 shows that, if person was not None (a matching person was found), then serializing the data is a little different. You don’t pass the many=True parameter to the creation of the PersonSchema() instance. Instead, you pass many=False because only a single object is passed in to serialize.
Line 18 is where the dump method of person_schema is called, and the data attribute of the resulting object is returned.
Line 23 shows that, if person was None (a matching person wasn’t found), then the Flask abort() method is called to return an error.

Another modification to person.py is creating a new person in the database. This gives you an opportunity to use the Marshmallow PersonSchema to deserialize a JSON structure sent with the HTTP request to create a SQLAlchemy Person object. Here’s part of the updated person.py module showing the handler for the REST URL endpoint POST /api/people:

 1 defcreate(person): 2 """ 3     This function creates a new person in the people structure 4     based on the passed-in person data 5  6     :param person:  person to create in people structure 7     :return:        201 on success, 406 on person exists 8 """ 9 fname=person.get('fname')10 lname=person.get('lname')11 12 existing_person=Person.query \
13 .filter(Person.fname==fname) \
14 .filter(Person.lname==lname) \
15 .one_or_none()16 17 # Can we insert this person?18 ifexisting_personisNone:19 20 # Create a person instance using the schema and the passed-in person21 schema=PersonSchema()22 new_person=schema.load(person,session=db.session).data23 24 # Add the person to the database25 db.session.add(new_person)26 db.session.commit()27 28 # Serialize and return the newly created person in the response29 returnschema.dump(new_person).data,20130 31 # Otherwise, nope, person exists already32 else:33 abort(409,f'Person {fname}{lname} exists already')

Here’s what the above code is doing:

Line 9 & 10 set the fname and lname variables based on the Person data structure sent as the POST body of the HTTP request.
Lines 12 – 15 use the SQLAlchemy Person class to query the database for the existence of a person with the same fname and lname as the passed-in person.
Line 18 addresses whether existing_person is None. (existing_person was not found.)
Line 21 creates a PersonSchema() instance called schema.
Line 22 uses the schema variable to load the data contained in the person parameter variable and create a new SQLAlchemy Person instance variable called new_person.
Line 25 adds the new_person instance to the db.session.
Line 26 commits the new_person instance to the database, which also assigns it a new primary key value (based on the auto-incrementing integer) and a UTC-based timestamp.
Line 33 shows that, if existing_person is not None (a matching person was found), then the Flask abort() method is called to return an error.

Update the Swagger UI

With the above changes in place, your REST API is now functional. The changes you’ve made are also reflected in an updated swagger UI interface and can be interacted with in the same manner. Below is a screenshot of the updated swagger UI opened to the GET /people/{person_id} section. This section of the UI gets a single person from the database and looks like this:

As shown in the above screenshot, the path parameter lname has been replaced by person_id, which is the primary key for a person in the REST API. The changes to the UI are a combined result of changing the swagger.yml file and the code changes made to support that.

Update the Web Application

The REST API is running, and CRUD operations are being persisted to the database. So that it is possible to view the demonstration web application, the JavaScript code has to be updated.

The updates are again related to using person_id instead of lname as the primary key for person data. In addition, the person_id is attached to the rows of the display table as HTML data attributes named data-person-id, so the value can be retrieved and used by the JavaScript code.

This article focused on the database and making your REST API use it, which is why there’s just a link to the updated JavaScript source and not much discussion of what it does.

Example Code

All of the example code for this article is available here. There’s one version of the code containing all the files, including the build_database.py utility program and the server.py modified example program from Part 1.

Conclusion

Congratulations, you’ve covered a lot of new material in this article and added useful tools to your arsenal!

You’ve learned how to save Python objects to a database using SQLAlchemy. You’ve also learned how to use Marshmallow to serialize and deserialize SQLAlchemy objects and use them with a JSON REST API. The things you’ve learned have certainly been a step up in complexity from the simple REST API of Part 1, but that step has given you two very powerful tools to use when creating more complex applications.

SQLAlchemy and Marshmallow are amazing tools in their own right. Using them together gives you a great leg up to create your own web applications backed by a database.

In Part 3 of this series, you’ll focus on the R part of RDBMS: relationships, which provide even more power when you are using a database.

« Part 1: REST APIs With Flask + Connexion

Part 2: Database Persistence

Part 3: Coming Soon! »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Will Kahn-Greene: Bleach v3.0.0 released!

October 3, 2018, 9:00 am

≫ Next: Sumana Harihareswara - Cogito, Ergo Sumana: Tidelift Is Paying Maintainers And, Potentially, Fixing the Economics of an Industry

≪ Previous: Real Python: Building and Documenting Python REST APIs With Flask and Connexion – Part 2

What is it?

Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.

Bleach v3.0.0 released!

Bleach 3.0.0 focused on easing the problems with the html5lib dependency and fixing regressions created in the Bleach 2.0 rewrite

For the first, I vendored html5lib 1.0.1 into Bleach and wrote a shim module. Bleach code uses things in the shim module which import things from html5lib. In this way I:

keep the two separated to some exten
the shim is easy to test on its own
it shouldn't be too hard to update html5lib versions
we don't have to test Bleach against multiple versions of html5lib (which took a lot of time)
no one has to deal with Bleach requiring one version of html5lib and other libraries requiring other versions

I think this is a big win for all of us.

The second was tricky. The Bleach 2.0 rewrite changed clean and linkify from running in the tokenizing step of HTML parsing to running after parsing is done. The parser (un)helpfully would clean up the HTML before passing it to Bleach. Because of that, the cleaned text would end up with all this extra stuff.

For example, with Bleach 2.1.4, you'd have this:

>>> import bleach
>>> bleach.clean('This is terrible.<sarcasm>')
'This is terrible.&lt;sarcasm&gt;&lt;/sarcasm&gt;'

The tokenizer would parse out things that looked like HTML tags, the parser, would see an end tag that didn't have a start tag and would add the start tag, then clean would escape the start and end tags because they weren't in the list of allowed tags. Blech.

Bleach 3.0.0 fixes that by tweaking the tokenizer to know about the list of allowed tags. With this knowledge, it can see a start, end, or empty tag and strip or escape it during tokenization. Then the parser doesn't try to fix anything.

With Bleach 3.0.0, we get this:

>>> import bleach
>>> bleach.clean('This is terrible.<sarcasm>')
'This is terrible.&lt;sarcasm&gt;'

What I could use help with

I could use help with improving the documentation. I think it's dense and all over the place focus-wise. I find it difficult to read.

If you're good with documentation, I sure could use your help. See issue 397 for more.

Where to go for more

For more specifics on this release, see here: https://bleach.readthedocs.io/en/latest/changes.html#version-3-0-0-october-3rd-2018

Documentation and quickstart here: https://bleach.readthedocs.org/en/

Source code and issue tracker here: https://github.com/mozilla/bleach

↧

Sumana Harihareswara - Cogito, Ergo Sumana: Tidelift Is Paying Maintainers And, Potentially, Fixing the Economics of an Industry

October 3, 2018, 2:26 pm

≫ Next: Codementor: Why do I care about immutables in Python?

≪ Previous: Will Kahn-Greene: Bleach v3.0.0 released!

As the founder of Changeset Consulting, I keep my eye on consultancies and services in and near my niche, open source leadership, maintainership, and sustainability.* And I've known Luis Villa for years and got to work with him at Wikimedia. So yeah, I noticed when Tidelift announced its big new launch. And -- now, as a very-part-time consultant who helps Tidelift understand the Python world -- I am excited about their commitment to pay more than USD$1 million to maintainers (including "a guaranteed minimum $10,000 over the next 24 months to select projects").

Here's my take on the new Tidelift subscription model, the "lifter" role, and whom this works for.

For software businesses, this provides that missing vendor relationship, SLA, release cadence expectations, and general peace of mind for all of that unseen infrastructure you depend on. It's often easier for businesses -- of many sizes -- to pay a regular fee than to put open source project management work, dependency-updating, compliance checking, dependency security audits, or FLOSS volunteer relations on the engineering schedule.

For individual programmers and community-maintained open source projects, Tidelift is a potential source of substantial income. As a Pythonist, I hope to reach people who are currently core code contributors to open source projects in Python, especially on the Libraries.io digital infrastructure/unseen infrastructure/improve the bus factor lists. And I would like to reach projects like the ones Nathaniel Smith calls out in a recent post:

that (1) require a modest but non-trivial amount of sustained, focused attention, and (2) have an impact that is large, but broad and diffuse

and projects in the "wide open", "specialty library", and "upstream dependency" categories identified by the Open Tech Strategies report "Open Source Archetypes: A Framework For Purposeful Open Source".

For such people and projects, becoming a lifter is a promising model -- especially since the required tasks are fairly few, and are things maintainers should do anyway. I'm encouraged to see Jeff Forcier (maintainer of Fabric, Alabaster, and more) and Ned Batchelder's coverage.py getting onto the Tidelift platform.

And you can see estimated monthly income for your package right now. For some people, especially those whose healthcare doesn't depend on an employer, Tidelift payments plus some side consulting could be a sustainable, comfortable income.

Then there are folks like me whose contributions are only partially visible in commit logs (management, user support, testing, and so on), and groups that work together best as a team. Tidelift is also a potential source of income for us, but it's a little more complicated. Tidelift can send lifter payments to individuals, for-profits, and nonprofits, but: "If a package has multiple co-maintainers, you'll need to agree as a group on an approach." If you thought code of conduct conversations with your community were uncomfortable, wait till you bring up money! But, more seriously: I've been able to talk frankly with open source colleagues about thorny "who gets paid what?" questions, and if you're candid with your co-maintainers, the benefits may be pretty substantial. You can get advice on this conversation during the next live Tidelift web-based Q&A, Thursday, Oct. 11 at 2 p.m. Eastern Time (sign up at the bottom of the lifter info page).

Nonprofits, companies, and working groups that maintain projects can sign up now as lifters. Even if it's just a trickle of money right now, it might build over time and turn into enough to fund travel for an in-person sprint, contract work to improve continuous integration, an Outreachy internship, etc.

(One gap here: right now, Tidelift isn't great at supporting system-level packages and projects, like tools that get installed via apt or yum/DNF. I'm pretty sure that's something they're working on.)

What about noncommercial users or users who can't afford Tidelift subscriptions? The more lifters and subscribers sign up, the more those users benefit, too. Subscribers' funding means maintainers have time to make improvements that help everyone. And lifters agree to follow security, maintenance, and licensing best practices that also help everyone. Plus, Tidelift stewards libraries.io, a great resource for anyone who uses or develops open source (more on that). More money for Tidelift could mean libraries.io gets better too.

So I'm tooting a horn here and hoping more people sign up, because this is one of the more plausible ways open source sustainability could possibly work. Tidelift could be a real game-changer for the industry. Check it out.

* Examples: new competitors like Maintainer Mountaineer and OpenTeam, new funders like OSS Capital, and colleagues/referrals like Open Tech Strategies, VM Brasseur, Otter Tech, and Authentic Engine.

↧

Codementor: Why do I care about immutables in Python?

October 3, 2018, 9:44 pm

≫ Next: Codementor: Beginner web scraping with Python and Repl.it

≪ Previous: Sumana Harihareswara - Cogito, Ergo Sumana: Tidelift Is Paying Maintainers And, Potentially, Fixing the Economics of an Industry

Immutable python data.

↧

Codementor: Beginner web scraping with Python and Repl.it

October 4, 2018, 12:05 am

≫ Next: PyBites: You don't need to be a Pro @ Python to crack the code of Pycon

≪ Previous: Codementor: Why do I care about immutables in Python?

In this beginner's guide to Python web scraping, we walk through how to retrieve data from websites. From interpreting the HTML source code of a website, to downloading it with Python, and extracting interesting elements, this guide will help you get started building your own automatic web scrapers.

↧

PyBites: You don't need to be a Pro @ Python to crack the code of Pycon

October 4, 2018, 12:45 am

≫ Next: Michał Bultrowicz: A simple self-modifying function in Python

≪ Previous: Codementor: Beginner web scraping with Python and Repl.it

I wanted to write this article to distill any preconceived notion that you have to be an "expert" or "non-newbie" in order to get a lot of value from going to PyCon, the largest of the annual Python conferences. Along the way I want to use my personal experience to highlight some tips for success.

My Story

When I attended PyCon this past May, I had only been through about 20 pages of Matt Harrison's book Illustrated Guide to Python 3, three sections of Talk Python's JumpStart course, probably some PluralSight videos, and been scared off by the Collections' section of the #100DaysOfCode course on Talk Python. The phrase "bits and pieces" has never been a more appropriate descriptor.

I knew what a function was and about f-strings, but for all intents and purposes had a very basic level of knowledge with Python. I had yet to pip install anything or write my own class. Things like Github and virtual environments were like secret handshakes to a secret society that I knew nothing about.

And – you know what? I had an unbelievable, transformative experience. Part of it was luck – but not all of it.

My first Pycon

I arrived the night before the start of PyCon, which is buffered by two days of classes known as "tutorials". It was around 10:30, I had some work to do. Instead of toiling away in my hotel room, I went down to the lobby bar. When I arrived, there was someone with a Talk Python t-shirt. Turns out there is a high degree of correlation between people wearing Python-related garb in a hotel in downtown Cleveland at the beginning of May and conference PyCon attendees 😊.

We struck up a conversation over some drinks. He was a network engineer from Arkansas. I was a SQL developer from Texas. We both liked Python and a good drink. It was comradery at first sip. The next day, during a class break during the first day of Tutorials, I saw the same chap in a circle of people. Sam, the person who I met at the hotel bar the night before, could easily be spotted from a distance. He was about 6-foot-tall and with a clean-shaven noggin of a head, which served as my homing beacon in an otherwise crowded sea of people.

A fateful meeting

After I bumbled myself into the group of people, I recognized two of the individuals whom Sam was talking with. They were none other than the PyBites co-founders Julian and Bob themselves!!!! Even though I had barely used the PyBites website, I recognized their faces from their twitter profiles and knew of them from their podcast appearance on Talk Python earlier that year. To say I had a mini "Internet" celebrity freak-out was an understatement. Here I was meeting people whom I had heard on one of my favorite podcasts and who co-authored the #100DaysOfCode course on the Talk Python website. Certainly, that is not enough notoriety to get them on the cover of People magazine, but their story was very meaningful to me.

Later that same evening, a coworker who came with me to PyCon and myself joined Sam, Bob, Julian and Sam's boss for drinks later. That was just the start, I kept running into this same group of people over and over again. I even ended up helping Bob and Julian with their poster session for PyBites on Sunday. Poster sessions are a more informal, "show-and-tell" version of a talk at PyCon where you get to interact one-on-one with the "speaker".

Along the way, I volunteered at the conference registration desks, helped out both as a session chair and a session runner, for some of the PyCon talks, participated in the "hallway track" (more on that later), attended a live taping of the Python Bytes podcast, and, occasionally, sat in on some actual talks. None of these required that I knew what pip install, package management or magic methods meant. Most important of all, I had a freaking blast.

What's the Secret Sauce?

In looking back, what are the actions or habits that I took that are repeatable and less dumb luck? The main takeaway from my experience is that the best part of the conference and the part that:

requires the least amount of technical knowledge, and
is most unique to the live experience of PyCon are the interactions that take place and the connections that are made with other conference attendees.

Engage with other people

With about 3500 conference attendees, if you don't form a passable connection with the first 50 people that you meet, there are 3450 other cracks at the bat. The important thing is to give yourself chances to meet and engage with other people. This involves the following:

Engaging in the "Hallway Track"
Volunteering
Attending "Open Spaces"
Spending time in the event space outside of where the talks are being held

The Hallway Track

The Hallway Track is the name given to everything outside of what's on the papered schedule. It is the conversations and chatter that occur "in the hallways". It is everything but the talks, open spaces, job fair, poster sessions, tutorials and keynotes. And it is critical to avail yourself to the Hallway Track as much as possible. Sit at a table with an open seat, walk up to a circle of people where there's an opening, say hi and strike up a conversation with as many strangers as possible.

Tip– Leaving a gap in a circle of people is called the "Pac-Man". The Pac-Man circle is highly encouraged by the Python community. Be sure to do the same thing if you find yourself in a circle of people. Closed circles give off the vibe of "members-only club" whereas a Pac-Man circle gives off the message of "come and join us if you like".

Conversation starters

Most of those people will not turn into your best friends, but there's a better than not chance that you will find some common ground with a lot of them. Everyone at PyCon has at least some ground in that they have a personal or commercial interest in Python. You know what that means? There are some boilerplate, token questions than you can ask anyone:

Where are you traveling from?
How do you use Python in your studies or work?
What sparked interested in learning Python?

In Python speak, think of it as an infinite loop where if you run into a conversation that feels flat or is not up your alley, just hit continue to start the loop again. The volume of people can be intimidating in certain respects but it also brings with it plenty of "freedom to fail". If a conversation falls flat or you say something embarrassing, chances are that person won't remember it anyway and you have plenty of other tokens to play at this game.

Volunteering

There is a plethora of opportunities to volunteer at PyCon, which is another natural way to rub shoulders and get an overall sense of being involved as part this big heaping event. Volunteer opportunities include helping out with registration, swag-bag stuffing, being a runner or session chair (introducing the speaker for a talk). There are also plenty of opportunities to volunteer. Almost every second of the conference schedule is littered with multiple time slots.

Attending Open Spaces

Open Spaces are small group breakout sessions that touch upon whatever the speaker or group of speakers want to discuss. These ranged from an open session on mental health to one on job interview prep to Microsoft demoing VS Code. It really has a feeling of being more book clubbish than something more structured. These are not recorded and provide another low-key way to rub shoulders and interact with others.

Tip– Often open invitations for late night social gatherings (e.g., bar hopping, card games) are listed on the same whiteboard that "purely" Python Open Spaces are. So don't think they are just for technical topics. Spending time in the event hall outside of where the talks are being held

The Event Hall

At PyCon there is an event hall where companies (e.g., Microsoft, Google, IBM) or content creators (e.g., Michael Kennedy from Talk Python, O'Reilly Media) will have booths set up. People at these booths are there for one of three main reasons: 1) to sell you a product, 2) convince you to use their software, or 3) to hire you.

Think of the event hall as retail shopping. You are not obligated to buy or commit to anything. But if something catches your attention, stop by, look around, and chat up the people staffing the booth. They are there to sell or advocate something to you, so most of the time the booth people are doing most of the talking.

My recommendation is to not overindulge on one specific outlet. Mixing and matching work best in my opinion. You may also prefer one option heavily over the other – but give the others a chance too. There is plenty of opportunity over the two days of tutorial session, and over the three days and four nights of the main conference to explore all of these above actions in a non-trivial manner.

In Summary

You do not need to do 100% of what I have laid out in this post. But if you pony up the registration fee, hotel and travel costs only to avail yourself to a one-day early preview of content that will be on YouTube in the blink of an eye, you are wasting your time and money.

Go to PyCon for the people and the enjoy the moments that cannot be put on a flash drive and uploaded to the cloud. You do not need to be expert at Python to be a pro at attending PyCon. You just need a smidgen of courage and the belief that more times than not – people aren't half bad. Challenge yourself to talk to at least 5 new people each day. I think you will find that the Python community is a special brew of people. There is no better advocate of this notion than Brett Cannon in his opening remarks of PyCon 2014. In reflecting back on his early days with the Python language and community:

The community was smaller but it honestly felt the same it was an extremely welcoming, friendly, constructive group of people who were always willing to let people come in, help them out, and let them enjoy programming…Now, I don't know about the rest of you, but I like to think of it as I came for the language but I stayed for the community. So I want to personally thank all of you for making this such a wonderful place to be and such a wonderful group of people to be around.

I couldn't agree more, Brett.

Keep Calm and Code in Python!

-- Jason

↧

Michał Bultrowicz: A simple self-modifying function in Python

October 3, 2018, 5:00 pm

≫ Next: Evennia: Evennia in Hacktoberfest 2018

≪ Previous: PyBites: You don't need to be a Pro @ Python to crack the code of Pycon

Replacing its own definition is a fun/horrifying thing that a Python function can do:

↧

Evennia: Evennia in Hacktoberfest 2018

October 4, 2018, 3:34 am

≫ Next: Python Software Foundation: Join the 2018 Python Developers Survey: Share and learn about the community

≪ Previous: Michał Bultrowicz: A simple self-modifying function in Python

Like last year, Evennia, the Python MUD creation system, takes part in Hacktoberfest, a yearly event run by Digitalocean in collaboration with GitHub.

The premise is simple: Sign up at their website and then contribute with 5 GitHub pull requests during the month of October. If you do, you'll win a unique T-shirt!

You can help out any OSS project to win, if you want to help out Evennia, I have marked a bunch of suitable issues with the Hacktoberfest label for you to sink your teeth in.

Code on!

↧

Python Software Foundation: Join the 2018 Python Developers Survey: Share and learn about the community

October 4, 2018, 5:01 am

≫ Next: John Cook: Physical constants in Python

≪ Previous: Evennia: Evennia in Hacktoberfest 2018

2018 is drawing to a close and we are excited to start the official Python Developers Survey for 2018!

In 2017, Python Software Foundation together with JetBrains conducted an official Python Developers Survey for the first time. Over 9,500 developers from almost 150 different countries participated to help us map out an accurate landscape of the Python community.

With this second iteration of the official Python Developers Survey, we aim to identify how the Python development world looks today and how it compares to last year. The results of the survey will serve as a major source of knowledge about the current state of the Python community, so we encourage you to participate and make an invaluable contribution to this community resource. The survey takes approximately 10 minutes to complete.

Please take a few minutes to complete the Python Developers Survey 2018!

Your valuable opinion and feedback will help us better understand how different Python developers use Python, related frameworks, tools, and technologies. We also hope you'll have fun going through the questions.

The survey is organized in partnership between the Python Software Foundation and JetBrains. After the survey is over, we will publish the aggregated results and randomly choose 100 winners (those who complete the survey in its entirety), who will each receive an amazing Python Surprise Gift Pack.

↧

Kernels

IPython Kernel Extensions

Notebook Server Extensions

Notebook Extensions

Wrapping Up

Related Reading

Introduction

Web Development Using Python

Django

Pyramid

Flask

Simple Login System with Django

Installing Django

Django's Auth App

Conclusion

The simplest case: a script that takes some inputs and returns an output

Step 1: extract the processing into a function

Step 2: create a website

Step 3: make the processing code available to the web app

Step 4: Accepting input

Step 5: validating input

Step 6: doing the calculation!

Pause for breath...

The next step -- multi-phase scripts

Sessions to the rescue!

Processing files

And we're done

Prerequisite knowledge

Getting into our role

Key terms and concepts

Dictionary rules for keys and values

Basic dictionary operations

Creation and deletion

Data access and insertion

Membership checking

Section summary

Looping techniques

Section summary

Dictionary comprehensions

Dictionary Advantages and Disadvantages

Further Reading

New in This Version

F-String Improvements

Further Improvements

Interested?

Introduction

Dataset

Neural Networks with One Hidden Layer

Feed Forward

Back Propagation

Phase 1

Phase 2

Code for Neural Networks with One Hidden Layer

Conclusion

Who This Article Is For

Additional Dependencies

People Data

Database Interaction

Little Bobby Tables

Modeling Data With SQLAlchemy

Serializing/Deserializing Modeled Data

Create the Initialized Database

Config Module

Models Module

Creating the Database

Using the Database

Update the REST API

Update the REST API Handlers

Update the Swagger UI

Update the Web Application

Example Code

Conclusion

What is it?

Bleach v3.0.0 released!

What I could use help with

Where to go for more

My Story

My first Pycon

A fateful meeting

What's the Secret Sauce?