Rene Dudfield: Draft of, ^Let's write a unit test!^

July 18, 2018, 12:39 am

≫ Next: Michael Foord: A Very Short Love Letter to Agile

≪ Previous: Python Bytes: #87 Guido van Rossum steps down

(BeginDraft)

So, I started writing this for people who want to 'contribute' to Free Libre and Open source projects.
It's not finished yet, but still useful, and I'd like a bit of feedback, and to start linking to it from the pygame developer docs. So there.

(/EndDraft)

A unit test is a piece of code which tests one thing works well in isolation from other parts of software. In this guide, I'm going to explain how to write one using the standard python unittest module, for the pygame game library. You can apply this advice to most python projects, or free/libre open source projects in general.

A minimal test.

What pygame.draw.ellipse should do: http://www.pygame.org/docs/ref/draw.html#pygame.draw.ellipse
Where to put the test: https://github.com/pygame/pygame/blob/master/test/draw_test.py

def test_ellipse(self):
    import pygame.draw
    surf = pygame.Surface((320, 200))
    pygame.draw.ellipse(surf, (255, 0, 0), (10, 10, 25, 20))

All the test does is call the draw function on the surface with a color, and a rectangle. That's it. A minimal, useful test. If you have a github account, you can even edit the test file in the browser to submit your PR. If you have email, or internet access you can email me or someone else on the internet and ask them to do add it to pygame.

But why write a unit test anyway?

Unit tests help pygame make sure things don't break on multiple platforms. When your code is running on dozens of CPUs and just as many operating systems things get a little tricky to test manually. So we write a unit test and let all the build robots do that work for us.

A great way to contribute to libre/free and open source projects is to contribute a test. Less bugs in the library means less bugs in your own code. Additionally, you get some public credit for your contribution.

The best part about it, is that it's a great way to learn python, and about the thing you are testing. Want to know how graphics algorithms should work, in lots of detail? Start writing tests for them.

The simplest test is to just call the function. Just calling it is a great first test. Easy, and useful.

At the time of writing there are 39 functions that aren't even called when running the pygame tests. Why not join me on this adventure?

Let's write a unit test!

In this guide I'm going to write a test for an pygame.draw.ellipse to make sure a thick circle has the correct colors in it, and not lots of black spots. There's a bunch of tips and tricks to help you along your way. Whilst you can just edit a test in your web browser, and submit a PR, it might be more comfortable to do it in your normal development environment.

Grab a fork, and let's dig in.

Set up git for github if you haven't already. Then you'll want to 'fork' pygame on https://github.com/pygame/pygame so you have your own local copy.

Note, we also accept patches by email, or on github issues. So you can skip all this github business if you want to. https://www.pygame.org/wiki/patchesandbugs

Fork the repository (see top right of the pygame repo page)
Make the change locally. Push to your copy of the fork.
Submit a pull request

So you've forked the repo, and now you can clone your own copy of the git repo locally.

$ git clone https://github.com/YOUR-USERNAME/pygame
$ cd pygame/
$ python test/draw_test.py 
...
----------------------------------------------------------------------
Ran 3 tests in 0.007s

OK

You'll see all of the tests in the test/ folder.

Browse the test folder online: https://github.com/pygame/pygame/tree/master/test

If you have an older version of pygame, you can use this little program to see the issue.

There is some more extensive documentation in the test/README file. Including on how to write a test that requires manual interaction.

Standard unittest module.

pygame uses the standard python unittest module. With a few enhancements to make it nicer for developing C code.

Fun fact: pygame included the unit testing module before python did.

We will go over the basics in this guide, but for more detailed information please see:
https://docs.python.org/3/library/unittest.html

How to run a single test?

Running all the tests at once can take a while. What if you just want to run a single test?

If we look inside draw_test.py, each test is a class name, and a function. There is a "DrawModuleTest" class, and there should be a "def test_ellipse" function.

So, let's run the test...

~/pygame/ $ python test/draw_test.py DrawModuleTest.test_ellipse
Traceback (most recent call last):
...
AttributeError: type object 'DrawModuleTest' has no attribute 'test_ellipse'

Starting with failure. Our test isn't there yet.

Good. This fails. It's because we don't have a test called "def test_ellipse" in there yet. What there is, is a method called 'todo_test_ellipse'. This is an extension pygame testing framework has so we can easily see which functionality we still need to write tests for.

~/pygame/ $ python run_tests.py --incomplete
...
FAILED (errors=39)

Looks like there are currently 39 functions or methods without a test. Easy pickings.

Digression: Low hanging fruit, help wanted.

Something that's easy to do.

A little digression for a moment... what is low hanging fruit?

Low hanging fruit is easy to get off the tree. You don't need a ladder, or robot arms with a claw on the end. So I guess that's what people are talking about in the programming world when they say "low hanging fruit".

pygame low hanging fruit

Many projects keep a list of "low hanging fruit", or "help wanted" issues. Like the pygame low hanging fruit list. Ones other people don't think will be all that super hard to do. If you can't find any on there labeled like this, then ask them. Perhaps they'll know of something easy to do, but haven't had the time to mark one yet.

One little trick is that writing a simple test is quite easy for most projects. So if they don't have any marked "low hanging fruit", go take a look in their test folder and see if you can add something in there.

Don't be afraid to ask questions. If you look at an issue, and you can't figure it out, or get stuck on something, ask a nice question in there for help.

Digression: Contribution guide.

There's usually also a contribution guide. Like the pygame Contribute wiki page. Or it may be called developer docs, or there may be a CONTRIBUTING.md file in the source code repository. Often there is a separate place the developers talk on. For pygame it is the pygame mailing list, but there is also a chat server. Find the details on the Info wiki page.

Back to the test.

The unittest module arranges tests inside functions that start with "test_" that live in a class.

[TODO: empty todo_test with whole class]
[TODO: image of what is wrong, moire pattern]
[TODO: show how we test that function]
[TODO: show pull request travis/appveyor running tests]

↧

Michael Foord: A Very Short Love Letter to Agile

July 17, 2018, 5:00 pm

≫ Next: Matthew Rocklin: Dask Development Log, Scipy 2018

≪ Previous: Rene Dudfield: Draft of, ^Let's write a unit test!^

Resolver Systems 2009

The photo shows some of the Resolver Systems crew enjoying a meal together at the 2009 EuroPython in Birmingham.

I love the word rigour. It conveys either, or both, strict discipline or something that was really hard work.

I’ve found the rigorous application of theoretical principles a really useful way of learning those principles. Learning what they really mean, and what those principles are good at achieving and what they’re not good at achieving.

I’ve been rigorous in my discipline in meditating. I’ve meditated for an hour a day, generally six days a week, for a number of years now. A dedicated and regular practise of focus and letting go of distractions, which is the substance of mindfulness practise, has made a difference in my life and my understanding of myself.

My trade is as a software engineer, a computer programmer. I taught myself to program by becoming really passionate about it. What you love you learn. I learned the art and craft of engineering in my first professional job, at a small startup in London called Resolver Systems.

There, for the four years I worked there, we rigorously applied the principles of Extreme Programming, a strict variant and really the progenitor of the “agile” movement. The goal is engineering processes that make the development process agile and fluid, able to change direction quickly and able to check that it is continuously delivering value in the work being done, whilst also creating the software in ways that ensure as much as is possible you are creating a quality and useful product.

This includes full Test Driven Design (often now called “test first” in a great misunderstanding of the value of TDD), with a full test coverage from unit to functional (full stack). We had about three to four times more test code than production code. We built a beautiful thing.

It also included full pair programming. So for four years we worked together and thought together and learned together. The product failed, unfortunately, an idea ahead of its time. With Python finally being added to Excel as a scripting language it’s possible that the idea of applying proper engineering principles to the creation of complex spreadsheets may have its day after all.

Contrary to perhaps what we thought in the honeymoon phase of learning “agile” it isn’t a magic silver bullet for curing all ills of software engineering. However, agile processes do stand in stark contrast to the traditional “waterfall” model of software enginering. A strongly waterfall process is very hard to change, which is precisely why they’re such a bad idea. Perhaps we can summarise the essence of agile as “finding engineering processes and practises that are able to evolve and suit the specific needs of the team, product and customers”. Being able to change matters, being stuck is awful.

This post originally appeared on my personal blog A Love Letter to Agile on Unpolished Musings.

↧

Matthew Rocklin: Dask Development Log, Scipy 2018

July 16, 2018, 5:00 pm

≫ Next: Dataquest: Basic Statistics in Python: Probability

≪ Previous: Michael Foord: A Very Short Love Letter to Agile

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Last week many Dask developers gathered for the annual SciPy 2018 conference. As a result, very little work was completed, but many projects were started or discussed. To reflect this change in activity this blogpost will highlight possible changes and opportunities for readers to further engage in development.

Dask on HPC Machines

The dask-jobqueue project was a hit at the conference. Dask-jobqueue helps people launch Dask on traditional job schedulers like PBS, SGE, SLURM, Torque, LSF, and others that are commonly found on high performance computers. These are very common among scientific, research, and high performance machine learning groups but commonly a bit hard to use with anything other than MPI.

This project came up in the Pangeo talk, lightning talks, and the Dask Birds of a Feather session.

During sprints a number of people came up and we went through the process of configuring Dask on common supercomputers like Cheyenne, Titan, and Cori. This process usually takes around fifteen minutes and will likely be the subject of a future blogpost. We published known-good configurations for these clusters on our configuration documentation

Additionally, there is a JupyterHub issue to improve documentation on best practices to deploy JupyterHub on these machines. The community has done this well a few times now, and it might be time to write up something for everyone else.

Get involved

If you have access to a supercomputer then please try things out. There is a 30-minute Youtube video screencast on the dask-jobqueue documentation that should help you get started.

If you are an administrator on a supercomputer you might consider helping to build a configuration file and place it in /etc/dask for your users. You might also want to get involved in the JupyterHub on HPC conversation.

Dask / Scikit-learn talk

Olivier Grisel and Tom Augspurger prepared and delivered a great talk on the current state of the new Dask-ML project.

MyBinder and Bokeh Servers

Not a Dask change, but Min Ragan-Kelley showed how to run services through mybinder.org that are not only Jupyter. As an example, here is a repository that deploys a Bokeh server application with a single click.

I think that by composing with Binder Min effectively just created the free-to-use hosted Bokeh server service. Presumably this same model could be easily adapted to other applications just as easily.

Dask and Automated Machine Learning with TPOT

Dask and TPOT developers are discussing paralellizing the automatic-machine-learning tool TPOT.

TPOT uses genetic algorithms to search over a space of scikit-learn style pipelines to automatically find a decently performing pipeline and model. This involves a fair amount of computation which Dask can help to parallelize out to multiple machines.

Get involved

Trivial things work now, but to make this efficient we’ll need to dive in a bit more deeply. Extending that pull request to dive within pipelines would be a good task if anyone wants to get involved. This would help to share intermediate results between pipelines.

Dask and Scikit-Optimize

Among various features, Scikit-optimize offers a BayesSearchCV object that is like Scikit-Learn’s GridSearchCV and RandomSearchCV, but is a bit smarter about how to choose new parameters to test given previous results. Hyper-parameter optimization is a low-hanging fruit for Dask-ML workloads today, so we investigated how the project might help here.

So far we’re just experimenting using Scikit-Learn/Dask integration through joblib to see what opportunities there are. Dicussion among Dask and Scikit-Optimize developers is happening here:

Issue: dask/dask-ml #300

Centralize PyData/Scipy tutorials on Binder

We’re putting a bunch of the PyData/Scipy tutorials on Binder, and hope to embed snippets of Youtube videos into the notebooks themselves.

This effort lives here:

pydata-tutorials.readthedocs.io

Motivation

The PyData and SciPy community delivers tutorials as part of most conferences. This activity generates both educational Jupyter notebooks and explanatory videos that teach people how to use the ecosystem.

However, this content isn’t very discoverable after the conference. People can search on Youtube for their topic of choice and hopefully find a link to the notebooks to download locally, but this is a somewhat noisy process. It’s not clear which tutorial to choose and it’s difficult to match up the video with the notebooks during exercises. We’re probably not getting as much value out of these resources as we could be.

To help increase access we’re going to try a few things:

Produce a centralized website with links to recent tutorials delivered for each topic
Ensure that those notebooks run easily on Binder
Embed sections of the talk on Youtube within each notebook so that the explanation of the section is tied to the exercises

Get involved

This only really works long-term under a community maintenance model. So far we’ve only done a few hours of work and there is still plenty to do in the following tasks:

Find good tutorials for inclusion
Ensure that they work well on mybinder.org
- are self-contained and don’t rely on external scripts to run
- have an environment.yml or requirements.txt
- don’t require a lot of resources
Find video for the tutorial
Submit a pull request to the tutorial repository that embeds a link to the youtube talk at the top cell of the notebook at the proper time for each notebook

Dask, Actors, and Ray

I really enjoyed the talk on Ray another distributed task scheduler for Python. I suspect that Dask will steal ideas for actors for stateful operation. I hope that Ray takes on ideas for using standard Python interfaces so that more of the community can adopt it more quickly. I encourage people to check out the talk and give Ray a try. It’s pretty slick.

Planning conversations for Dask-ML

Dask and Scikit-learn developers had the opportunity to sit down again and raise a number of issues to help plan near-term development. This focused mostly around building important case studies to motivate future development, and identifying algorithms and other projects to target for near-term integration.

Case Studies

Algorithms

Get involved

We could use help in building out case studies to drive future development in the project. There are also several algorithmic places to get involved. Dask-ML is a young and fast-moving project with many opportunities for new developers to get involved.

Dask and UMAP for low-dimensional embeddings

Leland McKinnes gave a great talk Uniform Manifold Approximation and Projection for Dimensionality Reduction in which he lays out a well founded algorithm for dimensionality reduction, similar to PCA or T-SNE, but with some nice properties. He worked together with some Dask developers where we identified some challenges due to dask array slicing with random-ish slices.

A proposal to fix this problem lives here, if anyone wants a fun problem to work on:

dask/dask #3409 (comment)

Dask stories

We soft-launched Dask Stories a webpage and project to collect user and share stories about how people use Dask in practice. We’re also delivering a separate blogpost about this today.

See blogpost: Who uses Dask?

If you use Dask and want to share your story we would absolutely welcome your experience. Having people like yourself share how they use Dask is incredibly important for the project.

↧

Dataquest: Basic Statistics in Python: Probability

July 18, 2018, 5:00 am

≫ Next: EuroPython: EuroPython 2018: Introducing Smarkets

≪ Previous: Matthew Rocklin: Dask Development Log, Scipy 2018

When studying statistics, you will inevitably have to learn about probability. It is easy lose yourself in the formulas and theory behind probability, but it has essential uses in both working and daily life. We've previously discussed some basic concepts in descriptive statistics; now we'll explore how statistics relates to probability.

Prerequisites:

Similar to the previous post, this article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python. If you are uncomfortable with for loops and lists, I recommend covering them briefly before progressing.

What is probability?

At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" An event is some outcome of interest. To calculate the chance of an event happening, we also need to consider all the other events that can occur.

The quintessential representation of probability is the humble coin toss. In a coin toss the only events that can happen are:

Flipping a heads
Flipping a tails

These two events form the sample space, the set of all possible events that can happen. To calculate the probability of an event occurring, we count how many times are event of interest can occur (say flipping heads) and dividing it by the sample space. Thus, probability will tell us that an ideal coin will have a 1-in-2 chance of being heads or tails. By looking at the events that can occur, probability gives us a framework for making predictions about how often events will happen.

However, even though it seems obvious, if we actually try to toss some coins, we're likely to get an abnormally high or low counts of heads every once in a while. If we don't want to make the assumption that the coin is fair, what can we do? We can gather data! We can use statistics to calculate probabilities based on observations from the real world and check how it compares to the ideal.

From statistics to probability

Our data will be generated by flipping a coin 10 times and counting how many times we get heads. We will call a set of 10 coin tosses a trial. Our data point will be the number of heads we observe. We may not get the "ideal" 5 heads, but we won't worry too much since one trial is only one data point.

If we perform many, many trials, we expect the average number of heads over all of our trials to approach the 50%. The code below simulates 10, 100, 1000, and 1000000 trials, and then calculates the average proportion of heads observed. Our process is summarized in the image below as well.

Coin Example

import random
def coin_trial():
    heads = 0
    for i in range(100):
        if random.random() <= 0.5:
            heads +=1
    return heads

def simulate(n):
   trials = []
   for i in range(n):
       trials.append(coin_trial())
   return(sum(trials)/n)
   
simulate(10)
>> 5.4

simulate(100)
>>> 4.83

simulate(1000)
>>> 5.055

simulate(1000000)
>>> 4.999781

The coin_trial function is what represents a simulation of 10 coin tosses. It uses the random() function to generate a float between 0 and 1, and increments our heads count if it's within half of that range. Then, simulate repeats these trials depending on how many times you'd like, returning the average number of heads across all of the trials.

The coin toss simulations give us some interesting results. First, the data confirm that our average number of heads does approach what probability suggests it should be. Furthermore, this average improves with more trials. In 10 trials, there's some slight error, but this error almost disappears entirely with 1,000,000 trials. As we get more trials, the deviation away from the average decreases. Sound familiar?

Sure, we could have flipped the coin ourselves, but Python saves us a lot of time by allowing us to model this process in code. As we get more and more data, the real-world starts to resemble the ideal. Thus, given enough data, statistics enables us to calculate probabilities using real-world observations. Probability provides the theory, while statistics provides the tools to test that theory using data. The descriptive statistics, specifically mean and standard deviation, become the proxies for the theoretical.

You may ask, "Why would I need a proxy if I can just calculate the theoretical probability itself?" Coin tosses are a simple toy example, but the more interesting probabilities are not so easily calculated. What is the chance of someone developing a disease over time? What is the probability that a critical car component will fail when you are driving?

There are no easy ways to calculate probabilities, so we must fall back on using data and statistics to calculate them. Given more and more data, we can become more confident that what we calculate represents the true probability of these important events happening.

That being said, remember from our previous statistics post that you are a sommelier-in-training. You need to figure out which wines are better than others before you start purchasing them. You have a lot of data on hand, so we'll use our statistics to guide our decision.

The data and the distribution

Before we can tackle the question of "which wine is better than average," we have to mind the nature of our data. Intuitively, we'd like to use the scores of the wines to compare groups, but there comes a problem: the scores usually fall in a range. How do we compare groups of scores between types of wines and know with some degree of certainty that one is better than the other?

Enter the normal distribution. The normal distribution refers to a particularly important phenomenon in the realm of probability and statistics. The normal distribution looks like this:

Normal Distribution Look

The most important qualities to notice about the normal distribution is its symmetry and its shape. We've been calling it a distribution, but what exactly is being distributed? It depends on the context.

In probability, the normal distribution is a particular distribution of the probability across all of the events. The x-axis takes on the values of events we want to know the probability of. The y-axis is the probability associated with each event, from 0 to 1. We haven't discussed probability distributions in-depth here, but know that the normal distribution is a particularly important kind of probability distribution.

In statistics, it is the values of our data that are being distributed. Here, the x-axis is the values of our data, and the y-axis is the count of each of these values. Here's the same picture of the normal distribution, but labelled according to a probability and statistical context:

Axes Comparison

In a probability context, the high point in a normal distribution represents the event with the highest probability of occurring. As you get farther away from this event on either side, the probability drops rapidly, forming that familiar bell-shape. The high point in a statistical context actually represents the mean. As in probability, as you get farther from the mean, you rapidly drop off in frequency. That is to say, extremely high and low deviations from the mean are present but exceedingly rare.

If you suspect there is another relationship between probability and statistics through the normal distribution, then you are correct in thinking so! We will explore this important relationship later in the article, so hold tight.

Since we'll be using the distribution of scores to compare different wines, we'll do some set up to capture some wines that we're interested in. We'll bring in the wine data and then separate out the scores of some wines of interest to us.

To bring back in the data, we need the following code:

import csv
with open("wine-data.csv", "r", encoding="latin-1") as f:
    wines = list(csv.reader(f))

The data is shown below in tabular form. We need the points column, so we'll extract this into its own list. We've heard from one wine expert that the Hungarian Tokaji wines are excellent, while a friend has suggested that we start with the Italian Lambrusco. We have the data to compare these wines!

If you don't remember what the data looks like, here's a quick table to reference and get reacquainted.

index	country	description	designation	points	price	province	region_1	region_2	variety	winery
0	US	"This tremendous 100%..."	Martha's Vineyard	96	235	California	Napa Valley	Napa	Cabernet Sauvignon	Heitz
1	Spain	"Ripe aromas of fig...	Carodorum Selecci Especial Reserva	96	110	Northern Spain	Toro		Tinta de Toro	Bodega Carmen Rodriguez
2	US	"Mac Watson honors...	Special Selected Late Harvest	96	90	California	Knights Valley	Sonoma	Sauvignon Blanc	Macauley
3	US	"This spent 20 months...	Reserve	96	65	Oregon	Willamette Valley	Willamette Valley	Pinot Noir	Ponzi
4	France	"This is the top wine...	La Brelade	95	66	Provence	Bandol		Provence red blend	Domaine de la Begude

# Extract the Tokaji scores
tokaji = []
non_tokaji = []
for wine in wines:
    if points != '':
        points = wine[4]
    if wine[9] == "Tokaji":
        tokaji.append(float(points))
    else:
        non_tokaji.append(points)

# Extract the Lambrusco scores 
lambrusco = []
non_lambrusco = []
for wine in wines:
    if points != '':
        points = wine[4]
    if wine[9] == "Lambrusco":
        lambrusco.append(float(points))
    else:
        non_lambrusco.append(float(points))

If we visualize each group of scores as normal distributions, we can immediately tell if two distributions are different based on where they are. But we will quickly run into problems with this approach, as shown below. We assume the scores will be normally distributed since we have a ton of data. While that assumption is okay here, we'll discuss later when it may actually be dangerous to do so.

Visualized Normal Pairs

When the two score distributions overlap too much, it's probably better to assume thy actually come from the same distribution and aren't different. On the other extreme with no overlap, it's safe to assume that the distributions aren't the same. Our trouble lay in the case of some overlap. Given that the extreme highs of one distribution may intersect with the extreme lows of another, how can we say if the groups are different?

Here, we must again call upon the normal distribution to give us an answer and a bridge between statistics and probability.

Revisiting the normal

The normal distribution is significant to probability and statistics thanks to two factors: the Central Limit Theorem and the Three Sigma Rule.

Central Limit Theorem

In the previous section, we demonstrated that if we repeated our 10-toss trials many, many times, the average heads-count of all of these trials will approach the 50% we expect from an ideal coin. With more trials, the closer the average of these trials approach the true probability, even if the individual trials themselvesare imperfect. This idea is a key tenet of the Central Limit Theorem.

In our coin-tossing example, a single trial of 10 throws produces a single estimate of what probability suggests should happen (5 heads). We call it an estimate because we know that it won't be perfect (i.e. we won't get 5 heads everytime). If we make many estimates, the Central Limit Theorem dictates that the distribution of these estimates will look like a normal distribution. The zenith of this distribution will line up with the true value that the estimates should take on. In statistics, the peak of the normal distribution lines up with the mean, and that's exactly what we observed. Thus, given multiple "trials" as our data, the Central Limit Theorem suggests that we can hone in on the theoretical ideal given by probability, even when we don't know the true probability.

Central Limit Theorem lets us know that the average of many trials means will approach the true mean, the Three Sigma Rule will tell us how much the data will be spread out around this mean.

Three Sigma Rule

The Three Sigma rule, also known as the empirical rule or 68-95-99.7 rule, is an expression of how many of our observations fall within a certain distance of the mean. Remember that the standard deviation (a.k.a. "sigma") is the average distance an observation in the data set is from the mean.

The Three Sigma rule dictates that given a normal distribution, 68% of your observations will fall between one standard deviation of the mean. 95% will fall within two, and 99.7% will fall within three. A lot of complicated math goes into the derivation of these values, and as such, is out of the scope of this article. The key takeaway is to know that the Three Sigma Rule enables us to know how much data is contained under different intervals of a normal distribution. The picture below is a great summary of what the Three Sigma Rule represents.

Three Sigma

We'll connect these concepts back to our wine data. As a sommelier, we'd like to know with high confidence that Chardonnay and Pinot Noir are more popular than the average wine. We have many thousands of wine reviews, so by Central Limit Theorem, the average score of these reviews should line up with a so-called "true" representation of the wine's quality (as judged by the reviewer).

Although the Three Sigma rule is a statement of how much of your data falls within known values, it is also a statement of the rarity of extreme values. Any value that is more than three standard deviations away from the mean should be treated with caution or care. By taking advantage of the Three Sigma Rule and the Z-score, we'll finally be able to prescribe a value to how likely Chardonnay and Pinot Noir are different from the average wine.

Z-score

The Z-score is a simple calculation that answers the question, "Given a data point, how many standard deviations is it away from the mean?" The equation below is the Z-score equation.

Z-score

By itself, the Z-score doesn't provide much information to you. It gains the most value when compared against a Z-table, which tabulates the cumulative probability of a standard normal distribution up until a given Z-score. A standard normal is a normal distribution with a mean of 0 and a standard deviation of 1. The Z-score lets us reference this the Z-table even if our normal distribution is not standard.

The cumulative probability is the sum of the probabilities of all values occurring, up until a given point. An easy example is the mean itself. The mean is the exact middle of the normal distribution, so we know that the sum of all probabilites of getting values from the left side up until the mean is 50%. The values from the Three Sigma Rule actually come up if you try to calculate the cumulative probability between standard deviations. The picture below provides a visualization of the cumulative probability.

Cumulative Probability

We know that the sum of all probabilities must equal 100%, so we can use the Z-table to calculate probabilities on both sides of the Z-score under the normal distribution.

Cumulative Probability 2

This calculation of probability of being past a certain Z-score is useful to us. It lets us ask go from "how far is a value from the mean" to "how likely is a value this far from the mean to be from the same group of observations?" Thus, the probability derived from the Z-score and Z-table will answer our wine based questions.

import numpy as np
tokaji_avg = np.average(tokaji)
lambrusco_avg = np.average(lambrusco)

tokaji_std = np.std(tokaji)
lambrusco = np.std(lambrusco)

# Let's see what the results are
print("Tokaji: ", tokaji_avg, tokaji_std)
print("Lambrusco: ", lambrusco_avg, lambrusco_std)
>>> Tokaji:  90.9 2.65015722804
>>> Lambrusco:  84.4047619048 1.61922267961

This doesn't look good for our friend's recommendation! For the purpose of this article, we'll treat both the Tokaji and Lambrusco scores as normally distributed. Thus, the average score of each wine will represent their "true" score in terms of quality. We will calculate the Z-score and see how far away the Tokaji average is from the Lambrusco.

z = (tokaji_avg - lambrusco_avg) / lambrusco_std
>>> 4.0113309781438229

# We'll bring in scipy to do the calculation of probability from the Z-table
import scipy.stats as st
st.norm.cdf(z)
>>> 0.99996981130231266

# We need the probability from the right side, so we'll flip it!
1 - st.norm.cdf(z)
>>> 3.0188697687338895e-05

The answer is quite small, but what exactly does it mean? The infinitesimal smallness of this probability requires some careful interpretation.

Let's say that we believed that there was no difference between our friend's Lambrusco and the wine expert's Tokaji. That is to say, we believe that the quality of the Lambrusco and the Tokaji to be about the same. Likewise, due to individual differences between wines, there will be some spread of the scores of these wines. This will produce normally distured scores if we make a histogram of the Tokaji and Lambrusco wines, thanks to Central Limit Theorem.

Now, we have some data that allows us to calculate the mean and standard deviation of both wines in question. These values allow us to actually test our belief that Lambrusco and Tokaji were of similar quality. We used the Lambrusco wine scores as a base and compared the Tokaji average, but we could have easily done it the other way around. The only difference would be a negative Z-score.

The Z-score was 4.01! Remember that the Three Sigma Rule tells us that 99.7% of the data should fall within 3 standard deviations, assuming that Tokaji and Lambrusco were similar. The probability of a score average as extreme as Tokaji's in a world where Lambrusco and Tokaji wines are assumed to be the same is very, very small. So small that we are forced to consider the converse: Tokaji wines are different from Lambrusco wines and will produce a different score distribution.

We've chosen our wording here carefully: I took care not to say, "Tokaji wines are better than Lambrusco." They are highly probable to be. This is because we calculated a probability which, though microscopically small, is not zero. In order to be precise, we can say that Lambrusco and Tokaji wines are definitively not from the same score distribution, but we cannot say that one is better or worse than the other.

This type of reasoning is within the domain of inferential statistics, and this article only seeks to give you a brief introduction into the rationale behind it. We covered a lot of concepts in this article, so if you found yourself getting lost, go back and take it slow. Having this framework of thinking is immensely powerful, but easy to misuse and misunderstand.

Conclusion

We started with descriptive statistics and then connected them to probability. From probability, we developed a way to quantatively show if two groups come from the same distribution. In this case, we compared two wine recommendations and found that they most likely do not come from the same score distribution. In other words, one wine type is most likely better than the other one.

Statistics doesn't have to be a field relegated to just statisticians. As a data scientist, having an intuitive understanding on common statistical measures represent will give you an edge on developing your own theories and the ability to subsequently test these theories. We barely scratched the surface of inferential statistics here, but the same general ideas here will help guide your intuition in your statistical journey. Our article discussed the advantages of the normal distribution, but statisticians have also developed techniques to adjust for distributions that aren't normal.

EuroPython: EuroPython 2018: Introducing Smarkets

July 18, 2018, 5:01 am

≫ Next: Python Software Foundation: The Happy Medium: Distinguished Service Award Winner Tim Peters

≪ Previous: Dataquest: Basic Statistics in Python: Probability

We are very pleased to have Smarkets as Keystone Sponsor for EuroPython 2018. You can visit them at the most central booth in our exhibit area, the Lennox Suite in the EICC, and take the opportunity to chat with their staff or enjoy their escape room.

Please find below a hosted blog post from Smarkets.

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

Smarkets: where Python and financial trading meet

Smarkets operates one of the world’s most powerful and innovative event trading exchanges. We have engineered our technology in-house - without using white label products - and Python powers our platform. In fact, our codebase is fully Python3 and is deployed across multiple teams of engineers within our self-managed organisation.

We combine financial technology with the startup culture of fast development and frequent releases; the perfect environment to make use of Python’s data science and rapid prototyping capabilities.

Python language helps us to remain nimble as it allows us to move from idea to product extremely quickly. Python not only has a solid standard library but also an extensive community which means that packages exist for just about everything you could ever need.

Join us at EuroPython where our workshop will show you how Smarkets is using Python to revolutionise sports trading. You will learn more about Python’s implementation on an exchange, how trading bots work and some strategies that can be employed to successfully trade sports. Put this theory in action with access to our API and a skeleton bot, which you will use as a base to create your very own trading bot. The workshop is free people with a conference ticket or training pass. Bring your laptops!

Come and check out our booth #10 to learn more about life at Smarkets and what it’s like to work at the UK’s largest self-managed organisation where you get to be your own boss, define the projects you work on and even get to set your own salary. If that’s not tempting enough, we’ve got an actual escape room on our booth as one of the main attractions at EuroPython this year! If you’ve got 30 minutes to spare or simply fancy the challenge, come along and see if you can crack your way out! Successful teams will also automatically be entered into a prize draw that we’ll be running each day.

↧

Python Software Foundation: The Happy Medium: Distinguished Service Award Winner Tim Peters

July 18, 2018, 5:28 am

≫ Next: Real Python: Lists and Tuples in Python

≪ Previous: EuroPython: EuroPython 2018: Introducing Smarkets

When Tim Peters started working on Python, his first advice for Guido van Rossum was that programmers want to add ints and floats. From the beginning, Python had both kinds of numbers, just like today, but adding them together then required a cumbersome type-cast. Peters argued that Python should implicitly convert ints to floats, like most other languages, for programmers' sake: "That is a very common operation for anyone who works with floating point numbers," Van Rossum recalls him saying, "so you’ve got to do it this way."

Ever since, Peters has pushed the language in this direction. He insists that Python should be a practical language that caters to the needs of programmers, and he has a knack for guiding design debates to achieve this goal. In recognition of his contributions, the PSF presented Tim Peters with the 2017 Distinguished Service Award.

A Realist Algorithm

"Timsort is Tim's grand opus," says Van Rossum. The algorithm is not only the standard sort for Python; when Java developer Joshua Bloch saw its merit for sorting real-world data, he incorporated it into the Java standard library as well. The genius of Timsort is to recognize how data naturally occurs in everyday programs: it's less likely to be randomly ordered than to be partly ordered, or ordered in reverse. Programmers usually throw such data at a sorting function anyway, and a theoretically elegant algorithm like Quicksort won't recognize the shortcuts it could take to save work. Timsort is designed to recognize such opportunities and deploy efficient tricks for them.

Timsort is optimized for the world not as we imagine it, but as it is. This realism is characteristic of the Python language as a whole. It flows from Van Rossum's taste in design, which Peters distilled into a poem in 1999.

The Zen of Python

It's only 19 lines. But this short list of precepts has influenced the language and the programs written in it profoundly. It is a shared literature for Python programmers, in the same way that most English speakers know certain lines of Shakespeare. Python's designers quote the Zen of Python in PEP debates, and programmers reviewing code in their own Python projects use the Zen to support their opinions.

Guido van Rossum says, "You can use it to motivate a design choice, but it’s not scripture. It can’t be the only reason to choose a particular design. You still have to put your thinking cap on." Just like the Zen Buddhist sayings that inspired it, Peters's text isn't dogma. Indeed, for every commandment the Zen of Python hands down, there is also a joke or a contradiction to remind us to take it lightly.

Core developer Carol Willing summarizes the Zen of Python's message like this: "We're going to meet constraints in a way that makes good common sense first, so you can maintain the code and people can understand the code." It's this commonsense approach that makes Python a joy to use. Willing began coding on a mainframe at Bell Labs when she was in fifth grade in 1976; in all her years as a programmer the most enjoyable have been her years with Python. She says, "Every day I get to use it makes me feel like a kid again." Now, when she teaches Project Jupyter interns each summer, one of her first instructions is to type "import this".

Willing extends Python's Zen to its community, too. She says that "Beautiful is better than ugly" is a good guide for talking with our colleagues. "There’s an ugly way of saying things, and a more respectful, nicer way of saying things. Maybe we should err on the side of being respectful and nice."

A Happy Medium

In Guido van Rossum's estimation, Peters's biggest contribution to the community has been his years of answering questions and guiding debates on the Python mailing list, writing each message precisely and cheerfully. PSF director Thomas Wouters agrees: "Tim is just never flustered. He always takes it in good humor and it definitely has an effect on everyone else, as well." Even an experienced developer like Carol Willing says that when she sees a post from Peters on a topic she knows, she'll take the time to read it for new insights or new ways of explaining.

In design debates, Peters invented a notion of "channeling Guido" to free Van Rossum from the overflow of emails. He claimed to act like a spirit medium speaking with Van Rossum's voice, but this understates Peters's influence. "He was a mentor for me," says Van Rossum. "He combines incredible technical skills with insight into what the person he's communicating with is missing or needs to see, with a patient way of explaining. He showed me that style of communicating which I strive for but can't always do."

Recently, in the wake of contentious debate over the ":=" operator, Guido van Rossum resigned as BDFL. Tim Peters, too, is less active on Python mailing lists than before. The Python community can no longer rely on one individual and his channeler for guidance. As Brett Cannon wrote, "a key asset that Guido has provided for us as a BDFL is consistency in design/taste." As a summary of Van Rossum's thinking, the Zen of Python is now more important than ever.

Images: Utagawa Kuniyoshi (1797-1861), 108 Heroes of the Popular Suikoden.

↧

Real Python: Lists and Tuples in Python

July 18, 2018, 7:00 am

≫ Next: Django Weblog: Django 2.1 release candidate 1 released

≪ Previous: Python Software Foundation: The Happy Medium: Distinguished Service Award Winner Tim Peters

Lists and tuples are arguably Python’s most versatile, useful data types. You will find them in virtually every nontrivial Python program.

Here’s what you’ll learn in this tutorial: You’ll cover the important characteristics of lists and tuples. You’ll learn how to define them and how to manipulate them. When you’re finished, you should have a good feel for when and how to use these object types in a Python program.

Python Lists

In short, a list is a collection of arbitrary objects, somewhat akin to an array in many other programming languages but more flexible. Lists are defined in Python by enclosing a comma-separated sequence of objects in square brackets ([]), as shown below:

>> a = ['foo', 'bar', 'baz', 'qux']>>> print(a)['foo', 'bar', 'baz', 'qux']>>> a['foo', 'bar', 'baz', 'qux']

The important characteristics of Python lists are as follows:

Lists are ordered.
Lists can contain any arbitrary objects.
List elements can be accessed by index.
Lists can be nested to arbitrary depth.
Lists are mutable.
Lists are dynamic.

Each of these features is examined in more detail below.

Lists Are Ordered

A list is not merely a collection of objects. It is an ordered collection of objects. The order in which you specify the elements when you define a list is an innate characteristic of that list and is maintained for that list’s lifetime. (You will see a Python data type that is not ordered in the next tutorial on dictionaries.)

Lists that have the same elements in a different order are not the same:

>>> a=['foo','bar','baz','qux']>>> b=['baz','qux','bar','foo']>>> a==bFalse>>> aisbFalse>>> [1,2,3,4]==[4,1,3,2]False

Lists Can Contain Arbitrary Objects

A list can contain any assortment of objects. The elements of a list can all be the same type:

>>> a=[2,4,6,8]>>> a[2, 4, 6, 8]

Or the elements can be of varying types:

>>> a=[21.42,'foobar',3,4,'bark',False,3.14159]>>> a[21.42, 'foobar', 3, 4, 'bark', False, 3.14159]

Lists can even contain complex objects, like functions, classes, and modules, which you will learn about in upcoming tutorials:

>>> int<class 'int'>>>> len<built-in function len>>>> deffoo():... pass...>>> foo<function foo at 0x035B9030>>>> importmath>>> math<module 'math' (built-in)>>>> a=[int,len,foo,math]>>> a[<class 'int'>, <built-in function len>, <function foo at 0x02CA2618>,<module 'math' (built-in)>]

A list can contain any number of objects, from zero to as many as your computer’s memory will allow:

>>> a=[]>>> a[]>>> a=['foo']>>> a['foo']>>> a=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,... 21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,... 41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,... 61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,... 81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100]>>> a[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,97, 98, 99, 100]

(A list with a single object is sometimes referred to as a singleton list.)

List objects needn’t be unique. A given object can appear in a list multiple times:

>>> a=['bark','meow','woof','bark','cheep','bark']>>> a['bark', 'meow', 'woof', 'bark', 'cheep', 'bark']

List Elements Can Be Accessed by Index

Individual elements in a list can be accessed using an index in square brackets. This is exactly analogous to accessing individual characters in a string. List indexing is zero-based as it is with strings.

Consider the following list:

>>> a=['foo','bar','baz','qux','quux','corge']

The indices for the elements in a are shown below:

List Indices

Here is Python code to access some elements of a:

>>> a[0]'foo'>>> a[2]'baz'>>> a[5]'corge'

Virtually everything about string indexing works similarly for lists. For example, a negative list index counts from the end of the list:

Negative List Indexing

>>> a[-1]'corge'>>> a[-2]'quux'>>> a[-5]'bar'

Slicing also works. If a is a list, the expression a[m:n] returns the portion of a from index m to, but not including, index n:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a[2:5]['baz', 'qux', 'quux']

Other features of string slicing work analogously for list slicing as well:

Both positive and negative indices can be specified:

>>> a[-5:-2]['bar', 'baz', 'qux']>>> a[1:4]['bar', 'baz', 'qux']>>> a[-5:-2]==a[1:4]True

Omitting the first index starts the slice at the beginning of the list, and omitting the second index extends the slice to the end of the list:

>>> print(a[:4],a[0:4])['foo', 'bar', 'baz', 'qux'] ['foo', 'bar', 'baz', 'qux']>>> print(a[2:],a[2:len(a)])['baz', 'qux', 'quux', 'corge'] ['baz', 'qux', 'quux', 'corge']>>> a[:4]+a[4:]['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> a[:4]+a[4:]==aTrue

You can specify a stride—either positive or negative:

>>> a[0:6:2]['foo', 'baz', 'quux']>>> a[1:6:2]['bar', 'qux', 'corge']>>> a[6:0:-2]['corge', 'qux', 'bar']

The syntax for reversing a list works the same way it does for strings:
```
>>> a[::-1]['corge', 'quux', 'qux', 'baz', 'bar', 'foo']
```
The [:] syntax works for lists. However, there is an important difference between how this operation works with a list and how it works with a string.
If s is a string, s[:] returns a reference to the same object:
```
>>> s='foobar'>>> s[:]'foobar'>>> s[:]issTrue
```
Conversely, if a is a list, a[:] returns a new object that is a copy of a:
```
>>> a=['foo','bar','baz','qux','quux','corge']>>> a[:]['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> a[:]isaFalse
```

Several Python operators and built-in functions can also be used with lists in ways that are analogous to strings:

The in and not in operators:

>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> 'qux'inaTrue>>> 'thud'notinaTrue

The concatenation (+) and replication (*) operators:

>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> a+['grault','garply']['foo', 'bar', 'baz', 'qux', 'quux', 'corge', 'grault', 'garply']>>> a*2['foo', 'bar', 'baz', 'qux', 'quux', 'corge', 'foo', 'bar', 'baz','qux', 'quux', 'corge']

The len(), min(), and max() functions:

>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> len(a)6>>> min(a)'bar'>>> max(a)'qux'

It’s not an accident that strings and lists behave so similarly. They are both special cases of a more general object type called an iterable, which you will encounter in more detail in the upcoming tutorial on definite iteration.

By the way, in each example above, the list is always assigned to a variable before an operation is performed on it. But you can operate on a list literal as well:

>>> ['foo','bar','baz','qux','quux','corge'][2]'baz'>>> ['foo','bar','baz','qux','quux','corge'][::-1]['corge', 'quux', 'qux', 'baz', 'bar', 'foo']>>> 'quux'in['foo','bar','baz','qux','quux','corge']True>>> ['foo','bar','baz']+['qux','quux','corge']['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> len(['foo','bar','baz','qux','quux','corge'][::-1])6

For that matter, you can do likewise with a string literal:

>>> 'If Comrade Napoleon says it, it must be right.'[::-1]'.thgir eb tsum ti ,ti syas noelopaN edarmoC fI'

Lists Can Be Nested

You have seen that an element in a list can be any sort of object. That includes another list. A list can contain sublists, which in turn can contain sublists themselves, and so on to arbitrary depth.

Consider this (admittedly contrived) example:

>>> x=['a',['bb',['ccc','ddd'],'ee','ff'],'g',['hh','ii'],'j']>>> x['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j']

The object structure that x references is diagrammed below:

A Nested List

x[0], x[2], and x[4] are strings, each one character long:

>>> print(x[0],x[2],x[4])a g j

But x[1] and x[3] are sublists:

>>> x[1]['bb', ['ccc', 'ddd'], 'ee', 'ff']>>> x[3]['hh', 'ii']

To access the items in a sublist, simply append an additional index:

>>> x[1]['bb', ['ccc', 'ddd'], 'ee', 'ff']>>> x[1][0]'bb'>>> x[1][1]['ccc', 'ddd']>>> x[1][2]'ee'>>> x[1][3]'ff'>>> x[3]['hh', 'ii']>>> print(x[3][0],x[3][1])hh ii

x[1][1] is yet another sublist, so adding one more index accesses its elements:

>>> x[1][1]['ccc', 'ddd']>>> print(x[1][1][0],x[1][1][1])ccc ddd

There is no limit, short of the extent of your computer’s memory, to the depth or complexity with which lists can be nested in this way.

All the usual syntax regarding indices and slicing applies to sublists as well:

>>> x[1][1][-1]'ddd'>>> x[1][1:3][['ccc', 'ddd'], 'ee']>>> x[3][::-1]['ii', 'hh']

However, be aware that operators and functions apply to only the list at the level you specify and are not recursive. Consider what happens when you query the length of x using len():

>>> x['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j']>>> len(x)5>>> x[0]'a'>>> x[1]['bb', ['ccc', 'ddd'], 'ee', 'ff']>>> x[2]'g'>>> x[3]['hh', 'ii']>>> x[4]'j'

x has only five elements—three strings and two sublists. The individual elements in the sublists don’t count toward x’s length.

You’d encounter a similar situation when using the in operator:

>>> 'ddd'inxFalse>>> 'ddd'inx[1]False>>> 'ddd'inx[1][1]True

'ddd' is not one of the elements in x or x[1]. It is only directly an element in the sublist x[1][1]. An individual element in a sublist does not count as an element of the parent list(s).

Lists Are Mutable

Most of the data types you have encountered so far have been atomic types. Integer or float objects, for example, are primitive units that can’t be further broken down. These types are immutable, meaning that they can’t be changed once they have been assigned. It doesn’t make much sense to think of changing the value of an integer. If you want a different integer, you just assign a different one.

By contrast, the string type is a composite type. Strings are reducible to smaller parts—the component characters. It might make sense to think of changing the characters in a string. But you can’t. In Python, strings are also immutable.

The list is the first mutable data type you have encountered. Once a list has been created, elements can be added, deleted, shifted, and moved around at will. Python provides a wide range of ways to modify lists.

Modifying a Single List Value

A single value in a list can be replaced by indexing and simple assignment:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge']>>> a[2]=10>>> a[-1]=20>>> a['foo', 'bar', 10, 'qux', 'quux', 20]

You may recall from the tutorial Strings and Character Data in Python that you can’t do this with a string:

>>> s='foobarbaz'>>> s[2]='x'Traceback (most recent call last):
  File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignment

A list item can be deleted with the del command:

>>> a=['foo','bar','baz','qux','quux','corge']>>> dela[3]>>> a['foo', 'bar', 'baz', 'quux', 'corge']

Modifying Multiple List Values

What if you want to change several contiguous elements in a list at one time? Python allows this with slice assignment, which has the following syntax:

a[m:n]=<iterable>

Again, for the moment, think of an iterable as a list. This assignment replaces the specified slice of a with <iterable>:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a[1:4]['bar', 'baz', 'qux']>>> a[1:4]=[1.1,2.2,3.3,4.4,5.5]>>> a['foo', 1.1, 2.2, 3.3, 4.4, 5.5, 'quux', 'corge']>>> a[1:6][1.1, 2.2, 3.3, 4.4, 5.5]>>> a[1:6]=['Bark!']>>> a['foo', 'Bark!', 'quux', 'corge']

The number of elements inserted need not be equal to the number replaced. Python just grows or shrinks the list as needed.

You can insert multiple elements in place of a single element—just use a slice that denotes only one element:

>>> a=[1,2,3]>>> a[1:2]=[2.1,2.2,2.3]>>> a[1, 2.1, 2.2, 2.3, 3]

Note that this is not the same as replacing the single element with a list:

>>> a=[1,2,3]>>> a[1]=[2.1,2.2,2.3]>>> a[1, [2.1, 2.2, 2.3], 3]

You can also insert elements into a list without removing anything. Simply specify a slice of the form [n:n] (a zero-length slice) at the desired index:

>>> a=[1,2,7,8]>>> a[2:2]=[3,4,5,6]>>> a[1, 2, 3, 4, 5, 6, 7, 8]

You can delete multiple elements out of the middle of a list by assigning the appropriate slice to an empty list. You can also use the del statement with the same slice:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a[1:5]=[]>>> a['foo', 'corge']>>> a=['foo','bar','baz','qux','quux','corge']>>> dela[1:5]>>> a['foo', 'corge']

Prepending or Appending Items to a List

Additional items can be added to the start or end of a list using the + concatenation operator or the += augmented assignment operator:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a+=['grault','garply']>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge', 'grault', 'garply']>>> a=['foo','bar','baz','qux','quux','corge']>>> a=[10,20]+a>>> a[10, 20, 'foo', 'bar', 'baz', 'qux', 'quux', 'corge']

Note that a list must be concatenated with another list, so if you want to add only one element, you need to specify it as a singleton list:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a+=20Traceback (most recent call last):
  File "<pyshell#58>", line 1, in <module>a+=20TypeError: 'int' object is not iterable>>> a+=[20]>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge', 20]

Note: Technically, it isn’t quite correct to say a list must be concatenated with another list. More precisely, a list must be concatenated with an object that is iterable. Of course, lists are iterable, so it works to concatenate a list with another list.

Strings are iterable also. But watch what happens when you concatenate a string onto a list:

>>> a=['foo','bar','baz','qux','quux']>>> a+='corge'>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'c', 'o', 'r', 'g', 'e']

This result is perhaps not quite what you expected. When a string is iterated through, the result is a list of its component characters. In the above example, what gets concatenated onto list a is a list of the characters in the string 'corge'.

If you really want to add just the single string 'corge' to the end of the list, you need to specify it as a singleton list:

>>> a=['foo','bar','baz','qux','quux']>>> a+=['corge']>>> a['foo', 'bar', 'baz', 'qux', 'quux', 'corge']

If this seems mysterious, don’t fret too much. You’ll learn about the ins and outs of iterables in the tutorial on definite iteration.

Methods That Modify a List

Finally, Python supplies several built-in methods that can be used to modify lists. Information on these methods is detailed below.

Note: The string methods you saw in the previous tutorial did not modify the target string directly. That is because strings are immutable. Instead, string methods return a new string object that is modified as directed by the method. They leave the original target string unchanged:

>>> s='foobar'>>> t=s.upper()>>> print(s,t)foobar FOOBAR

List methods are different. Because lists are mutable, the list methods shown here modify the target list in place.

a.append(<obj>)

Appends an object to a list.

a.append(<obj>) appends object <obj> to the end of list a:

>>> a=['a','b']>>> a.append(123)>>> a['a', 'b', 123]

Remember, list methods modify the target list in place. They do not return a new list:

>>> a=['a','b']>>> x=a.append(123)>>> print(x)None>>> a['a', 'b', 123]

Remember that when the + operator is used to concatenate to a list, if the target operand is an iterable, then its elements are broken out and appended to the list individually:

>>> a=['a','b']>>> a+[1,2,3]['a', 'b', 1, 2, 3]

The .append() method does not work that way! If an iterable is appended to a list with .append(), it is added as a single object:

>>> a=['a','b']>>> a.append([1,2,3])>>> a['a', 'b', [1, 2, 3]]

Thus, with .append(), you can append a string as a single entity:

>>> a=['a','b']>>> a.append('foo')>>> a['a', 'b', 'foo']

a.extend(<iterable>)

Extends a list with the objects from an iterable.

Yes, this is probably what you think it is. .extend() also adds to the end of a list, but the argument is expected to be an iterable. The items in <iterable> are added individually:

>>> a=['a','b']>>> a.extend([1,2,3])>>> a['a', 'b', 1, 2, 3]

In other words, .extend() behaves like the + operator. More precisely, since it modifies the list in place, it behaves like the += operator:

>>> a=['a','b']>>> a+=[1,2,3]>>> a['a', 'b', 1, 2, 3]

a.insert(<index>, <obj>)

Inserts an object into a list.

a.insert(<index>, <obj>) inserts object <obj> into list a at the specified <index>. Following the method call, a[<index>] is <obj>, and the remaining list elements are pushed to the right:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a.insert(3,3.14159)>>> a[3]3.14159>>> a['foo', 'bar', 'baz', 3.14159, 'qux', 'quux', 'corge']

a.remove(<obj>)

Removes an object from a list.

a.remove(<obj>) removes object <obj> from list a. If <obj> isn’t in a, an exception is raised:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a.remove('baz')>>> a['foo', 'bar', 'qux', 'quux', 'corge']>>> a.remove('Bark!')Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>a.remove('Bark!')ValueError: list.remove(x): x not in list

a.pop(index=-1)

Removes an element from a list.

This method differs from .remove() in two ways:

You specify the index of the item to remove, rather than the object itself.
The method returns a value: the item that was removed.

a.pop() simply removes the last item in the list:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a.pop()'corge'>>> a['foo', 'bar', 'baz', 'qux', 'quux']>>> a.pop()'quux'>>> a['foo', 'bar', 'baz', 'qux']

If the optional <index> parameter is specified, the item at that index is removed and returned. <index> may be negative, as with string and list indexing:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a.pop(1)'bar'>>> a['foo', 'baz', 'qux', 'quux', 'corge']>>> a.pop(-3)'qux'>>> a['foo', 'baz', 'quux', 'corge']

<index> defaults to -1, so a.pop(-1) is equivalent to a.pop().

Lists Are Dynamic

This tutorial began with a list of six defining characteristics of Python lists. The last one is that lists are dynamic. You have seen many examples of this in the sections above. When items are added to a list, it grows as needed:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a[2:2]=[1,2,3]>>> a+=[3.14159]>>> a['foo', 'bar', 1, 2, 3, 'baz', 'qux', 'quux', 'corge', 3.14159]

Similarly, a list shrinks to accommodate the removal of items:

>>> a=['foo','bar','baz','qux','quux','corge']>>> a[2:3]=[]>>> dela[0]>>> a['bar', 'qux', 'quux', 'corge']

Python Tuples

Python provides another type that is an ordered collection of objects, called a tuple.

Pronunciation varies depending on whom you ask. Some pronounce it as though it were spelled “too-ple” (rhyming with “Mott the Hoople”), and others as though it were spelled “tup-ple” (rhyming with “supple”). My inclination is the latter, since it presumably derives from the same origin as “quintuple,” “sextuple,” “octuple,” and so on, and everyone I know pronounces these latter as though they rhymed with “supple.”

Defining and Using Tuples

Tuples are identical to lists in all respects, except for the following properties:

Tuples are defined by enclosing the elements in parentheses (()) instead of square brackets ([]).
Tuples are immutable.

Here is a short example showing a tuple definition, indexing, and slicing:

>>> t=('foo','bar','baz','qux','quux','corge')>>> t('foo', 'bar', 'baz', 'qux', 'quux', 'corge')>>> t[0]'foo'>>> t[-1]'corge'>>> t[1::2]('bar', 'qux', 'corge')

Never fear! Our favorite string and list reversal mechanism works for tuples as well:

>>> t[::-1]('corge', 'quux', 'qux', 'baz', 'bar', 'foo')

Note: Even though tuples are defined using parentheses, you still index and slice tuples using square brackets, just as for strings and lists.

Everything you’ve learned about lists—they are ordered, they can contain arbitrary objects, they can be indexed and sliced, they can be nested—is true of tuples as well. But they can’t be modified:

>>> t=('foo','bar','baz','qux','quux','corge')>>> t[2]='Bark!'Traceback (most recent call last):
  File "<pyshell#65>", line 1, in <module>t[2]='Bark!'TypeError: 'tuple' object does not support item assignment

Why use a tuple instead of a list?

Program execution is faster when manipulating a tuple than it is for the equivalent list. (This is probably not going to be noticeable when the list or tuple is small.)
Sometimes you don’t want data to be modified. If the values in the collection are meant to remain constant for the life of the program, using a tuple instead of a list guards against accidental modification.
There is another Python data type that you will encounter shortly called a dictionary, which requires as one of its components a value that is of an immutable type. A tuple can be used for this purpose, whereas a list can’t be.

In a Python REPL session, you can display the values of several objects simultaneously by entering them directly at the >>> prompt, separated by commas:

>>> a='foo'>>> b=42>>> a,3.14159,b('foo', 3.14159, 42)

Python displays the response in parentheses because it is implicitly interpreting the input as a tuple.

There is one peculiarity regarding tuple definition that you should be aware of. There is no ambiguity when defining an empty tuple, nor one with two or more elements. Python knows you are defining a tuple:

>>> t=()>>> type(t)<class 'tuple'>

>>> t=(1,2)>>> type(t)<class 'tuple'>>>> t=(1,2,3,4,5)>>> type(t)<class 'tuple'>

But what happens when you try to define a tuple with one item:

>>> t=(2)>>> type(t)<class 'int'>

Doh! Since parentheses are also used to define operator precedence in expressions, Python evaluates the expression (2) as simply the integer 2 and creates an int object. To tell Python that you really want to define a singleton tuple, include a trailing comma (,) just before the closing parenthesis:

>>> t=(2,)>>> type(t)<class 'tuple'>>>> t[0]2>>> t[-1]2

You probably won’t need to define a singleton tuple often, but there has to be a way.

When you display a singleton tuple, Python includes the comma, to remind you that it’s a tuple:

>>> print(t)(2,)

Tuple Assignment, Packing, and Unpacking

As you have already seen above, a literal tuple containing several items can be assigned to a single object:

>>> t=('foo','bar','baz','qux')

When this occurs, it is as though the items in the tuple have been “packed” into the object:

Tuple Packing

>>> t('foo', 'bar', 'baz', 'qux')>>> t[0]'foo'>>> t[-1]'qux'

If that “packed” object is subsequently assigned to a new tuple, the individual items are “unpacked” into the objects in the tuple:

Tuple Unpacking

>>> (s1,s2,s3,s4)=t>>> s1'foo'>>> s2'bar'>>> s3'baz'>>> s4'qux'

When unpacking, the number of variables on the left must match the number of values in the tuple:

>>> (s1,s2,s3)=tTraceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>(s1,s2,s3)=tValueError: too many values to unpack (expected 3)>>> (s1,s2,s3,s4,s5)=tTraceback (most recent call last):
  File "<pyshell#17>", line 1, in <module>(s1,s2,s3,s4,s5)=tValueError: not enough values to unpack (expected 5, got 4)

Packing and unpacking can be combined into one statement to make a compound assignment:

>>> (s1,s2,s3,s4)=('foo','bar','baz','qux')>>> s1'foo'>>> s2'bar'>>> s3'baz'>>> s4'qux'

Again, the number of elements in the tuple on the left of the assignment must equal the number on the right:

>>> (s1,s2,s3,s4,s5)=('foo','bar','baz','qux')Traceback (most recent call last):
  File "<pyshell#63>", line 1, in <module>(s1,s2,s3,s4,s5)=('foo','bar','baz','qux')ValueError: not enough values to unpack (expected 5, got 4)

In assignments like this and a small handful of other situations, Python allows the parentheses that are usually used for denoting a tuple to be left out:

>>> t=1,2,3>>> t(1, 2, 3)>>> x1,x2,x3=t>>> x1,x2,x3(1, 2, 3)>>> x1,x2,x3=4,5,6>>> x1,x2,x3(4, 5, 6)>>> t=2,>>> t(2,)

It works the same whether the parentheses are included or not, so if you have any doubt as to whether they’re needed, go ahead and include them.

Tuple assignment allows for a curious bit of idiomatic Python. Frequently when programming, you have two variables whose values you need to swap. In most programming languages, it is necessary to store one of the values in a temporary variable while the swap occurs like this:

>>> a='foo'>>> b='bar'>>> a,b('foo', 'bar')>>># We need to define a temp variable to accomplish the swap.>>> temp=a>>> a=b>>> b=temp>>> a,b('bar', 'foo')

In Python, the swap can be done with a single tuple assignment:

>>> a='foo'>>> b='bar'>>> a,b('foo', 'bar')>>># Magic time!>>> a,b=b,a>>> a,b('bar', 'foo')

As anyone who has ever had to swap values using a temporary variable knows, being able to do it this way in Python is the pinnacle of modern technological achievement. It will never get better than this.

Conclusion

This tutorial covered the basic properties of Python lists and tuples, and how to manipulate them. You will use these extensively in your Python programming.

One of the chief characteristics of a list is that it is ordered. The order of the elements in a list is an intrinsic property of that list and does not change, unless the list itself is modified. (The same is true of tuples, except of course they can’t be modified.)

The next tutorial will introduce you to the Python dictionary: a composite data type that is unordered. Read on!

« Strings in Python

Lists and Tuples in Python

Dictionaries in Python »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Django Weblog: Django 2.1 release candidate 1 released

July 18, 2018, 8:38 am

≫ Next: EuroPython: EuroPython 2018: Find a new job at the conference

≪ Previous: Real Python: Lists and Tuples in Python

Django 2.1 release candidate 1 is the final opportunity for you to try out the smorgasbord of new features before Django 2.1 is released.

The release candidate stage marks the string freeze and the call for translators to submit translations. Provided no major bugs are discovered that can't be solved in the next two weeks, Django 2.1 will be released on or around August 1. Any delays will be communicated on the django-developers mailing list thread.

Please use this opportunity to help find and fix bugs (which should be reported to the issue tracker). You can grab a copy of the package from our downloads page or on PyPI.

The PGP key ID used for this release is Tim Graham: 1E8ABDC773EDE252.

↧

EuroPython: EuroPython 2018: Find a new job at the conference

July 18, 2018, 9:31 am

≫ Next: EuroPython: EuroPython 2018: Day passes now also valid for Sprints Weekend

≪ Previous: Django Weblog: Django 2.1 release candidate 1 released

We’d like to draw your attention to our job board, with plenty of job ads from our sponsors:

EuroPython 2018 Job Board

We will also send out job ad emails to attendees who have opt’ed in to receiving these emails. If you are interested, please log in, go to your profile and enable the recruiting email option in the privacy section:

Note that we will not give your email addresses to sponsors, but only send out these emails on behalf of them.

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

↧

EuroPython: EuroPython 2018: Day passes now also valid for Sprints Weekend

July 18, 2018, 10:10 am

≫ Next: RMOTR: Bitcoin trading with Python — Bollinger Bands strategy analysis

≪ Previous: EuroPython: EuroPython 2018: Find a new job at the conference

Due to popular demand, we are making it possible to attend the Sprints Weekend (July 28-29), even if you only have a day pass or are considering to buy one and not a regular conference tickets which includes the sprints as well.

EuroPython 2018 Sprints (Hackathons)

If you have never been to a sprint, you’ll be amazed at how much you can learn from others while working on simple or more complex projects.

If you have already run sprints yourself, why not run one at EuroPython and get to know new people for your project ?

Please head on to our EuroPython Sprints page for more details. Come and join the sprinters !

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

↧

RMOTR: Bitcoin trading with Python — Bollinger Bands strategy analysis

July 18, 2018, 11:08 am

≫ Next: EuroPython: EuroPython 2018: Introducing Smarkets

≪ Previous: EuroPython: EuroPython 2018: Day passes now also valid for Sprints Weekend

As part of RMOTR’s Data Science program we teach our students to work with Pandas Time Series and Matplotlib plots.

We wanted to create a practical and engaging project to help them practice with those libraries. Bitcoin (and cryptocurrencies in general) is a hot topic at the moment, and most of the historical pricing information is open and free to be analyzed 💪.

Want to start learning Python? Sign up for our FREE Python Prep Course.

Introduction

This post will describe our experiment step by step playing with the Bitcoin dataset and analyzing the Bollinger Bands trading strategy over the historical data.

A lot of details will be excluded from this post, but everything is available for you to read, clone and play in the following Github repository as a Jupyter Notebook 👇

martinzugnoni/bollinger-bands-trading-analysis

Note: Information shared in this post and the Github repository must not be used as financial advice. This experiment is only for educational purposes. Use the code under your own risk!

1. Finding the prices dataset

To properly analyze a trading strategy, we first need to find a data set containing the historical BTCUSD prices.

For this experiment, we will use BTCUSD Bitstamp price that can be downloaded as a CSV dump from http://api.bitcoincharts.com/v1/csv/.

Each row in the dataset represents BTCUSD operations in Bitstamp (many of them per minute), including the price and volume. This might seem too detailed for the analysis we want to make, so we will mention later how to process the dataset into something more meaningful for our experiment.

If we parse the original CSV dataset as a Pandas DataFrame, it will look something like this:

We are ready to get started! Now, let’s see how to tune this DataFrame into something more useful to work with Bollinger Bands.

2. Building candles

As I anticipated earlier in this post, having information about each Bitstamp operation is too granular for our purpose. It would be enough to simply aggregate data by certain periods of time (i.e. each day).

To do this, we can convert our original DataFrame into Candlesticks. Each candle is represented by four values: Open, High, Low, Close prices (a.k.a. “ohlc”). We are basically aggregating all prices from a certain time window and calculating the max, mix, first and last values.

Doing this is as simple as using the Pandas Resampler “olhc” function. The resulting DataFrame will looking similar to this:

Note that now each row represents a single day (the candle time window) and each column represents individual values of OLHC as explained before.

3. Adding the Bollinger Bands

Before adding the bands, we will need to calculate two metrics in our DataFrame: rolling mean and standard deviation.

Both bands will simply be “x” times the standard deviation over and below the rolling mean. “x” is one of the parameters of the strategy configuration (more about this later), as well as the amount of period used for calculating the rolling mean.

For the scope of this post, let’s consider a window of 30 periods for the rolling mean, and 1.5 times the standard deviation for the bands.

Our expanded DataFrame should look like this:

I’ve added three new columns for each of the metrics described above.
(see more details about how I calculated them in the Jupyter Notebook)

4. Define the trading strategy

Our strategy will be very simple. We will open a “long” position whenever the price of BTCUSD crosses the lower band (green line), and change to a “short” position when the price crosses the upper band (orange line).

Easy, right? 😉

This plot shows us the result of the strategy over the whole 2018 set of prices.

As you can see, the strategy found three opportunities to operate (see vertical lines). Each green vertical line represent a “long” position and each red vertical line represents a “short” one. We initially set a long position and the price went up (that’s good), we then shorted it and the price when down again (this looks great!), and finally we set a long position again but the price started dropping. This might cause some losses in our final returns. We will get into more details about the strategy returns later.

5. Calculating the strategy returns

Based on the operations we opened, we can evaluate how good or bad the strategy was by analyzing how the market price changed day by day and see if it was positive for us depending on the position we took.

If we accumulate daily gains and losses, we can see the returns as a plot over each period of the simulation. For the current configuration of the strategy, it should look similar to:

The accumulated return definitely looks terrible! We lost around 38% of the investment after the simulation. 😞

It doesn’t mean the strategy is not valid. It actually only means the parameters we used for the rolling window and number of standard deviations are not good enough.

We should find a way to test different configurations of the same strategy, and see if they give us better returns 😉.

6. Finding better configurations

We have three main variables that we can tweak to see if we improve the returns of the simulation:

How many hours represent each period.
We were using daily candles (24 hours) before, but 6H or 12H might give us better results
Number of periods for the rolling mean.
Amount of standard deviations to calculate the bands.

It would be great to test every single possible configuration for the strategy, but this tends to be infinite amount of possibilities.
My proposal is to define a space of valid values for each variable, and then take some random samples from that space.
With this approach, we can evaluate if any of the configurations end up being a profitable investment. 🙏

The next figure shows us all return curves, for each individual configuration of the strategy, in a single canvas.

We can easily compare them, and figure out that at least a few of them have positive returns! 🎉

If we deeply investigate into each configuration, we can find that using 12H candles, 100 periods for the rolling window and 1.5 standard deviations should give us the best returns.

That’s 34% returns over the initial investment. Not bad at all! 😉

Final notes

Playing with prices and simulating eventual returns of different strategies is a pretty fun experiment, but keep in mind that it is JUST an experiment.

Don’t assume that positive returns in any of these strategies will actually give you real gains while trading in the real market. Conclusions of this experiment are based on past events and we can never ensure that future events will follow the same results.

Read, play, and learn as much as you want, but use this content at your own risk. ¯\_(ツ)_/¯

Bitcoin trading with Python — Bollinger Bands strategy analysis 🐍 was originally published in rmotr.com on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

EuroPython: EuroPython 2018: Introducing Smarkets

July 18, 2018, 5:01 am

≫ Next: py.CheckIO: Design Patterns. Part 4

≪ Previous: RMOTR: Bitcoin trading with Python — Bollinger Bands strategy analysis

Please find below a hosted blog post from Smarkets.

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

Smarkets: where Python and financial trading meet

We combine financial technology with the startup culture of fast development and frequent releases; the perfect environment to make use of Python’s data science and rapid prototyping capabilities.

↧

py.CheckIO: Design Patterns. Part 4

July 18, 2018, 9:16 am

≫ Next: EuroPython: EuroPython 2018: Find a new job at the conference

≪ Previous: EuroPython: EuroPython 2018: Introducing Smarkets

In the final article of the series on design patterns you’ll learn about the State, which is very effective in situations where it’s necessary to describe different behaviors depending on the states of the same object, as well as about the Interpreter, which will help to interpret the obscure language and make it more understandable, or vice versa (if your task is to encrypt and protect sensitive information).

↧

EuroPython: EuroPython 2018: Find a new job at the conference

July 18, 2018, 9:31 am

≫ Next: EuroPython: EuroPython 2018: Day passes now also valid for Sprints Weekend

≪ Previous: py.CheckIO: Design Patterns. Part 4

We’d like to draw your attention to our job board, with plenty of job ads from our sponsors:

EuroPython 2018 Job Board

We will also send out job ad emails to attendees who have agreed to receiving these emails. If you are interested, please log in, go to your profile and enable the recruiting email option in the privacy section:

Note that we will not give your email addresses to sponsors, but only send out these emails on behalf of them.

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

↧

EuroPython: EuroPython 2018: Day passes now also valid for Sprints Weekend

July 18, 2018, 10:10 am

≫ Next: Python Engineering at Microsoft: Python in Visual Studio Code – June & July 2018 Release

≪ Previous: EuroPython: EuroPython 2018: Find a new job at the conference

Due to popular demand, we are making it possible to attend the Sprints Weekend (July 28-29) even if you only have a day pass or are considering to buy one and not a regular conference tickets which includes the sprints as well.

EuroPython 2018 Sprints (Hackathons)

If you have never been to a sprint, you’ll be amazed at how much you can learn from others while working on simple or more complex projects.

If you have already run sprints yourself, why not run one at EuroPython and get to know new people for your project ?

Please head on to our EuroPython Sprints page for more details. Come and join the sprinters !

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

↧

Python Engineering at Microsoft: Python in Visual Studio Code – June & July 2018 Release

July 18, 2018, 11:10 am

≫ Next: Python Engineering at Microsoft: Introducing the Python Language Server

≪ Previous: EuroPython: EuroPython 2018: Day passes now also valid for Sprints Weekend

We are pleased to announce that the June & July 2018 releases of the Python Extension for Visual Studio Code is now available from the marketplace and the gallery. You can download the Python extension from the marketplace, or install it directly from the extension gallery within Visual Studio Code. You can learn more about Python support in Visual Studio Code in the VS Code documentation.

Between these two releases we have closed a total of 156 issues including introducing a new experimental language server and gevent support in our experimental debugger.

Preview of Python language server

We are pleased to make available an opt-in preview of the Microsoft Python Language Server in Visual Studio Code. The language server is the IntelliSense engine from Visual Studio that implements the language server protocol, and brings the following benefits to Visual Studio Code developers.

Syntax errors as you type in code:

Warnings when modules are not found:

Using Typeshed files to fill in missing completions for modules
Improved performance for analyzing your workspace and presenting completions
Ability to detect syntax errors on your entire workspace, rather than just the current file.
Faster startup times
Faster imports
Better handling for a number of language constructs

To try out the new language server, go to File > Preferences > Settings and add the following option:

"python.jediEnabled": false

This will trigger a notification asking you to reload Visual Studio Code. Upon reload it will begin downloading the language server for your operating system. Our meta issue for the language server goes into more details of the install process as well as provides an FAQ for troubleshooting issues.

After some time using the language server you will see a prompt to fill out a survey, so please let us know how it works for you.

gevent launch configuration for debugging

Contributed by Bence Nagy at the PyCon US 2018 sprints, the experimental debugger now supports a gevent launch configuration for code that has been monkey-patched by gevent. A predefined debugging template named "Python Experimental: Gevent" is available, as well as adding the setting "gevent": true to any launch configuration.

Various Fixes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. The full list of improvements is listed in our changelog, some notable improvements are:

Changed the keyboard shortcut for Run Selection/Line in Python Terminal to Shift+Enter. (#1875)
Changed the keyboard shortcut for Run Selection/Line in Python Terminal to not interfere with the Find/Replace dialog box. (#2068)
Added a setting to control automatic test discovery on save, python.unitTest.autoTestDiscoverOnSaveEnabled. (thanks Lingyu Li) (#1037)
Ensured navigation to definitions follows imports and is transparent to decoration. (thanks Peter Law) (#1638)
Fix to display all interpreters in the interpreter list when a workspace contains a Pipfile. (#1800)
Automatically add path mappings for remote debugging when attaching to the localhost. (#1829)
Add support for the "source.organizeImports" setting for "editor.codeActionsOnSave" (thanks Nathan Gaberel)

Be sure to download the Python extension for VS Code now to try out the above improvements. If you run into any issues be sure to file an issue on the Python VS Code GitHub page.

↧

Python Engineering at Microsoft: Introducing the Python Language Server

July 18, 2018, 11:11 am

≫ Next: Bhishan Bhandari: Idiomatic Python – Writing better Python

≪ Previous: Python Engineering at Microsoft: Python in Visual Studio Code – June & July 2018 Release

Visual Studio has long been recognized for the quality of its IntelliSense (code analysis and suggestions) across all languages, and has had support for Python since 2011. We are pleased to announce that we are going to be making the Python support available to other tools as the Microsoft Python Language Server. It is available first today in the July release of the Python Extension for Visual Studio Code, and we will later release it as a standalone component that you can use with any tool that works with the Language Server Protocol.

Background on IntelliSense and Language Servers

Ever since the days of Visual Basic, one of the core features of the Visual Studio series of IDEs has been IntelliSense: auto-completions for variables, functions, and other symbols that appear as you type in your code. Through a clever combination of static code analysis, precompiled databases and UI overlays, developers are regularly blown away at how productive they can be with an editor that truly understands their code.

Fast forward to today, and IntelliSense is still one of the most important features out there. More tools are requiring users to write code, and completions are practically a necessity in these editors. However, writing the static analysis necessary to provide a great experience is difficult, and most implementations are very closely tied to the editor they work with. Enter the language server protocol.

Language servers are standalone programs implementing the language server protocol, and were created to work with Visual Studio Code. Editors can start running a language server and use this JSON-based communication channel to provide and request information about the user’s code. All of the analysis and "smart" operations are handled by the server, allowing the editor to focus on presentation and interaction with the user.

Visual Studio Code uses language servers for most of its supported languages, including C++, C# and Go. From the editor’s point of view there are no differences between these languages - all the intelligence exists in the language server. This means that it is easy to add support for new languages to Visual Studio Code, and it does not require changing the editor at all. Language servers can also be used with plugins for Sublime Text, vim and more.

Introducing the Python Language Server

Previously, Python IntelliSense in Visual Studio was very specific to that IDE. We have been developing this support for nearly a decade. It has an impressively deep understanding of the Python language, but only Visual Studio users have been able to enjoy this work. Recently we have been refactoring our implementation to separate it from Visual Studio and make it available as a standalone program using the language server protocol.

From the point of view of the editor, language servers are a black box that is given text and gives back lists of more text. But the black box normally contains a process known as static type inferencing, where the language server determines ("infers") the type of each variable without actually running the code. For statically-typed languages, such as C#, this is often as simple as finding the variable definition and the type specified there. However, Python variables can change type any time they are assigned, and assignments can happen almost anywhere in any of the code that is run. This actually makes perfect static type inferencing impossible!

(Technical aside: Variables are often thought of as "holes" into which only compatible values can "fit", where the shape of the hole is determined by its type. In Python, variables are names that are attached ("bound") to the value when it is assigned. Assigning a new name always re-binds the value regardless of whether the type is the same as the previous one. So just because you see "self.value = ‘a string’" in one place doesn’t mean that "self.value" will always be a string.)

Our Python Language Server uses iterative full-program analysis to track the types of all the variables in your project while simulating execution of all the code in your project. Normally this kind of approach can take hours for complex programs and require unlimited amounts of RAM, but we have used many tricks to make it complete quickly enough for IntelliSense use. We have also made the tradeoffs necessary to provide useful information despite it not being possible to perfectly infer all types in a Python program.

The end result is that we have a black box that takes Python code and provides all the information your editor needs for tooltips, completions, finding definitions and references, global variable renaming, and more. For performance, it runs with .NET Core on Windows, macOS and Linux, works with Python 2.5 through to Python 3.7 and supports the latest language features such as async/await, type annotations and type stub packages (including typeshed, a copy of which is included with the language server). It performs incremental updates as you type, and is already proven as a core feature of Visual Studio.

Benefits for Python in VS Code

Our July release of the Python extension for Visual Studio Code will include an early version of the Python Language Server. Features that are new for VS Code developers in this release include:

Syntax errors as you type in code
Warnings when modules are not found
Using typeshed files to fill in missing completions for modules
Improved performance for analyzing your workspace and presenting completions
Ability to detect syntax errors on your entire workspace, rather than just the current file.
Faster startup times
Faster imports
Better handling for a number of language constructs

All of these are already available in Visual Studio 2017 or will be in the next minor update.

Having a standalone, cross-platform language server means that we can continue to innovate and improve on our IntelliSense experience for Python developers in both Visual Studio and Visual Studio Code at the same time.

Be sure to check out our VS Code release announcement for more information. The standalone release of the Python Language Server will follow in the next few months, and will be available under the Apache 2.0 license.

↧

Bhishan Bhandari: Idiomatic Python – Writing better Python

July 18, 2018, 6:22 pm

≫ Next: Rene Dudfield: pygame 1.9.4 released

≪ Previous: Python Engineering at Microsoft: Introducing the Python Language Server

This is a follow-up post of Idiomatic Python – Looping Approaches. The purpose of the article is to highlight on better code and encourage it. Looping over dictionary keys >>> books_price = { ... 'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17, ... 'The Self-Taught Programmer: The Definitive Guide to Programming Professionally': 15.09, ... […]

The post Idiomatic Python – Writing better Python appeared first on The Tara Nights.

↧

Rene Dudfield: pygame 1.9.4 released

July 19, 2018, 3:14 am

≫ Next: EuroPython: EuroPython 2018: Sponsored trainings

≪ Previous: Bhishan Bhandari: Idiomatic Python – Writing better Python

pygame 1.9.4 has been released into the wild!

TLDR; Some highlights.

python 3.7 support.
beta pypy support. See Are we pypy yet?.
pygame.draw fixes
pygame.math is not experimental anymore. Speedups and bugfixes.
Debian, Mac homebrew, mac virtualenv, manylinux and other platform fixes.
documentation fixes, jedi support for type ahead in editors like VSCode and VIM.
Surface.blits for blitting many surfaces at once more quickly.

Thanks

A very special thanks to the people who have volunteered commits to pygame since the last release. In alphabetical order...
Adam Di Carlo (@adicarlo) | Christian Bender (@christianbender) | Don Kirkby (@donkirkby) | endolith (@endolith) | hjpotter92 (@hjpotter92) | Ian Mallett (@imallett) | Lenard Lindstrom (@llindstrom) | Mathias Weber (@mweb) | Matti Picus (@mattip) | Nicholas Tollervey (@ntoll) | (@orangudan) | Raymon Skjørten Hansen (@raymonshansen) | René Dudfield (@illume) | Stefan Bethge (@kjyv) | Stuart Axon (@stuaxo) | Thomas Kluyver (@takluyver) | Tobias Persson (@Anisa)

I'm probably missing some people, and also missing some people who contributed in other ways.
For example, in discussions, issue reports, helping out on the wiki, the website, and for helping others
in the community, and providing good vibes. So whilst the commits are easy to use to make a list of people to thank, it's not inclusive of everyone who deserves thanks.

More details.

#451 #460 #467 #468 #469 #470
#444 link to help pages when compile fails.
#443 In set_error get_error tests ignore first error. Could be anything.
#442 Freetype requires pkg-config instead of freetype-config now.
#439 Surface.blits
#435 Adding pypy builds for Mac on travis.
#432 Appveyor pypy and pypy3 windows 32bit.
#431 Implement object alloc caching for rect.c to improve on pypy.
#427 PixelArray.close(), with PixelArray(surf) as px, context manager.
#426 Skip tests that rely on arrinter and pythonapi on pypy.
#420 pypy didn't like tp_dictoffset hack in events. Make our own setter, getter.
#418 draw.aaline should work with ARGB surfaces (like on mac).
#416 Vector cleanup
#415 So virtualenv gets a focused window on Mac too.
#414 Mac Travis homebrew fix
#413 Jedi confused by pygame imports. Make it happy.
#408 pygame.transform.threshold tests, keyword arguments, docs.
#403 pygame.math.Vector2/3 not experimental
#398 Clean up _camera_vidcapture.py unused code, and document a bit.
#394 Add pitch bend to MIDI library
#392 Add pypy builder to travis ci, and allow it to fail.
#391 ppc64le and other Debian fixes
#389 pygame.draw.circle with a thickness had a weird moiré pattern.
#387 test python 3.7 on travis CI.
#386 python 3.7 fixes.
#384 pygame.display doc fixes.
#381 import rect.inflate docs.
#363 Fix several typos, and improve grammar in the introduction.
#361 Add unit test for some key functions.
#360 update math.c for pypy.
#357 add UYVY support for better linux camera support.
#356 Fix aaellipse artifacts
703350f Update Rect slicing for Python 3
6d0e97a bug fix for freetype.Font.render_to()
#78 Add environment PYGAME_EXTRA_BASE to add an extra base directory to the start of the search path.
#77 Build alsa libs ourselves for manylinux builds.
#76 Docs fixup.

↧

EuroPython: EuroPython 2018: Sponsored trainings

July 19, 2018, 7:15 am

≫ Next: A. Jesse Jiryu Davis: Help Me Offer Professional Coaching to PyGotham Speakers

≪ Previous: Rene Dudfield: pygame 1.9.4 released

We’d like to highlight a special offering by our sponsors Greymatter / Intel and Smarkets: trainings which you are free to attend with a conference ticket.

A minimal test.

But why write a unit test anyway?

Let's write a unit test!

Grab a fork, and let's dig in.

Standard unittest module.

How to run a single test?

So, let's run the test...

Digression: Low hanging fruit, help wanted.

Digression: Contribution guide.

Back to the test.

Dask on HPC Machines

Get involved

Dask / Scikit-learn talk

MyBinder and Bokeh Servers

Dask and Automated Machine Learning with TPOT

Get involved

Dask and Scikit-Optimize

Centralize PyData/Scipy tutorials on Binder

Motivation

Get involved

Dask, Actors, and Ray

Planning conversations for Dask-ML

Case Studies

Algorithms

Get involved

Dask and UMAP for low-dimensional embeddings

Dask stories

Prerequisites:

What is probability?

From statistics to probability

The data and the distribution

Revisiting the normal

Central Limit Theorem

Three Sigma Rule

Z-score

Conclusion

Further Reading

Smarkets: where Python and financial trading meet

Python Lists

Lists Are Ordered

Lists Can Contain Arbitrary Objects

List Elements Can Be Accessed by Index

Lists Can Be Nested

Lists Are Mutable

Modifying a Single List Value

Modifying Multiple List Values

Prepending or Appending Items to a List

Methods That Modify a List

Lists Are Dynamic

Python Tuples

Defining and Using Tuples

Tuple Assignment, Packing, and Unpacking

Conclusion

Introduction

1. Finding the prices dataset

2. Building candles

3. Adding the Bollinger Bands

4. Define the trading strategy

5. Calculating the strategy returns

6. Finding better configurations

Final notes

Smarkets: where Python and financial trading meet

Preview of Python language server

gevent launch configuration for debugging

Various Fixes and Enhancements

Background on IntelliSense and Language Servers

Introducing the Python Language Server

Benefits for Python in VS Code

TLDR; Some highlights.

Thanks

More details.

Sponsored training sessions in Room Lammermuir