Codementor: Python, Javascript, and Web automation

November 22, 2019, 4:33 am

≫ Next: Ian Ozsvald: Higher Performance Python (ODSC 2019)

≪ Previous: Python Anywhere: System update on 21 November 2019

Comparing Python and Javascript in the context of web automation.

↧

Ian Ozsvald: Higher Performance Python (ODSC 2019)

November 22, 2019, 5:21 am

≫ Next: Jaime Buelta: “Hands-On Docker for Microservices with Python” is now available!

≪ Previous: Codementor: Python, Javascript, and Web automation

Building on PyDataCambridge last week I had the additional pleasure of talking on Higher Performance Python at ODSC 2019 yesterday. I had a brilliant room of 300 Pythonic data scientists at all levels who asked an interesting array of questions:

Happy smiling audience

This talk expanded on last week’s version at PyDataCambridge as I had some more time. The problem-intro was a little longer (and this helped set the scene as I had more first-timers in the room), then I dug a little further into Pandas and added extra advice at the end. Overall I covered:

Robert Kern’s line_profiler to profile performance in sklearn’s “fit” method against a custom numpy function
Pandas function calling using iloc/iterrows/apply and apply with raw=True (in increasingly-fast order)
Using Swifter and Dask to parallelise over many cores
Using Numba to get an easy additional 10x speed-up
Discussed highly-performant team advice to sanity check some of the options

“It was a fantastic talk.” – Stewart

My publisher O’Reilly were also kind enough to send over a box of the 1st edition High Performance Python books for signing, just as I did in Cambridge last week. As usual I didn’t have enough free books for those hoping for a copy – sorry if you missed out (I only get given a limited set to give away). The new content for the 2nd edition is available online in O’Reilly’s Safari Early Access Programme.

The talk ends with my customary note requesting a postcard if you learned something useful – feel free to send me an email asking for my address, I love to receive postcards I have an email announce list for my upcoming training in January with a plan to introduce a High Performance Python training day, so join that list if you’d like low-volume announcements. I have a twice-a-month email list for “Ian’s Thoughts & Jobs Listing” which includes jobs I know about in our UK community and my recommendations and notes. Join this if you’d like an idea of what’s happening in the UK Pythonic Data Science scene.

The 2nd edition of High Performance Python should be out for next April, preview it in the Early Access Programme here.

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

The post Higher Performance Python (ODSC 2019) appeared first on Entrepreneurial Geekiness.

↧

Jaime Buelta: “Hands-On Docker for Microservices with Python” is now available!

November 22, 2019, 5:33 am

≫ Next: Catalin George Festila: Python 3.7.5 : Display a file in the hexadecimal and binary output.

≪ Previous: Ian Ozsvald: Higher Performance Python (ODSC 2019)

Last year I published a book, and I liked the experience, so I wrote another! The book is called Hands-On Docker for Microservices with Python, and it goes through the different steps to move from a Monolith Architecture towards a Microservices one. It is written from a very practical stand point, and aims to cover … Continue reading

↧

Catalin George Festila: Python 3.7.5 : Display a file in the hexadecimal and binary output.

November 19, 2019, 2:43 am

≫ Next: Mike C. Fletcher: Getting Twitch out the Door (but not as Twitch)

≪ Previous: Jaime Buelta: “Hands-On Docker for Microservices with Python” is now available!

This is an example with a few python3 modules that display a file in the hexadecimal and binary output: import sys import os.path import argparse parser = argparse.ArgumentParser() parser.add_argument("FILE", help="the file that you wish to dump to hexadecimal", type=str) parser.add_argument("-b", "--binary", help="display bytes in binary format instead of hexadecimal") args = parser.parse_args(

↧

Mike C. Fletcher: Getting Twitch out the Door (but not as Twitch)

November 22, 2019, 9:13 am

≫ Next: NumFOCUS: Now Hiring: Matplotlib Research Software Engineering Fellow

≪ Previous: Catalin George Festila: Python 3.7.5 : Display a file in the hexadecimal and binary output.

As part of trying to get testing done for a PyOpenGL release, I finally got around to testing Twitch, porting it to Python 3.6 and doing a release, only to discover that in the 4 years (!) since I last worked on it, the original package name got used on PyPI. Duh. So Twitch is now formally Twitch OGLC/twitchoglc (for OpenGLContext, on which it's based). If you don't release early and often you lose, folks.

What is twitch, you ask? Well, it's a proof-of-concept loader for Quake III Style .bsp maps. Not so much a useful renderer as something that shows how you can use numpy to load binary formats into relatively efficient OpenGL rendering code. Currently the lighting and textures are rather crap-tastic, and we definitely have problems with degenerate (single sided) geometry, but again, the point is to provide sample code, rather than an actual rendering engine.

↧

NumFOCUS: Now Hiring: Matplotlib Research Software Engineering Fellow

November 22, 2019, 9:19 am

≫ Next: Stack Abuse: Dimensionality Reduction in Python with Scikit-Learn

≪ Previous: Mike C. Fletcher: Getting Twitch out the Door (but not as Twitch)

The post Now Hiring: Matplotlib Research Software Engineering Fellow appeared first on NumFOCUS.

↧

Stack Abuse: Dimensionality Reduction in Python with Scikit-Learn

November 22, 2019, 10:30 am

≫ Next: qutebrowser development blog: qutebrowser meetup Berlin (2019-11-28)

≪ Previous: NumFOCUS: Now Hiring: Matplotlib Research Software Engineering Fellow

Introduction

In machine learning, the performance of a model only benefits from more features up until a certain point. The more features are fed into a model, the more the dimensionality of the data increases. As the dimensionality increases, overfitting becomes more likely.

There are multiple techniques that can be used to fight overfitting, but dimensionality reduction is one of the most effective techniques. Dimensionality reduction selects the most important components of the feature space, preserving them and dropping the other components.

Why is Dimensionality Reduction Needed?

There are a few reasons that dimensionality reduction is used in machine learning: to combat computational cost, to control overfitting, and to visualize and help interpret high dimensional data sets.

Often in machine learning, the more features that are present in the dataset the better a classifier can learn. However, more features also means a higher computational cost. Not only can high dimensionality lead to long training times, more features often lead to an algorithm overfitting as it tries to create a model that explains all the features in the data.

Because dimensionality reduction reduces the overall number of features, it can reduce the computational demands associated with training a model but also helps combat overfitting by keeping the features that will be fed to the model fairly simple.

Dimensionality reduction can be used in both supervised and unsupervised learning contexts. In the case of unsupervised learning, dimensionality reduction is often used to preprocess the data by carrying out feature selection or feature extraction.

The primary algorithms used to carry out dimensionality reduction for unsupervised learning are Principal Component Analysis (PCA) and Singular Value Decomposition (SVD).

In the case of supervised learning, dimensionality reduction can be used to simplify the features fed into the machine learning classifier. The most common methods used to carry out dimensionality reduction for supervised learning problems is Linear Discriminant Analysis (LDA) and PCA, and it can be utilized to predict new cases.

Take note that the use cases described above are general use cases and not the only conditions these techniques are used in. After all, dimensionality reduction techniques are statistical methods and their use is not restricted by machine learning models.

Let's take some time to explain the ideas behind each of the most common dimensionality reduction techniques.

Principal Component Analysis

Principal Component Analysis (PCA) is a statistical method that creates new features or characteristics of data by analyzing the characteristics of the dataset. Essentially, the characteristics of the data are summarized or combined together. You can also conceive of Principal Component Analysis as "squishing" data down into just a few dimensions from much higher dimensions space.

To be more concrete, a drink might be described by many features, but many of these features will be redundant and relatively useless for identifying the drink in question. Rather than describing wine with features like aeration, C02 levels, etc., they could more easily be described by color, taste, and age.

Principal Component Analysis selects the "principal" or most influential characteristics of the dataset and creates features based on them. By choosing only the features with the most influence on the dataset, the dimensionality is reduced.

PCA preserves the correlations between variables when it creates new features. The principal components created by the technique are linear combinations of the original variables, calculated with concepts called eigenvectors.

It is assumed that the new components are orthogonal, or unrelated to one another.

PCA Implementation Example

Let's take a look at how PCA can be implemented in Scikit-Learn. We'll be using the Mushroom classification dataset for this.

First, we need to import all the modules we need, which includes PCA, train_test_split, and labeling and scaling tools:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")

After we load in the data, we'll check for any null values. We'll also encode the data with the LabelEncoder. The class feature is the first column in the dataset, so we split up the features and labels accordingly:

m_data = pd.read_csv('mushrooms.csv')

# Machine learning systems work with integers, we need to encode these
# string characters into ints

encoder = LabelEncoder()

# Now apply the transformation to all the columns:
for col in m_data.columns:
    m_data[col] = encoder.fit_transform(m_data[col])

X_features = m_data.iloc[:,1:23]
y_label = m_data.iloc[:, 0]

We'll now scale the features with the standard scaler. This is optional as we aren't actually running the classifier, but it may impact how our data is analyzed by PCA:

# Scale the features
scaler = StandardScaler()
X_features = scaler.fit_transform(X_features)

We'll now use PCA to get the list of features and plot which features have the most explanatory power, or have the most variance. These are the principle components. It looks like around 17 or 18 of the features explain the majority, almost 95% of our data:

# Visualize
pca = PCA()
pca.fit_transform(X_features)
pca_variance = pca.explained_variance_

plt.figure(figsize=(8, 6))
plt.bar(range(22), pca_variance, alpha=0.5, align='center', label='individual variance')
plt.legend()
plt.ylabel('Variance ratio')
plt.xlabel('Principal components')
plt.show()

alt

Let's convert the features into the 17 top features. We'll then plot a scatter plot of the data point classification based on these 17 features:

pca2 = PCA(n_components=17)
pca2.fit(X_features)
x_3d = pca2.transform(X_features)

plt.figure(figsize=(8,6))
plt.scatter(x_3d[:,0], x_3d[:,5], c=m_data['class'])
plt.show()

alt

Let's also do this for the top 2 features and see how the classification changes:

pca3 = PCA(n_components=2)
pca3.fit(X_features)
x_3d = pca3.transform(X_features)

plt.figure(figsize=(8,6))
plt.scatter(x_3d[:,0], x_3d[:,1], c=m_data['class'])
plt.show()

alt

Singular Value Decomposition

The purpose of Singular Value Decomposition is to simplify a matrix and make doing calculations with the matrix easier. The matrix is reduced to its constituent parts, similar to the goal of PCA. Understanding the ins and outs of SVD isn't completely necessary to implement it in your machine learning models, but having an intuition for how it works will give you a better idea of when to use it.

SVD can be carried out on either complex or real-valued matrices, but to make this explanation easier to understand, we'll go over the method of decomposing a real-valued matrix.

When doing SVD we have a matrix filled in with data and we want to reduce the number of columns the matrix has. This reduces the dimensionality of the matrix while still preserving as much of the variability in the data as possible.

We can say that Matrix A equals the transpose of matrix V:

$$
A = U * D * V^t
$$

Assuming we have some matrix A, we can represent that matrix as three other matrices called U, V, and D. Matrix A has the original x*y elements, while Matrix U is an orthogonal matrix containing x*x elements and Matrix V is a different orthogonal matrix containing y*y elements. Finally, D is a diagonal matrix containing x*y elements.

Decomposing values for a matrix involves converting the singular values in the original matrix into the diagonal values of the new matrix. Orthogonal matrices do not have their properties changed if they are multiplied by other numbers, and we can take advantage of this property to get an approximation of matrix A. When multiplying the orthogonal matrix together combined when the transpose of matrix V, we get a matrix that is equivalent to the original matrix A.

When we break/decompose matrix A down into U, D, and V, we then have three different matrices that contain the information of Matrix A.

It turns out that the left-most columns of the matrices hold the majority of our data, and we can select just these few columns to have a good approximation of Matrix A. This new matrix is much simpler and easier to work with, as it has far fewer dimensions.

SVD Implementation Example

One of the most common ways that SVD is used is to compress images. After all, the pixel values that make up the red, green, and blue channels in the image can just be reduced and the result will be an image that is less complex but still contains the same image content. Let's try using SVD to compress an image and render it.

We'll use several functions to handle the compression of the image. We'll really only need Numpy and the Image function from the PIL library in order to accomplish this, since Numpy has a method to carry out the SVD calculation:

import numpy
from PIL import Image

First, we'll just write a function to load in the image and turn it into a Numpy array. We then want to select the red, green, and blue color channels from the image:

def load_image(image):
    image = Image.open(image)
    im_array = numpy.array(image)

    red = im_array[:, :, 0]
    green = im_array[:, :, 1]
    blue = im_array[:, :, 2]

    return red, green, blue

Now that we have the colors, we need to compress the color channels. We can start by calling Numpy's SVD function on the color channel we want. We'll then create an array of zeroes that we'll fill in after the matrix multiplication is completed. We then specify the singular value limit we want to use when doing the calculations:

def channel_compress(color_channel, singular_value_limit):
    u, s, v = numpy.linalg.svd(color_channel)
    compressed = numpy.zeros((color_channel.shape[0], color_channel.shape[1]))
    n = singular_value_limit

    left_matrix = numpy.matmul(u[:, 0:n], numpy.diag(s)[0:n, 0:n])
    inner_compressed = numpy.matmul(left_matrix, v[0:n, :])
    compressed = inner_compressed.astype('uint8')
    return compressed

red, green, blue = load_image("dog3.jpg")
singular_val_lim = 350

After this, we do matrix multiplication on the diagonal and the value limits in the U matrix, as described above. This gets us the left matrix and we then multiply it with the V matrix. This should get us the compressed values which we transform to the ‘uint8' type:

def compress_image(red, green, blue, singular_val_lim):
    compressed_red = channel_compress(red, singular_val_lim)
    compressed_green = channel_compress(green, singular_val_lim)
    compressed_blue = channel_compress(blue, singular_val_lim)

    im_red = Image.fromarray(compressed_red)
    im_blue = Image.fromarray(compressed_blue)
    im_green = Image.fromarray(compressed_green)

    new_image = Image.merge("RGB", (im_red, im_green, im_blue))
    new_image.show()
    new_image.save("dog3-edited.jpg")

compress_image(red, green, blue, singular_val_lim)

We'll be using this image of a dog to test our SVD compression on:

alt

We also need to set the singular value limit we'll use, let's start with 600 for now:

red, green, blue = load_image("dog.jpg")
singular_val_lim = 350

Finally, we can get the compressed values for the three color channels and transform them from Numpy arrays into image components using PIL. We then just have to join the three channels together and show the image. This image should be a little smaller and simpler than the original image:

alt

Indeed, if you inspect the size of the images, you'll notice that the compressed one is smaller, though we've also had a bit of lossy compression. You can see some noise in the image as well.

You can play around with adjusting the singular value limit. The lower the chosen limit the greater the compression will be, but at a certain point image artifact-ing will show up and the image will degrade in quality:

def compress_image(red, green, blue, singular_val_lim):
    compressed_red = channel_compress(red, singular_val_lim)
    compressed_green = channel_compress(green, singular_val_lim)
    compressed_blue = channel_compress(blue, singular_val_lim)

    im_red = Image.fromarray(compressed_red)
    im_blue = Image.fromarray(compressed_blue)
    im_green = Image.fromarray(compressed_green)

    new_image = Image.merge("RGB", (im_red, im_green, im_blue))
    new_image.show()

compress_image(red, green, blue, singular_val_lim)

Linear Discriminant Analysis

Linear Discriminant Analysis operates by projecting data from a multidimensional graph onto a linear graph. The easiest way to conceive of this is with a graph filled up with data points of two different classes. Assuming that there is no line that will neatly separate the data into two classes, the two dimensional graph can be reduced down into a 1D graph. This 1D graph can then be used to hopefully achieve the best possible separation of the data points.

When LDA is carried out there are two primary goals: minimizing the variance of the two classes and maximizing the distance between the means of the two data classes.

In order to achieve this, a new axis will be plotted in the 2D graph. This new axis should separate the two data points based on the previously mentioned criteria. Once the new axis has been created the data points within the 2D graph are redrawn along the new axis.

LDA carries out three different steps to move the original graph to the new axis. First, the separability between the classes has to be calculated, and this is based on the distance between the class means or the between-class variance. In the next step, the within class variance must be calculated, which is the distance between the mean and sample for the different classes. Finally, the lower dimensional space that maximizes the between class variance has to be constructed.

LDA works best when the means of the classes are far from each other. If the means of the distribution are shared it won't be possible for LDA to separate the classes with a new linear axis.

LDA Implementation Example

Finally, let's see how LDA can be used to carry out dimensionality reduction. Note that LDA can be used as a classification algorithm in addition to carrying out dimensionality reduction.

We'll be using the Titanic dataset for the following example.

Let's start off by making all our necessary imports:

import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, f1_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

We'll now load in our training data, which we'll divide into training and validation sets.

Though, we need to do a little data preprocessing first. Let's drop the Name, Cabin, and Ticket columns as they don't carry a lot of useful info. We also need to fill in any missing data, which we'll replace with median values in the case of the Age feature and an S in the case of the Embarked feature:

training_data = pd.read_csv("train.csv")

# Let's drop the cabin and ticket columns
training_data.drop(labels=['Cabin', 'Ticket'], axis=1, inplace=True)

training_data["Age"].fillna(training_data["Age"].median(), inplace=True)
training_data["Embarked"].fillna("S", inplace=True)

We also need to encode the non-numerical features. We'll encode both the Sex and Embarked columns. Let's drop the Name column as well, since it seems unlikely to be useful in classification:

encoder_1 = LabelEncoder()

# Fit the encoder on the data
encoder_1.fit(training_data["Sex"])

# Transform and replace the training data
training_sex_encoded = encoder_1.transform(training_data["Sex"])
training_data["Sex"] = training_sex_encoded

encoder_2 = LabelEncoder()
encoder_2.fit(training_data["Embarked"])

training_embarked_encoded = encoder_2.transform(training_data["Embarked"])
training_data["Embarked"] = training_embarked_encoded

# Assume the name is going to be useless and drop it
training_data.drop("Name", axis=1, inplace=True)

We need to scale the values, but the Scaler tool takes arrays, so the values we want to reshape need to be turned into arrays first. After that, we can scale the data:

# Remember that the scaler takes arrays
ages_train = np.array(training_data["Age"]).reshape(-1, 1)
fares_train = np.array(training_data["Fare"]).reshape(-1, 1)

scaler = StandardScaler()

training_data["Age"] = scaler.fit_transform(ages_train)
training_data["Fare"] = scaler.fit_transform(fares_train)

# Now to select our training and testing data
features = training_data.drop(labels=['PassengerId', 'Survived'], axis=1)
labels = training_data['Survived']

We can now select the training features and labels and use train_test_split to make our training and validation data. It's easy to do classification with LDA, you handle it just like you would any other classifier in Scikit-Learn.

Just fit the function on the training data and have it predict on the validation/testing data. We can then print metrics for the predictions against the actual values:

X_train, X_val, y_train, y_val = train_test_split(features, labels, test_size=0.2, random_state=27)

model = LDA()
model.fit(X_train, y_train)
preds = model.predict(X_val)
acc = accuracy_score(y_val, preds)
f1 = f1_score(y_val, preds)

print("Accuracy: {}".format(acc))
print("F1 Score: {}".format(f1))

Here's the print out:

Accuracy: 0.8100558659217877
F1 Score: 0.734375

When it comes to transforming the data and reducing dimensionality, let's run a Logistic Regression classifier on the data first so we can see what our performance is prior to dimensionality reduction:

logreg_clf = LogisticRegression()
logreg_clf.fit(X_train, y_train)
preds = logreg_clf.predict(X_val)
acc = accuracy_score(y_val, preds)
f1 = f1_score(y_val, preds)

print("Accuracy: {}".format(acc))
print("F1 Score: {}".format(f1))

Here's the results:

Accuracy: 0.8100558659217877
F1 Score: 0.734375

Now we will transform the data features by specifying a number of desired components for LDA and fitting the model on the features and labels. We then just transform the features and save it into a new variable. Let's print out the original and reduced number of features:

LDA_transform = LDA(n_components=1)
LDA_transform.fit(features, labels)
features_new = LDA_transform.transform(features)

# Print the number of features
print('Original feature #:', features.shape[1])
print('Reduced feature #:', features_new.shape[1])

# Print the ratio of explained variance
print(LDA_transform.explained_variance_ratio_)

Here's the print out for the above code:

Original feature #: 7
Reduced feature #: 1
[1.]

We now just have to do train/test split again with the new features and run the classifier again to see how performance changed:

X_train, X_val, y_train, y_val = train_test_split(features_new, labels, test_size=0.2, random_state=27)

logreg_clf = LogisticRegression()
logreg_clf.fit(X_train, y_train)
preds = logreg_clf.predict(X_val)
acc = accuracy_score(y_val, preds)
f1 = f1_score(y_val, preds)

print("Accuracy: {}".format(acc))
print("F1 Score: {}".format(f1))

Accuracy: 0.8212290502793296
F1 Score: 0.7500000000000001

Conclusion

We've gone over the major methods of dimensionality reduction techniques: Principal Component Analysis, Singular Value Decomposition, and Linear Discriminant Analysis. These are statistical techniques you can use to help your machine learning models perform better, combat overfitting, and assist in data analysis.

While these three techniques are the most commonly used dimensionality reduction techniques, others exist. Other dimensionality techniques include kernel approximation and isomap spectral embedding.

↧

qutebrowser development blog: qutebrowser meetup Berlin (2019-11-28)

November 22, 2019, 11:55 am

≫ Next: NumFOCUS: mlpack Machine Learning Library joins NumFOCUS Sponsored Projects

≪ Previous: Stack Abuse: Dimensionality Reduction in Python with Scikit-Learn

I (The-Compiler) am currently in Berlin - I've met with Qt/QtWebEngine developers at Qt Contributors Summit and had some very interesting development discussions there. There are some writeups available in the Qt Wiki.

Next Thursday (28th) I'd like to have a small qutebrowser user meetup here :) We'll meet at 19 …

↧

NumFOCUS: mlpack Machine Learning Library joins NumFOCUS Sponsored Projects

November 22, 2019, 12:29 pm

≫ Next: Davy Wybiral: ESPlay Micro: Open Source ESP32 Game Console

≪ Previous: qutebrowser development blog: qutebrowser meetup Berlin (2019-11-28)

The post mlpack Machine Learning Library joins NumFOCUS Sponsored Projects appeared first on NumFOCUS.

↧

Davy Wybiral: ESPlay Micro: Open Source ESP32 Game Console

November 22, 2019, 8:49 am

≫ Next: Catalin George Festila: Python 3.7.5 : Create GUI with npyscreen.

≪ Previous: NumFOCUS: mlpack Machine Learning Library joins NumFOCUS Sponsored Projects

Today we'll take a look at the ESPlay Micro, an open source game console built around the ESP32 WROVER SoC. Aside from being a lightweight game console for retro emulation these boards also make for a great development platform to learn low level systems programming. The hardware and software are both open source and pretty straightforward.

PS: these would make for an awesome Christmas present for anyone into gaming or making.

↧

Catalin George Festila: Python 3.7.5 : Create GUI with npyscreen.

November 22, 2019, 8:04 pm

≫ Next: Ned Batchelder: Support windows bar calendar

≪ Previous: Davy Wybiral: ESPlay Micro: Open Source ESP32 Game Console

This python module solves the issue of creating easy GUI in the terminal. The development team tells us: Npyscreen is a python widget library and application framework for programming terminal or console applications. The development of this python module is similar to the PyQt python module. The npyscreen comes with many widgets and easy development. The GitHub repo for this python module comes

↧

Ned Batchelder: Support windows bar calendar

November 23, 2019, 5:18 am

≫ Next: Talk Python to Me: #239 Bayesian foundations

≪ Previous: Catalin George Festila: Python 3.7.5 : Create GUI with npyscreen.

Like any large suite of applications, Open edX software (my day job) depends on a number of underpinnings: Django, Python, Ubuntu, MySQL, and so on. We want to stay up-to-date on those dependencies, or at least ensure we are using supported versions of each.

To help with that, I wanted to make a chart of the support windows for different versions of each dependency. I figured the simplest way to draw a chart like that was to make a spreadsheet. Google Sheets is enough for something like this, and makes it easy to share the result with others who need to refer to it.

To create the spreadsheet programmatically, I used the JavaScript scripting support. My program went through a few other iterations before landing on this technique, so it’s in kind of a strange form now: it’s a Python program that writes JavaScript, which you then paste into a spreadsheet and run.

It makes a nice result, the Support Windows spreadsheet:

The tree-named things at the top of the chart are the Open edX releases. Mostly the chart is used to reason about when we need to upgrade the dependencies in order for Open edX releases to stay on supported versions. The bolder rectangles are the currently used versions.

The program is here: barcalendar.py. It’s all in one file, though at least the code is organized from general to specific: first color utilities, then a generic BaseCalendar class, then a GsheetCalendar class, then the code specific to our software to draw the chart.

When I thought about writing this blog post, I wanted to clean up the program first. Split it into multiple files, refactor the version logic to make some utilities, and so on. It was easy to imagine making the code more re-usable, more of a library. But I resisted letting the perfect be the enemy of the good. This program is useful to us, it might be useful to others. Why not share it now?

The hack of writing JavaScript code to be pasted into a spreadsheet feels slightly embarrassing: shouldn’t I at least be able to use the Gsheet API from Python to do the work?

But using the Gsheet API would mean struggling with authentication, which always seems to be difficult, and I can neatly sidestep by copying and pasting JavaScript. Not to mention, refactoring is easier this way because I can save the JavaScript output and check that the refactored code didn’t change it.

So in some ways, this is a low-tech implementation of the functional programming idea that you should separate I/O from computation. My barcalendar.py does the computation, generating a JavaScript “display list”. Then Google Sheets does the “I/O” when I run the JavaScript in the spreadsheet. Nice: I’m not an authentication-avoiding chump, I’m an insightful functional programmer!

This kind of separation is sometimes called “clean architecture.” Brandon Rhodes has a detailed talk about Clean Architecture in Python.

So enjoy barcalendar.py despite its flaws (or because of its genius). It’s working well for us. If you improve it or use it, let me know.

↧

Talk Python to Me: #239 Bayesian foundations

November 23, 2019, 12:00 am

≫ Next: Programiz: Python CSV

≪ Previous: Ned Batchelder: Support windows bar calendar

In this episode, we'll dive into one of the foundations of modern data science, Bayesian algorithms, and thinking. Join me along with guest Max Sklar as we look at the algorithmic side of data science.

↧

Programiz: Python CSV

November 22, 2019, 12:41 am

≫ Next: Weekly Python StackOverflow Report: (cciv) stackoverflow python report

≪ Previous: Talk Python to Me: #239 Bayesian foundations

In this tutorial, we will learn how to read and write into CSV files in Python with the help of examples.

↧

Weekly Python StackOverflow Report: (cciv) stackoverflow python report

November 24, 2019, 2:07 am

≫ Next: PyCon: Registration for PyCon US 2020 is open!

≪ Previous: Programiz: Python CSV

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-11-24 10:07:08 GMT

↧

PyCon: Registration for PyCon US 2020 is open!

November 20, 2019, 3:32 am

≫ Next: Programiz: Python CSV

≪ Previous: Weekly Python StackOverflow Report: (cciv) stackoverflow python report

We are excited to announce the opening of PyCon US 2020 registration. The registration site has been updated, tweaked, and tested all in the effort to provide you a seamless experience.

The new system will allow you to access, view, and add to your current registration. You can book and view hotel reservations and request changes if needed right through your dashboard.

Where do I go to register?

Head over to us.pycon.org and create an account. Once you are logged in access the registration page via your dashboard.

Registration costs

The early bird pricing is $550 for corporate, $350 for individuals, and $100 for students. Once we sell the first 800 tickets, regular prices will go into effect. Regular pricing will be $700 for corporate, $400 for individuals, and $125 for students.

PyCon will take place April 14-23, 2020 in Pittsburgh, PA. The conference starts with 2 days of Tutorials, Education Summit and Sponsor Workshops. Followed by 3 days of keynotes, talks, lightning talks, hatchery program and much more. Ending with 4 days of sprints.

Over 3,000 people from across the globe in one place to learn from and share experiences with.

Join in hallway conversations, participate in special events, visit with our many sponsors in the expo hall, and enjoy the many talks available with beginner to advanced content offered.

Tutorials

Tutorials will be presented Wednesday April 14, 2020 and Thursday April 15, 2020. We are accepting proposals for tutorials through Friday, November 22, 2019. Find more information and submit a proposal here. Once our tutorial committee has scheduled the selected tutorials, you will be able to add them to your conference registration for an additional fee. Watch for tutorial registration to launch in February 2020.

Education Summit

The Education Summit is held on Thursday April 15, 2020. The Education Summit requires you to be registered due to capacity limits. Please only register if you plan to attend as this is a popular event. If you register and are unable to attend please let us know by emailing pycon-reg@python.org. We want to be sure the room is full and those that are able to attend have the chance.

Evening Dinners

There are two evening dinners that require additional registration and capacity is limited. The Gateway Clipper Dinner Cruise on Friday April 16, 2020 and the Trivia Night Dinner with host Brandon Rhodes being held on Sunday April 19, 2020. If you register for the dinners, please be sure you are able to attend. These events do sell out and we want all those that want to attend to have the opportunity. If you register to attend and your plans change, please let us know by emailing pycon-reg@python.org.

Cancellation Fees

Registration cancellations must be submitted in writing and received by April 19, 2019 in order to receive a refund minus the $50 cancellation fee ($25 for students). No refunds will be granted for cancellations received after April 19, 2019.

In lieu of cancellation you are able to transfer your registration to another person. For details about transferring your registration, visit the registration information page.

Attendees traveling to PyCon internationally are encouraged to review our International Travel Refund Policy. This is especially important for recipients of Financial Aid applicants attending from abroad. PyCon strives to support the Python community in attending, no matter where they are traveling from.

Hotel

PyCon has contracted special rates with nearby hotels. When you complete your registration for PyCon US 2020, you will be able to book a hotel reservation through our official housing bureau. This is the only way to get the conference rates. More information can be found on the Venue and Hotels page.

Note: Beware of Housing Pirates! PyCon or our official housing bureau, VisitPittsburgh, will not be calling delegates to sell rooms. If you are contacted by an agency other than VisitPittsburgh offering to make your hotel reservations, we urge you to not use their services. We cannot protect you against them if you do book a reservation.
Looking for a roommate? Check out PyCon’s Room Sharing page.

Childcare

PyCon is proud to announce that we will be once again offering Childcare during the main conference days, April 16-19, 2020. Space is limited, so be sure to sign-up soon.

Financial Aid

Check out the Financial Aid page to learn more about the support we provide for travel, hotel, registration, and childcare to ensure that everyone has an opportunity to attend PyCon.

More Information

Head to the registration information page for more details!

↧

Programiz: Python CSV

November 22, 2019, 12:41 am

≫ Next: Vinay Sajip (Logging): A Qt GUI for logging

≪ Previous: PyCon: Registration for PyCon US 2020 is open!

In this tutorial, we will learn how to read and write into CSV files in Python with the help of examples.

↧

Vinay Sajip (Logging): A Qt GUI for logging

November 24, 2019, 7:36 am

≫ Next: Janusworx: #100DaysOfCode, Days 002 & 003 – Dates & Times

≪ Previous: Programiz: Python CSV

A question that comes up from time to time is about how to log to a GUI application. The Qt framework is a popular cross-platform UI framework with Python bindings using PySide2 or PyQt5 libraries.

The following example shows how to log to a Qt GUI. This introduces a simple QtHandler class which takes a callable, which should be a slot in the main thread that does GUI updates. A worker thread is also created to show how you can log to the GUI from both the UI itself (via a button for manual logging) as well as a worker thread doing work in the background (here, just logging messages at random levels with random short delays in between).

The worker thread is implemented using Qt’s QThread class rather than the threading module, as there are circumstances where one has to use QThread, which offers better integration with other Qt components.

The code should work with recent releases of either PySide2 or PyQt5. You should be able to adapt the approach to earlier versions of Qt. Please refer to the comments in the code snippet for more detailed information.

importdatetimeimportloggingimportrandomimportsysimporttime# Deal with minor differences between PySide2 and PyQt5try:fromPySide2importQtCore,QtGui,QtWidgetsSignal=QtCore.SignalSlot=QtCore.SlotexceptImportError:fromPyQt5importQtCore,QtGui,QtWidgetsSignal=QtCore.pyqtSignalSlot=QtCore.pyqtSlotlogger=logging.getLogger(__name__)## Signals need to be contained in a QObject or subclass in order to be correctly# initialized.#classSignaller(QtCore.QObject):signal=Signal(str,logging.LogRecord)## Output to a Qt GUI is only supposed to happen on the main thread. So, this# handler is designed to take a slot function which is set up to run in the main# thread. In this example, the function takes a string argument which is a# formatted log message, and the log record which generated it. The formatted# string is just a convenience - you could format a string for output any way# you like in the slot function itself.## You specify the slot function to do whatever GUI updates you want. The handler# doesn't know or care about specific UI elements.#classQtHandler(logging.Handler):def__init__(self,slotfunc,*args,**kwargs):super(QtHandler,self).__init__(*args,**kwargs)self.signaller=Signaller()self.signaller.signal.connect(slotfunc)defemit(self,record):s=self.format(record)self.signaller.signal.emit(s,record)## This example uses QThreads, which means that the threads at the Python level# are named something like "Dummy-1". The function below gets the Qt name of the# current thread.#defctname():returnQtCore.QThread.currentThread().objectName()## Used to generate random levels for logging.#LEVELS=(logging.DEBUG,logging.INFO,logging.WARNING,logging.ERROR,logging.CRITICAL)## This worker class represents work that is done in a thread separate to the# main thread. The way the thread is kicked off to do work is via a button press# that connects to a slot in the worker.## Because the default threadName value in the LogRecord isn't much use, we add# a qThreadName which contains the QThread name as computed above, and pass that# value in an "extra" dictionary which is used to update the LogRecord with the# QThread name.## This example worker just outputs messages sequentially, interspersed with# random delays of the order of a few seconds.#classWorker(QtCore.QObject):@Slot()defstart(self):extra={'qThreadName':ctname()}logger.debug('Started work',extra=extra)i=1# Let the thread run until interrupted. This allows reasonably clean# thread termination.whilenotQtCore.QThread.currentThread().isInterruptionRequested():delay=0.5+random.random()*2time.sleep(delay)level=random.choice(LEVELS)logger.log(level,'Message after delay of %3.1f: %d',delay,i,extra=extra)i+=1## Implement a simple UI for this cookbook example. This contains:## * A read-only text edit window which holds formatted log messages# * A button to start work and log stuff in a separate thread# * A button to log something from the main thread# * A button to clear the log window#classWindow(QtWidgets.QWidget):COLORS={logging.DEBUG:'black',logging.INFO:'blue',logging.WARNING:'orange',logging.ERROR:'red',logging.CRITICAL:'purple',}def__init__(self,app):super(Window,self).__init__()self.app=appself.textedit=te=QtWidgets.QPlainTextEdit(self)# Set whatever the default monospace font is for the platformf=QtGui.QFont('nosuchfont')f.setStyleHint(f.Monospace)te.setFont(f)te.setReadOnly(True)PB=QtWidgets.QPushButtonself.work_button=PB('Start background work',self)self.log_button=PB('Log a message at a random level',self)self.clear_button=PB('Clear log window',self)self.handler=h=QtHandler(self.update_status)# Remember to use qThreadName rather than threadName in the format string.fs='%(asctime)s %(qThreadName)-12s %(levelname)-8s %(message)s'formatter=logging.Formatter(fs)h.setFormatter(formatter)logger.addHandler(h)# Set up to terminate the QThread when we exitapp.aboutToQuit.connect(self.force_quit)# Lay out all the widgetslayout=QtWidgets.QVBoxLayout(self)layout.addWidget(te)layout.addWidget(self.work_button)layout.addWidget(self.log_button)layout.addWidget(self.clear_button)self.setFixedSize(900,400)# Connect the non-worker slots and signalsself.log_button.clicked.connect(self.manual_update)self.clear_button.clicked.connect(self.clear_display)# Start a new worker thread and connect the slots for the workerself.start_thread()self.work_button.clicked.connect(self.worker.start)# Once started, the button should be disabledself.work_button.clicked.connect(lambda:self.work_button.setEnabled(False))defstart_thread(self):self.worker=Worker()self.worker_thread=QtCore.QThread()self.worker.setObjectName('Worker')self.worker_thread.setObjectName('WorkerThread')# for qThreadNameself.worker.moveToThread(self.worker_thread)# This will start an event loop in the worker threadself.worker_thread.start()defkill_thread(self):# Just tell the worker to stop, then tell it to quit and wait for that# to happenself.worker_thread.requestInterruption()ifself.worker_thread.isRunning():self.worker_thread.quit()self.worker_thread.wait()else:print('worker has already exited.')defforce_quit(self):# For use when the window is closedifself.worker_thread.isRunning():self.kill_thread()# The functions below update the UI and run in the main thread because# that's where the slots are set up@Slot(str, logging.LogRecord)defupdate_status(self,status,record):color=self.COLORS.get(record.levelno,'black')s='<pre><font color="%s">%s</font></pre>'%(color,status)self.textedit.appendHtml(s)@Slot()defmanual_update(self):# This function uses the formatted message passed in, but also uses# information from the record to format the message in an appropriate# color according to its severity (level).level=random.choice(LEVELS)extra={'qThreadName':ctname()}logger.log(level,'Manually logged!',extra=extra)@Slot()defclear_display(self):self.textedit.clear()defmain():QtCore.QThread.currentThread().setObjectName('MainThread')logging.getLogger().setLevel(logging.DEBUG)app=QtWidgets.QApplication(sys.argv)example=Window(app)example.show()sys.exit(app.exec_())if__name__=='__main__':main()

↧

Janusworx: #100DaysOfCode, Days 002 & 003 – Dates & Times

November 22, 2019, 10:13 pm

≫ Next: Janusworx: #100DaysOfCode, Day 004 – The Collections Module

≪ Previous: Vinay Sajip (Logging): A Qt GUI for logging

Worked an hour for the past two days, exerting all of my python knowledge at the small project they gave me.
Try as I might, I could not do it.
So looked at the solution.
And realised, while it was all logical, I couldn’t for the life of me have written that code on my own.
Long way to travel.
Lots of work to do.

↧

Janusworx: #100DaysOfCode, Day 004 – The Collections Module

November 23, 2019, 3:18 am

≫ Next: Mike C. Fletcher: Updating PyOpenGL Sample Code References

≪ Previous: Janusworx: #100DaysOfCode, Days 002 & 003 – Dates & Times

Decided to watch the videos for day 4 since I am a day behind.
The exercises look complicated, but the collections module looks like a real time saver.

Will give it a go tomorrow.

↧