PyCharm: Interview: Oliver Bestwalter for tox webinar next week

December 6, 2018, 7:00 am

≫ Next: Polyglot.Ninja(): Auto incrementing IDs for MongoDB

≪ Previous: Dataquest: An Intro to Deep Learning in Python

Python has long distinguished itself with a culture of testing. In the last decade, two libraries have combined to give powerful testing in isolation — pytest and tox. The latter combines easily with pytest to give you a clean environment across test runs, including across multiple versions of Python.

tox certainly counts as one of those things lots of PyCharm customers know they should know, but don’t yet know. To make it easy to break the ice we’ve invited Oliver Bestwalter to introduce tox in a PyCharm webinar. Oliver is the maintainer of tox and advocate for release automation in projects.

Here’s our interview with Oliver. As background, Oliver was previously interviewed on The Mouse vs. Python blog and the Test & Code podcast.

Give us a sneak peak on what you’re going to discuss in the webinar and what audience it is aimed at.

I want to do a time lapse journey through a hypothetical project that grows tests and automation over its lifetime. The focus will be on how to use tox to bundle all these development and automation tasks into a workflow that works well locally and on CI.

Let’s go back in time. Can you give us your Python origin story?

The short version is: I stumbled over the tutorial in 2006 and fell in love. The longer version is in the interview you linked

You are a core maintainer of tox. What’s it like running a large, popular open source project?

When I joined the project, Holger Krekel (the original author) and the first generation of contributors had started to move on to other projects. Work around tox had cooled down, so I thought I’d help out a bit. There were quite a few open pull requests and issues that needed addressing. So I did that and after a while Holger asked me if I want to handle releases as well. I also made contact with plugin authors to gather tox plugins into the tox-dev organization to raise their visibility. My main focus was on keeping the project alive, fixing annoying bugs and creating a welcoming atmosphere. That’s a lot of work that doesn’t result in very much code, but I still think it was the right thing to do rather than to just hack away at the code.

Over the past few months I wasn’t quite able to even follow what is going on in the issue tracker for a number of private and professional reasons. At the moment my activities are concentrated more on teaching Python with a focus on testing and automation (with pytest and tox obviously). Luckily Bernát Gábor joined as a maintainer this year and is currently very active in the project (thank you Bernát!).

So to answer the question: in the beginning it was a good feeling to keep an important project alive and improve its processes, but for quite a while now it has also meant feeling guilty a lot of the time, because I can’t do more “direct” work for tox a lot of the time.

Testing is one part of your release automation vision. Can you discuss how other parts — pre-commit hooks, black, CI — fit together?

I think of all test and automation tools as helpers to build lines of defense (against bugs) in a development process. The first line of defense is static code analysis. Ideally the most obvious problems are already pointed out directly in the editor when they happen. PyCharm inspections and quick fixes are great for that if you crank them all up to 11. The next line of defense are linters and automatic fixers (like flake8 and black). The pre-commit framework wraps that and more into a neat package that can prevent you from accidentally committing code that has obvious problems. If you don’t use it locally the first stage in CI will fail where the same checks are run. The next line of defense are automatic tests. Depending on the nature of the project there might be more (like running tests against the deployed environment on different stages).

The role that CI plays for me is quite simple: CI is just a task execution and report collection tool and should contain as little process knowledge as possible. So nothing fancy should happen there that isn’t easily reproduced on a developer box. CI simply runs everything that the developers run locally and makes artifacts available for release. This means that the process knowledge needs to be encoded in the tox configuration and the automation scripts that live alongside the production code. The real world is often a bit more complicated and this isn’t always completely possible (often due to security considerations) but that’s the aim.

What’s been your experience using an IDE like PyCharm to visualize testing and the other pieces?

I am with Brian Okken on this: PyCharm is a really good GUI for pytest (see or better listen: Episode 48 of Test & Code)

The Test runner tab provides a good overview of the tests that were run and I also appreciate the many ways how I can navigate from the results back in to the relevant code sections. What I appreciate even more is the convenience of running the tests right from the code rather than having to switch to the command line. Also: auto-test – very handy. It’s also great that PyCharm even supports code completion and navigation for pytest fixtures in recent releases.

Look ahead 3 years from now. What’s next for tox and the release automation field?

My crystal ball is at the repair shop at the moment, but I’ll try

In respect to tox that really depends who steps up to the plate and champions the development of new features and improvements. If I get the chance, I want to improve interpreter discovery and squash some bugs. I also want to improve the documentation and make tox more accessible to newcomers. Regarding the recent changes in packaging (PEP-517 and PEP-518) tox already has everything in place. Projects using flit and other tools that leverage the sudden freedom from setup.py and setuptools can use tox, but I am sure there will be some kinks to iron out along the way. tox also needs to grow native support for pyproject.toml (meaning that you won’t need a tox.ini anymore and configure it there alongside all the other tools). In three years tox will also be 100% bug-free and powered by renewable energy

Regarding the greater landscape I really hope that the development will go even further into the direction we are already heading: making build, test and deploy automation as easy and accessible to the masses as possible. I’ll do my best to help with that.

↧

Polyglot.Ninja(): Auto incrementing IDs for MongoDB

December 6, 2018, 9:14 am

≫ Next: Codementor: 6 Lessons from Learning to Code

≪ Previous: PyCharm: Interview: Oliver Bestwalter for tox webinar next week

If you’re familiar with relational databases like MySQL or PostgreSQL, you’re probably also familiar with auto incrementing IDs. You select a primary key for a table and make it auto incrementing. Every row you insert afterwards, each of them gets a new ID, automatically incremented from the last one. We don’t have to keep track of what number comes next or ensure the atomic nature of this operation (what happens if two different client wants to insert a new row at the very same time? do they both get the same id?). This can be very useful where sequential, numeric IDs are essential. For example, let’s say we’re building a url shortener. We can base62 encode the ID of the url id to quickly generate a short slug for that long url.

Fast forward to MongoDB, the popular NoSQL database doesn’t have any equivalent to sequential IDs. It’s true that you can insert anything unique as the required _id field of a mongodb document, so you can take things to your hand and try to insert unique ids yourselves. But you have to ensure the uniqueness and atomicity of the operation.

A very popular work around to this is to create a separate mongodb collection. Then maintain documents with a numeric value to keep track of your auto incrementing IDs. Now, every time we want to insert a new document that needs a unique ID, we come back to this collection, use the $inc operator to atomically increment this number and then use the incremented number as the unique id for our new document.

Let me give an example, say we have an messages collection. Each new message needs a new, sequential ID. We create a new collection named sequences. Each document in this sequences collection will hold the last used ID for a collection. So, for tracking the unique ID in the messages collection, we create a new document in the sequences collection like this:

{
    "_id" : "messages",
    "value" : 0
}

Next, we will write a function that can give us the next sequential ID for a collection by it’s name. The code is in Python, using PyMongo library.

def get_sequence(name):
    collection = db.sequences
    document = collection.find_one_and_update({"_id": name}, {"$inc": {"value": 1}}, return_document=True)

    return document["value"]

If we need the next auto incrementing ID for the messages collection, we can call it like this:

{"_id": get_sequence("messages")}

Find and Modify – Deprecated

If you have searched on Google, you might have come across many StackOverflow answers as well as individual blog posts which refer to findAndModify() call (find_and_modify in Pymongo). This was the way to do things. But it’s deprecated now, so please use the new find_one_and_update function now.

(How) Does this scale?

We would only call the get_sequence function before inserting a new mongo document. The function uses the $inc operator which is atomic in nature. Mongo guarantees this. So even if 100s of different clients trying to increment the value for the same document, they will be all applied one after one. So each value they get will be unique, new IDs.

I personally haven’t been able to test this strategy at a larger scale but according to people on StackOverflow and other forums, people have scaled this to thousands and millions of users. So I guess it’s pretty safe.

The post Auto incrementing IDs for MongoDB appeared first on Polyglot.Ninja().

↧

Codementor: 6 Lessons from Learning to Code

December 6, 2018, 10:27 am

≫ Next: Continuum Analytics Blog: Intake for Cataloging Spark

≪ Previous: Polyglot.Ninja(): Auto incrementing IDs for MongoDB

A few helpful lessons as you learn to code

↧

Continuum Analytics Blog: Intake for Cataloging Spark

December 6, 2018, 4:45 pm

≫ Next: Test and Code: 56: Being a Guest on a Podcast - Michael Kennedy

≪ Previous: Codementor: 6 Lessons from Learning to Code

By: Martin Durant Intake is an open source project for providing easy pythonic access to a wide variety of data formats, and a simple cataloging system for these data sources. Intake is a new project, and all are encouraged to try and comment on it. pySpark is the python interface to Apache Spark, a fast …
Read more →

The post Intake for Cataloging Spark appeared first on Anaconda.

↧

Test and Code: 56: Being a Guest on a Podcast - Michael Kennedy

December 6, 2018, 11:15 pm

≫ Next: gamingdirectional: Create the about scene for pygame project

≪ Previous: Continuum Analytics Blog: Intake for Cataloging Spark

Michael Kennedy of Talk Python and Python Bytes fame joins Brian to talk about being a great guest and what to expect.

Even if you have never wanted to be on a podcast, you might learn some great tips. A few of the things we talk about will be helpful for other endeavors, like public speaking, guest blog posts, look for unsolicited job opportunities.

Some people have never been on a podcast before, and are possibly freaked out about some of the unknowns of being on a podcast. That's why we did this episode.

Michael and I discuss a bunch of the niggly details so that you can be relaxed and know what to expect.

Topics include:

If you want to be on a podcast
- How to stand out and be someone a podcast would want to have on a show.
- How to suggest yourself as a guest and the topic you want to discuss.
- Picking a topic for a podcast
What to do before the show to prepare
- Helping the host out with some information
- Some hardware (not much)
- Some software (all free)
- Sending info like bio, headshot, links, etc.
- What to expect the host or show to do before the recording.
- Where to record
- Sketching out some show topics with the host, maybe on a shared document.
What to expect and do
- Right before the show
- During the conversation
- After the recording
- When it goes live (help promote it)

Special Guest: Michael Kennedy.

gamingdirectional: Create the about scene for pygame project

December 7, 2018, 5:10 am

≫ Next: Stein Magnus Jodal: Mopidy-MPRIS 2.0 released

≪ Previous: Test and Code: 56: Being a Guest on a Podcast - Michael Kennedy

In this article we are going to create an about scene which will introduce the game, provides game instructions as well as gives credit to the game creator. The about page will also have a back button which will lead the player back to the main home page. We will create this about scene under the start scene class and we will also change the home page buttons to three, which is the play button...

Source

↧

Stein Magnus Jodal: Mopidy-MPRIS 2.0 released

December 6, 2018, 4:00 pm

≫ Next: Talk Python to Me: #189 War Stories of the Developer Evangelists

≪ Previous: gamingdirectional: Create the about scene for pygame project

I’ve released Mopidy-MPRIS 2.0, the first major update to Mopidy-MPRIS in about 3.5 years.

Mopidy-MPRIS is a Mopidy extension that makes Mopidy controllable from other programs on the same machine through D-Bus. This makes it possible to control Mopidy from various widgets in GNOME/KDE/etc, as well as with keyboard media keys.

This release replaces the python-dbus D-Bus bindings with python-pydbus to modernize the code base and prepare it for the move to Python 3. It also wires up a lot of events so that various UI elements are immediately updated when the server state changes.

As part of the release, the documentation has been greatly extended, including a survey of some MPRIS clients and tips on how to run Mopidy-MPRIS on the system bus. Throughout the documentation I’ve added calls for help wherever something isn’t working perfectly and I haven’t figured it out yet. Even with these rough spots, this is easily the best Mopidy-MPRIS release so far.

For all the details, check out the changelog.

↧

Talk Python to Me: #189 War Stories of the Developer Evangelists

December 7, 2018, 12:00 am

≫ Next: "Morphex's Blogologue": Focusing on the simple things

≪ Previous: Stein Magnus Jodal: Mopidy-MPRIS 2.0 released

Have you ever wondered what a developer advocate (sometimes called a dev evangelist) does? You know these folks. They are often seen at conferences working at some high-end tech company's booth or traveling from conference to conference speaking on their specialty.

↧

"Morphex's Blogologue": Focusing on the simple things

December 7, 2018, 7:05 am

≫ Next: EuroPython Society: EuroPython 2019: Venue and location selected

≪ Previous: Talk Python to Me: #189 War Stories of the Developer Evangelists

This morning the internet became unavailable, after also being unavailable this weekend for several days.

So I decided to take a look at my demo board which does surveillance with a webcam using the surveil app, surveil is here:

https://github.com/morphex/surveil

Well, one thing lead to another (...), and I locked myself out of the demo board.

Which was all-in-all a good thing, because when I decided to make things easy for myself, I instead ran the surveil app on my laptop, with the webcam attached there.

I was a bit surprised and embarrassed when the surveil script which should have given a helpful error message on the wrong command-line arguments, instead failed with a TypeError, because I had forgotten a comma.

So I fixed that, and noticed that the contents of the surveil directory (images taken with the webcam that could contain sensitive data) was included in the commit.

This was a big deal, and I included the surveil and longterm data storage directories in the .gitignore file.

Finally, I made the video capture device a configure option, as I don't use the webcam integrated in the laptop, but rather /dev/video1 - which is the device the USB Webcam gets when attached.

A commit of these changes is here:

https://github.com/morphex/surveil/commit/42743c4f3785e1e9dd...

Last week I drifted off in an interesting conversation on the Python-User list:

https://mail.python.org/pipermail/python-list/2018-November/...

Which I guess shows that I could've spent the time thinking about an interesting concept on more pragmatic things, like testing the surveil script on another machine.

Finally, I'm looking for a way to do testing; and I'm wondering of a good way to test that the command-line interface functions as expected as well.

I guess that's more of a functional test, but maybe there is a package which integrates unit and functional tests / integration tests.

↧

EuroPython Society: EuroPython 2019: Venue and location selected

December 7, 2018, 8:43 am

≫ Next: EuroPython: EuroPython 2019: Venue and location selected

≪ Previous: "Morphex's Blogologue": Focusing on the simple things

After a very work intense RFP with more than 40 venues competing, 17 entries, and two rounds of refinements, we are now happy to announce the winner:

EuroPython 2019 will be held in
Basel, Switzerland, from July 8 - 14 2019

We will now start work on the contracts and get the organization going, so that we can all enjoy another edition of EuroPython next year.

Many thanks,
–
EuroPython Society Board
https://www.europython-society.org/

↧

EuroPython: EuroPython 2019: Venue and location selected

December 7, 2018, 8:49 am

≫ Next: Stack Abuse: Seaborn Library for Data Visualization in Python: Part 2

≪ Previous: EuroPython Society: EuroPython 2019: Venue and location selected

europythonsociety:

After a very work intense RFP with more than 40 venues competing, 17 entries, and two rounds of refinements, we are now happy to announce the winner:
EuroPython 2019 will be held in
Basel, Switzerland, from July 8 - 14 2019
We will now start work on the contracts and get the organization going, so that we can all enjoy another edition of EuroPython next year.
Many thanks,
–
EuroPython Society Board
https://www.europython-society.org/

↧

Stack Abuse: Seaborn Library for Data Visualization in Python: Part 2

December 7, 2018, 9:12 am

≫ Next: Python Data: Quick Tip: SQLAlchemy for MySQL and Pandas

≪ Previous: EuroPython: EuroPython 2019: Venue and location selected

In the previous article Seaborn Library for Data Visualization in Python: Part 1, we looked at how the Seaborn Library is used to plot distributional and categorial plots. In this article we will continue our discussion and will see some of the other functionalities offered by Seaborn to draw different types of plots. We will start our discussion with Matrix Plots.

Matrix Plots

Matrix plots are the type of plots that show data in the form of rows and columns. Heat maps are the prime examples of matrix plots.

Heat Maps

Heat maps are normally used to plot correlation between numeric columns in the form of a matrix. It is important to mention here that to draw matrix plots, you need to have meaningful information on rows as well as columns. Continuing with the theme from teh last article, let's plot the first five rows of the Titanic dataset to see if both the rows and column headers have meaningful information. Execute the following script:

import pandas as pd  
import numpy as np

import matplotlib.pyplot as plt  
import seaborn as sns

dataset = sns.load_dataset('titanic')

dataset.head()

In the output, you will see the following result:

From the output, you can see that the column headers contain useful information such as passengers survived, their age, fare etc. However the row headers only contains indexes 0, 1, 2, etc. To plot matrix plots, we need useful information on both columns and row headers. One way to do this is to call the corr() method on the dataset. The corr() function returns the correlation between all the numeric columns of the dataset. Execute the following script:

dataset.corr()

In the output, you will see that both the columns and the rows have meaningful header information, as shown below:

Now to create a heat map with these correlation values, you need to call the heatmap() function and pass it your correlation dataframe. Look at the following script:

corr = dataset.corr()  
sns.heatmap(corr)

The output looks like this:

From the output, it can be seen that what heatmap essentially does is that it plots a box for every combination of rows and column value. The color of the box depends upon the gradient. For instance, in the above image if there is a high correlation between two features, the corresponding cell or the box is white, on the other hand if there is no correlation, the corresponding cell remains black.

The correlation values can also be plotted on the heatmap by passing True for the annot parameter. Execute the following script to see this in action:

corr = dataset.corr()  
sns.heatmap(corr, annot=True)

Output:

You can also change the color of the heatmap by passing an argument for the cmap parameter. For now, just look at the following script:

corr = dataset.corr()  
sns.heatmap(corr, cmap='winter')

The output looks like this:

In addition to simply using correlation between all the columns, you can also use pivot_table function to specify the index, the column and the values that you want to see corresponding to the index and the columns. To see pivot_table function in action, we will use the "flights" data set that contains the information about the year, the month and the number of passengers that traveled in that month.

Execute the following script to import the data set and to see the first five rows of the dataset:

import pandas as pd  
import numpy as np

import matplotlib.pyplot as plt  
import seaborn as sns

dataset = sns.load_dataset('flights')

dataset.head()

Output:

Now using the pivot_table function, we can create a heat map that displays the number of passengers that traveled in a specific month of a specific year. To do so, we will pass month as the value for the index parameter. The index attribute corresponds to the rows. Next we need to pass year as value for the column parameter. And finally for the values parameter, we will pass the passengers column. Execute the following script:

data = dataset.pivot_table(index='month', columns='year', values='passengers')  
sns.heatmap(data)

The output looks like this:

It is evident from the output that in the early years the number of passengers who took the flights was less. As the years progress, the number of passengers increases.

Currently, you can see that the boxes or the cells are overlapping in some cases and the distinction between the boundaries of the cells is not very clear. To create a clear boundary between the cells, you can make use of the linecolor and linewidths parameters. Take a look at the following script:

data = dataset.pivot_table(index='month', columns='year', values='passengers' )  
sns.heatmap(data, linecolor='blue', linewidth=1)

In the script above, we passed "blue" as the value for the linecolor parameter, while the linewidth parameter is set to 1. In the output you will see a blue boundary around each cell:

You can increase the value for the linewidth parameter if you want thicker boundaries.

Cluster Map

In addition to heat map, another commonly used matrix plot is the cluster map. The cluster map basically uses Hierarchical Clustering to cluster the rows and columns of the matrix.

Let's plot a cluster map for the number of passengers who traveled in a specific month of a specific year. Execute the following script:

data = dataset.pivot_table(index='month', columns='year', values='passengers')  
sns.clustermap(data)

To plot a cluster map, clustermap function is used, and like the heat map function, the dataset passed should have meaningful headers for both rows and columns. The output of the script above looks like this:

In the output, you can see months and years clustered together on the basis of number of passengers that traveled in a specific month.

With this, we conclude our discussion about the Matrix plots. In the next section we will start our discussion about grid capabilities of the Seaborn library.

Seaborn Grids

Grids in Seaborn allow us to manipulate the subplots depending upon the features used in the plots.

Pair Grid

In Part 1 of this article series, we saw how pair plot can be used to draw scatter plot for all possible combinations of the numeric columns in the dataset.

Let's revise the pair plot here before we can move on to the pair grid. The dataset we are going to use for the pair grid section is the "iris" dataset which is downloaded by default when you download the seaborn library. Execute the following script to load the iris dataset:

import pandas as pd  
import numpy as np

import matplotlib.pyplot as plt  
import seaborn as sns

dataset = sns.load_dataset('iris')

dataset.head()

The first five rows of the iris dataset look like this:

Now let's draw a pair plot on the iris dataset. Execute the following script:

sns.pairplot(dataset)

A snapshot of the out looks like this:

Now let's plot pair grid and see the difference between the pair plot and the pair grid. To create a pair grid, you simply have to pass the dataset to the PairGrid function, as shown below:

sns.PairGrid(dataset)

Output:

In the output, you can see empty grids. This is essentially what the pair grid function does. It returns an empty set of grids for all the features in the dataset.

Next, you need to call map function on the object returned by the pair grid function and pass it the type of plot that you want to draw on the grids. Let's plot a scatter plot using the pair grid.

grids = sns.PairGrid(dataset)  
grids.map(plt.scatter)

The output looks like this:

You can see scatter plots for all the combinations of numeric columns in the "iris" dataset.

You can also plot different types of graphs on the same pair grid. For instance, if you want to plot a "distribution" plot on the diagonal, "kdeplot" on the upper half of the diagonal, and "scatter" plot on the lower part of the diagonal you can use map_diagonal, map_upper, and map_lower functions, respectively. The type of plot to be drawn is passed as the parameter to these functions. Take a look at the following script:

grids = sns.PairGrid(dataset)  
grids.map_diag(sns.distplot)  
grids.map_upper(sns.kdeplot)  
grids.map_lower(plt.scatter)

The output of the script above looks like this:

You can see the true power of the pair grid function from the image above. On the diagonals we have distribution plots, on the upper half we have the kernel density plots, while on the lower half we have the scatter plots.

Facet Grids

The facet grids are used to plot two or more than two categorical features against two or more than two numeric features. Let's plot a facet grid which plots the distributional plot of gender vs alive with respect to the age of the passengers.

For this section, we will again use the Titanic dataset. Execute the following script to load the Titanic dataset:

import pandas as pd  
import numpy as np

import matplotlib.pyplot as plt  
import seaborn as sns

dataset = sns.load_dataset('titanic')

To draw facet grid, the FacetGrid() function is used. The first parameter to the function is the dataset, the second parameter col specifies the feature to plot on columns while the row parameter specifies the feature on the rows. The FacetGrid() function returns an object. Like the pair grid, you can use the map function to specify the type of plot you want to draw.

Execute the following script:

grid = sns.FacetGrid(data=dataset, col='alive', row='sex')  
grid.map(sns.distplot, 'age')

In the above script, we plot the distributional plot for age on the facet grid. The output looks like this:

From the output, you can see four plots. One for each combination of gender and survival of the passenger. The columns contain information about the survival while the rows contain information about the sex, as specified by the FacetGrid() function.

The first row and first column contain age distribution of the passengers where sex is male and the passengers did not survive. The first row and second column contain age distribution of the passengers where sex is male and the passengers survived. Similarly, the second row and first column contain age distribution of the passengers where sex is female and the passengers did not survive while the second row and second column contain age distribution of the passengers where sex is female and the passengers survived.

In addition to distributional plots for one feature, we can also plot scatter plots that involve two features on the facet grid.

For instance, the following script plots the scatter plot for age and fare for both the genders of the passengers who survived and who didn't.

grid = sns.FacetGrid(data= dataset, col= 'alive', row = 'sex')  
grid.map(plt.scatter, 'age', 'fare')

The output of the script above looks like this:

Regression Plots

Regression plots, as the name suggests are used to perform regression analysis between two or more variables.

In this section, we will study the linear model plot that plots a linear relationship between two variables along with the best-fit regression line depending upon the data.

The dataset that we are going to use for this section is the "diamonds" dataset which is downloaded by default with the seaborn library. Execute the following script to load the dataset:

import pandas as pd  
import numpy as np

import matplotlib.pyplot as plt  
import seaborn as sns

dataset = sns.load_dataset('diamonds')

dataset.head()

The dataset looks like this:

The dataset contains different features of a diamond such as weight in carats, color, clarity, price, etc.

Let's plot a linear relationship between, carat and price of the diamond. Ideally, the heavier the diamond is, the higher the price should be. Let's see if this is actually true based on the information available in the diamonds dataset.

To plot the linear model, the lmplot() function is used. The first parameter is the feature you want to plot on the x-axis, while the second variable is the feature you want to plot on the y-axis. The last parameter is the dataset. Execute the following script:

sns.lmplot(x='carat', y='price', data=dataset)

The output looks like this:

You can also plot multiple linear models based on a categorical feature. The feature name is passed as value to the hue parameter. For instance, if you want to plot multiple linear models for the relationship between carat and price feature, based on the cut of the diamond, you can use lmplot function as follows:

sns.lmplot(x='carat', y='price', data=dataset, hue='cut')

The output looks like this:

From the output, you can see that the linear relationship between the carat and the price of the diamond is steepest for the ideal cut diamond as expected and the linear model is shallowest for fair cut diamond.

In addition to plotting the data for the cut feature with different hues, we can also have one plot for each cut. To do so, you need to pass the column name to the cols attribute. Take a look at the following script:

sns.lmplot(x='carat', y='price', data=dataset, col='cut')

In the output, you will see a separate column for each value in the cut column of the diamonds dataset as shown below:

You can also change the size and aspect ratio of the plots using the aspect and size parameters. Take a look at the following script:

sns.lmplot(x='carat', y = 'price', data= dataset, col = 'cut', aspect = 0.5, size = 8 )

The aspect parameter defines the aspect ratio between the width and height. An aspect ratio of 0.5 means that the width is half of the height as shown in the output.

You can see through the size of the plot has changed, the font size is still very small. In the next section, we will see how to control the fonts and styles of the Seaborn plots.

Plot Styling

Seaborn library comes with a variety of styling options. In this section, we will see some of them.

Set Style

The set_style() function is used to set the style of the grid. You can pass the darkgrid, whitegrid, dark, white and ticks as the parameters to the set_style function.

For this section, we will again use the "titanic dataset". Execute the following script to see darkgrid style.

sns.set_style('darkgrid')  
sns.distplot(dataset['fare'])

The output looks like this;

In the output, you can see that we have dark back ground with grids. Let's see how whitegrid looks like. Execute the following script:

sns.set_style('whitegrid')  
sns.distplot(dataset['fare'])

The output looks like this:

Now you can see that we still have grids in the background but the dark grey background is not visible. I would suggest that you try and play with the rest of the options and see which style suits you.

Change Figure Size

Since Seaborn uses Matplotlib functions behind the scenes, you can use Matplotlib's pyplot package to change the figure size as shown below:

plt.figure(figsize=(8,4))  
sns.distplot(dataset['fare'])

In the script above, we set the width and height of the plot to 8 and 4 inches respectively. The output of the script above looks like this:

Set Context

Apart from the notebook, you may need to create plots for posters. To do so, you can use the set_context() function and pass it poster as the only attribute as shown below:

sns.set_context('poster')  
sns.distplot(dataset['fare'])

In the output, you should see a plot with the poster specifications as shown below. For instance, you can see that the fonts are much bigger compared to normal plots.

Conclusion

Seaborn Library is an advanced Python library for data visualization. This article is Part 2 of the series of articles on Seaborn for Data Visualization in Python. In this article, we saw how to plot regression and matrix plots in Seaborn. We also saw how to change plot styles and use grid functions to manipulate subplots. In the next article, we will see how Python's Pandas library's built-in capabilities can be used for data visualization.

↧

Python Data: Quick Tip: SQLAlchemy for MySQL and Pandas

December 7, 2018, 12:19 pm

≫ Next: Python Bytes: #107 Restructuring and searching data, the Python way

≪ Previous: Stack Abuse: Seaborn Library for Data Visualization in Python: Part 2

SQLAlchemy Logo For years I’ve used the mysql-python library for connecting to mysql databases. It’s worked well for me over the years but there are times when you need speed and/or better connection management that what you get with mysql-python. That’s where SQLAlchemy comes in.

Before diving into this, if you are doing things that aren’t dependent on speed (e.g., it doesn’t matter if it takes 1 second to connect to the database and grab your data and close the database) then you can easily ignore this tip. That said, if you have multiple connections, that connect time can add up.

For example, I recently had an issue where it was taking 4.5+ seconds to connect to a database, run analysis and spit out the results. That’s not terrible if its something for you only but if its a production system and speed is a requirement, that might be too long (and it IS too long).

When I did some analysis using python’s timer() I found that more than 50% of that 4.5 seconds time was in establishing database connections so I grabbed my trusty SQLAlchemy toolkit and went to work.

For those of you that don’t know, SQLAlchemy is a ‘python SQL toolkit and Object Relational Mapper’ (ORM) that is supposed to make things easier when working with SQL databases. For me, the ORM aspect tends to make things more difficult so I tend to stick with plain SQL queries but the SQL toolkit aspect of SQLAlchemy makes a lot of sense and add some time savings when connecting to a SQL database.

Before we get into the SQLAlchemy aspects, let’s take a second to look at how to connect to a SQL database with the mysql-python connector (or at least take a look at how I do it).

First, let’s setup our import statements. For this, we will import MySQLdb, pandas and pandas.io.sql in order to read SQL data directly into a pandas dataframe.

import pandas as pd
import MySQLdb
import pandas.io.sql as psql

Next, let’s create a database connection, create a query, execute that query and close that database.

# setup the database connection.  There's no need to setup cursors with pandas psql.
db=MySQLdb.connect(host=HOST, user=USER, passwd=PW, db=DBNAME)

# create the query
query = "select * from TABLENAME"

# execute the query and assign it to a pandas dataframe
df = psql.read_sql(query, con=db)
# close the database connection
db.close()

This is a fairly standard approach to reading data into a pandas dataframe from mysql using mysql-python. This approach is what I had been using before when I was getting 4.5+ seconds as discussed above. Note – there were multiple database calls and some analysis included in that 4.5+ seconds. A basic database call like the above ran in approximately 0.45 seconds in my code that I was trying to improve performance on and establishing the database connection was the majority of that time.

To improve performance – especially if you will have multiple calls to multiple tables, you can use SQLAlchemy with pandas. You’ll need to pip install sqlalchemy if you don’t have it installed already. Now, let’s setup our imports:

import pandas as pd
import sqlalchemy as sql

Now you can setup your connection string to your database for SQLAlchemy, you’d put everything together like the following:

connect_string = 'mysql://USER:PW@DBHOST/DB'

where USER is your username, PW is your password, DBHOST is the database host and DB is the database you want to connect to.

To setup the persistent connection, you do the following:

sql_engine = sql.create_engine(connect_string)

Now, you have a connection to your database and you’re ready to go. No need to worry about cursors or opening/closing database connections. SQLAlchemy keeps the connection management aspects in for you.

Now all you need to do is focus on your SQL queries and loading the results into a pandas dataframe.

query =query = "select * from TABLENAME"
df = pd.read_sql_query(query, sql_engine)

That’s all it takes. AND…it’s faster. In the example above, my database setup / connection / query / closing times dropped from 0.45 seconds to 0.15 seconds. Times will vary based on what data you are querying and where the database is of course but in this case, all things were the same except for mysql-python being replaced with SQLAlchemy and using the new(ish) read_sql_query function in pandas.

Using this approach, the 4.5+ seconds it took to grab data, analyze the data and return the data was reduced to about 1.5 seconds. Impressive gains for just switching out the connection/management method.

The post Quick Tip: SQLAlchemy for MySQL and Pandas appeared first on Python Data.

↧

Python Bytes: #107 Restructuring and searching data, the Python way

December 7, 2018, 12:00 am

≫ Next: Catalin George Festila: Python Qt5 - simple checkbox example.

≪ Previous: Python Data: Quick Tip: SQLAlchemy for MySQL and Pandas

↧

Catalin George Festila: Python Qt5 - simple checkbox example.

December 7, 2018, 7:06 am

≫ Next: gamingdirectional: Sound on Sound off

≪ Previous: Python Bytes: #107 Restructuring and searching data, the Python way

Today we created a simple tutorial about QCheckBox and QLabel.
The purpose of this tutorial is to use QCheckBox in a GUI interface.
When we check QCheckBox, this will change the text from a QLabel.
The variables used by QCheckBox are my_checkbox and my_label for QLabel.
The result of my source code is this:

Let's see the source code:

# -*- coding: utf-8 -*-
"""
@author: catafest
"""
import sys
from PyQt5.QtCore import Qt
from PyQt5.QtWidgets import QWidget, QCheckBox, QLabel, QApplication

class MyCheckBox(QWidget):
 def __init__(self):
  super().__init__()

  my_checkbox = QCheckBox("Check this , see result", self)
  my_checkbox.move(50,60)
  my_checkbox.stateChanged.connect(self.change_my_option)


  self.my_label = QLabel("You can visit free-tutorial.org ", self)
  self.my_label.move(50,30)

  #self.my_label.setAlignment(Qt.AlignCenter)

  self.setGeometry(420,420,640,100)
  self.setWindowTitle("free-tutorials.org PyQt5 ChecBox ")



 def change_my_option(self, state):
  if state  == Qt.Checked:
   self.my_label.setText("Thank's by free-tutorial.org")
  else:
   self.my_label.setText("You can visit free-tutorial.org")

if __name__ == '__main__':
 app = QApplication(sys.argv)
 win = MyCheckBox()
 win.show()
 sys.exit(app.exec_())

↧

gamingdirectional: Sound on Sound off

December 7, 2018, 7:06 pm

≫ Next: codingdirectional: Remove the duplicate file from nested folders with python

≪ Previous: Catalin George Festila: Python Qt5 - simple checkbox example.

In this article we will create a mechanism to turn the soundtrack of our pygame project on and off whenever we click on the sound on’s or sound off’s icon on the main game page. First of all we will need to include the sound on and sound off icon in the start scene class. from BgSprite import BgSprite from GameSprite import GameSprite from pygame.locals import * from pygame import math...

Source

↧

codingdirectional: Remove the duplicate file from nested folders with python

December 7, 2018, 9:51 pm

≫ Next: Philippe Normand: GStreamer’s playbin3 overview for application developers

≪ Previous: gamingdirectional: Sound on Sound off

In this article we will continue to develop our python application which will remove the duplicate file within a folder. In the last chapter we have removed a duplicate file in another folder and this time we will remove all the duplicate files within the nested folders by slightly modifying the previous program. First of all we will edit the main file by replacing the forward slash with backward slash to suite the windows file path’s format.

from tkinter import *
from tkinter import filedialog
from Remove import Remove

win = Tk() # 1 Create instance
win.title("Multitas") # 2 Add a title
win.resizable(0, 0) # 3 Disable resizing the GUI
win.configure(background='black') # 4 change background color

# 5 Create a label
aLabel = Label(win, text="Remove duplicate", anchor="center")
aLabel.grid(column=0, row=1)
aLabel.configure(foreground="white")
aLabel.configure(background="black")

# 6 Create a selectFile function to be used by button
def selectFile():

    filename = filedialog.askopenfilename(initialdir="/", title="Select file")
    if(filename != ''):
        filename = filename.split('/')[-1] # this is for the windows separator only
        folder = filedialog.askdirectory() # 7 open a folder then create and start a new remove thread to delete the duplicate file
        if(folder != ''):
            folder = folder.replace('/', '\\')
            remove = Remove(folder, aLabel, filename)
            remove.start()

# 8 Adding a Button
action = Button(win, text="Select File", command=selectFile)
action.grid(column=0, row=0) # 9 Position the button
action.configure(background='brown')
action.configure(foreground='white')

win.mainloop()  # 10 start GUI

Next we will modify the remove class so now we can remove all the duplicate files within nested folder.

import threading
import os

class Remove(threading.Thread):

   def __init__(self, massage, aLabel, filename):

      threading.Thread.__init__(self)
      self.massage = massage
      self.label = aLabel
      self.filename = filename

   def run(self):

      filepaths = os.listdir(self.massage)

      for filepath in list(filepaths):
         os.chdir(self.massage)
         if(os.path.isfile(filepath)):
            if(filepath == self.filename):
               os.remove(filepath)
         else:
            self.delete_duplicate(os.path.join(self.massage, filepath))
      return

   def delete_duplicate(self, folder): # sub method to pass folder to

      filepaths = os.listdir(folder)

      for filepath in list(filepaths):
         os.chdir(folder)   # need this to reset the current folder
         if(os.path.isfile(filepath)):
            if (filepath == self.filename):
               os.remove(filepath)
         else:
            self.delete_duplicate(os.path.join(folder, filepath))

After you have selected a file from a folder and selected another folder which you want to search and remove the file with the same file name as the one inside the previous folder, you can just sit back and wait for the program to search and destroy all the duplicate files in that second folder plus all the duplicate files inside the folders that are within the second folder as well as the duplicate file insides the folder which contains inside the folder which is insides that second folder. The program has successfully removed all the duplicate files within folders that have less than 50 files each without any delays with only one new thread. Will the program slow down if there are lots of folders and files to search for? We don’t know yet and will only modify it if we need more threads to handle the job. Our next goal is to remove the file if and only if the content of that file is the same as the selected one. Remember what I have written in the previous chapter? We simply cannot assume that two files with the same name and the same file’s extension contain the same content. So stay tune for the next chapter.

↧

Philippe Normand: GStreamer’s playbin3 overview for application developers

December 8, 2018, 1:48 am

≫ Next: Philippe Normand: Web overlay in GStreamer with WPEWebKit

≪ Previous: codingdirectional: Remove the duplicate file from nested folders with python

Multimedia applications based on GStreamer usually handle playback with the playbin element. I recently added support for playbin3 in WebKit. This post aims to document the changes needed on application side to support this new generation flavour of playbin.

So, first of, why is it named playbin3 anyway? The GStreamer …

↧

Philippe Normand: Web overlay in GStreamer with WPEWebKit

December 8, 2018, 6:09 am

≫ Next: Weekly Python StackOverflow Report: (clv) stackoverflow python report

≪ Previous: Philippe Normand: GStreamer’s playbin3 overview for application developers

After a year or two of hiatus I attended the GStreamer conference which happened in beautiful Edinburgh. It was great to meet the friends from the community again and learn about what’s going on in the multimedia world. The quality of the talks was great, the videos are published …

↧

Weekly Python StackOverflow Report: (clv) stackoverflow python report

December 8, 2018, 9:05 am

≫ Next: Techiediaries - Django: Python Django JWT — djangorestframework-jwt Example

≪ Previous: Philippe Normand: Web overlay in GStreamer with WPEWebKit

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2018-12-08 16:54:21 GMT

↧

Give us a sneak peak on what you’re going to discuss in the webinar and what audience it is aimed at.

Let’s go back in time. Can you give us your Python origin story?

You are a core maintainer of tox. What’s it like running a large, popular open source project?

Testing is one part of your release automation vision. Can you discuss how other parts — pre-commit hooks, black, CI — fit together?

What’s been your experience using an IDE like PyCharm to visualize testing and the other pieces?

Look ahead 3 years from now. What’s next for tox and the release automation field?

Find and Modify – Deprecated

(How) Does this scale?

Related posts:

Matrix Plots

Heat Maps

Cluster Map

Seaborn Grids

Pair Grid

Facet Grids

Regression Plots

Plot Styling

Set Style

Change Figure Size

Set Context

Conclusion

Related posts: