Stack Abuse: Preparing for a Python Developer Interview

September 10, 2018, 5:41 am

≫ Next: Stack Abuse: Implementing Word2Vec with Gensim Library in Python

≪ Previous: Continuum Analytics Blog: AI Enablement Platform for Teams at Scale—Accelerate Your AI/ML Productivity with Anaconda Enterprise and Cisco UCS

Introduction

Preparing for a Python Developer Interview

In this article I will be giving my opinions and suggestions for putting yourself in the best position to out-perform competing candidates in a Python programming interview so that you can land a job as a Python developer.

You may be thinking, with the shortage of programmers in the job market all I need to do is show up and answer a few questions about basic Python syntax and let my degree or bootcamp certificate take care of the rest. Well let me be the first to tell you that is very unlikely to be the case, and even if it does work you are not likely to stay employed for long once the other members of your team realize you don't have the chops to cut it on their team.

So, if you are looking to break into the Python programming space or even move up to a senior Python developer role, I invite you to keep reading as I layout some important tips for being as competitive as possible in the interviewing game.

Know your way around Python

Seems obvious if you have applied to a job listing for a Python developer role that you should probably know Python. However, if you do not and you managed to bluff your way into an interview without the necessary knowledge of Python, you have some serious effort to put in. You better block out some significant time immediately to get up to speed on at least the basics of Python and realize that unless you have years of experience in another high-level, object oriented, programming language (ie, Java, JavaScript, C#, ect...) you probably stand very little chance of doing well enough in this interview to land the job. Sorry for the bad news... stop lying on job applications.

At the very least you should be able to white board out some idomatic Python constructs like loops, control flow structures, list comprehensions, and define some basic classes. If any of this is not sounding familiar I recommend you head over to Scott Robinson's Python Tutorial for Absolute Begginers article here on StackAbuse.

Showcase your Example Projects

I realize that you have been busy with school, a coding bootcamp, or your current / previous job, but I cannot stress the importance of this enough. When you are applying for a Python developer job you are effectively trying to convince the hiring manager that you possess the skills they need to make a significant contribution to a product or a project that will someday be a product that brings value to that company.

From my experience the best way to prove you can code is to hand over a reasonable amount of code that demonstrates your ability to produce a usable piece of software. This could be a simple web application, data processing script, or minimal desktop application. The key here is to give an idea of your ability to write code that is well organized, idiomatic, and readable.

The best way to do this is to have a public GitHub, BitBucket, or GitLab repository that houses your example project(s). This does a few things for you:

It puts you in the open source community which in and of itself is a great thing.
It demonstrates that you also know the basics of Git version control.
It gets your name out there and increases your chance of being contacted for jobs as well.

Regarding the second point, when you are building your example code project, treat it like a real project. Complete small peices of functionality at a time them commit them to version control with descriptive commits. You will be suprised at the effect of this. Hiring managers place high value on your understanding and ability to use Git version control.

Brush up on Data Structures and Algorithms

First off you should know the common Python data structures such as lists, dictionaries, tuples, and how to create classes.

Next, you should know the more generalized data structures such as linked lists, stacks, and queues that are not necessarily implemented in the Python standard library, but can be implemented using the language.

You should also be able to compare and contrast the basic Python data structures with the aforementioned generalized data structures and describe how you can either use exsting Python data structures such as lists to implement the functionality of a stack, or on the other hand, do a custom implementation of a class like a LinkedList.

Below is an example of a custom implementation of a linked list, which utilizes an Element (also referred to as Node) internal class to manage data elements.

class Element:  
    def __init__(self, value):
        self.value = value
        self.next = None

class LinkedList:  
    def __init__(self, head=None):
        self.head = head

    def append(self, value):
        if self.head is None:
            self.head = Element(value)
        else:
            current = self.head
            while current.next is not None:
                current = current.next
            current.next = Element(value)

    def pop(self):
        if self.head is None:
            return None

        if self.head.next is None:
            value = self.head.value
            self.head = None
            return value

        current = self.head
        while current.next.next:
            current = current.next
        value = current.next.value
        current.next = None
        return value

    def peek(self):
        if self.head is None:
            return None
        current = self.head
        while current.next:
            current = current.next
        return current.value

    def remove(self, value):
        if self.head is None:
            return None

        if self.head.next is None:
            if self.head.value == value:
                self.head = None
                return True
            return None

        if self.head.next.next is None:
            if self.head.next.value == value:
                self.head.next = None
                return True

        current = self.head
        while current.next.next:
            if current.next.value == value:
                current.next = current.next.next
                return True
            current = current.next
        return None

    def insert_first(self, value):
        next_element = self.head
        self.head = Element(value)
        self.head.next = next_element

    def delete_first(self):
        if self.head:
            new_first = self.head.next
            self.head = new_first

You should be able to identify instances where it would be benifical to use a particular data structure, like a linked list.

For example, if you know you will be inserting and deleting items from the front of a list often then it is significantly more efficient to use something like a LinkedList over a the standard Python list. However, it's worth mentioning that such an operation is most commonly used with a queue or a stack, which a LinkedList can be used for, but the Python collections module already has a built-in data structure useful for this called deque which would be important to bring up also during the discussion with the interviewers.

The primary objective of bringing up the custom implementation of a LinkedList in a Python interview would be to demonstrate your ability to code up a custom class and describe the differences between the standard Python list and the mechanics of a LinkedList.

Also, be aware of some basic algorithms that are used to perform common tasks such as sorting and searching.

For example, it would be good to explain how and why a binary search performs significantly better than a linear search in a list. Specifically, a linear search will always be O(n) while a binary search is O(log n). You would also want to explain when it is appropriate to use a binary search over a linear one. In cases where you expect to be searching a moderately large list many times it is likely worth the expense incurred to sort a list and make it capable of binary searches, but if a list is to be searched only a couple of times it may not be worth the expense of sorting it. Also worthy of a mention is to contemplate whether it is better to just use another data structure such as a dictionary in cases where the key you are searching on is a hashable structure, which will essentially give you O(1) item lookups and insertions.

Ability to Comprehend and Solve Problems

Being a rock star developer is much more than just memorizing a particular language's syntax or commonly used data structures and algorithms, however valuable that may be. The thing that will differentiate you from the crowd is your ability to comprehend a problem, use case, opportunity to be implemented in software, or whatever else you may refer to the things we are asked to translate into code.

What this requires is a combination of both hard and soft skills. You need to be able to actively listen to the feature requirement or bug description and identify the pertenent facts, and ask questions to drive out additional key aspects. Then you need to be able to break all that information down into individual tasks or components that can be carried out to then collectively work together to perform the desired functionality.

Believe me, this is ultimately what an employer wants to test you on, how you handle being presented with a programming task or problem and your ability to identify key pieces of information and use it to devise a solution.

This is easier said than done. However, there are a few things that will increase your likihood of success, namely by putting in lots of practice and getting exposure to a variety of problems. The more problems your are exposed to the more you are able to start recognizing common patterns in problems and reoccuring solutions that often vary only minimally. A great way to gain experience solving programming problems is to use a service like Daily Coding Problem.

The Daily Coding Problem is a service that you can signup for which will email you a different programming problem presented in Python every day for you to solve. For example, the home page of The Daily Programming Problem gives an example of the types of problems you can expect to recieve such as, "There's a staircase with N steps, and you can climb 1 or 2 steps at a time. Given N, write a function that returns the number of unique ways you can climb the staircase. The order of the steps matters".

Interestingly, the permutations of different step combinations simplifies to a summation of the combinations of steps for (N - 1) + (N - 2) which you might recognize as the primary logic for implementing the algorithm for a Nth number Fibonacci sequence.

Let me elaborate more on this.

How many different ways can you climb one (N = 1) stair taking 1 or 2 steps at a time? A set of exactly one [1].

N = 1 => [1]

Now how about two stairs (N = 2)?

N = 2 => [1, 1], [2]

Then for a formula of f(N) = f(N - 1) + f(N - 2) as long as N > 0

[1] + ([1,1], [2]) = [1,1,1], [1,2], [2,1]

As I mentioned earlier, this is the recursive implementation of the Fibonacci sequence and in Python it looks like this.

def step_combinations(stairs):  
    if stairs <= 1:
        return 1
    return step_combinations(stairs - 1) + step_combinations(stairs - 2)

With Daily Coding Problem, not only will you get practice problems every day, but you can also get detailed solutions to those problems, for a small discounted fee, to help you solve the extra tricky problems or let you compare your solutions to those provided by the service.

Icing on the Cake

Since this is an article on interviewing for a Python job I have been focussing on Python-specific technical skills, however in my experience, rarely will a Python developer actually only code in Python. In fact, its probably not even a good idea from a long-term employability standpoint to think you will only ever work with one technology or programming language.

My advice is to pay attention to the ansilary technologies that are often on the job listing in sections like "Nice to haves", which may list things like JavaScript, CSS, Java, etc... and be prepared to lightly delve into those as well. This shows you are able and willing to learn other things that will bring value to the company you are applying for.

Another beneficial things to do is to have some knowledge of the company. Do some basic research on the company you have applied to work for. Focus on things like identifying key revenue streams and any culture identities the company may have or are trying to establish.

Last but not least, I would like to touch on dressing for an interview. It should go without saying that it pays to dress to impress but, I have actually heard of and seen developers show up to interviews in jeans and hoodies... Doinke! At the very least, if the company's culture is loose enough, you should dress in business casual, but I still recommend a suit. You have already put in the effort to be able to show off your mad Python skills and wow them with your knowledge of the company, so don't blow it by leaving them with the lasting impression of "yea he seemed like he knew about programming, but so did the other N candidates who looked like they didn't just wander in from the arcade".

Simply put, take pride in your appearance and not just your Python skills.

Conclusion

In this article I have tried to articulate what I have come to find are key differentiators that can put you ahead of the competition while interviewing for a Python developer role. I have mentioned the importance of actually knowing Python, the usefullness of common data structures and algorithms, becoming a better problem solver by exposure to many problems via services like The Daily Coding Problem, and even the basics such as company research and appropriate attire. I hope you found some value in this article but, most of all I hope it helps you nail that upcoming Python interview.

As always I thank you for reading and welcome comments and criticisms below.

Resources

↧

Stack Abuse: Implementing Word2Vec with Gensim Library in Python

September 10, 2018, 6:31 am

≫ Next: Talk Python to Me: #176 The Python Community by the Numbers

≪ Previous: Stack Abuse: Preparing for a Python Developer Interview

Introduction

Humans have a natural ability to understand what other people are saying and what to say in response. This ability is developed by consistently interacting with other people and the society over many years. The language plays a very important role in how humans interact. Languages that humans use for interaction are called natural languages.

The rules of various natural languages are different. However, there is one thing in common in natural languages: flexibility and evolution.

Natural languages are highly very flexible. Suppose, you are driving a car and your friend says one of these three utterances: "Pull over", "Stop the car", "Halt". You immediately understand that he is asking you to stop the car. This is because natural languages are extremely flexible. There are multiple ways to say one thing.

Another important aspect of natural languages is the fact that they are consistently evolving. For instance, a few years ago there was no term such as "Google it", which refers to searching for something on the Google search engine. Natural languages are always undergoing evolution.

On the contrary, computer languages follow a strict syntax. If you want to tell a computer to print something on the screen, there is a special command for that. The task of Natural Language Processing is to make computers understand and generate human language in a way similar to humans.

This is a huge task and there are many hurdles involved. This video lecture from the University of Michigan contains a very good explanation of why NLP is so hard.

In this article we will implement the Word2Vec word embedding technique used for creating word vectors with Python's Gensim library. However, before jumping straight to the coding section, we will first briefly review some of the most commonly used word embedding techniques, along with their pros and cons.

Word Embedding Approaches

One of the reasons that Natural Language Processing is a difficult problem to solve is the fact that, unlike human beings, computers can only understand numbers. We have to represent words in a numeric format that is understandable by the computers. Word embedding refers to the numeric representations of words.

Several word embedding approaches currently exist and all of them have their pros and cons. We will discuss three of them here:

Bag of Words
TF-IDF Scheme
Word2Vec

Bag of Words

The bag of words approach is one of the simplest word embedding approaches. The following are steps to generate word embeddings using the bag of words approach.

We will see the word embeddings generated by the bag of words approach with the help of an example. Suppose you have a corpus with three sentences.

S1 = I love rain
S2 = rain rain go away
S3 = I am away

To convert above sentences into their corresponding word embedding representations using the bag of words approach, we need to perform the following steps:

Create a dictionary of unique words from the corpus. In the above corpus, we have following unique words: [I, love, rain, go, away, am]
Parse the sentence. For each word in the sentence, add 1 in place of the word in the dictionary and add zero for all the other words that don't exist in the dictionary. For instance, the bag of words representation for sentence S1 (I love rain), looks like this: [1, 1, 1, 0, 0, 0]. Similarly for S2 and S3, bag of word representations are [0, 0, 2, 1, 1, 0] and [1, 0, 0, 0, 1, 1], respectively.

Notice that for S2 we added 2 in place of "rain" in the dictionary; this is because S2 contains "rain" twice.

Pros and Cons of Bag of Words

Bag of words approach has both pros and cons. The main advantage of the bag of words approach is that you do not need a very huge corpus of words to get good results. You can see that we build a very basic bag of words model with three sentences. Computationally, a bag of words model is not very complex.

A major drawback of the bag of words approach is the fact that we need to create huge vectors with empty spaces in order to represent a number (sparse matrix) which consumes memory and space. In the example previous, we only had 3 sentences. Yet you can see three zeros in every vector.

Imagine a corpus with thousands of articles. In such a case, the number of unique words in a dictionary can be thousands. If one document contains 10% of the unique words, the corresponding embedding vector will still contain 90% zeros.

Another major issue with the bag of words approach is the fact that it doesn't maintain any context information. It doesn't care about the order in which the words appear in a sentence. For instance, it treats the sentences "Bottle is in the car" and "Car is in the bottle" equally, which are totally different sentences.

A type of bag of words approach, known as n-grams, can help maintain the relationship between words. N-gram refers to a contiguous sequence of n words. For instance, 2-grams for the sentence "You are not happy", are "You are", "are not" and "not happy". Although the n-grams approach is capable of capturing relationships between words, the size of the feature set grows exponentially with too many n-grams.

TF-IDF Scheme

The TF-IDF scheme is a type of bag words approach where instead of adding zeros and ones in the embedding vector, you add floating numbers that contain more useful information compared to zeros and ones. The idea behind TF-IDF scheme is the fact that words having a high frequency of occurrence in one document, and less frequency of occurrence in all the other documents, are more crucial for classification.

TF-IDF is a product of two values: Term Frequency (TF) and Inverse Document Frequency (IDF).

Term frequency refers to the number of times a word appears in the document and can be calculated as:

Term frequence = (Number of Occurences of a word)/(Total words in the document)

For instance, if we look at sentence S1 from the previous section i.e. "I love rain", every word in the sentence occurs once and therefore has a frequency of 1. On the contrary, for S2 i.e. "rain rain go away", the frequency of "rain" is two while for the rest of the words, it is 1.

IDF refers to the log of the total number of documents divided by the number of documents in which the word exists, and can be calculated as:

IDF(word) = Log((Total number of documents)/(Number of documents containing the word))

For instance, the IDF value for the word "rain" is 0.1760, since the total number of documents is 3 and rain appears in 2 of them, therefore log(3/2) is 0.1760. On the other hand, if you look at the word "love" in the first sentence, it appears in one of the three documents and therefore its IDF value is log(3), which is 0.4771.

Pros and Cons of TF-IDF

Though TF-IDF is an improvement over the simple bag of words approach and yields better results for common NLP tasks, the overall pros and cons remain the same. We still need to create a huge sparse matrix, which also takes a lot more computation than the simple bag of words approach.

Word2Vec

The Word2Vec embedding approach, developed by Tomas Mikolov, is considered the state of the art. Word2Vec approach uses deep learning and neural networks-based techniques to convert words into corresponding vectors in such a way that the semantically similar vectors are close to each other in N-dimensional space, where N refers to the dimensions of the vector.

Word2Vec returns some astonishing results. Word2Vec's ability to maintain semantic relation is reflected by a classic example where if you have a vector for the word "King" and you remove the vector represented by the word "Man" from the "King" and add "Women" to it, you get a vector which is close to the "Queen" vector. This relation is commonly represented as:

King - Man + Women = Queen

Word2Vec model comes in two flavors: Skip Gram Model and Continuous Bag of Words Model (CBOW).

In the Skip Gram model, the context words are predicted using the base word. For instance, given a sentence "I love to dance in the rain", the skip gram model will predict "love" and "dance" given the word "to" as input.

On the contrary, the CBOW model will predict "to", if the context words "love" and "dance" are fed as input to the model. The model learns these relationships using deep neural networks.

Pros and Cons of Word2Vec

Word2Vec has several advantages over bag of words and IF-IDF scheme. Word2Vec retains the semantic meaning of different words in a document. The context information is not lost. Another great advantage of Word2Vec approach is that the size of the embedding vector is very small. Each dimension in the embedding vector contains information about one aspect of the word. We do not need huge sparse vectors, unlike the bag of words and TF-IDF approaches.

Note: The mathematical details of how Word2Vec works involve an explanation of neural networks and softmax probability, which is beyond the scope of this article. If you want to understand the mathematical grounds of Word2Vec, please read this paper: https://arxiv.org/abs/1301.3781

Word2Vec in Python with Gensim Library

In this section, we will implement Word2Vec model with the help of Python's Gensim library. Follow these steps:

Creating Corpus

We discussed earlier that in order to create a Word2Vec model, we need a corpus. In real-life applications, Word2Vec models are created using billions of documents. For instance Google's Word2Vec model is trained using 3 million words and phrases. However, for the sake of simplicity, we will create a Word2Vec model using a Single Wikipedia article. Our model will not be as good as Google's. Although, it is good enough to explain how Word2Vec model can be implemented using the Gensim library.

Before we could summarize Wikipedia articles, we need to fetch them. To do so we will use a couple of libraries. The first library that we need to download is the Beautiful Soup library, which is a very useful Python utility for web scraping. Execute the following command at command prompt to download the Beautiful Soup utility.

$ pip install beautifulsoup4

Another important library that we need to parse XML and HTML is the lxml library. Execute the following command at command prompt to download lxml:

$ pip install lxml

The article we are going to scrape is the Wikipedia article on Artificial Intelligence. Let's write a Python Script to scrape the article from Wikipedia:

import bs4 as bs  
import urllib.request  
import re  
import nltk

scrapped_data = urllib.request.urlopen('https://en.wikipedia.org/wiki/Artificial_intelligence')  
article = scrapped_data .read()

parsed_article = bs.BeautifulSoup(article,'lxml')

paragraphs = parsed_article.find_all('p')

article_text = ""

for p in paragraphs:  
    article_text += p.text

In the script above, we first download the Wikipedia article using the urlopen method of the request class of the urllib library. We then read the article content and parse it using an object of the BeautifulSoup class. Wikipedia stores the text content of the article inside p tags. We use the find_all function of the BeautifulSoup object to fetch all the contents from the paragraph tags of the article.

Finally, we join all the paragraphs together and store the scraped article in article_text variable for later use.

Preprocessing

At this point we have now imported the article. The next step is to preprocess the content for Word2Vec model. The following script preprocess the text:

# Cleaing the text
processed_article = article_text.lower()  
processed_article = re.sub('[^a-zA-Z]', ' ', processed_article )  
processed_article = re.sub(r'\s+', ' ', processed_article)

# Preparing the dataset
all_sentences = nltk.sent_tokenize(processed_article)

all_words = [nltk.word_tokenize(sent) for sent in all_sentences]

# Removing Stop Words
from nltk.corpus import stopwords  
for i in range(len(all_words)):  
    all_words[i] = [w for w in all_words[i] if w not in stopwords.words('english')]

In the script above, we convert all the text to lowercase and then remove all the digits, special characters, and extra spaces from the text. After preprocessing, we are only left with the words.

The Word2Vec model is trained on a collection of words. First, we need to convert our article into sentences. We use nltk.sent_tokenize utility to convert our article into sentences. To convert sentences into words, we use nltk.word_tokenize utility. As a last preprocessing step, we remove all the stop words from the text.

After the script completes its execution, the all_words object contains the list of all the words in the article. We will use this list to create our Word2Vec model with the Gensim library.

Creating Word2Vec Model

With Gensim, it is extremely straightforward to create Word2Vec model. The word list is passed to the Word2Vec class of the gensim.models package. We need to specify the value for the min_count parameter. A value of 2 for min_count specifies to include only those words in the Word2Vec model that appear at least twice in the corpus. The following script creates Word2Vec model using the Wikipedia article we scraped.

from gensim.models import Word2Vec

word2vec = Word2Vec(all_words, min_count=2)

To see the dictionary of unique words that exist at least twice in the corpus, execute the following script:

vocabulary = word2vec.wv.vocab  
print(vocabulary)

When the above script is executed, you will see a list of all the unique words occurring at least twice.

Model Analysis

We successfully created our Word2Vec model in the last section. Now is the time to explore what we created.

Finding Vectors for a Word

We know that the Word2Vec model converts words to their corresponding vectors. Let's see how we can view vector representation of any particular word.

v1 = word2vec.wv['artificial']

The vector v1 contains the vector representation for the word "artificial". By default, a hundred dimensional vector is created by Gensim Word2Vec. This is a much, much smaller vector as compared to what would have been produced by bag of words. If we use the bag of words approach for embedding the article, the length of the vector for each will be 1206 since there are 1206 unique words with a minimum frequency of 2. If the minimum frequency of occurrence is set to 1, the size of the bag of words vector will further increase. On the other hand, vectors generated through Word2Vec are not affected by the size of the vocabulary.

Finding Similar Words

Earlier we said that contextual information of the words is not lost using Word2Vec approach. We can verify this by finding all the words similar to the word "intelligence".

Take a look at the following script:

sim_words = word2vec.wv.most_similar('intelligence')

If you print the sim_words variable to the console, you will see the words most similar to "intelligence" as shown below:

('ai', 0.7124934196472168)
('human', 0.6869025826454163)
('artificial', 0.6208730936050415)
('would', 0.583903431892395)
('many', 0.5610555410385132)
('also', 0.5557990670204163)
('learning', 0.554862380027771)
('search', 0.5522681474685669)
('language', 0.5408136248588562)
('include', 0.5248900055885315)

From the output, you can see the words similar to "intelligence" along with their similarity index. The word "ai" is the most similar word to "intelligence" according to the model, which actually makes sense. Similarly, words such as "human" and "artificial" often coexist with the word "intelligence". Our model has successfully captured these relations using just a single Wikipedia article.

Conclusion

In this article, we implemented a Word2Vec word embedding model with Python's Gensim Library. We did this by scraping a Wikipedia article and built our Word2Vec model using the article as a corpus. We also briefly reviewed the most commonly used word embedding approaches along with their pros and cons as a comparison to Word2Vec.

I would suggest you to create a Word2Vec model of your own with the help of any text corpus and see if you can get better results compared to the bag of words approach.

↧

Talk Python to Me: #176 The Python Community by the Numbers

September 10, 2018, 1:00 am

≫ Next: Real Python: The Best Python Books

≪ Previous: Stack Abuse: Implementing Word2Vec with Gensim Library in Python

The Python landscape is changing pretty dramatically. Python's rapid growth over the past 5 years means it doesn't look the same as the early days. On this episode, we take a deep look inside the state of the Python ecosystem with Ewa Jodlowska and Dmitry Filippov. They lead the PSF and JetBrains Python survey. And they are here to dig into the results.

↧

Real Python: The Best Python Books

September 10, 2018, 7:00 am

≫ Next: Python Anywhere: Python 3.7 now available!

≪ Previous: Talk Python to Me: #176 The Python Community by the Numbers

Python is an amazing programming language. It can be applied to almost any programming task, allows for rapid development and debugging, and brings the support of what is arguably the most welcoming user community.

Getting started with Python is like learning any new skill: it’s important to find a resource you connect with to guide your learning. Luckily, there’s no shortage of excellent books that can help you learn both the basic concepts of programming and the specifics of programming in Python. With the abundance of resources, it can be difficult to identify which book would be best for your situation.

In this article, we highlight the best books for learning Python through a collection of book reviews. Each review gives you a taste of the book, the topics covered, and the context used to illustrate those topics. Different books will resonate with different people, depending on the style and presentation of the books, the readers’ backgrounds, as well as other factors.

If you are new to Python, any of the introductory books will give you a solid foundation in the basics.

Perhaps you want to learn Python with your kid, or maybe teach Python to a group of kids. Check out the Best Python Books for Kids for resources aimed at a younger audience.

As you progress in you Python journey, you will want to dig deeper to maximize the efficiency of your code. The best intermediate and advanced Python books provide insight to help you level up your Python skills, enabling you to become an expert Pythonista.

After reading these reviews, if you still are not sure which book to choose, publishers often provide a sample chapter or section to give you an example of what the book offers. Reading a sample of the book should give you the most representative picture of the author’s pace, style, and expectations.

Regardless of which book most stands out, consider this anecdote from one of our book reviewers, Steven C. Howell:

“A favorite professor once told me, ‘It doesn’t matter which book you read first. It’s always the second one that makes the most sense.’
I can’t say this has always been the case for me, but I’ve definitely found that a second reference can make all the difference when the first left me puzzled or frustrated.
When learning Python classes, I had difficulty relating to the examples used in the first two books I picked up. It wasn’t until the third book I referred to that the concepts started to click.
The important lesson is that if you get stuck or frustrated, and the resources you have are not helping, then don’t give up. Look at another book, search the web, ask on a forum, or just take a break.”

Note: This article contains affiliate links to retailers like Amazon, so you can support Real Python by clicking through and making a purchase on some of the links. Purchasing from one of these links adds no extra cost to you. Affiliate links never influence our editorial decisions in any way.

Best Books for Learning Python

If you are new to Python, you are likely in one of the following two situations:

You are new to programming and want to start by learning Python.
You have a reasonable amount of programming experience in another language and now want to learn Python.

This section focuses on the first of these two scenarios, with reviews of the books we consider to be the best Python programming books for readers who are new to both programming and Python. Accordingly, these books require no previous programming experience. They start from the absolute basics and teach both general programming concepts as well as how they apply to Python.

Note: If you’re looking for the best Python books for experienced programmers, consider the following selection of books with full reviews in the intro and advanced sections:

Think Python: The most basic of this list, Think Python provides a comprehensive Python reference.
Fluent Python: While Python’s simplicity lets you quickly start coding, this book teaches you how to write idiomatic Python code, while going into several deep topics of the language.
Effective Python: 59 Ways to Write Better Python: This relatively short book is a collection of 59 articles that, similarly to Fluent Python, focus on teaching you how to write truly Pythonic code.
Python Cookbook: As a cookbook, this will be a good reference on how to use Python to complete tasks you have done in another language.

Alternatively, you may even prefer to go directly to the official Python Tutorial, a well-written and thorough resource.

Python Crash Course

Eric Matthes (No Starch Press, 2016)

It does what it says on the tin, and it does it really well. The book starts out with a walkthrough of the basic Python elements and data structures, working through variables, strings, numbers, lists, and tuples, outlining how you work with each of them.

Next, if statements and logical tests are covered, followed by a dive into dictionaries.

After that, the book covers user input, while loops, functions, classes, and file handling, as well as code testing and debugging.

That’s just the first half of the book! In the second half, you work on three major projects, creating some clever, fun applications.

The first project is an Alien Invasion game, essentially Space Invaders, developed using the pygame package. You design a ship (using classes), then program how to pilot it and make it fire bullets. Then, you design several classes of aliens, make the alien fleet move, and make it possible to shoot them down. Finally, you add a scoreboard and a list of high scores to complete the game.

After that, the next project covers data visualization with matplotlib, random walks, rolling dice, and a little bit of statistical analysis, creating graphs and charts with the pygal package. You learn how to download data in a variety of formats, import it into Python, and visualize the results, as well as how to interact with web APIs, retrieving and visualizing data from GitHub and HackerNews.

The third project walks you through the creation of a complete web application using Django to set up a Learning Log to track what users have been studying. It covers how to install Django, set up a project, design your models, create an admin interface, set up user accounts, manage access controls on a per-user basis, style your entire app with Bootstrap, and then finally deploy it to Heroku.

This book is well written and nicely organized. It presents a large number of useful exercises as well as three challenging and entertaining projects that make up the second half of the book. (Reviewed by David Schlesinger.)

Head-First Python, 2nd edition

Paul Barry (O’Reilly, 2016)

I really like the Head-First series of books, although they’re admittedly lighter weight in overall content than many of the other recommendations in this section. The trade-off is the that this approach makes the book more user-friendly.

If you’re the kind of person who likes to learn things one small, fairly self-contained chunk at a time, and you want to have lots of concrete examples and illustrations of the concepts involved, then the Head-First series is for you. The publisher’s website has the following to say about their approach:

“Based on the latest research in cognitive science and learning theory, Head-First Python uses a visually rich format to engage your mind, rather than a text-heavy approach that puts you to sleep. Why waste your time struggling with new concepts? This multi-sensory learning experience is designed for the way your brain really works.” (Source)

Chock full of illustrations, examples, asides, and other tidbits, Head-First Python is consistently engaging and easy to read. This book starts its tour of Python by diving into lists and explaining how to use and manipulate them. It then goes into modules, errors, and file handling. Each topic is organized around a unifying project: building a dynamic website for a school athletic coach using Python through a Common Gateway Interface (CGI).

After that, the book spends time teaching you how to use an Android application to interact with the website you created. You learn to handle user input, wrangle data, and look into what’s involved in deploying and scaling a Python application on the web.

While this book isn’t as comprehensive as some of the others, it covers a good range of Python tasks in a way that’s arguably more accessible, painless, and effective. This is especially true if you find the subject of writing programs somewhat intimidating at first.

This book is designed to guide you through any challenge. While the content is more focused, this book has plenty of material to keep you busy and learning. You will not be bored. If you find most programming books to be too dry, this could be an excellent book for you to get started in Python. (Reviewed by David Schlesinger and Steven C. Howell.)

Invent Your Own Computer Games with Python, 4th edition

Al Sweigart (No Starch, 2017)

If games are your thing, or you even have a game idea of your own, this would be the perfect book to learn Python. In this book, you learn the fundamentals of programming and Python with the application exercises focused on building classic games.

Starting with an introduction to the Python shell and the REPL loop, followed by a basic “Hello, World!” script, you dive right into making a basic number-guessing game, covering random numbers, flow control, type conversion, and Boolean data. After that, a small joke-telling script is written to illustrate the use of print statements, escape characters, and basic string operations.

The next project is a text-based cave exploration game, Dragon’s Realm, which introduces you to flowcharts and functions, guides you through how to define your own arguments and parameters, and explains Boolean operators, global and local scope, and the sleep() function.

After a brief detour into how to debug your Python code, you next implement the game of Hangman, using ASCII artwork, while learning about lists, the in operator, methods, elif statements, the random module, and a handful of string methods.

You then extend the Hangman game with new features, like word lists and difficulty levels, while learning about dictionaries, key-value pairs, and assignment to multiple variables.

Your next project is a Tic-Tac-Toe game, which introduces some high-level artificial intelligence concepts, shows you how to short-circuit evaluation in conditionals, and explains the None value as well as some different ways of accessing lists.

Your journey through the rest of the book proceeds in a similar vein. You’ll learn nested loops while building a Mastermind-style number guessing game, Cartesian coordinates for a Sonar Hunt game, cryptography to write a Caesar cipher, and artificial intelligence when implementing Reversi (also known as Othello), in which the computer can play against itself.

After all of this, there’s a dive into using graphics for your games with PyGame: you’ll cover how to animate the graphics, manage collision detection, as well as use sounds, images, and sprites. To bring all these concepts together, the book guides you through making a graphical obstacle-dodging game.

This book is well done, and the fact that each project is a self-contained unit makes it appealing and accessible. If you’re someone who likes to learn by doing, then you’ll enjoy this book.

The fact that this book introduces concepts only as needed can be a possible disadvantage. While it’s organized more as a guide than a reference, the broad range of contents taught in the context of familiar games makes this one of the best books for learning Python. (Reviewed by David Schlesinger.)

Think Python: How to Think Like a Computer Scientist, 2nd edition

Allen B. Downey (O’Reilly, 2015)

If learning Python by creating video games is too frivolous for you, consider Allen Downey’s book Think Python, which takes a much more serious approach.

As the title says, the goal of this book is to teach you how coders think about coding, and it does a good job of it. Compared to the other books, it’s drier and organized in a more linear way. The book focuses on everything you need to know about basic Python programming, in a very straightforward, clear, and comprehensive way.

Compared to other similar books, it doesn’t go quite as deep into some of the more advanced areas, instead covering a wider range of material, including topics the other books don’t go anywhere near. Examples of such topics include operator overloading, polymorphism, analysis of algorithms, and mutability versus immutability.

Previous versions were a little light on exercises, but the latest edition has largely corrected this shortcoming. The book contains four reasonably deep projects, presented as case studies, but overall, it has fewer directed application exercises compared to many other books.

If you like a step-by-step presentation of just the facts, and you want to get a little additional insight into how professional coders look at problems, this book is a great choice. (Reviewed by David Schlesinger and Steven C. Howell.)

Effective Computation in Physics: Field Guide to Research with Python

Anthony Scopatz, Kathryn D. Huff (O’Reilly, 2015)

This is the book I wish I had when I was first learning Python.

Despite its name, this book is an excellent choice for people who don’t have experience with physics, research, or computational problems.

It really is a field guide for using Python. On top of actually teaching you Python, it also covers the related topics, like the command-line and version control, as well as the testing and deploying of software.

In addition to being a great learning resource, this book will also serve as an excellent Python reference, as the topics are well organized with plenty of interspersed examples and exercises.

The book is divided into four aptly named sections: Getting Started, Getting it Done, Getting it Right, and Getting it Out There.

The Getting Started section contains everything you need to hit the ground running. It begins with a chapter on the fundamentals of the bash command-line. (Yes, you can even install bash for Windows.) The book then proceeds to explain the foundations of Python, hitting on all the expected topics: operators, strings, variables, containers, logic, and flow control. Additionally, there is an entire chapter dedicated to all the different types of functions, and another for classes and object-oriented programming.

Building on this foundation, the Getting it Done section moves into the more data-centric area of Python. Note that this section, which takes up approximately a third of the book, will be most applicable to scientists, engineers, and data scientists. If that is you, enjoy. If not, feel free to skip ahead, picking out any pertinent sections. But be sure to catch the last chapter of the section because it will teach you how to deploy software using pip, conda, virtual machines, and Docker containers.

For those of you who are interested in working with data, the section begins with a quick overview of the essential libraries for data analysis and visualization. You then have a separate chapter dedicated to teaching you the topics of regular expressions, NumPy, data storage (including performing out-of-core operations), specialized data structures (hash tables, data frames, D-trees, and k-d trees), and parallel computation.

The Getting it Right section teaches you how to avoid and overcome many of the common pitfalls associated with working in Python. It begins by extending the discussion on deploying software by teaching you how to build software pipelines using make. You then learn how to use Git and GitHub to track, store, and organize your code edits over time, a process known as version control. The section concludes by teaching you how to debug and test your code, two incredibly valuable skills.

The final section, Getting it Out There, focuses on effectively communicating with the consumers of your code, yourself included. It covers the topics of documentation, markup languages (primarily LaTeX), code collaboration, and software licenses. The section, and book, concludes with a long list of scientific Python projects organized by topic.

This book stands out because, in addition to teaching all the fundamentals of Python, it also teaches you many of the technologies used by Pythonistas. This is truly one of the best books for learning Python.

It also serves as a great reference, will a full glossary, bibliography, and index. The book definitely has a scientific Python spin, but don’t worry if you do not come from a scientific background. There are no mathematical equations, and you may even impress your coworkers when they see you are on reading up on Computational Physics! (Reviewed by Steven C Howell.)

Learn Python 3 the Hard Way

Zed A. Shaw (Addison-Wesley, 2016)

Learn Python the Hard Way is a classic. I’m a big fan of the book’s approach. When you learn “the hard way,” you have to:

Type in all the code yourself
Do all the exercises
Find your own solutions to problems you run into

The great thing about this book is how well the content is presented. Each chapter is clearly presented. The code examples are all concise, well constructed, and to the point. The exercises are instructive, and any problems you run into will not be at all insurmountable. Your biggest risk is typographical errors. Make it through this book, and you’ll definitely no longer be a beginner at Python.

Don’t let the title put you off. The “hard way” turns out to be the easy way if you take the long view. Nobody loves typing a lot of stuff in, but that’s what programming actually involves, so it’s good to get used to it from the start. One nice thing about this book is that it has been refined through several editions now, so any rough edges have been made nice and smooth by now.

The book is constructed as a series of over fifty exercises, each building on the previous, and each teaching you some new feature of the language. Starting from Exercise 0, getting Python set up on your computer, you begin writing simple programs. You learn about variables, data types, functions, logic, loops, lists, debugging, dictionaries, object-oriented programming, inheritance, and packaging. You even create a simple game using a game engine.

The next sections cover concepts like automated testing, lexical scanning on user input to parse sentences, and the lpthw.web package, to put your game up on the web.

Zed is an engaging, patient writer who doesn’t gloss over the details. If you work through this book the right way—the “hard way,” by following up on the study suggestions provided throughout the text as well as the programming exercises—you’ll be well beyond the beginner programmer stage when you’ve finished. (Reviewed by David Schlesinger.)

Note: Of all the books included in this article, this is the only with somewhat mixed reviews. The Stack Overflow (SO) community has compiled a list of 22 complaints prefaced with the following statement:

“We noticed a general trend that users using [Learn Python the Hard Way] post questions that don’t make a lot of sense both on SO and in chat. This is due to the structure and techniques used in the book.” (Source)

They provide their own list of recommended tutorials, which includes the following:

Despite the negative criticism toward Learn Python the Hard Way, David Schlesinger and Amazon reviewers agree that the book is worthwhile, though you probably want to supplement your library with another Python book that could serve more as a reference. Also, be sure to do your due diligence before posting questions to Stack Overflow, as that community can be somewhat abrasive at times.

Real Python Course, Part 1

Real Python Team (Real Python, 2017)

This eBook is the first of three (so far) in the Real Python course series. It was written with the goal of getting you up and running, and it does a great job at achieving this goal. The book is a mix of explanatory prose, example code, and review exercises. The interspersed review exercises solidify your learning by letting you immediately apply what you’ve learned.

As with the previous books, clear instructions are provided up front for getting Python installed and running on your computer. After the setup section, rather than giving a dry overview of data types, Real Python simply starts with strings and is actually quite thorough: you learn string slicing before you hit page 30.

Then the book gives you a good sense of the flavor of Python by showing you how to play with some of the class methods that can be applied. Next, you learn to write functions and loops, use conditional logic, work with lists and dictionaries, and read and write files.

Then things get really fun! Once you’ve learned to install packages with pip (and from source), Real Python covers interacting with and manipulating PDF files, using SQL from within Python, scraping data from web pages, using numpy and matplotlib to do scientific computing, and finally, creating graphical user interfaces with EasyGUI and tkinter.

What I like best about Real Python is that, in addition to covering the basics in a thorough and friendly way, the book explores some more advanced uses of Python that none of the other books hit on, like web-scraping. There are also two additional volumes, which go into more advanced Python development. (Reviewed by David Schlesinger.)

View On Real Python »

Disclaimer: I first started using the Real Python books several years ago, when they were still in beta. I thought then—and still think now—that they’re one of the best resources available to learn the Python language and several ways it can be used. My gig writing articles on the Real Python web site is a much more recent development, and my review is completely independent. — David

Best Python Books for Kids

The following books are aimed at adults interested in teaching kids to code, while possibly learning it themselves along the way. Both of these books are recommended for kids as young as 9 or 10, but they are great for older kids as well.

It’s important to note that these books are not meant to be just handed to a kid, depending on their age. They would be ideal for a parent who wanted to learn Python alongside their child.

Python for Kids: A Playful Introduction to Programming

Jason R. Briggs (No Starch, 2013)

“Playful” is right! This is a fun book for all ages, despite its title. It provides a clear, easy to follow, introduction to Python programming. It’s profusely illustrated, the examples are straightforward and clearly presented, and it’s a solid guide for someone who wants to get a good grounding in the basics, plus a little more.

The book begins with an excellent, detailed guide to getting Python installed on your system, whether that’s Windows, OS X, or Ubuntu Linux. It then proceeds to introduce the Python shell and how it can be used as a simple calculator. This serves to introduce some basic concepts like variables and arithmetic operation.

Next, iterables are tackled, and the chapter works its way progressively through strings, lists, tuples, and dictionaries.

Once that’s accomplished, the Python turtle library is used to begin working with turtle graphics, a popular framework for teaching children to code. From there, the book progresses through conditional statements, loops, functions, and modules.

Classes and objects are covered, followed by a truly excellent section on Python’s built-in functions, and then a section on a number of useful Python libraries and modules. Turtle graphics are revisited in greater detail, after which the book introduces tkinter for creating user interfaces, better graphics, and even animations.

This concludes part 1 of the book, “Learning to Program,” with the remainder focused on building two fun application projects. The first project is to build a single-player version of Pong, called Bounce! This integrates the programming concepts of functions, classes, and control flow, together with the tasks of creating an interface using tkinter, illustrating to the canvas, performing geometric calculations, and using event bindings to create interactivity.

In the second project, you build a side-scrolling video game, Mr. Stickman Races for the Exit. This game applies many of the same concepts and tasks as Bounce! but with more depth and increased complexity. Along the way, you also get introduced to the open source image manipulation program GIMP, used to create your game’s assets. The book gets an amazing amount of mileage out of these two games, and getting them working is both instructive and a lot of fun.

I really like this book. Whether you are young, or just young at heart, you will enjoy this book if you are looking for a fun, approachable, introduction to Python and programming. (Reviewed by David Schlesinger and Steven C. Howell.)

Teach Your Kids to Code: A Parent-Friendly Guide to Python Programming

Bryson Payne (No Starch, 2015)

This book is similar to Python for Kids but intended more for an adult working with a child (or children) to learn to code, as the title suggests. One thing that sets this book apart from most introductory books is the use of color and illustrations on almost every page. The book is well written and presents learning to code as a way to teach children problem-solving skills.

As is commonly the case, this book begins with a Python installation guide. Compared to Python for Kids, the guide in this book is more cursory but completely adequate.

The first activity is, again, turtle graphics. A number of basic variations on drawing a rotated square are presented—without a lot of underlying explanation, initially—just to introduce the general concepts, but by the end of the section, you’ll have been provided with a pretty good understanding of the basics.

Next, calculations, variables, and mathematics in Python are explained. Once strings have been covered, the book brings all of that back into turtle graphics to enhance and explore the work that was done earlier. By this point, the code explanations are extremely clear, with explicit line-by-line details. You’d have a hard time misunderstanding any of the code presented.

Lists are explored next, as is the eval() function. Loops are introduced and then used to create increasingly complex graphics with the turtle. Conditional expressions come next, along with Boolean logic and operators.

The random library is introduced with a guessing game and randomly placed spirals made with turtle graphics. You explore randomness further by implementing rolling dice and picking cards, which leads up to you creating the games Yahtzee and War.

Functions, more advanced graphics, and user interaction are investigated next.

The book then branches off to cover using PyGame to create even more advanced graphics and animations, and then user interaction to create a very simple drawing program.

At this point, you have all the tools to create some real games. Development of both a full-featured version of Pong and a bubble-popping game are presented. Both provide enough depth to pose some challenges and maintain interest.

What I like best about this book is its large number of programming challenges, as well as the excellent summaries at the end of each chapter reminding you what was covered. If you and your child are interested in programming, this book should take both of you a good distance, and you’ll have a lot of fun. As the author, Dr. Bryson Payne, said in his recent TEDx talk, “Step out of your comfort zone, and become literate in the language of technology.” (Reviewed by David Schlesinger and Steven C. Howell.)

Best Intermediate and Advanced Python Books

Knowing Python is one thing. Knowing what’s Pythonic takes practice. Sometimes Python’s low barrier to entry gives people the mistaken idea that the language is less capable than other languages, that style does not matter, or that best practices are only a matter of preference. Have you ever seen Python code that looked like C or Fortran?

Learning how to use Python effectively requires some understanding of what Python is doing under the hood. Pythonic programming takes advantage of how the Python language is implemented to maximize the efficiency of your code.

Fortunately, there are some excellent books, packed with expert guidance, aimed to help you take what you’ve learned and level up your skills. Any of the books in this section will give you a deeper understanding of Python programming concepts and teach you how to write developer-style Python code. Note that these are by no means introductory books. They do not include the basics of getting started. These books will be helpful if you are already coding in Python and want to further hone your skills on your path to becoming a serious Pythonista.

Python Tricks: A Buffet of Awesome Python Features

Dan Bader (dbader.org, 2017)

This book illustrates valuable lesser-known Python features and best practices, written to help you gain a deeper understanding of Python. Each of the 43 subsections presents a different concept, referred to as a Python Trick, with discussion and easy-to-digest code examples illustrating how you can take advantage of that concept.

The book’s content is broken into the following sections:

Patterns for Cleaner Python
Effective Functions
Classes & OOP
Common Data Structures in Python
Looping & Iteration
Dictionary Tricks
Pythonic Productivity Techniques

As it says on the cover, the content is organized as “A Buffet,” with each subsection being a self-contained topic, with a brief introduction, examples, discussion, and list of Key Takeaways. As such, you should feel free to jump around to whichever sections are the most appealing.

In addition to the book, I particularly enjoyed the 12 Bonus Videos that are available when you purchase this as an eBook. They have an average length of 11 minutes, perfect for watching during lunch. Each video illustrates a different concept using clear and concise code examples that are simple to reproduce. While some of the videos covered familiar concepts, they still provided interesting insight without dragging on. (Reviewed by Steven C. Howell.)

Disclaimer: Though this book is officially distributed through Real Python, I recommend it independently of my connection with Real Python. I purchased this book when it was first released, before I had the opportunity to write for Real Python. For further evidence of the value of this book, check out the Amazon reviews: 148, averaging 4.8 out of 5 stars, at the time of this review. — Steve

Fluent Python: Clear, Concise, and Effective Programming

Luciano Ramalho (O’Reilly, 2014)

This book was written for experienced Python 2 programmers who want to become proficient in Python 3. Consequently, this book is perfect for someone with a solid foundation in the basics of Python, 2 or 3, who wants to take their skills to the next level. Additionally, this book also works well as a reference for an experienced programmer from another language who wants to look up “How do I do <x> in Python?”

The book is organized by topic so that each section can be read independently. While many of the topics covered in this book are found in introductory books, Fluent Python provides much more detail, illuminating many of the more nuanced and overlooked features of the Python language.

The chapters are broken into the following six sections:

Prologue: introduces Python’s object-oriented nature and the special methods that keep Python libraries consistent
Data Structures: covers sequences, mappings, sets, and the difference between str and bytes
Functions as Objects: explains the consequences of functions being first-class objects in the Python language
Object-Oriented Idioms: includes references, mutability, instances, multiple inheritance, and operator overloading
Control Flow: extends beyond the basic conditionals and covers the concept of generators, context managers, coroutines, yield from syntax, and concurrency using asyncio
Metaprogramming: explores the lesser know aspects of classes, discussing dynamic attributes and properties, attribute descriptors, class decorators, and metaclasses

With code examples on almost every page, and numbered call-outs linking lines of code to helpful descriptions, this book is extremely approachable. Additionally, the code examples are geared toward the interactive Python console, a practical approach to exploring and learning the concepts presented.

I find myself turning to this book when I have a Python question and want an explanation that is more thorough than the one I would likely get on Stack Overflow. I also enjoy reading this book when I have a bit of down-time and just want to learn something new. On more than one occasion, I have found that a concept I recently learned from this book unexpectedly turned out to be the perfect solution to a problem I had to solve. (Reviewed by Steven C. Howell.)

Effective Python: 59 Ways to Write Better Python

Brett Slatkin (Addison-Wesley, 2015)

This book is a collection of 59 independent articles that build on a basic understanding of Python to teach Pythonic best practices, lesser known functionality, and built-in tools. The topics range in complexity, beginning with the simple concept of being aware of which Python version you’re using, and ending with the more complicated, and typically ignored, concept of identifying memory leaks.

Each article is a combination of example code, discussion, and a list of things to remember.

As each article is independent, this is a great book to jump around in, allowing you to focus on the topics that are most applicable or interesting. This also makes it perfect for reading one article at a time. With each article being around two to four pages in length, you could make time to read one article per day, finishing the book in two to three months (depending on whether you read on weekends).

The articles are grouped into the following 8 chapters:

Pythonic Thinking: introduces the best ways to perform common tasks, while taking advantage of how Python is implemented
Functions: clarifies nuanced differences of Python functions and outlines how to use functions to clarify intention, promote reuse, and reduce bugs
Classes and Inheritance: outlines the best practices when working with Python classes
Metaclasses and Attributes: illuminates the somewhat mysterious topic of metaclasses, teaching you how to use them to create intuitive functionality
Concurrency and Parallelism: explains how to know to write multi-threaded applications in Python
Built-in Modules: introduces a few of Python’s lesser-known built-in libraries to make your code more useful and reliable
Collaboration: discusses proper documentation, packaging, dependency, and virtual environments
Production: covers the topics of debugging, optimization, testing, and memory management

If you have a solid foundation in Python and want to fill in holes, deepen you understanding, and learn some of the less obvious features of Python, this would be a great book for you. (Reviewed by Steven C. Howell.)

Python Cookbook

David Beazley & Brian K. Jones (O’Reilly, 3rd edition, 2013)

What makes this book stand out is its level of detail. Code cookbooks are typically designed as short and sweet manuals to illustrate slick ways of doing everyday tasks. In this case, each recipe in Python Cookbook has an extended code solution as well as an author’s discussion of some particular elements of the solution.

Each recipe starts out with a clear problem statement, such as, “You want to write a decorator that adds an extra argument to the calling signature of the wrapped function.” It then jumps into a solution that uses modern, idiomatic Python 3 code, patterns, and data structures, often spending four to five pages discussing the solution.

Based on its more involved and sophisticated examples, and the authors’ own recommendation in the preface, this is probably the most advanced Python book on our list. Despite that, don’t be scared away if you consider yourself an intermediate Python programmer. Who’s judging, anyway? There’s an old saying that goes something like this:

“The best way to become a better basketball player is to lose to the best players you can find, rather than beating the worst.”

You may see some code blocks you don’t fully understand—come back to them in a few months. Re-read those sections after you’ve picked up a few additional concepts, and suddenly, it will click. Most of the chapters start out fairly straightforward, and then gradually become more intense.

The latter half of the book illustrates designs like decorator patterns, closures, accessor functions, and callback functions.

It’s always nice to read from a trustworthy source, and this book’s authors certainly fit that bill. David Beazley is a frequent keynote speaker at events such as PyCon and also the author of Python Essential Reference. Similarly, Brian K. Jones is a CTO, the creator of a Python magazine, and founder of the Python User Group in Princeton (PUG-IP).

This particular edition is written and tested with Python 3.3. (Reviewed by Brad Solomon.)

Get Coding!

One of the awesome things about Python is it has a relatively low barrier to entry, compared to many other languages. Despite this, learning Python is a never-ending process. The language is relevant for such a wide variety of tasks, and evolves so much that there will always be something new to discover and learn. While you can pick up enough Python to do some fun things in a week or two, people who’ve been using Python for twenty years will tell you they’re still learning new things they can do with this flexible and evolving language.

To ultimately be successful as a Python programmer, you need to begin with a solid foundation, then gain a deeper understanding of how the language works, and how to best put it to use. To gain a solid foundation, you really can’t go wrong with any of the best books to learn Python. If you want to learn Python with a child, or maybe teach a group of kids, check out the list of best Python books for kids. After you’ve got your feet wet, check out some of the best intermediate and advanced Python books to dig in deeper to less obvious concepts that will improve the efficiency of your code.

All of these books will teach you what you need to know to legitimately call yourself a Python coder. The only ingredient missing is you.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Python Anywhere: Python 3.7 now available!

September 10, 2018, 7:33 am

≫ Next: Codementor: How to build your own blockchain for a financial product

≪ Previous: Real Python: The Best Python Books

If you signed up since 28 August, you'll have Python 3.7 available on your account -- you can use it just like any other Python version.

If you signed up before then, it's a little more complicated, because adding Python 3.7 to your account requires changing your system image. Each account has an associated system image, which determines which Python versions, Python packages, operating system packages, and so on are available. The new image is called "earlgrey" (after the previous system images, "classic" and "dangermouse").

What this means is that if we change your system image, the pre-installed Python packages will all get upgraded, which means that any code you have that depends on them might stop working if it's not compatible with the new versions.

For previous upgrades, from classic to dangermouse, we said that all was OK if you were using a virtualenv, which (of course) removes the dependency on the pre-installed Python packages. However, this new image not only adds Python 3.7 and upgrades packages, it also upgrades the older Python versions -- for example, from the antediluvian 2.7.6 to 2.7.12, and from 3.6.0 to 3.6.6. This can break virtualenvs in some cases.

So, long story short -- we can switch your account over to the new system image, but you may need to rebuild your virtualenvs afterwards if you're using them -- and you may need to update your code to handle newer pre-installed Python packages if you're not using virtualenvs.

There are more details about exactly which package versions are included in which system image on the batteries included page. And if you'd like to switch your account over to earlgrey, just drop us a line using the "Send feedback" button. (If you've read all of the above, and understand that you may have to make code/virtualenv changes, mention that you have in the feedback message as otherwise we'll respond by basically repeating all of the stuff we just said, and asking "are you sure?")

↧

Codementor: How to build your own blockchain for a financial product

September 10, 2018, 7:35 am

≫ Next: Continuum Analytics Blog: Intake: Caching Data on First Read Makes Future Analysis Faster

≪ Previous: Python Anywhere: Python 3.7 now available!

Technologies are changing fast; people are not. – Jakob Nielsen Blockchain is a relatively new technology that many deem is used only for buying Bitcoins. They try to implement it in whatever...

↧

Continuum Analytics Blog: Intake: Caching Data on First Read Makes Future Analysis Faster

September 10, 2018, 9:37 am

≫ Next: RMOTR: Machine Learning power . Extracting dominant colors from images with clustering.

≪ Previous: Codementor: How to build your own blockchain for a financial product

By Mike McCarty Intake provides easy access data sources from remote/cloud storage. However, for large files, the cost of downloading files every time data is read can be extremely high. To overcome this obstacle, we have developed a “download once, read many times” caching strategy to store and manage data sources on the local file system. …
Read more →

The post Intake: Caching Data on First Read Makes Future Analysis Faster appeared first on Anaconda.

↧

RMOTR: Machine Learning power . Extracting dominant colors from images with clustering.

September 10, 2018, 10:56 am

≫ Next: Nikola: Nikola v8.0.0 is out!

≪ Previous: Continuum Analytics Blog: Intake: Caching Data on First Read Makes Future Analysis Faster

Check the step by step Jupyter Notebook for a full explanation of the process

A couple of days ago we were thinking about simple Machine Learning applications to present to our students. One of our main goals is to demonstrate how simple it can be to apply machine learning techniques to resolve every day problems and that you don’t need a PhD to use libraries like scikit-learn or Keras.

Our friend Matias (https://twitter.com/yosoymatias) devised a simple clustering app to extract colors from images: using K-Means, an unsupervised clustering method, we can identify “dominant” colors and create a simple color palette.

The result is exactly what we were looking for. A really simple and fun application that any Python programmer can understand: [LIVE DEMO]https://colors.rmotr.com/

https://medium.com/media/6bda238f4e07d016aea4799a2801a9cb/href

There’s a step by step explanation in this Jupyter Notebook.

We created the webapp (and the API) using Flask and deployed it with AWS Lambdas (using the amazing Zappa project).

And the source code is hosted here: https://github.com/rmotr/color-extractor-service

Are you interested in learning Data Science? Check our LIVE online course: http://rmotr.com/python-data-science-projects/

Machine Learning power 💪. Extracting dominant colors from images with clustering. was originally published in rmotr.com on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Nikola: Nikola v8.0.0 is out!

September 10, 2018, 12:00 pm

≫ Next: "Menno's Musings": influx-spout 2.1

≪ Previous: RMOTR: Machine Learning power . Extracting dominant colors from images with clustering.

On behalf of the Nikola team, I am pleased to announce the immediate availability of Nikola v8.0.0. After 15 months in development, we’ve created our best release ever, with new features, many bugs squashed, and some improvements under the hood.

What is Nikola?

Nikola is a static site and blog generator, written in Python. It can use Mako and Jinja2 templates, and input in many popular markup formats, such as reStructuredText and Markdown — and can even turn Jupyter Notebooks into blog posts! It also supports image galleries, and is multilingual. Nikola is flexible, and page builds are extremely fast, courtesy of doit (which is rebuilding only what has been changed).

Find out more at the website: https://getnikola.com/

Downloads

Install using pip install Nikola==8.0.0.

If you want to upgrade to Nikola v8, make sure to read the Upgrading blog post.

Changes

Important compatibility changes

Rename crumbs.tmpl to ui_helper.tmpl and the breadcrumbs bar function to breadcrumbs (your templates may need changing as well)
Rename post.is_mathjax to post.has_math. Themes using post.is_mathjax must be updated; it is recommended that they are changed to use math_helper.tmpl.
Reading reST docinfo metadata, including first heading as title, requires USE_REST_DOCINFO_METADATA now (Issue #2987)
RSS feeds might have changed their places due to RSS_PATH behavior changes (you may need to change RSS_PATH, RSS_FILENAME_BASE)
Atom feeds for archives and Atom pagination are no longer supported (Issue #3016)
Sections are replaced by categories (Issue #2833)
You need <a class="reference"> (instead of image-reference) to activate the lightbox now
Date formatting now uses the Babel library, you might need to change BABEL_FORMAT (Issues #2606, 3121)
The first heading in a reST document is not removed anymore by default unless USE_REST_DOCINFO_METADATA is enabled (Issues #2382, #3124)

Features

Add Malayalam translation by Nemo Dicto
Add Vietnamese translation by Hoai-Thu Vuong
Don’t generate gallery index if the destination directory is site root and it would conflict with blog index (Issue #3133)
All built-in themes now support updated timestamp fields in posts. The update time, if it is specified and different from the posting time, will be displayed as "{postDate} (${messages("updated")} {updateDate})". If no update time is specified, the posting time will be displayed alone.
All built-in themes now support the DATE_FANCINESS option.
Theme bundles are now parsed using the configparser module and can support newlines inside entries as well as comments
Make bootstrap4 navbar color configurable with THEME_CONFIG['navbar_light'] (Issue #2863)
New data_file option for chart shortcode and directive (Issue #3129)
Show the filename of the missing file when nikola serve can't find a file (i.e. when an 404 error occurs).
Better error messages for JSON download failures in nikola plugin and nikola theme (Issue getnikola/plugins#282)
Use Babel instead of the locale module to better handle localizations (Issues #2606, #3121)
Change DATE_FORMAT formats to CLDR formats (Issue #2606)
Add NAVIGATION_ALT_LINKS option, displayed on the right side in bootstrap4/bootblog4 (Issue #3030)
Added documentation of Post objects to list of template variables (Issue #3003)
Support featured posts in bootblog4 (Issue #2964)
Add THEME_CONFIG setting that themes can use in any way
Use youtube-nocookie.com for better privacy in youtube reST directive and improve the appearance of the player
Support hackerthemes.com themes and renamed bootswatch_theme command subtheme (Issue #3049)
Add DISABLE_MAIN_ATOM_FEED setting (Issue #3016, Issue #3039)
Add ATOM_FILENAME_BASE setting (defaults to index for existing sites, but feed for new sites) (Issue #3016)
Add CATEGORY_DESTPATH_AS_DEFAULT, CATEGORY_DESTPATH_TRIM_PREFIX, CATEGORY_DESTPATH_FIRST_DIRECTORY_ONLY settings, as part of replacing sections with categories (Issue #2833)
Tags draft, private and mathjax are no longer treated special if USE_TAG_METADATA is set to False (default for new sites) (Issue #2761)
Replace draft and private tags with a status meta field (supports published, featured, draft, private) and mathjax with .. has_math: yes (Issue #2761)
Rename TAG_PAGES_TITLES→ TAG_TITLES, TAG_PAGES_DESCRIPTIONS→ TAG_DESCRIPTIONS.
Rename CATEGORY_PAGES_TITLES→ CATEGORY_TITLES, CATEGORY_PAGES_DESCRIPTIONS→ CATEGORY_DESCRIPTIONS.
Produce a better error message when a template referenced in another template is missing (Issue #3055)
Support captioned images and image ordering in galleries, as well as arbitrary metadata through a new metadata.yml file (Issue #3017, Issue #3050, Issue #2837)
New ATOM_PATH setting (Issue #2971)
Make crumbs available to all pages
Allowing to customize RSS and Atom feed extensions with RSS_EXTENSION, ATOM_EXTENSION settings (Issue #3041)
Allowing to customize filename base appended to RSS_PATH with RSS_FILENAME_BASE setting (Issue #3041)
Use basic ipynb template by default for slightly better appearance and behavior
Fixing behavior of RSS_PATH to do what the documentation says it does (Issue #3024)
Add support for fragments in path handlers (Issue #3032)
New METADATA_VALUE_MAPPING setting to allow for flexible global modification of metadata (Issue #3025)
New smartjoin template function/filter that joins lists and leaves strings as-is (Issue #3025)
Explain index.html conflicts better (Issue #3022)
Recognize both TEASER_END and (new) END_TEASER (Issue #3010) (warning: if you perform manual splits, the regex change means new indexes must be used)
New MARKDOWN_EXTENSION_CONFIGS setting (Issue #2970)
Replace flowr.js with justified-layout.js by Flickr (does not require jQuery!)
bootblog4 is the new default theme (Issue #2964)
New bootstrap4 and bootblog4 themes (Issue #2964)
New Thai translation by Narumol Hankrotha and Jean Jordaan
Support for Commento comment system (Issue #2773)
New PRESERVE_ICC_PROFILES option to control whether ICC profiles are preserved when copying images.
Use baguetteBox in Bootstrap theme (part of Issue #2777)
New default-config command to generate a clean configuration.
New thumbnail shortcode similar to the reStructuredText thumbnail directive (via Issue #2809)
Rewrite nikola auto with asyncio and aiohttp (Issue #2850)
New listings shortcode similar to the reStructuredText listings directive (Issue #2868)
Switch to reStructuredText’s new HTML 5 renderer (Issue #2874)
Deprecate html4css1.css in favor of rst_base.css (Issue #2874)
Add support for MetadataExtractor plugins that allow custom, extensible metadata extraction from posts (Issue #2830)
Support YAML and TOML metadata in 2-file posts (via Issue #2830)
Renamed UNSLUGIFY_TITLES→ FILE_METADATA_UNSLUGIFY_TITLES (Issue #2840)
Add NIKOLA_SHOW_TRACEBACKS environment variable that shows full tracebacks instead of one-line summaries
Use PRETTY_URLS by default on all sites (Issue #1838)
Feed link generation is completely refactored (Issue #2844)
Let path handlers return absolute URLs (Issue #2876)
Add BLOG_EMAIL to global context to make it available for templates (Issue #2968)

Bugfixes

Use UTF-8 instead of system encoding for gallery metadata.yml file
Do not remove first heading in document (reST document title) if USE_REST_DOCINFO_METADATA is disabled (Issue #3124)
Remove NO_DOCUTILS_TITLE_TRANSFORM setting, this is now default behavior if USE_REST_DOCINFO_METADATA is disabled (Issue #2382, #3124)
Enforce trailing slash for directories in nikola auto (Issue #3140)
Galleries with baguetteBox don’t glitch out on the first image anymore (Issue #3141)
Pass arguments to youtube directive unchanged (Issue #3150)
Fix listing installed themes if theme directory is missing.
Watch correct output folder in nikola auto (Issue #3119)
Fix post fragment dependencies when posts are only available in a non-default language (Issue #3112)
Implement MARKDOWN_EXTENSION_CONFIGS properly (Issue #2970)
Ignore .DS_Store when processing listings (Issue #3099)
Remove redundant check for tag similarity (Mentioned in Issue #3123)
Improve appearance of bootblog4 on mobile (Issue #3069)
Make smartjoin more flexible (Issue #3080)
Make post-list and post_list synonymous (Issue #3083)
Support CATEGORY_DESTPATH_NAMES with pages following destpath
Make CATEGORY_PAGES_FOLLOW_DESTPATH more resilient (Issue #3081)
Guard against null items in gallery meta files (Issues #3076, #3077)
Respect USE_FILENAME_AS_TITLE in galleries with a meta file
Fix gallery metadata for multilingual sites (Issue #3078)
Fixes behavior for posts not available in default language (Issues #2956 and #3073)
Always follow FEED_LENGTH for Atom feeds
Apply filters to all Atom feeds
Read file metadata if compiler metadata exists and prefer it over compiler metadata (Issue #3008)
Rename DISABLE_INDEXES_PLUGIN_INDEX_AND_ATOM_FEED to DISABLE_INDEXES and DISABLE_INDEXES_PLUGIN_RSS_FEED to DISABLE_MAIN_RSS_FEED (Issue #3039)
Make chart shortcode its own plugin and make the reST directive depend on it.
Put post_list shortcode in its own plugin and make the reST directive depend on it.
Don’t silence syntax errors and other exceptions that occur while reading metadata
Use documented dateutil API for time zone list (Issue #3006)
Handle trailing slash redirects with query strings correctly in nikola serve (Issue #3000)
Fix w3c validation errors for itemscope entries in default themes
Hide “Incomplete language” message for overrides of complete languages
Handle '/' and other absolute paths better in POSTS / PAGES / TRANSLATIONS (Issue #2982)
Fix loading non-default languages
Support KaTeX for reST display math (Issue #2888)
Use npm for asset management instead of bower, which was deprecated (Issue #2790)
Properly handle SHOW_INDEX_PAGE_NAVIGATION with Jinja templates (Issue #2960)
Prevent crashes due to Windows-specific code in auto running on all platforms (Issue #2940)
Don’t run hyphenate on <pre> blocks (Issue #2939)
Make errors in reST display in logs again
Unquote paths given to link:// magic URLs (Issue #2934)
Specify UTF-8 input encoding for Mako as default (Issue #2930)
Don't trigger rebuilds in auto mode for files it's safe to ignore (Issue #2906)
Fix padding for Jupyter code blocks (Issue #2927)
Apply SCHEDULE_ALL to posts only (Issue #2921)
Restore version number to Bootswatch URLs (Issue #2916)
Do not strip trailing slash in slug magic links
Ignore empty tags in HTML metadata reader (Issue #2890)
Do not remove doctype if add_header_permalinks or deduplicate_ids are used
Handle empty slug metadata (Issue #2887)
Fix crash when compiling empty .html posts (Issue #2851)
Make failures to get source commit hash non-fatal in github_deploy (Issue #2847)
Less cryptic error when guessing format from extension in new_post fails
Use Jupyter name more consistently in docs
Support CODE_COLOR_SCHEME in Jupyter notebooks (Issue #2093)
Language was not passed to title and link generation for page indexes
Addressed issue with snaps not allowing certain functions to work properly.

Removed conf.py settings

The following settings have been removed. Nikola will now always behave as if the value was what is displayed afer the setting name.

FEED_PREVIEWIMAGE = True
SITEMAP_INCLUDE_FILELESS_DIRS = True
USE_OPEN_GRAPH = True
USE_BASE_TAG = False

Removed features

Removed Colorbox, baguetteBox is used instead (Issue #2777)
Removed googleplus comments (no longer supported) (Issue #635)
Removed the slides directive for docutils, it will now be a separate plugin.
Dropped Python 2 and Python 3.3 support (oldest supported version is 3.4)
Removed nikola install_theme— use nikola theme instead
Droppped insecure post “encryption” feature
Stopped supporting all deprecated config options
Dropped annotations support (annotateit.org closed down in March 2017)
Removed taxonomy option also_create_classifications_from_other_languages (Issue #2785) and generate_atom_feeds_for_post_lists (Issue #3016)
Removed old 7-line metadata format (Issue #2839)
Atom feeds are now limited to one page (Issue #3016)
Removed sections (replaced by improved categories) (Issue #2833)
Moved tag_cloud_data.json generation to a separate tagcloud plugin (Issue #1696)
The webassets library is no longer required, we now manually bundle files (Issue #3074)

↧

"Menno's Musings": influx-spout 2.1

September 10, 2018, 9:24 pm

≫ Next: Mike Driscoll: Python 101: Episode #24 – Debugging with pdb

≪ Previous: Nikola: Nikola v8.0.0 is out!

influx-spout 2.1 has just been released and it includes a bunch of exciting new features. Here's the highlights...

Mike Driscoll: Python 101: Episode #24 – Debugging with pdb

September 10, 2018, 10:05 pm

≫ Next: Mike Driscoll: wxPython 101: Creating a Splash Screen

≪ Previous: "Menno's Musings": influx-spout 2.1

Learn the basics of using Python’s built-in debugger, pdb. Note that this screencast was recorded before Python 3.6 and 3.7 so it does not cover some of the new enhancements in the debugger.

You can read the chapter this screencast is based on here: http://python101.pythonlibrary.org/chapter24_debugging.html

↧

Mike Driscoll: wxPython 101: Creating a Splash Screen

September 10, 2018, 10:05 pm

≫ Next: Semaphore Community: Dockerizing a Python Django Web Application

≪ Previous: Mike Driscoll: Python 101: Episode #24 – Debugging with pdb

A common UI element that you used to see a lot of was the Splash Screen. A splash screen is just a dialog with a logo or art on it that sometimes includes a message about how far along the application has loaded. Some developers use splash screens as a way to tell the user that the application is loading so they don’t try to open it multiple times.

wxPython has support for creating splash screens. In versions of wxPython prior to version 4, you could find the splash screen widget in wx.SplashScreen. However in wxPython’s latest version, it has been moved to wx.adv.SplashScreen.

Let’s look at a simple example of the Splash Screen:

import wx
import wx.adv 
class MyFrame(wx.Frame):
 
    def__init__(self):
        wx.Frame.__init__(self, None, wx.ID_ANY, "Tutorial", size=(500,500)) 
        bitmap = wx.Bitmap('py_logo.png')
        splash = wx.adv.SplashScreen(
                     bitmap, 
                     wx.adv.SPLASH_CENTER_ON_SCREEN|wx.adv.SPLASH_TIMEOUT, 
                     5000, self)
        splash.Show() 
        self.Show() 
 
# Run the programif __name__ == "__main__":
    app = wx.App(False)
    frame = MyFrame()
    app.MainLoop()

Here we create a subclass of wx.Frame and we load up an image using wx.Bitmap. You will note that wx.Bitmap does not actually require you to only load bitmaps as I am using a PNG here. Anyway, the next line instantiates our splash screen instance. Here we pass it the bitmap we want to show, a flag to tell it how to position itself, a timeout in milliseconds for how long the splash screen should show itself and what its parent should be. These are all required arguments.

There are also three additional arguments that the splash screen widget can accept: pos, size and style. You will note that in this example we tell the splash screen to center itself onscreen. We could also tell it to center on its parent via SPLASH_CENTRE_ON_PARENT.

You will, of course, need to modify this example to use an image of your own.

Wrapping Up

The splash screen is actually pretty useful if you have an application that takes a long time to load. You can easily use it to distract the user and give the illusion that your application is still responsive even when it hasn’t fully loaded yet. Give it a try and see what you think.

Semaphore Community: Dockerizing a Python Django Web Application

September 11, 2018, 12:19 am

≫ Next: Wallaroo Labs: Converting a Batch Job to Real-time

≪ Previous: Mike Driscoll: wxPython 101: Creating a Splash Screen

This article is brought with ❤ to you by Semaphore.

Introduction

This article will cover building a simple 'Hello World'-style web application written in Django and running it in the much talked about and discussed Docker. Docker takes all the great aspects of a traditional virtual machine, e.g. a self contained system isolated from your development machine, and removes many of the drawbacks such as system resource drain, setup time, and maintenance.

When building web applications, you have probably reached a point where you want to run your application in a fashion that is closer to your production environment. Docker allows you to set up your application runtime in such a way that it runs in exactly the same manner as it will in production, on the same operating system, with the same environment variables, and any other configuration and setup you require.

By the end of the article you'll be able to:

Understand what Docker is and how it is used,
Build a simple Python Django application, and
Create a simple Dockerfile to build a container running a Django web application server.

What is Docker, Anyway?

Docker's homepage describes Docker as follows:

"Docker is an open platform for building, shipping and running distributed applications. It gives programmers, development teams, and operations engineers the common toolbox they need to take advantage of the distributed and networked nature of modern applications."

Put simply, Docker gives you the ability to run your applications within a controlled environment, known as a container, built according to the instructions you define. A container leverages your machines resources much like a traditional virtual machine (VM). However, containers differ greatly from traditional virtual machines in terms of system resources. Traditional virtual machines operate using Hypervisors, which manage the virtualization of the underlying hardware to the VM. This means they are large in terms of system requirements.

Containers operate on a shared Linux operating system base and add simple instructions on top to execute and run your application or process. The difference being that Docker doesn't require the often time-consuming process of installing an entire OS to a virtual machine such as VirtualBox or VMWare. Once Docker is installed, you create a container with a few commands and then execute your applications on it via the Dockerfile. Docker manages the majority of the operating system virtualization for you, so you can get on with writing applications and shipping them as you require in the container you have built. Furthermore, Dockerfiles can be shared for others to build containers and extend the instructions within them by basing their container image on top of an existing one. The containers are also highly portable and will run in the same manner regardless of the host OS they are executed on. Portability is a massive plus side of Docker.

Prerequisites

Before you begin this tutorial, ensure the following is installed to your system:

Python 2.7 or 3.x,
Docker (Mac users: it's recommended to use docker-machine, available via Homebrew-Cask), and
A git repository to store your project and track changes.

Setting Up a Django web application

Starting a Django application is easy, as the Django dependency provides you with a command line tool for starting a project and generating some of the files and directory structure for you. To start, create a new folder that will house the Django application and move into that directory.

$ mkdir project
$ cd project

Once in this folder, you need to add the standard Python project dependencies file which is usually named requirements.txt, and add the Django and Gunicorn dependency to it. Gunicorn is a production standard web server, which will be used later in the article. Once you have created and added the dependencies, the file should look like this:

$ cat requirements.txt
Django==1.9.4
gunicorn==19.6.0

With the Django dependency added, you can then install Django using the following command:

$pipinstall-rrequirements.txt

Once installed, you will find that you now have access to the django-admin command line tool, which you can use to generate the project files and directory structure needed for the simple "Hello, World!" application.

$ django-admin startproject helloworld

Let's take a look at the project structure the tool has just created for you:

.
├── helloworld
│   ├── helloworld
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   └── manage.py
└── requirements.txt

You can read more about the structure of Django on the official website. django-admin tool has created a skeleton application. You control the application for development purposes using the manage.py file, which allows you to start the development test web server for example:

$ cd helloworld
$ python manage.py runserver

The other key file of note is the urls.py, which specifies what URL's route to which view. Right now, you will only have the default admin URL which we won't be using in this tutorial. Lets add a URL that will route to a view returning the classic phrase "Hello, World!".

First, create a new file called views.py in the same directory as urls.py with the following content:

fromdjango.httpimportHttpResponsedefindex(request):returnHttpResponse("Hello, world!")

Now, add the following URL url(r'', 'helloworld.views.index') to the urls.py, which will route the base URL of / to our new view. The contents of the urls.py file should now look as follows:

fromdjango.conf.urlsimporturlfromdjango.contribimportadminurlpatterns=[url(r'^admin/',admin.site.urls),url(r'','helloworld.views.index'),]

Now, when you execute the python manage.py runserver command and visit http://localhost:8000 in your browser, you should see the newly added "Hello, World!" view.

The final part of our project setup is making use of the Gunicorn web server. This web server is robust and built to handle production levels of traffic, whereas the included development server of Django is more for testing purposes on your local machine only. Once you have dockerized the application, you will want to start up the server using Gunicorn. This is much simpler if you write a small startup script for Docker to execute. With that in mind, let's add a start.sh bash script to the root of the project, that will start our application using Gunicorn.

#!/bin/bash# Start Gunicorn processesechoStartingGunicorn.execgunicornhelloworld.wsgi:application \
    --bind0.0.0.0:8000 \
    --workers3

The first part of the script writes "Starting Gunicorn" to the command line to show us that it is starting execution. The next part of the script actually launches Gunicorn. You use exec here so that the execution of the command takes over the shell script, meaning that when the Gunicorn process ends so will the script, which is what we want here.

You then pass the gunicorn command with the first argument of helloworld.wsgi:application. This is a reference to the wsgi file Django generated for us and is a Web Server Gateway Interface file which is the Python standard for web applications and servers. Without delving too much into WSGI, the file simply defines the application variable, and Gunicorn knows how to interact with the object to start the web server.

You then pass two flags to the command, bind to attach the running server to port 8000, which you will use to communicate with the running web server via HTTP. Finally, you specify workers which are the number of threads that will handle the requests coming into your application. Gunicorn recommends this value to be set at (2 x $num_cores) + 1. You can read more on configuration of Gunicorn in their documentation.

Finally, make the script executable, and then test if it works by changing directory into the project folder helloworld and executing the script as shown here. If everything is working fine, you should see similar output to the one below, be able to visit http://localhost:8000 in your browser, and get the "Hello, World!" response.

$ chmod +x start.sh
$ cd helloworld
$ ../start.sh
Starting Gunicorn.
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Listening at: http://0.0.0.0:8000 (82248)
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Using worker: sync
[2016-06-26 19:43:28 +0100] [82251] [INFO]
Booting worker with pid: 82251
[2016-06-26 19:43:28 +0100] [82252] [INFO]
Booting worker with pid: 82252
[2016-06-26 19:43:29 +0100] [82253] [INFO]
Booting worker with pid: 82253

Dockerizing the Application

You now have a simple web application that is ready to be deployed. So far, you have been using the built-in development web server that Django ships with the web framework it provides. It's time to set up the project to run the application in Docker using a more robust web server that is built to handle production levels of traffic.

Installing Docker

One of the key goals of Docker is portability, and as such is able to be installed on a wide variety of operating systems.

For this tutorial, you will look at installing Docker Machine on MacOS. The simplest way to achieve this is via the Homebrew package manager. Instal Homebrew and run the following:

$ brew update && brew upgrade --all && brew cleanup && brew prune
$ brew install docker-machine

With Docker Machine installed, you can use it to create some virtual machines and run Docker clients. You can run docker-machine from your command line to see what options you have available. You'll notice that the general idea of docker-machine is to give you tools to create and manage Docker clients. This means you can easily spin up a virtual machine and use that to run whatever Docker containers you want or need on it.

You will now create a virtual machine based on VirtualBox that will be used to execute your Dockerfile, which you will create shortly. The machine you create here should try to mimic the machine you intend to run your application on in production. This way, you should not see any differences or quirks in your running application neither locally nor in a deployed environment.

Create your Docker Machine using the following command:

$ docker-machine create development --driver virtualbox
--virtualbox-disk-size "5000" --virtualbox-cpu-count 2
--virtualbox-memory "4096"

This will create your machine and output useful information on completion. The machine will be created with 5GB hard disk, 2 CPU's and 4GB of RAM.

To complete the setup, you need to add some environment variables to your terminal session to allow the Docker command to connect the machine you have just created. Handily, docker-machine provides a simple way to generate the environment variables and add them to your session:

$ docker-machine env development
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://123.456.78.910:1112"
export DOCKER_CERT_PATH="/Users/me/.docker/machine/machines/development"
export DOCKER_MACHINE_NAME="development"
# Run this command to configure your shell:
# eval "$(docker-machine env development)"

Complete the setup by executing the command at the end of the output:

$(docker-machine env development)

Execute the following command to ensure everything is working as expected.

$ docker images
REPOSITORY   TAG   IMAGE  ID   CREATED   SIZE

You can now dockerize your Python application and get it running using the docker-machine.

Writing the Dockerfile

The next stage is to add a Dockerfile to your project. This will allow Docker to build the image it will execute on the Docker Machine you just created. Writing a Dockerfile is rather straightforward and has many elements that can be reused and/or found on the web. Docker provides a lot of the functions that you will require to build your image. If you need to do something more custom on your project, Dockerfiles are flexible enough for you to do so.

The structure of a Dockerfile can be considered a series of instructions on how to build your container/image. For example, the vast majority of Dockerfiles will begin by referencing a base image provided by Docker. Typically, this will be a plain vanilla image of the latest Ubuntu release or other Linux OS of choice. From there, you can set up directory structures, environment variables, download dependencies, and many other standard system tasks before finally executing the process which will run your web application.

Start the Dockerfile by creating an empty file named Dockerfile in the root of your project. Then, add the first line to the Dockerfile that instructs which base image to build upon. You can create your own base image and use that for your containers, which can be beneficial in a department with many teams wanting to deploy their applications in the same way.

# Dockerfile

# FROM directive instructing base image to build upon
FROM python:2-onbuild

It's worth noting that we are using a base image that has been created specifically to handle Python 2.X applications and a set of instructions that will run automatically before the rest of your Dockerfile. This base image will copy your project to /usr/src/app, copy your requirements.txt and execute pip install against it. With these tasks taken care of for you, your Dockerfile can then prepare to actually run your application.

Next, you can copy the start.sh script written earlier to a path that will be available to you in the container to be executed later in the Dockerfile to start your server.

# COPY startup script into known file location in container
COPY start.sh /start.sh

Your server will run on port 8000. Therefore, your container must be set up to allow access to this port so that you can communicate to your running server over HTTP. To do this, use the EXPOSE directive to make the port available:

# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000

The final part of your Dockerfile is to execute the start script added earlier, which will leave your web server running on port 8000 waiting to take requests over HTTP. You can execute this script using the CMD directive.

# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!

With all this in place, your final Dockerfile should look something like this:

# Dockerfile

# FROM directive instructing base image to build upon
FROM python:2-onbuild

# COPY startup script into known file location in container
COPY start.sh /start.sh

# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000

# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!

You are now ready to build the container image, and then run it to see it all working together.

Building and Running the Container

Building the container is very straight forward once you have Docker and Docker Machine on your system. The following command will look for your Dockerfile and download all the necessary layers required to get your container image running. Afterwards, it will run the instructions in the Dockerfile and leave you with a container that is ready to start.

To build your container, you will use the docker build command and provide a tag or a name for the container, so you can reference it later when you want to run it. The final part of the command tells Docker which directory to build from.

$ cd <project root directory>
$ docker build -t davidsale/dockerizing-python-django-app .

Sending build context to Docker daemon 237.6 kB
Step 1 : FROM python:2-onbuild
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
 ---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Using cache
Step 1 : COPY . /usr/src/app
 ---> 68be8680cbc4
Removing intermediate container 75ed646abcb6
Step 2 : COPY start.sh /start.sh
 ---> 9ef8e82c8897
Removing intermediate container fa73f966fcad
Step 3 : EXPOSE 8000
 ---> Running in 14c752364595
 ---> 967396108654
Removing intermediate container 14c752364595
Step 4 : WORKDIR helloworld
 ---> Running in 09aabb677b40
 ---> 5d714ceea5af
Removing intermediate container 09aabb677b40
Step 5 : CMD /start.sh
 ---> Running in 7f73e5127cbe
 ---> 420a16e0260f
Removing intermediate container 7f73e5127cbe
Successfully built 420a16e0260f

In the output, you can see Docker processing each one of your commands before outputting that the build of the container is complete. It will give you a unique ID for the container, which can also be used in commands alongside the tag.

The final step is to run the container you have just built using Docker:

$ docker run -it -p 8000:8000 davidsale/djangoapp1
Starting Gunicorn.
[2016-06-26 19:24:11 +0000] [1] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:24:11 +0000] [1] [INFO]
Listening at: http://0.0.0.0:9077 (1)
[2016-06-26 19:24:11 +0000] [1] [INFO]
Using worker: sync
[2016-06-26 19:24:11 +0000] [11] [INFO]
Booting worker with pid: 11
[2016-06-26 19:24:11 +0000] [12] [INFO]
Booting worker with pid: 12
[2016-06-26 19:24:11 +0000] [17] [INFO]
Booting worker with pid: 17

The command tells Docker to run the container and forward the exposed port 8000 to port 8000 on your local machine. After you run this command, you should be able to visit http://localhost:8000 in your browser to see the "Hello, World!" response. If you were running on a Linux machine, that would be the case. However, if running on MacOS, then you will need to forward the ports from VirtualBox, which is the driver we use in this tutorial so that they are accessible on your host machine.

$ VBoxManage controlvm "development" natpf1
  "tcp-port8000,tcp,,8000,,8000";

This command modifies the configuration of the virtual machine created using docker-machine earlier to forward port 8000 to your host machine. You can run this command multiple times changing the values for any other ports you require.

Once you have done this, visit http://localhost:8000 in your browser. You should be able to visit your dockerized Python Django application running on a Gunicorn web server, ready to take thousands of requests a second and ready to be deployed on virtually any OS on planet using Docker.

Next Steps

After manually verifying that the application is behaving as expected in Docker, the next step is the deployment. You can use Semaphore's Docker platform for automating this process.

Continuous Integration and Deployment for Docker projects on Semaphore

As a first step you need to create a free Semaphore account. Then, connect your Docker project repository to your new account. Semaphore will recognize that you're using Docker, and will automatically recommend the Docker platform for it.

The last step is to specify commands to build and run your Docker images:

docker build <your-project> .
docker run <your-project>

Semaphore will execute these commands on every git push.

Semaphore also makes it easy to push your images to various Docker container registries. To learn more about getting the most out of Docker on Semaphore, check out our Docker documentation pages.

Conclusion

In this tutorial, you have learned how to build a simple Python Django web application, wrap it in a production grade web server, and created a Docker container to execute your web server process.

If you enjoyed working through this article, feel free to share it and if you have any questions or comments leave them in the section below. We will do our best to answer them, or point you in the right direction.

This article is brought with ❤ to you by Semaphore.

↧

Wallaroo Labs: Converting a Batch Job to Real-time

September 10, 2018, 5:00 pm

≫ Next: NumFOCUS: Conda-forge joins NumFOCUS Sponsored Projects

≪ Previous: Semaphore Community: Dockerizing a Python Django Web Application

Introduction Often called stream processing, real-time processing allows applications to run computations and filter data at any scale. At Wallaroo Labs, we build and offer support for an event-based stream processing framework called Wallaroo. Frameworks like Wallaroo, allow you to do highly parallel computation across clusters of workers without having to worry about any additional complexity. One of the things we hear from developers who aren’t familiar with stream processing is that they aren’t sure about the use cases.

↧

NumFOCUS: Conda-forge joins NumFOCUS Sponsored Projects

September 11, 2018, 12:39 pm

≫ Next: Vasudev Ram: How many ways can you substring a string? Part 1

≪ Previous: Wallaroo Labs: Converting a Batch Job to Real-time

The post Conda-forge joins NumFOCUS Sponsored Projects appeared first on NumFOCUS.

↧

Vasudev Ram: How many ways can you substring a string? Part 1

September 11, 2018, 12:47 pm

≫ Next: Codementor: Load Testing a Django Application using LocustIO

≪ Previous: NumFOCUS: Conda-forge joins NumFOCUS Sponsored Projects

By Vasudev Ram

String image attribution

Recently, something I read made me think of writing a simple program to generate all substrings of a given string.
(To be precise, excluding the null string.)

Here is an initial version I came up with, all_substrings.py:

"""
all_substrings.py
Function and program to find all the substrings of a given string.
Author: Vasudev Ram
Copyright 2018 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Twitter: https://mobile.twitter.com/vasudevram
Product Store: https://gumroad.com/vasudevram
"""

from __future__ import print_function
import sys
from error_exit import error_exit
from debug1 import debug1

def usage():
    message_lines = [\
"Usage: python {} a_string".format(sa[0]),
"Print all substrings of a_string.",
    ]
    sys.stderr.write("\n".join(message_lines))

def all_substrings(s):
"""
    Generator function that yields all the substrings of a given string.
"""

    ls = len(s)
    if ls == 0:
        usage()
        error_exit("\nError: String argument must be non-empty.")

    start = 0
    while start < ls:
        end = start + 1
        while end <= ls:
            debug1("s[{}:{}] = {}".format(start, end, s[start:end]))
            yield s[start:end]
            end += 1
        start += 1

def main():
    if lsa != 2:
        usage()
        error_exit("\nError: Exactly one argument must be given.")

    for substring in all_substrings(sa[1]):
        print(substring)

sa = sys.argv
lsa = len(sa)

if __name__ == "__main__":
    main()

Some runs and output of the program:

With no command-line arguments:

$ python all_substrings.py
Usage: python all_substrings.py a_string
Print all substrings of a_string.
Error: Exactly one argument must be given.

With one command-line argument, an empty string:

$ python all_substrings.py ""
Usage: python all_substrings.py a_string
Print all substrings of a_string.
Error: String argument must be non-empty.

Now with a 3-character string, with debugging enabled, via the use of my debug1 debugging function [1] (and Python's __debug__ built-in variable, which is set to True by default):

$ python all_substrings.py abc
s[0:1] = a
a
s[0:2] = ab
ab
s[0:3] = abc
abc
s[1:2] = b
b
s[1:3] = bc
bc
s[2:3] = c
c

[1] You can read about and get the code for that debugging function here:

Improved simple Python debugging function

The remaining runs are with debugging turned off via Python's -O flag:

With a 4-character string:

$ python -O all_substrings.py abcd
a
ab
abc
abcd
b
bc
bcd
c
cd
d

With a 4-character string, not all characters unique:

$ python -O all_substrings.py FEED
F
FE
FEE
FEED
E
EE
EED
E
ED
D

Note that when there are duplicated characters in the input, we can get duplicate substrings in the output; in this case, E appears twice.

With a string of length 6, again with some characters repeated (E and D):

$ python -O all_substrings.py FEEDED
F
FE
FEE
FEED
FEEDE
FEEDED
E
EE
EED
EEDE
EEDED
E
ED
EDE
EDED
D
DE
DED
E
ED
D

Again, we get duplicate substrings in the output.

With a 6-character string, no duplicate characters:

$ python -O all_substrings.py 123456
1
12
123
1234
12345
123456
2
23
234
2345
23456
3
34
345
3456
4
45
456
5
56
6

Is there any other way of doing it?
Any interesting enhancements possible?

Yes to both questions.
I will cover some of those points in a follow-up post.

Actually, I already did one thing in the current version, which is of interest: I used a generator to yield the substrings lazily, instead of creating them all upfront, and then returning them all in a list. I'll show and discuss a few pros and cons of some other approaches later.

Meanwhile, want to have a bit of fun with visual effects?

Try some variations of runs of the program like these:

python -O all_substrings.py +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

python -O all_substrings.py /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

python -O all_substrings.py "%% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $$ @@ && %% $"

$ python -O all_substrings.py 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010

python -O all_substrings.py ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"

You can change the characters used in the string argument to any combination of any punctuation characters, or even letters or digits - anything you like. You can also vary the number of characters used in the string. Longer ones (though not too long) tend to give better visual effects and the display also lasts for longer. Note that all the characters of the string you use, should come on the same single line, the same line as the python command you use. Also, if using a pipe character (|) (or any other characters that are special to your OS shell), enclose the whole string in quotes as I have done in an example above. I ran this on Windows and so used double quotes for such cases. Single quotes give errors. On Unix-like systems, either may work, but some characters may get interpreted inside double quotes. Experiment :)

You can also add an import time statement in the imports section of the program, and then use a time.sleep(number) inside the for loop, say, just above the print(substring) statement. I used values like:

time.sleep(0.002)

which works well for my display. You can tweak that number for your hardware.

- Have fun.

Did you know that there are a large number of meanings and contexts for the word string? Here are some of them:

String (Wikipedia).

This Wikipedia article about strings in computer science is interesting, and has a lot more points than one might imagine at first:

(computer) strings

- Vasudev Ram - Online Python training and consulting

Hit the ground running with my vi quickstart tutorial, vetted by two Windows system administrator friends.Jump to posts: Python * DLang * xtopdf Interested in a Python, SQL or Linux course?Get WP Engine, powerful managed WordPress hosting.Subscribe to my blog (jugad2.blogspot.com) by email My ActiveState Code recipes

Follow me on:Gumroad * LinkedIn * TwitterDo you create online products? Get Convertkit:Email marketing for digital product creators

↧

Codementor: Load Testing a Django Application using LocustIO

September 12, 2018, 12:15 am

≫ Next: Kay Hayen: Nuitka this week #6

≪ Previous: Vasudev Ram: How many ways can you substring a string? Part 1

Django framework, used for buliding web applications quickly in a clean and efficient manner. As the size of application increases, a common issue faced by all teams is performance of the...

↧

Kay Hayen: Nuitka this week #6

September 12, 2018, 3:05 am

≫ Next: PyCharm: PyCharm 2018.3 EAP 2

≪ Previous: Codementor: Load Testing a Django Application using LocustIO

Contents

Holiday

In my 2 weeks holiday, I indeed focused on a really big thing, and got more done that I had hoped for. For C types, nuitka_bool, which is a tri-state boolean with true, false and unassigned, can be used for some variables, and executes some operations without going through objects anymore.

bool

Condition codes are no longer special. They all need a boolean value from the expression used as a condition, and there was a special paths for some popular expressions for conditions, but of course not all. That is now a universal thing, conditional statement/expressions will now simply ask to provide a temp variable of value nuitka_bool and then code generation handles it.

For where it is used, code gets a lot lighter, and of course faster, although I didn't measure it yet. Going to Py_True/Py_False and comparing with it, wasn't that optimal, and it's nice this is now so much cleaner as a side effect of that C bool work.

This seems to be so good, that actually it's the default for this to be used in 0.6.0, and that itself is a major break through. Not so much for actual performance, but for structure. Other C types are going to follow soon and will give massive performance gains.

void

And what was really good, is that not only did I get bool to work almost perfectly, I also started work on the void C target type and finished that after my return from holiday last weekend, which lead to new optimization that I am putting in the 0.5.33 release that is coming soon, even before the void code generation is out.

The void C type cannot read values back, and unused values should not be used, so this gives errors for cases where that becomes obvious.

aorb

Consider this expression. The or expression, that one is going to producing a value, which is then released, but not used otherwise. New optimzation creates a conditional statement out of it, which takes a as the condition and if not true, then evaluates b but ignores it.

ifnota:b

The void evaluation of b can then do further optimization for it.

Void code generation can therefore highlight missed opportunities for this kid of optimization, and found a couple of these. That is why I was going for it, and I feel it pays off. Code generation checking optimization here, is a really nice synergy between the two.

Plus I got all the tests to work with it, and solved the missing optimizations it found very easily. And instead of allocating an object now, not assigning is often creating more obvious code. And that too allowed me to find a couple of bugs by C compiler warnings.

Obviously I will want to run a compile all the world test before making it the default, which is why this will probably become part of 0.6.1 to be the default.

module_var

Previously variable codes were making a hard distinction for module variables and make them use their own helper codes. Now this is encapsulated in a normal C type class like nuitka_bool, or the one for PyObject * variables, and integrates smoothly, and even got better. A sign things are going smooth.

Goto Generators

Still not released. I delayed it after my holiday, and due to the heap generator change, after stabilizing the C types work, I want to first finish a tests/library/compile_python_module.py resume run, which will for a Anaconda3 compile all the code found in there.

Right now it's still doing that, and even found a few bugs. The heap storage can still cause issues, as can changes to cloning nodes, which happens for try nodes and their finally blocks.

This should finish these days. I looked at performance numbers and found that develop is indeed only faster, and factory due to even more optimization will be yet faster, and often noteworthy.

Benchmarks

The Speedcenter of Nuitka is what I use right now, but it's only showing the state of 3 branches and compared to CPython, not as much historical information. Also the organization of tests is poor. At least there is tags for what improved.

After release of Nuitka 0.6.0 I will show more numbers, and I will start to focus on making it easier to understand. Therefore no link right now, google if you are so keen. ;-)

Twitter

During the holiday sprint, and even after, I am going to Tweet a lot about what is going on for Nuitka. So follow me on twitter if you like, I will post important stuff as it happens there:

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Poll on Executable Names

So I put e.g. poll up on Twitter, which is now over. But it made me implement a new scheme, due to popular consensus

Hotfixes

Even more hotfixes. I even did 2 during my holiday, however packages built only later.

Threaded imports on 3.4 or higher of modules were not using the locking they should use. Multiprocessing on Windows with Python3 had even more problems, and the --include-package and --include-module were present, but not working.

That last one was actually very strange. I had added a new option group for them, but not added it to the parser. Result: Option works. Just does not show up in help output. Really?

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very like one you can help with.

Nuitka definitely needs more people to work on it.

Plans

Working down the release backlog. Things should be out. I am already working on what should become 0.6.1, but it's not yet 0.5.33 released. Not a big deal, but 0.6.0 has 2 really important fixes for performance regressions that have happened in the past. One is for loops, making that faster is probably like the most important one. The other for constant indexing, probably also very important. Very much measurable in pystone at least.

In the mean time, I am preparing to get int working as a target C type, so e.g. comparisons of such values could be done in pure C, or relatively pure C.

Also, I noticed that e.g. in-place operations can be way more optimized and did stuff for 0.6.1 already in this domain. That is unrelated to C type work, but kind of follows a similar route maybe. How to compare mixed types we know of, or one type only. That kind of things needs ideas and experiments.

Having int supported should help getting some functions to C speeds, or at least much closer to it. That will make noticable effects in many of the benchmarks. More C types will then follow one by one.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

↧

PyCharm: PyCharm 2018.3 EAP 2

September 12, 2018, 3:53 am

≫ Next: Real Python: Logging in Python

≪ Previous: Kay Hayen: Nuitka this week #6

The second early access preview (EAP) version of PyCharm 2018.3 is available for download on our website now.

New in This Version

A couple of git authentication issues have been smoothed out: previously, the IDE would not ask for a new password after the server’s password had been changed, this is now resolved. The same applied for incorrect saved key passphrases
In some cases, PyCharm would freeze when using the ‘Local History’ feature.
Various DB server specific SQL syntax is now supported: USING NEW INTO for PostgreSQL, and using floats without a leading zero when using SAMPLE in Oracle.
And more, check out the release notes here

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.

If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.

PyCharm 2018.3 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2018.3 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.

All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you’ll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.

↧

Real Python: Logging in Python

September 12, 2018, 7:00 am

≫ Next: Made With Mu: Mu and PyGameZero Gamepad Demo

≪ Previous: PyCharm: PyCharm 2018.3 EAP 2

Logging is a very useful tool in a programmer’s toolbox. It can help you develop a better understanding of the flow of a program and discover scenarios that you might not even have thought of while developing.

Logs provide developers with an extra set of eyes that are constantly looking at the flow that an application is going through. They can store information, like which user or IP accessed the application. If an error occurs, then they can provide more insights than a stack trace by telling you what the state of the program was before it arrived at the line of code where the error occurred.

By logging useful data from the right places, you can not only debug errors easily but also use the data to analyze the performance of the application to plan for scaling or look at usage patterns to plan for marketing.

Python provides a logging system as a part of its standard library, so you can quickly add logging to your application. In this article, you will learn why using this module is the best way to add logging to your application as well as how to get started quickly, and you will get an introduction to some of the advanced features available.

Free Bonus:5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

The Logging Module

The logging module in Python is a ready-to-use and powerful module that is designed to meet the needs of beginners as well as enterprise teams. It is used by most of the third-party Python libraries, so you can integrate your log messages with the ones from those libraries to produce a homogeneous log for your application.

Adding logging to your Python program is as easy as this:

importlogging

With the logging module imported, you can use something called a “logger” to log messages that you want to see. By default, there are 5 standard levels indicating the severity of events. Each has a corresponding method that can be used to log events at that level of severity. The defined levels, in order of increasing severity, are the following:

DEBUG
INFO
WARNING
ERROR
CRITICAL

The logging module provides you with a default logger that allows you to get started without needing to do much configuration. The corresponding methods for each level can be called as shown in the following example:

importlogginglogging.debug('This is a debug message')logging.info('This is an info message')logging.warning('This is a warning message')logging.error('This is an error message')logging.critical('This is a critical message')

The output of the above program would look like this:

WARNING:root:This is a warning messageERROR:root:This is an error messageCRITICAL:root:This is a critical message

The output shows the severity level before each message along with root, which is the name the logging module gives to its default logger. (Loggers are discussed in detail in later sections.) This format, which shows the level, name, and message separated by a colon (:), is the default output format that can be configured to include things like timestamp, line number, and other details.

Notice that the debug() and info() messages didn’t get logged. This is because, by default, the logging module logs the messages with a severity level of WARNING or above. You can change that by configuring the logging module to log events of all levels if you want. You can also define your own severity levels by changing configurations, but it is generally not recommended as it can cause confusion with logs of some third-party libraries that you might be using.

Basic Configurations

You can use the basicConfig(**kwargs) method to configure the logging:

“You will notice that the logging module breaks PEP8 styleguide and uses camelCase naming conventions. This is because it was adopted from Log4j, a logging utility in Java. It is a known issue in the package but by the time it was decided to add it to the standard library, it had already been adopted by users and changing it to meet PEP8 requirements would cause backwards compatibility issues.” (Source)

Some of the commonly used parameters for basicConfig() are the following:

level: The root logger will be set to the specified severity level.
filename: This specifies the file.
filemode: If filename is given, the file is opened in this mode. The default is a, which means append.
format: This is the format of the log message.

By using the level parameter, you can set what level of log messages you want to record. This can be done by passing one of the constants available in the class, and this would enable all logging calls at or above that level to be logged. Here’s an example:

importlogginglogging.basicConfig(level=logging.DEBUG)logging.debug('This will get logged')

DEBUG:root:This will get logged

All events at or above DEBUG level will now get logged.

Similarly, for logging to a file rather than the console, filename and filemode can be used, and you can decide the format of the message using format. The following example shows the usage of all three:

importlogginglogging.basicConfig(filename='app.log',filemode='w',format='%(name)s - %(levelname)s - %(message)s')logging.warning('This will get logged to a file')

root - ERROR - This will get logged to a file

The message will look like this but will be written to a file named app.log instead of the console. The filemode is set to w, which means the log file is opened in “write mode” each time basicConfig() is called, and each run of the program will rewrite the file. The default configuration for filemode is a, which is append.

You can customize the root logger even further by using more parameters for basicConfig(), which can be found here.

It should be noted that calling basicConfig() to configure the root logger works only if the root logger has not been configured before. Basically, this function can only be called once.

debug(), info(), warning(), error(), and critical() also call basicConfig() without arguments automatically if it has not been called before. This means that after the first time one of the above functions is called, you can no longer configure the root logger because they would have called the basicConfig() function internally.

The default setting in basicConfig() is to set the logger to write to the console in the following format:

ERROR:root:This is an error message

Formatting the Output

While you can pass any variable that can be represented as a string from your program as a message to your logs, there are some basic elements that are already a part of the LogRecord and can be easily added to the output format. If you want to log the process ID along with the level and message, you can do something like this:

importlogginglogging.basicConfig(format='%(process)d-%(levelname)s-%(message)s')logging.warning('This is a Warning')

18472-WARNING-This is a Warning

format can take a string with LogRecord attributes in any arrangement you like. The entire list of available attributes can be found here.

Here’s another example where you can add the date and time info:

importlogginglogging.basicConfig(format='%(asctime)s - %(message)s',level=logging.INFO)logging.info('Admin logged in')

2018-07-11 20:12:06,288 - Admin logged in

%(asctime)s adds the time of creation of the LogRecord. The format can be changed using the datefmt attribute, which uses the same formatting language as the formatting functions in the datetime module, such as time.strftime():

importlogginglogging.basicConfig(format='%(asctime)s - %(message)s',datefmt='%d-%b-%y %H:%M:%S')logging.warning('Admin logged out')

12-Jul-18 20:53:19 - Admin logged out

You can find the guide here.

Logging Variable Data

In most cases, you would want to include dynamic information from your application in the logs. You have seen that the logging methods take a string as an argument, and it might seem natural to format a string with variable data in a separate line and pass it to the log method. But this can actually be done directly by using a format string for the message and appending the variable data as arguments. Here’s an example:

importloggingname='John'logging.error('%s raised an error',name)

ERROR:root:John raised an error

The arguments passed to the method would be included as variable data in the message.

While you can use any formatting style, the f-strings introduced in Python 3.6 are an awesome way to format strings as they can help keep the formatting short and easy to read:

importloggingname='John'logging.error(f'{name} raised an error')

ERROR:root:John raised an error

Capturing Stack Traces

The logging module also allows you to capture the full stack traces in an application. Exception information can be captured if the exc_info parameter is passed as True, and the logging functions are called like this:

importlogginga=5b=0try:c=a/bexceptExceptionase:logging.error("Exception occurred",exc_info=True)

ERROR:root:Exception occurredTraceback (most recent call last):  File "exceptions.py", line 6, in <module>    c = a / bZeroDivisionError: division by zero[Finished in 0.2s]

If exc_info is not set to True, the output of the above program would not tell us anything about the exception, which, in a real-world scenario, might not be as simple as a ZeroDivisionError. Imagine trying to debug an error in a complicated codebase with a log that shows only this:

ERROR:root:Exception occurred

Here’s a quick tip: if you’re logging from an exception handler, use the logging.exception() method, which logs a message with level ERROR and adds exception information to the message. To put it more simply, calling logging.exception() is like calling logging.error(exc_info=True). But since this method always dumps exception information, it should only be called from an exception handler. Take a look at this example:

importlogginga=5b=0try:c=a/bexceptExceptionase:logging.exception("Exception occurred")

ERROR:root:Exception occurredTraceback (most recent call last):  File "exceptions.py", line 6, in <module>    c = a / bZeroDivisionError: division by zero[Finished in 0.2s]

Using logging.exception() would show a log at the level of ERROR. If you don’t want that, you can call any of the other logging methods from debug() to critical() and pass the exc_info parameter as True.

Classes and Functions

So far, we have seen the default logger named root, which is used by the logging module whenever its functions are called directly like this: logging.debug(). You can (and should) define your own logger by creating an object of the Logger class, especially if your application has multiple modules. Let’s have a look at some of the classes and functions in the module.

The most commonly used classes defined in the logging module are the following:

Logger: This is the class whose objects will be used in the application code directly to call the functions.
LogRecord: Loggers automatically create LogRecord objects that have all the information related to the event being logged, like the name of the logger, the function, the line number, the message, and more.
Handler: Handlers send the LogRecord to the required output destination, like the console or a file. Handler is a base for subclasses like StreamHandler, FileHandler, SMTPHandler, HTTPHandler, and more. These subclasses send the logging outputs to corresponding destinations, like sys.stdout or a disk file.
Formatter: This is where you specify the format of the output by specifying a string format that lists out the attributes that the output should contain.

Out of these, we mostly deal with the objects of the Logger class, which are instantiated using the module-level function logging.getLogger(name). Multiple calls to getLogger() with the same name will return a reference to the same Logger object, which saves us from passing the logger objects to every part where it’s needed. Here’s an example:

importlogginglogger=logging.getLogger('example_logger')logger.warning('This is a warning')

This is a warning

This creates a custom logger named example_logger, but unlike the root logger, the name of a custom logger is not part of the default output format and has to be added to the configuration. Configuring it to a format to show the name of the logger would give an output like this:

WARNING:example_logger:This is a warning

Again, unlike the root logger, a custom logger can’t be configured using basicConfig(). You have to configure it using Handlers and Formatters:

“It is recommended that we use module-level loggers by passing __name__ as the name parameter to getLogger() to create a logger object as the name of the logger itself would tell us from where the events are being logged. __name__ is a special built-in variable in Python which evaluates to the name of the current module.” (Source)

Using Handlers

Handlers come into the picture when you want to configure your own loggers and send the logs to multiple places when they are generated. Handlers send the log messages to configured destinations like the standard output stream or a file or over HTTP or to your email via SMTP.

A logger that you create can have more than one handler, which means you can set it up to be saved to a log file and also send it over email.

Like loggers, you can also set the severity level in handlers. This is useful if you want to set multiple handlers for the same logger but want different severity levels for each of them. For example, you may want logs with level WARNING and above to be logged to the console, but everything with level ERROR and above should also be saved to a file. Here’s a program that does that:

# logging_example.pyimportlogging# Create a custom loggerlogger=logging.getLogger(__name__)# Create handlersc_handler=logging.StreamHandler()f_handler=logging.FileHandler('file.log')c_handler.setLevel(logging.WARNING)f_handler.setLevel(logging.ERROR)# Create formatters and add it to handlersc_format=logging.Formatter('%(name)s - %(levelname)s - %(message)s')f_format=logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')c_handler.setFormatter(c_format)f_handler.setFormatter(f_format)# Add handlers to the loggerlogger.addHandler(c_handler)logger.addHandler(f_handler)logger.warning('This is a warning')logger.error('This is an error')

__main__ - WARNING - This is a warning__main__ - ERROR - This is an error

Here, logger.warning() is creating a LogRecord that holds all the information of the event and passing it to all the Handlers that it has: c_handler and f_handler.

c_handler is a StreamHandler with level WARNING and takes the info from the LogRecord to generate an output in the format specified and prints it to the console. f_handler is a FileHandler with level ERROR, and it ignores this LogRecord as its level is WARNING.

When logger.error() is called, c_handler behaves exactly as before, and f_handler gets a LogRecord at the level of ERROR, so it proceeds to generate an output just like c_handler, but instead of printing it to console, it writes it to the specified file in this format:

2018-08-03 16:12:21,723 - __main__ - ERROR - This is an error

The name of the logger corresponding to the __name__ variable is logged as __main__, which is the name Python assigns to the module where execution starts. If this file is imported by some other module, then the __name__ variable would correspond to its name logging_example. Here’s how it would look:

# run.pyimportlogging_example

logging_example - WARNING - This is a warninglogging_example - ERROR - This is an error

Other Configuration Methods

You can configure logging as shown above using the module and class functions or by creating a config file or a dictionary and loading it using fileConfig() or dictConfig() respectively. These are useful in case you want to change your logging configuration in a running application.

Here’s an example file configuration:

[loggers]keys=root,sampleLogger[handlers]keys=consoleHandler[formatters]keys=sampleFormatter[logger_root]level=DEBUGhandlers=consoleHandler[logger_sampleLogger]level=DEBUGhandlers=consoleHandlerqualname=sampleLoggerpropagate=0[handler_consoleHandler]class=StreamHandlerlevel=DEBUGformatter=sampleFormatterargs=(sys.stdout,)[formatter_sampleFormatter]format=%(asctime)s - %(name)s - %(levelname)s - %(message)s

In the above file, there are two loggers, one handler, and one formatter. After their names are defined, they are configured by adding the words logger, handler, and formatter before their names separated by an underscore.

To load this config file, you have to use fileConfig():

importloggingimportlogging.configlogging.config.fileConfig(fname='file.conf',disable_existing_loggers=False)# Get the logger specified in the filelogger=logging.getLogger(__name__)logger.debug('This is a debug message')

2018-07-13 13:57:45,467 - __main__ - DEBUG - This is a debug message

The path of the config file is passed as a parameter to the fileConfig() method, and the disable_existing_loggers parameter is used to keep or disable the loggers that are present when the function is called. It defaults to True if not mentioned.

Here’s the same configuration in a YAML format for the dictionary approach:

version:1formatters:simple:format:'%(asctime)s-%(name)s-%(levelname)s-%(message)s'handlers:console:class:logging.StreamHandlerlevel:DEBUGformatter:simplestream:ext://sys.stdoutloggers:sampleLogger:level:DEBUGhandlers:[console]propagate:noroot:level:DEBUGhandlers:[console]

Here’s an example that shows how to load config from a yaml file:

importloggingimportlogging.configimportyamlwithopen('config.yaml','r')asf:config=yaml.safe_load(f.read())logging.config.dictConfig(config)logger=logging.getLogger(__name__)logger.debug('This is a debug message')

2018-07-13 14:05:03,766 - __main__ - DEBUG - This is a debug message

Keep Calm and Read the Logs

The logging module is considered to be very flexible. Its design is very practical and should fit your use case out of the box. You can add basic logging to a small project, or you can go as far as creating your own custom log levels, handler classes, and more if you are working on a big project.

If you haven’t been using logging in your applications, now is a good time to start. When done right, logging will surely remove a lot of friction from your development process and help you find opportunities to take your application to the next level.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Introduction

Know your way around Python

Showcase your Example Projects

Brush up on Data Structures and Algorithms

Ability to Comprehend and Solve Problems

Icing on the Cake

Conclusion

Resources

Introduction

Word Embedding Approaches

Bag of Words

Pros and Cons of Bag of Words

TF-IDF Scheme

Pros and Cons of TF-IDF

Word2Vec

Pros and Cons of Word2Vec

Word2Vec in Python with Gensim Library

Creating Corpus

Preprocessing

Creating Word2Vec Model

Model Analysis

Finding Vectors for a Word

Finding Similar Words

Conclusion

Best Books for Learning Python

Python Crash Course

Head-First Python, 2nd edition

Invent Your Own Computer Games with Python, 4th edition

Think Python: How to Think Like a Computer Scientist, 2nd edition

Effective Computation in Physics: Field Guide to Research with Python

Learn Python 3 the Hard Way

Real Python Course, Part 1

Best Python Books for Kids

Python for Kids: A Playful Introduction to Programming

Teach Your Kids to Code: A Parent-Friendly Guide to Python Programming

Best Intermediate and Advanced Python Books

Python Tricks: A Buffet of Awesome Python Features

Fluent Python: Clear, Concise, and Effective Programming

Effective Python: 59 Ways to Write Better Python

Python Cookbook

Get Coding!

What is Nikola?

Downloads

Changes

Important compatibility changes

Features

Bugfixes

Removed conf.py settings

Removed features

Wrapping Up

Related Reading

Introduction

What is Docker, Anyway?

Prerequisites

Setting Up a Django web application

Dockerizing the Application

Installing Docker

Writing the Dockerfile

Building and Running the Container

Next Steps

Continuous Integration and Deployment for Docker projects on Semaphore

Conclusion

New in This Version

Interested?

The Logging Module

Basic Configurations

Formatting the Output

Logging Variable Data

Capturing Stack Traces

Classes and Functions

Using Handlers

Other Configuration Methods

Keep Calm and Read the Logs