Janusworx: #100DaysOfCode, Days 005, 006 and 007 – The Collections Module

November 27, 2019, 12:08 am

≫ Next: Lintel Technologies: Manhole service in Twisted Application.

≪ Previous: Codementor: Awesome Full Stack Python Resources

I have been at this an hour daily for four days now.
It just is not clicking for me.
One thing is for sure, this Talk Python course is definitely going to take longer than a hundred days!

I don’t know whether to be ashamed or proud.
Best to be unashamedly persistent, I guess.
More work, tomorrow.

↧

Lintel Technologies: Manhole service in Twisted Application.

November 27, 2019, 12:46 am

≫ Next: PyBites: From webscraper to wordcloud

≪ Previous: Janusworx: #100DaysOfCode, Days 005, 006 and 007 – The Collections Module

What is Manhole?

Manhole is an in-process service, that will accept UNIX domain socket connections and present the stack traces for all threads and an interactive prompt.

Using it we can access and modify objects or definition in the running application, like change or add the method in any class, change the definition of any method of class or module.

This allows us to make modifications in running an application without restarting the application, it makes work easy like debugging the application, you are able to check the values of the object while the program is running.

How to configure it?

from twisted.internet import reactor
from twisted.conch import manhole, manhole_ssh
from twisted.conch.ssh.keys import Key
from twisted.cred import portal, checkers

DATA = {"Service": "Manhole"}


def get_manhole_factory(namespace, **passwords):

    def get_manhole(arg):
        return manhole.ColoredManhole(namespace)
            
    realm = manhole_ssh.TerminalRealm()
    realm.chainedProtocolFactory.protocolFactory = get_manhole
    p = portal.Portal(realm)
    p.registerChecker(checkers.InMemoryUsernamePasswordDatabaseDontUse(**passwords))
    f = manhole_ssh.ConchFactory(p)
    f.publicKeys = {"ssh-rsa": Key.fromFile("keys/manhole.pub")}
    f.privateKeys = {"ssh-rsa": Key.fromFile("keys/manhole")}
    return f


reactor.listenTCP(2222, get_manhole_factory(globals(), admin='admin'))
reactor.run()

Once you run above snippet, the service will start on TCP port 2222.

You need to use SSH command to get login into the service.

See below how it looks like.

[lalit : ~]₹ ssh admin@localhost -p 2222
admin@localhost's password:
>>> dir() 
['DATA', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'checkers', 'get_manhole_factory', 'manhole', 'manhole_ssh', 'portal', 'reactor'] 
>>> DATA 
{'Service': 'Manhole'}
>>> DATA['Service'] = "Edited">>> DATA 
{'Service': 'Edited'}

[lalit : ~]₹ ssh admin@localhost -p 2222
admin@localhost's password: 
>>> dir() 
['DATA', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'checkers', 'get_manhole_factory', 'manhole', 'manhole_ssh', 'portal', 'reactor'] 
>>> DATA 
{'Service': 'Edited'}

Here In the first login, we change the value in DATA dictionary in running application, as we can see we get the new value in the second login.

The post Manhole service in Twisted Application. appeared first on Lintel Technologies Blog.

↧

PyBites: From webscraper to wordcloud

November 27, 2019, 3:01 am

≫ Next: PyBites: There's no wrong way... to eat a Bite of Py

≪ Previous: Lintel Technologies: Manhole service in Twisted Application.

Living in Belgium, I decided to scrape the Belgian newspaper Het Laatste Nieuws. I wanted to know what kept people busy when reading the news, so I went for a collection of all comments on all articles in the news section.

You can find the full code here.

Index

Requirements

The little Scraper that could

Bypassing the cookiewall

According to cookielaw.org the cookielaw can be described as following:

The Cookie Law is a piece of privacy legislation that requires websites to get consent from visitors to store or retrieve any information on a computer, smartphone or tablet.
It was designed to protect online privacy, by making consumers aware of how information about them is collected and used online, and give them a choice to allow it or not.

This means that if we haven't visited the page before, we will be greeted with a message that will block our access, asking for permission to put the cookies on our computer.

The great Cookiewall of HLN

To get past this 'cookiewall', the server needs to be presented with a cookie, so it knows we have given consent and it can track us without legal implications.

I also set my user agent to the same as the one on my computer, so there would be no differences in the source presented to me based on what browser I was using.

user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'consent_cookie={'pws':'functional|analytics|content_recommendation|targeted_advertising|social_media','pwv':'1'}...defget_pagedata():reqdata=req_session.get("https://www.hln.be/nieuws",cookies=consent_cookie,headers={'User-Agent':user_agent})returnreqdata

Et voila! No more cookiewalls, let the scraping begin!

Getting the articles

Now that I could get all the links, I had to go back to the HTML source of the page to figure out a good way to obtain the articles. I ended up with a pretty conclusive piece of code for the time of writing, and also added a way to get the categories for the articles as labeled by HLN.

All these records were turned into Article namedtuples:

Article=collections.namedtuple('Article','hash headline url categories')

defget_all_articles(reqdata):soup=BeautifulSoup(reqdata.text,"html.parser")article_list=[]html_article_list=soup.findChildren("article")forhtml_articleinhtml_article_list:article_wrappers=html_article.findChildren("div")try:html_indiv_articles=article_wrappers[1].findChildren("a",{"class":"teaser-link--color"})forarticleinhtml_indiv_articles:article_link=article['href']categories=get_categories_from_article(article_link)article_title=article.findChild("h1").textifarticle_titleisNone:article_title=article.find("h1")ifarticle_titleisNone:exit(0)sha1=hashlib.sha1()sha1.update(article_title.encode('utf-8'))article_hash=sha1.hexdigest()article_list.append(Article(hash=article_hash,headline=article_title,url=article_link,categories=categories))exceptIndexError:# these are the divs from the most-read category, we should already have these.continue

I was pretty agressive on errorhandling by either exit()ing completely, or simply ignoring the exception and continuing my loop, but that is because I feel scraping is a precise art and if data is something different than what you expect, you're expecting the wrong data!

Finally, I looped over the article list because I noticed that there were doubles (some articles might be on a 'featured' bar, and it was sometimes hard to distinguish between them)

clean_article_list=[][clean_article_list.append(itm)foritminarticle_listifitmnotinclean_article_list]returnclean_article_list

The Comments, a new challenge

Now that I had a list of all articles and the links to them, I wanted to get started by getting all the comments when I noticed my first scraping run had only gotten me 2 or 3 per article. Knowing there were 100's of comments on some of the articles in my article dataset, I realized something was wrong.

Back at the drawing board, we found the problem. A little thing called Ajax.

Every article loaded a couple of comments, and a link that said 'Show more comments'.

When clicking this link, an Ajax call was made to get the next comments. If there were more after that, a link was also included for the next ajax call.

The solution came with regex, as the Ajax links all were in a very specific pattern!

Still, the recursiveness in the puzzle was a bit challenging.

comment_regex="href=\"(https\:\/\/www\.hln\.be\/ajax\/comments\/(.*?)\/start\/ts_\d*?)\">"comment_rxobj=re.compile(comment_regex)defget_comments_from_page(reqdata,article_hash):comment_list=[]soup=BeautifulSoup(reqdata.text,'html.parser')comments_list_ul=soup.find("ul",{"class":"comments__list"})ifcomments_list_ulisNone:returncomment_listcomments_indiv_li=comments_list_ul.findChildren("li",{"class":"comment"})forcommentincomments_indiv_li:comment_author=comment.find("h2",{"class":"comment__author"}).textcomment_body=comment.find("p",{"class":"comment__body"}).text.replace("\n"," ").replace("\r","")comment_list.append(Comment(article_hash=article_hash,commenter=comment_author,comment_text=comment_body))returncomment_listdefget_all_comments(article_main_link,article_hash):comment_href_list=[]comment_list=[]article_reqdata=req_session.get(url=article_main_link,cookies=consent_cookie,headers={'User-Agent':user_agent})reqdata=article_reqdatacomment_list+=get_comments_from_page(reqdata=reqdata,article_hash=article_hash)whilecomment_rxobj.search(reqdata.text):comment_href_group=comment_rxobj.findall(reqdata.text)comment_href=comment_href_group[0][0]comment_href_list.append(comment_href)time.sleep(sleeptime)reqdata=req_session.get(comment_href,cookies=consent_cookie,headers={'User-Agent':user_agent})comment_list+=get_comments_from_page(reqdata=reqdata,article_hash=article_hash)returncomment_list

Because I didn't want to be sending too many requests in a short span, I also added some ratelimiting to not be too aggressive towards the hln server.

A tiny bit of AI: SpaCy

If we want to know what people are talking about, we have to be able to recognize the different parts of a sentence. If we would simply look at all words, a lot of 'non-descriptive' words would probably show up. It would still give a good image, but a lot of space would probably be taken by stop words or conjunctions (like 'and')

What we really care about is the subjects people are talking about, the Nouns of the lines they write. That's where SpaCy comes in!

SpaCy is an open source library for Natural Language Processing. There are trained models for a set of languages that can be used for a lot of different things.

This allowed us to create a dictionary of 'types' of words:

importspacynlp=spacy.load('nl_core_news_sm')...defget_wordtype_dict(raw_comment_list):wordtype_dict=defaultdict(list)forcommentinraw_comment_list:doc=nlp(comment)fortokenindoc:wordtype_dict[token.pos_].append(token.text)returnwordtype_dict

Outputting the first 5 words for each token:

ADV:
    Eindelijk
    niet
    nu
    nog
    eens
    zo

PUNCT:
    !
    ...
    .
    .
    .
    .
PRON:
    Ik
    dat
    ik
    Ik
    Wat
    dat
VERB:
    ga
    kijken
    Kan
    wachten
    noem
    zie
NOUN:
    Hemels
    nieuws
    ziekte
    Pia
    leven
    toegewenst

The classification was not the most accurate but it was enough for what I wanted to do.

I plan on retraining the model with my new data in the future to hopefully get a more accurate model for Belgian comments with its dialects.

Making things pretty: WordCloud

Now that I had the data, all I had to do was find a way to visualize it. Wordcloud is a library designed to make... well, wordclouds!

The wordcloud library takes a blob of text, and takes turns them in to wordclouds with the size of the words reflecting the number of occurences. After turning the lists of words in our worddict into strings, 2 lines of code were enough to produce a result:

defmake_word_cloud(text_blob):wordcloud=WordCloud(width=1600,height=1200,background_color="white").generate(text_blob)returnwordcloud

For our nouns, this gave us:

Wordcloud Result

You could also use a logo to generate the wordcloud by creating an ImageColorGenerator() and passing it to the wordcloud constructor:

defmake_word_cloud_logo(text_blob):rgb_hln_arr=np.array(load_hln_logo())wcig=ImageColorGenerator(rgb_hln_arr)wordcloud=WordCloud(width=1600,height=1200,background_color="white",mask=rgb_hln_arr,color_func=wcig)\
        .generate(text_blob)returnwordcloud

For ALL the words, this resulted in:

Wordcloud Logo Result

Things for the future

There's room for a lot of improvement in this project, below are some things I really want to get done and you might see me write about in the future:

Improve the spacy model for dutch/belgian POS Tagging.
This same project but with a more Object Oriented approach
Fix the encoding issues in the scraper
Sentiment Analysis
Merging word groups by combining POS Tagging and Dependency Parsing

Thanks for reading, I hope you enjoyed it as much as I enjoyed writing it. If you have any remarks or questions, you can likely find me on the Pybites Slack Channel as 'Jarvis'.

Keep calm and code in Python!

-- Cedric

↧

PyBites: There's no wrong way... to eat a Bite of Py

November 27, 2019, 3:02 am

≫ Next: Codementor: How to Update Legacy Code Successfully

≪ Previous: PyBites: From webscraper to wordcloud

The Bites of Py exercises from PyBites are a wonderful way to improve your Python skills in short, focused practice sessions. You can even work on them right from your browser! Of course, that's not the only way.

Here are a few different ways I might work on a bite. I hope some of these are useful - please share your own habits in the comments!

Quickstart: Working Directly in the Browser

If a bite appears to have a short solution with reasonably straightforward test cases, I'll probably give it a try right in the browser. PyBites uses the Ace editor with some nice Python-specific additions such as:

Code linting with flake8
Auto-formatting with black

This is a great way to start coding. It's pleasant to use, with no requirements beyond a capable browser.

If a bite deals with concepts or modules that I'm not familiar with though, I often want to work more interactively. I'm not just submitting code for tests in that case - I'm also reading documentation and experimenting to get a better feel for the concepts in the bite. The browser editor falls short for me in those cases, so I might switch to...

Interactive Exploration: Using a REPL

As long as you have Python installed on your local machine, you'll be able to run python to launch the Python interpreter in interactive mode. This gives you a helpful REPL (Read-Eval-Print Loop) where you can explore, try things out, and see the output in real-time.

Depending on the bite you're working on, you might need to install additional packages. It pays to do a little bit of work to keep your PyBites environment isolated, by following steps like these:

Prepare a new pybites virtual environment. Real Python has a primer on virtual environments that can help you get started.
Install required packages inside your pybites virtual environment. The specific requirements vary from bite to bite, but here are some packages that you'll need eventually:
requests
bs4 - for BeautifulSoup/web scraping bites
feedparser
python-dateutil
pandas

Aside from running python, there are a number of alternative REPLs available. This includes local tools such as bpython or ptpython, and web-based options like repl.it. My REPL of choice is the ptipython component of ptpython, with vim keybindings. This is mostly personal preference though, so find the experience that best fits your style!

Sometimes after I've done some exploring and feel comfortable with the concepts of a bite, I find that I'm getting hung up with a few failing tests. In that case I am looking for a smoother flow for testing and debugging. I might jump over to...

Testing/Debugging Support: A Full-Featured Editor

With an editor like PyCharm or VS Code, you can run the same tests locally that PyBites runs in the browser. However, locally you've got a quicker test cycle and you can debug along the way!

When I set up my editor of choice (currently VS Code) to work on a bite, it goes something like this:

First-time setup

Set up a directory where pybites code will live. For me, that is ~/code/pybites.
Activate the same pybites virtual environment I created for use with my REPL. Microsoft has some helpful guidance for working with virtual environments in VS Code.

Per-bite setup

Create a directory for the bite. In my case, code for bite 20 goes into ~/code/pybites/20.
Copy the code and test files. Again using bite 20 as an example, this means I have code in ~/code/pybites/20/account.py and tests in ~/code/pybites/20/test_account.py.
Configure tests. This means enabling pytest and using the bite directory (such as ~/code/pybites/20) as the test root as described in the documentation.

With the setup steps done, I can discover, run and debug tests quickly.

Test Bites: A New Spin

Now that Test Bites are live, there's an extra wrinkle to the coding and testing workflow. If you've already got a local environment set up though, you've already laid the groundwork for testing your tests! The last piece you need is the MutPy mutation testing tool. With that installed, you can run your mutation tests locally just like Bob did in the launch post!

There's No Wrong Way...

If you're practicing on PyBites, you'll definitely be submitting code from the browser. But what other tools will help you along the way? The options are endless - so go nuts, find something that works for you, and share your own tips in the comments!

Keep calm and code in Python!

-- AJ

↧

Codementor: How to Update Legacy Code Successfully

November 27, 2019, 4:46 am

≫ Next: Real Python: Python Descriptors: An Introduction

≪ Previous: PyBites: There's no wrong way... to eat a Bite of Py

In this article, we’ll talk about when updates are necessary and how to make them without affecting the app’s functionality.

↧

Real Python: Python Descriptors: An Introduction

November 27, 2019, 6:00 am

≫ Next: Python Anywhere: Python 3.8 now available!

≪ Previous: Codementor: How to Update Legacy Code Successfully

Descriptors are a specific Python feature that power a lot of the magic hidden under the language’s hood. If you’ve ever thought that Python descriptors are an advanced topic with few practical applications, then this tutorial is the perfect tool to help you understand this powerful feature. You’ll come to understand why Python descriptors are such an interesting topic, and what kind of use cases you can apply them to.

By the end of this tutorial, you’ll know:

What Python descriptors are
Where they’re used in Python’s internals
How to implement your own descriptors
When to use Python descriptors

This tutorial is intended for intermediate to advanced Python developers as it concerns Python internals. However, if you’re not at this level yet, then just keep reading! You’ll find useful information about Python and the lookup chain.

Free Bonus:Click here to get access to a free "The Power of Python Decorators" guide that shows you 3 advanced decorator patterns and techniques you can use to write to cleaner and more Pythonic programs.

What Are Python Descriptors?

Descriptors are Python objects that implement a method of the descriptor protocol, which gives you the ability to create objects that have special behavior when they’re accessed as attributes of other objects. Here you can see the correct definition of the descriptor protocol:

__get__(self,obj,type=None)->object__set__(self,obj,value)->None__delete__(self,obj)->None__set_name__(self,owner,name)

If your descriptor implements just .__get__(), then it’s said to be a non-data descriptor. If it implements .__set__() or .__delete__(), then it’s said to be a data descriptor. Note that this difference is not just about the name, but it’s also a difference in behavior. That’s because data descriptors have precedence during the lookup process, as you’ll see later on.

Take a look at the following example, which defines a descriptor that logs something on the console when it’s accessed:

# descriptors.pyclassVerbose_attribute():def__get__(self,obj,type=None)->object:print("accessing the attribute to get the value")return42def__set__(self,obj,value)->None:print("accessing the attribute to set the value")raiseAttributeError("Cannot change the value")classFoo():attribute1=Verbose_attribute()my_foo_object=Foo()x=my_foo_object.attribute1print(x)

In the example above, Verbose_attribute() implements the descriptor protocol. Once it’s instantiated as an attribute of Foo, it can be considered a descriptor.

As a descriptor, it has binding behavior when it’s accessed using dot notation. In this case, the descriptor logs a message on the console every time it’s accessed to get or set a value:

When it’s accessed to .__get__() the value, it always returns the value 42.
When it’s accessed to .__set__() a specific value, it raises an AttributeErrorexception, which is the recommended way to implement read-only descriptors.

Now, run the example above and you’ll see the descriptor log the access to the console before returning the constant value:

$ python descriptors.py
accessing the attribute to get the value42

Here, when you try to access attribute1, the descriptor logs this access to the console, as defined in .__get__().

How Descriptors Work in Python’s Internals

If you have experience as an object-oriented Python developer, then you may think that the previous example’s approach is a bit of overkill. You could achieve the same result by using properties. While this is true, you may be surprised to know that properties in Python are just… descriptors! You’ll see later on that properties are not the only feature that make use of Python descriptors.

Python Descriptors in Properties

If you want to get the same result as the previous example without explicitly using a Python descriptor, then the most straightforward approach is to use a property. The following example uses a property that logs a message to the console when it’s accessed:

# property_decorator.pyclassFoo():@propertydefattribute1(self)->object:print("accessing the attribute to get the value")return42@attribute1.setterdefattribute1(self,value)->None:print("accessing the attribute to set the value")raiseAttributeError("Cannot change the value")my_foo_object=Foo()x=my_foo_object.attribute1print(x)

The example above makes use of decorators to define a property, but as you may know, decorators are just syntactic sugar. The example before, in fact, can be written as follows:

# property_function.pyclassFoo():defgetter(self)->object:print("accessing the attribute to get the value")return42defsetter(self,value)->None:print("accessing the attribute to set the value")raiseAttributeError("Cannot change the value")attribute1=property(getter,setter)my_foo_object=Foo()x=my_foo_object.attribute1print(x)

Now you can see that the property has been created by using property(). The signature of this function is as follows:

property(fget=None,fset=None,fdel=None,doc=None)->object

property() returns a property object that implements the descriptor protocol. It uses the parameters fget, fset and fdel for the actual implementation of the three methods of the protocol.

Python Descriptors in Methods and Functions

If you’ve ever written an object-oriented program in Python, then you’ve certainly used methods. These are regular functions that have the first argument reserved for the object instance. When you access a method using dot notation, you’re calling the corresponding function and passing the object instance as the first parameter.

The magic that transforms your obj.method(*args) call into method(obj, *args) is inside a .__get__() implementation of the function object that is, in fact, a non-data descriptor. In particular, the function object implements .__get__() so that it returns a bound method when you access it with dot notation. The (*args) that follow invoke the functions by passing all the extra arguments needed.

To get an idea for how it works, take a look at this pure Python example from the official docs:

classFunction(object):...def__get__(self,obj,objtype=None):"Simulate func_descr_get() in Objects/funcobject.c"ifobjisNone:returnselfreturntypes.MethodType(self,obj)

In the example above, when the function is accessed with dot notation, .__get__() is called and a bound method is returned.

This works for regular instance methods just like it does for class methods or static methods. So, if you call a static method with obj.method(*args), then it’s automatically transformed into method(*args). Similarly, if you call a class method with obj.method(type(obj), *args), then it’s automatically transformed into method(type(obj), *args).

Note: To learn more about *args, check out Python args and kwargs: Demystified.

In the official docs, you can find some examples of how static methods and class methods would be implemented if they were written in pure Python instead of the actual C implementation. For instance, a possible static method implementation could be this:

classStaticMethod(object):"Emulate PyStaticMethod_Type() in Objects/funcobject.c"def__init__(self,f):self.f=fdef__get__(self,obj,objtype=None):returnself.f

Likewise, this could be a possible class method implementation:

classClassMethod(object):"Emulate PyClassMethod_Type() in Objects/funcobject.c"def__init__(self,f):self.f=fdef__get__(self,obj,klass=None):ifklassisNone:klass=type(obj)defnewfunc(*args):returnself.f(klass,*args)returnnewfunc

Note that, in Python, a class method is just a static method that takes the class reference as the first argument of the argument list.

How Attributes Are Accessed With the Lookup Chain

To understand a little more about Python descriptors and Python internals, you need to understand what happens in Python when an attribute is accessed. In Python, every object has a built-in __dict__ attribute. This is a dictionary that contains all the attributes defined in the object itself. To see this in action, consider the following example:

classVehicle():can_fly=Falsenumber_of_weels=0classCar(Vehicle):number_of_weels=4def__init__(self,color):self.color=colormy_car=Car("red")print(my_car.__dict__)print(type(my_car).__dict__)

This code creates a new object and prints the contents of the __dict__ attribute for both the object and the class. Now, run the script and analyze the output to see the __dict__ attributes set:

{'color': 'red'}{'__module__': '__main__', 'number_of_weels': 4, '__init__': <function Car.__init__ at 0x10fdeaea0>, '__doc__': None}

The __dict__ attributes are set as expected. Note that, in Python, everything is an object. A class is actually an object as well, so it will also have a __dict__ attribute that contains all the attributes and methods of the class.

So, what’s going on under the hood when you access an attribute in Python? Let’s make some tests with a modified version of the former example. Consider this code:

# lookup.pyclassVehicle(object):can_fly=Falsenumber_of_weels=0classCar(Vehicle):number_of_weels=4def__init__(self,color):self.color=colormy_car=Car("red")print(my_car.color)print(my_car.number_of_weels)print(my_car.can_fly)

In this example, you create an instance of the Car class that inherits from the Vehicle class. Then, you access some attributes. If you run this example, then you can see that you get all the values you expect:

$ python lookup.py
red4False

Here, when you access the attribute color of the instance my_car, you’re actually accessing a single value of the __dict__ attribute of the object my_car. When you access the attribute number_of_wheels of the object my_car, you’re really accessing a single value of the __dict__ attribute of the class Car. Finally, when you access the can_fly attribute, you’re actually accessing it by using the __dict__ attribute of the Vehicle class.

This means that it’s possible to rewrite the above example like this:

# lookup2.pyclassVehicle():can_fly=Falsenumber_of_weels=0classCar(Vehicle):number_of_weels=4def__init__(self,color):self.color=colormy_car=Car("red")print(my_car.__dict__['color'])print(type(my_car).__dict__['number_of_weels'])print(type(my_car).__base__.__dict__['can_fly'])

When you test this new example, you should get the same result:

$ python lookup2.py
red4False

So, what happens when you access the attribute of an object with dot notation? How does the interpreter know what you really need? Well, here’s where a concept called the lookup chain comes in:

First, you’ll get the result returned from the __get__ method of the data descriptor named after the attribute you’re looking for.
If that fails, then you’ll get the value of your object’s __dict__ for the key named after the attribute you’re looking for.
If that fails, then you’ll get the result returned from the __get__ method of the non-data descriptor named after the attribute you’re looking for.
If that fails, then you’ll get the value of your object type’s __dict__ for the key named after the attribute you’re looking for.
If that fails, then you’ll get the value of your object parent type’s __dict__ for the key named after the attribute you’re looking for.
If that fails, then the previous step is repeated for all the parent’s types in the method resolution order of your object.
If everything else has failed, then you’ll get an AttributeError exception.

Now you can see why it’s important to know if a descriptor is a data descriptor or a non-data descriptor? They’re on different levels of the lookup chain, and you’ll see later on that this difference in behavior can be very convenient.

How to Use Python Descriptors Properly

If you want to use Python descriptors in your code, then you just need to implement the descriptor protocol. The most important methods of this protocol are .__get__() and .__set__(), which have the following signature:

__get__(self,obj,type=None)->object__set__(self,obj,value)->None

When you implement the protocol, keep these things in mind:

self is the instance of the descriptor you’re writing.
obj is the instance of the object your descriptor is attached to.
type is the type of the object the descriptor is attached to.

In .__set__(), you don’t have the type variable, because you can only call .__set__() on the object. In contrast, you can call .__get__() on both the object and the class.

Another important thing to know is that Python descriptors are instantiated just once per class. That means that every single instance of a class containing a descriptor shares that descriptor instance. This is something that you might not expect and can lead to a classic pitfall, like this:

# descriptors2.pyclassOneDigitNumericValue():def__init__(self):self.value=0def__get__(self,obj,type=None)->object:returnself.valuedef__set__(self,obj,value)->None:ifvalue>9orvalue<0orint(value)!=value:raiseAttributeError("The value is invalid")self.value=valueclassFoo():number=OneDigitNumericValue()my_foo_object=Foo()my_second_foo_object=Foo()my_foo_object.number=3print(my_foo_object.number)print(my_second_foo_object.number)my_third_foo_object=Foo()print(my_third_foo_object.number)

Here, you have a class Foo that defines an attribute number, which is a descriptor. This descriptor accepts a single-digit numeric value and stores it in a property of the descriptor itself. However, this approach won’t work, because each instance of Foo shares the same descriptor instance. What you’ve essentially created is just a new class-level attribute.

Try to run the code and examine the output:

$ python descriptors2.py
333

You can see that all the instances of Foo have the same value for the attribute number, even though the last one was created after the my_foo_object.number attribute was set.

So, how can you solve this problem? You might think that it’d be a good idea to use a dictionary to save all the values of the descriptor for all the objects it’s attached to. This seems to be a good solution since .__get__() and .__set__() have the obj attribute, which is the instance of the object you’re attached to. You could use this value as a key for the dictionary.

Unfortunately, this solution has a big downside, which you can see in the following example:

# descriptors3.pyclassOneDigitNumericValue():def__init__(self):self.value={}def__get__(self,obj,type=None)->object:try:returnself.value[obj]except:return0def__set__(self,obj,value)->None:ifvalue>9orvalue<0orint(value)!=value:raiseAttributeError("The value is invalid")self.value[obj]=valueclassFoo():number=OneDigitNumericValue()my_foo_object=Foo()my_second_foo_object=Foo()my_foo_object.number=3print(my_foo_object.number)print(my_second_foo_object.number)my_third_foo_object=Foo()print(my_third_foo_object.number)

In this example, you use a dictionary for storing the value of the number attribute for all your objects inside your descriptor. When you run this code, you’ll see that it runs fine and that the behavior is as expected:

$ python descriptors3.py
300

Unfortunately, the downside here is that the descriptor is keeping a strong reference to the owner object. This means that if you destroy the object, then the memory is not released because the garbage collector keeps finding a reference to that object inside the descriptor!

You may think that the solution here could be the use of weak references. While that may, you’d have to deal with the fact that not everything can be referenced as weak and that, when your objects get collected, they disappear from your dictionary.

The best solution here is to simply not store values in the descriptor itself, but to store them in the object that the descriptor is attached to. Try this approach next:

# descriptors4.pyclassOneDigitNumericValue():def__init__(self,name):self.name=namedef__get__(self,obj,type=None)->object:returnobj.__dict__.get(self.name)or0def__set__(self,obj,value)->None:obj.__dict__[self.name]=valueclassFoo():number=OneDigitNumericValue("number")my_foo_object=Foo()my_second_foo_object=Foo()my_foo_object.number=3print(my_foo_object.number)print(my_second_foo_object.number)my_third_foo_object=Foo()print(my_third_foo_object.number)

In this example, when you set a value to the number attribute of your object, the descriptor stores it in the __dict__ attribute of the object it’s attached to using the same name of the descriptor itself.

The only problem here is that when you instantiate the descriptor you have to specify the name as a parameter:

number=OneDigitNumericValue("number")

Wouldn’t it be better to just write number = OneDigitNumericValue()? It might, but if you’re running a version of Python less than 3.6, then you’ll need a little bit of magic here with metaclasses and decorators. If you use Python 3.6 or higher, however, then the descriptor protocol has a new method .__set_name__() that does all this magic for you, as proposed in PEP 487:

__set_name__(self,owner,name)

With this new method, whenever you instantiate a descriptor this method is called and the name parameter automatically set.

Now, try to rewrite the former example for Python 3.6 and up:

# descriptors5.pyclassOneDigitNumericValue():def__set_name__(self,owner,name):self.name=namedef__get__(self,obj,type=None)->object:returnobj.__dict__.get(self.name)or0def__set__(self,obj,value)->None:obj.__dict__[self.name]=valueclassFoo():number=OneDigitNumericValue()my_foo_object=Foo()my_second_foo_object=Foo()my_foo_object.number=3print(my_foo_object.number)print(my_second_foo_object.number)my_third_foo_object=Foo()print(my_third_foo_object.number)

Now, .__init__() has been removed and .__set_name__() has been implemented. This makes it possible to create your descriptor without specifying the name of the internal attribute that you need to use for storing the value. Your code also looks nicer and cleaner now!

Run this example one more time to make sure everything works:

$ python descriptors5.py
300

This example should run with no problems if you use Python 3.6 or higher.

Why Use Python Descriptors?

Now you know what Python descriptors are and how Python itself uses them to power some of its features, like methods and properties. You’ve also seen how to create a Python descriptor while avoiding some common pitfalls. Everything should be clear now, but you may still wonder why you should use them.

In my experience, I’ve known a lot of advanced Python developers that have never used this feature before and that have no need for it. That’s quite normal because there are not many use cases where Python descriptors are necessary. However, that doesn’t mean that Python descriptors are just an academic topic for advanced users. There are still some good use cases that can justify the price of learning how to use them.

Lazy Properties

The first and most straightforward example is lazy properties. These are properties whose initial values are not loaded until they’re accessed for the first time. Then, they load their initial value and keep that value cached for later reuse.

Consider the following example. You have a class DeepThought that contains a method meaning_of_life() that returns a value after a lot of time spent in heavy concentration:

# slow_properties.pyimportrandomimporttimeclassDeepThought:defmeaning_of_life(self):time.sleep(3)return42my_deep_thought_instance=DeepThought()print(my_deep_thought_instance.meaning_of_life())print(my_deep_thought_instance.meaning_of_life())print(my_deep_thought_instance.meaning_of_life())

If you run this code and try to access the method three times, then you get an answer every three seconds, which is the length of the sleep time inside the method.

Now, a lazy property can instead evaluate this method just once when it’s first executed. Then, it will cache the resulting value so that, if you need it again, you can get it in no time. You can achieve this with the use of Python descriptors:

# lazy_properties.pyimportrandomimporttimeclassLazyProperty:def__init__(self,function):self.function=functionself.name=function.__name__def__get__(self,obj,type=None)->object:obj.__dict__[self.name]=self.function(obj)returnobj.__dict__[self.name]classDeepThought:@LazyPropertydefmeaning_of_life(self):time.sleep(3)return42my_deep_thought_instance=DeepThought()print(my_deep_thought_instance.meaning_of_life)print(my_deep_thought_instance.meaning_of_life)print(my_deep_thought_instance.meaning_of_life)

Take your time to study this code and understand how it works. Can you see the power of Python descriptors here? In this example, when you use the @LazyProperty descriptor, you’re instantiating a descriptor and passing to it .meaning_of_life(). This descriptor stores both the method and its name as instance variables.

Since it is a non-data descriptor, when you first access the value of the meaning_of_life attribute, .__get__() is automatically called and executes .meaning_of_life() on the my_deep_thought_instance object. The resulting value is stored in the __dict__ attribute of the object itself. When you access the meaning_of_life attribute again, Python will use the lookup chain to find a value for that attribute inside the __dict__ attribute, and that value will be returned immediately.

Note that this works because, in this example, you’ve only used one method .__get__() of the descriptor protocol. You’ve also implemented a non-data descriptor. If you had implemented a data descriptor, then the trick would not have worked. Following the lookup chain, it would have had precedence over the value stored in __dict__. To test this out, run the following code:

# wrong_lazy_properties.pyimportrandomimporttimeclassLazyProperty:def__init__(self,function):self.function=functionself.name=function.__name__def__get__(self,obj,type=None)->object:obj.__dict__[self.name]=self.function(obj)returnobj.__dict__[self.name]def__set__(self,obj,value):passclassDeepThought:@LazyPropertydefmeaning_of_life(self):time.sleep(3)return42my_deep_tought_instance=DeepThought()print(my_deep_tought_instance.meaning_of_life)print(my_deep_tought_instance.meaning_of_life)print(my_deep_tought_instance.meaning_of_life)

In this example, you can see that just implementing .__set__(), even if it doesn’t do anything at all, creates a data descriptor. Now, the trick of the lazy property stops working.

D.R.Y. Code

Another typical use case for descriptors is to write reusable code and make your code D.R.Y. Python descriptors give developers a great tool to write reusable code that can be shared among different properties or even different classes.

Consider an example where you have five different properties with the same behavior. Each property can be set to a specific value only if it’s an even number. Otherwise, it’s value is set to 0:

# properties.pyclassValues:def__init__(self):self._value1=0self._value2=0self._value3=0self._value4=0self._value5=0@propertydefvalue1(self):returnself._value1@value1.setterdefvalue1(self,value):self._value1=valueifvalue%2==0else0@propertydefvalue2(self):returnself._value2@value2.setterdefvalue2(self,value):self._value2=valueifvalue%2==0else0@propertydefvalue3(self):returnself._value3@value3.setterdefvalue3(self,value):self._value3=valueifvalue%2==0else0@propertydefvalue4(self):returnself._value4@value4.setterdefvalue4(self,value):self._value4=valueifvalue%2==0else0@propertydefvalue5(self):returnself._value5@value5.setterdefvalue5(self,value):self._value5=valueifvalue%2==0else0my_values=Values()my_values.value1=1my_values.value2=4print(my_values.value1)print(my_values.value2)

As you can see, you have a lot of duplicated code here. It’s possible to use Python descriptors to share behavior among all the properties. You can create an EvenNumber descriptor and use it for all the properties like this:

# properties2.pyclassEvenNumber:def__set_name__(self,owner,name):self.name=namedef__get__(self,obj,type=None)->object:returnobj.__dict__.get(self.name)or0def__set__(self,obj,value)->None:obj.__dict__[self.name]=(valueifvalue%2==0else0)classValues:value1=EvenNumber()value2=EvenNumber()value3=EvenNumber()value4=EvenNumber()value5=EvenNumber()my_values=Values()my_values.value1=1my_values.value2=4print(my_values.value1)print(my_values.value2)

This code looks a lot better now! The duplicates are gone and the logic is now implemented in a single place so that if you need to change it, you can do so easily.

Conclusion

Now that you know how Python uses descriptors to power some of its great features, you’ll be a more conscious developer who understands why some Python features have been implemented the way they are.

You’ve learned:

What Python descriptors are and when to use them
Where descriptors are used in Python’s internals
How to implement your own descriptors

What’s more, you now know of some specific use cases where Python descriptors are particularly helpful. For example, descriptors are useful when you have a common behavior that has to be shared among a lot of properties, even ones of different classes.

If you have any questions, leave a comment down below or contact me on Twitter! If you want to dive deeper into Python descriptors, then check out the official Python Descriptor HowTo Guide.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Python Anywhere: Python 3.8 now available!

November 27, 2019, 11:49 am

≫ Next: Python Bytes: #158 There's a bounty on your open-source bugs!

≪ Previous: Real Python: Python Descriptors: An Introduction

If you signed up since 26 November, you'll have Python 3.8 available on your account -- you can use it just like any other Python version.

If you signed up before then, it's a little more complicated, because adding Python 3.8 to your account requires changing your system image. Each account has an associated system image, which determines which Python versions, Python packages, operating system packages, and so on are available. The new image is called "fishnchips" (after the previous system images, "classic", "dangermouse" and "earlgrey").

What this means is that if we change your system image, the pre-installed Python packages will all get upgraded, which means that any code you have that depends on them might stop working if it's not compatible with the new versions.

Additionally, if you're using virtualenvs, because this update upgrades the point releases of the older Python versions (for example, 3.7.0 gets upgraded to 3.7.5), the update may make your envs stop working -- if so, you'll need to rebuild them.

So, long story short -- we can switch your account over to the new system image, but you may need to rebuild your virtualenvs afterwards if you're using them -- and you may need to update your code to handle newer pre-installed Python packages if you're not using virtualenvs.

There are more details about exactly which package versions are included in which system image on the batteries included page. And if you'd like to switch your account over to fishnchips, just drop us a line using the "Send feedback" button. (If you've read all of the above, and understand that you may have to make code/virtualenv changes, mention that you have in the feedback message as otherwise we'll respond by basically repeating all of the stuff we just said, and asking "are you sure?")

↧

Python Bytes: #158 There's a bounty on your open-source bugs!

November 27, 2019, 12:00 am

≫ Next: Talk Python to Me: #240 A guided tour of the CPython source code

≪ Previous: Python Anywhere: Python 3.8 now available!

↧

Talk Python to Me: #240 A guided tour of the CPython source code

November 27, 2019, 12:00 am

≫ Next: Janusworx: #100DaysOfCode, Day 008 – The Collections Module

≪ Previous: Python Bytes: #158 There's a bounty on your open-source bugs!

You might use Python every day. But how much do you know about what happens under the covers, down at the C level? When you type something like variable = [], what are the byte-codes that accomplish this? How about the class backing the list itself?

↧

Janusworx: #100DaysOfCode, Day 008 – The Collections Module

November 27, 2019, 11:16 pm

≫ Next: Python Circle: Improve Your Python Practices: Debugging, Testing, and Maintenance

≪ Previous: Talk Python to Me: #240 A guided tour of the CPython source code

Finally feels like something is happening.
Did two hours today.

I don’t know if what I do is cheating, but I darn near print everything to see output and then iterate on the errors.

I understood how to work with csv files and process them and why ordered dictionaries can be useful.
I used that to process my csv file and read and print select fields.

Will work on sorting them somehow and figure out frequency based on ratings tomorrow.

Pleased with myself. Today was a good day!

↧

Python Circle: Improve Your Python Practices: Debugging, Testing, and Maintenance

November 27, 2019, 11:45 pm

≫ Next: Wingware Blog: Navigating Python Code with Wing Pro 7 (part 3 of 3)

≪ Previous: Janusworx: #100DaysOfCode, Day 008 – The Collections Module

improving your python skills, debugging, testing and practice, pypi

↧

Wingware Blog: Navigating Python Code with Wing Pro 7 (part 3 of 3)

November 27, 2019, 5:00 pm

≫ Next: Codementor: How I learned Python

≪ Previous: Python Circle: Improve Your Python Practices: Debugging, Testing, and Maintenance

Last week and the week before, we looked at some of the code navigation features in Wing, including goto-definition, find uses, and project-wide search, code index menus, and the Source Browser.

This week we'll finish up this mini-series by looking at how to quickly and easily find and open files or visit symbols in Python code by typing a name fragment.

Project Configuration

The features described here assume that you have used AddExistingDirectory in the Project menu to add your source code to your project. Typically the project should contain the code you are actively working on. Packages that your code uses can be left out of the project, unless you anticipate often wanting to open or search files in them. Wing will still be able to find them through the Python Path, as needed for auto-completion, code warnings, and other purposes.

Open From Project

OpenfromProject from the File menu is typically the easiest way to navigate to a file by name. This displays a dialog that lists the project files whose names match a fragment:

Fragments can be abbreviations of the file name and may match enclosing directory names if they contain / or \. The arrow keys navigate the list and pressing Enter opens the selected file.

Find Symbol

A similar interface is available to find Python code symbols by name. For the current file, this is FindSymbol in the Source menu. For all project files, use FindSymbolinProject instead:

That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

As always, please don't hesitate to email support@wingware.com if you run into problems or have any questions.

↧

Codementor: How I learned Python

November 28, 2019, 10:50 am

≫ Next: Reuven Lerner: My Black Friday sale is live! Take 50% off any course in Python or data science

≪ Previous: Wingware Blog: Navigating Python Code with Wing Pro 7 (part 3 of 3)

About me Hi, I'm Kai and I am currently between my Bachelor's and my Master's Degree in Computer Engineering / Science. I want to help people to develop their skills in python. Why I wanted to...

↧

Reuven Lerner: My Black Friday sale is live! Take 50% off any course in Python or data science

November 28, 2019, 1:00 pm

≫ Next: Codementor: teach your kids to build their own game with Python - 2

≪ Previous: Codementor: How I learned Python

As promised, the Black Friday sale has begun in my online store. Through Monday, my courses and books are all 50% off with the coupon code BF2019.

This includes all eight of the video courses:

Intro Python: Fundamentals (basic syntax and data structures)
Intro Python: Functions (*NEW* writing and using functions)
Comprehending comprehensions (using list, set, and dict comprehensions)
Object-oriented Python (classes, instances, attributes, and methods)
NumPy (using NumPy for numeric analysis)
Pandas (*NOW COMPLETE* using Pandas for data analysis)
Understanding and mastering Git
Practice Makes Regexp (50 exercises to improve your use of regular expressions)

It also includes all six cohorts of Weekly Python Exercise that will start in 2020! Pay only $50 (rather than $100) per cohort with the coupon code BF2019:

People have had very kind things to say about my courses. For example:

“The exercises are perfect for me because they are right in my “wheelhouse”. I have enough background knowledge that the context of the problems is relevant in my experience, yet I can’t just rattle off the solutions instantly. I have to puzzle over them as I try to solve them. I do usually achieve my goal of coming up with a solution that I am pleased with prior to the answer coming out on the following Monday.” — Doug (about WPE)
“I was a total python noob when I started. I just wanted to learn the syntax, how to look at problems and find the solution. You provided both. Of course I did a lot of reading too but your teaching is instrumental in drilling some concepts into our brains.” — Jean-Pierre (about WPE)
“It was an amazing course. Apart from comprehensions, you have provided lots of information about Python programming. The exercises were really challenging.” — Jonayed (about “Comprehending Comprehensions”)
“I really liked the way you went slow and explained everything in microscopic detail, acknowledging where the NumPy syntax is non-intuitive.” — David (about “NumPy”)

Again, you can take advantage of this discount? Just use the coupon code BF2019 at checkout.

But be sure to do it in the coming days — because as of Tuesday, this year’s Black Friday sale will be completely over.

The post My Black Friday sale is live! Take 50% off any course in Python or data science appeared first on Reuven Lerner.

↧

Codementor: teach your kids to build their own game with Python - 2

November 28, 2019, 1:43 pm

≫ Next: Programiz: Python CSV

≪ Previous: Reuven Lerner: My Black Friday sale is live! Take 50% off any course in Python or data science

a series of tutorials that teaches kids/beginners how to develop the famous Space Invaders game with Python.

↧

Programiz: Python CSV

November 28, 2019, 8:22 pm

≫ Next: Quansight Labs Blog: Variable Explorer improvements in Spyder 4

≪ Previous: Codementor: teach your kids to build their own game with Python - 2

In this tutorial, we will learn how to read and write into CSV files in Python with the help of examples.

↧

Quansight Labs Blog: Variable Explorer improvements in Spyder 4

November 28, 2019, 5:00 pm

≫ Next: Janusworx: #100DaysOfCode, Day 009 – The Collections Module

≪ Previous: Programiz: Python CSV

Spyder 4 will be released very soon with lots of interesting new features that you'll want to check out, reflecting years of effort by the team to improve the user experience. In this post, we will be talking about the improvements made to the Variable Explorer.

These include the brand new Object Explorer for inspecting arbitrary Python variables, full support for MultiIndex dataframes with multiple dimensions, and the ability to filter and search for variables by name and type, and much more.

It is important to mention that several of the above improvements were made possible through integrating the work of two other projects. Code from gtabview was used to implement the multi-dimensional Pandas indexes, while objbrowser was the foundation of the new Object Explorer.

Janusworx: #100DaysOfCode, Day 009 – The Collections Module

November 29, 2019, 3:43 am

≫ Next: Stack Abuse: Unit Testing in Python with Unittest

≪ Previous: Quansight Labs Blog: Variable Explorer improvements in Spyder 4

I cheated and peeked again at the solution :)
After five days, I think I needed help.
But it was still a very good day.
I learned lots.

When I started this little project, I saw videos about defaultdicts and namedtuples and then kinda forgot that they would be of some use to me in my project itself.
That realisation came yesterday.
Like they say, it happened very slowly and then all at once! I wrote up a quick workflow of how the program was supposed to work on paper.
And then I had a decisison to make.
Do I peek at the answer? or not?
In the end, I did.
I wanted confirmation of my thought process, and realised that if I was going to figure out the code itself, this would take much, much longer.
Besides, writing Python will come to me if I stick with this as I have been doing, so no guilt about copying code.

The instructors did solve the problem, exactly the way I envisioned it in my head :)
And the code, to my inexperienced fingers was tricky. (I don’t know lambdas or expressions in general and the instructor uses them liberally; a dictionary expression to populate a dict and a lambda to sort a list)
However I take small comfort in the fact, that I did, write one third of the code all by myself.
Just goes to show, how little fluency I have with the language.

But still! I am happy I got my thinking straight :)
Onwards!

↧

Stack Abuse: Unit Testing in Python with Unittest

November 29, 2019, 5:33 am

≫ Next: Reuven Lerner: Black Friday: All of my Python courses are 50% off!

≪ Previous: Janusworx: #100DaysOfCode, Day 009 – The Collections Module

Introduction

In almost all fields, products are thoroughly tested before being released to the market to ensure its quality and that it works as intended.

Medicine, cosmetic products, vehicles, phones, laptops are all tested to ensure that they uphold a certain level of quality that was promised to the consumer. Given the influence and reach of software in our daily lives, it is important that we test our software thoroughly before releasing it to our users to avoid issues coming up when it is in use.

There are various ways and methods of testing our software, and in this article we will concentrate on testing our Python programs using the Unittest framework.

Unit Testing vs Other Forms of Testing

There are various ways to test software which are majorly grouped into functional and non-functional testing.

Non-functional testing: Meant to verify and check the non-functional aspects of the software such as reliability, security, availability, and scalability. Examples of non-functional testing include load testing and stress testing.
Functional testing: Involves testing our software against the functional requirements to ensure that it delivers the functionality required. For example, we can test if our shopping platform sends emails to users after placing their orders by simulating that scenario and checking for the email.

Unit testing falls under functional testing alongside integration testing and regression testing.

Unit testing refers to a method of testing where software is broken down into different components (units) and each unit is tested functionally and in isolation from the other units or modules.

A unit here refers to the smallest part of a system that achieves a single function and is testable. The goal of unit testing is to verify that each component of a system performs as expected which in turn confirms that the entire system meets and delivers the functional requirements.

Unit testing is generally performed before integration testing since, in order to verify that parts of a system work well together, we have to first verify that they work as expected individually first. It is also generally carried out by the developers building the individual components during the development process.

Benefits of Unit Testing

Unit testing is beneficial in that it fixes bugs and issues early in the development process and eventually speeds it up.

The cost of fixing bugs identified during unit testing is also low as compared to fixing them during integration testing or while in production.

Unit tests also serve as documentation of the project by defining what each part of the system does through well written and documented tests. When refactoring a system or adding features, unit tests help guard against changes that break the existing functionality.

Unittest Framework

Inspired by the JUnit testing framework for Java, unittest is a testing framework for Python programs that comes bundled with the Python distribution since Python 2.1. It is sometimes referred to as PyUnit. The framework supports the automation and aggregation of tests and common setup and shutdown code for them.

It achieves this and more through the following concepts:

Test Fixture: Defines the preparation required to the execution of the tests and any actions that need to be done after the conclusion of a test. Fixtures can include database setup and connection, creation of temporary files or directories, and the subsequent cleanup or deletion of the files after the test has been completed.
Test Case: Refers to the individual test that checks for a specific response in a given scenario with specific inputs.
Test Suite: Represents an aggregation of test cases that are related and should be executed together.
Test Runner: Coordinates the execution of the tests and provides the results of the testing process to the user through a graphical user interface, the terminal or a report written to a file.

unittest is not the only testing framework for Python out there, others include Pytest, Robot Framework, Lettuce for BDD, and Behave Framework.

If you're interested in reading more about Test-Driven Development in Python with PyTest, we've got you covered!

Unittest Framework in Action

We are going to explore the unittest framework by building a simple calculator application and writing the tests to verify that it works as expected. We will use the Test-Driven Development process by starting with the tests then implementing the functionality to make the tests pass.

Even though it is a good practice to develop our Python application in a virtual environment, for this example it will not be mandatory since unittest ships with the Python distribution and we will not need any other external packages to build our calculator.

Our calculator will perform simple addition, subtraction, multiplication, and division operations between two integers. These requirements will guide our functional tests using the unittest framework.

We will test the four operations supported by our calculator separately and write the tests for each in a separate test suite since the tests of a particular operation are expected to be executed together. Our test suites will be housed in one file and our calculator in a separate file.

Our calculator will be a SimpleCalculator class with functions to handle the four operations expected of it. Let us begin testing by writing the tests for the addition operation in our test_simple_calculator.py:

import unittest
from simple_calculator import SimpleCalculator

class AdditionTestSuite(unittest.TestCase):
    def setUp(self):
        """ Executed before every test case """
        self.calculator = SimpleCalculator()

    def tearDown(self):
        """ Executed after every test case """
        print("\ntearDown executing after the test case. Result:")

    def test_addition_two_integers(self):
        result = self.calculator.sum(5, 6)
        self.assertEqual(result, 11)

    def test_addition_integer_string(self):
        result = self.calculator.sum(5, "6")
        self.assertEqual(result, "ERROR")

    def test_addition_negative_integers(self):
        result = self.calculator.sum(-5, -6)
        self.assertEqual(result, -11)
        self.assertNotEqual(result, 11)

# Execute all the tests when the file is executed
if __name__ == "__main__":
    unittest.main()

We start by importing the unittest module and creating a test suite(AdditionTestSuite) for the addition operation.

In it, we create a setUp() method that is called before every test case to create our SimpleCalculator object that will be used to perform the calculations.

The tearDown() method is executed after every test case and since we do not have much use for it at the moment, we will just use it to print out the results of each test.

The functions test_addition_two_integers(), test_addition_integer_string() and test_addition_negative_integers() are our test cases. The calculator is expected to add two positive or negative integers and return the sum. When presented with an integer and a string, our calculator is supposed to return an error.

The assertEqual() and assertNotEqual() are functions that are used to validate the output of our calculator. The assertEqual() function checks whether the two values provided are equal, in our case, we expect the sum of 5 and 6 to be 11, so we will compare this to the value returned by our calculator.

If the two values are equal, the test has passed. Other assertion functions offered by unittest include:

assertTrue(a): Checks whether the expression provided is true
assertGreater(a, b): Checks whether a is greater than b
assertNotIn(a, b): Checks whether a is in b
assertLessEqual(a, b): Checks whether a is less or equal to b
etc...

A list of these assertions can be found in this cheat sheet.

When we execute the test file, this is the output:

$ python3 test_simple_calulator.py

tearDown executing after the test case. Result:
E
tearDown executing after the test case. Result:
E
tearDown executing after the test case. Result:
E
======================================================================
ERROR: test_addition_integer_string (__main__.AdditionTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_simple_calulator.py", line 22, in test_addition_integer_string
    result = self.calculator.sum(5, "6")
AttributeError: 'SimpleCalculator' object has no attribute 'sum'

======================================================================
ERROR: test_addition_negative_integers (__main__.AdditionTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_simple_calulator.py", line 26, in test_addition_negative_integers
    result = self.calculator.sum(-5, -6)
AttributeError: 'SimpleCalculator' object has no attribute 'sum'

======================================================================
ERROR: test_addition_two_integers (__main__.AdditionTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_simple_calulator.py", line 18, in test_addition_two_integers
    result = self.calculator.sum(5, 6)
AttributeError: 'SimpleCalculator' object has no attribute 'sum'

----------------------------------------------------------------------
Ran 3 tests in 0.001s

FAILED (errors=3)

At the top of the output, we can see the execution of the tearDown() function through the printing of the message we specified. This is followed by the letter E and error messages arising from the execution of our tests.

There are three possible outcomes of a test, it can pass, fail, or encounter an error. The unittest framework indicates the three scenarios by using:

A full-stop (.): Indicates a passing test
The letter ‘F’: Indicates a failing test
The letter ‘E’: Indicates an error occured during the execution of the test

In our case, we are seeing the letter E, meaning that our tests encountered errors that occurred when executing our tests. We are receiving errors because we have not yet implemented the addition functionality of our calculator:

class SimpleCalculator:
    def sum(self, a, b):
        """ Function to add two integers """
        return a + b

Our calculator is now ready to add two numbers, but to be sure it will perform as expected, let us remove the tearDown() function from our tests and run our tests once again:

$ python3 test_simple_calulator.py
E..
======================================================================
ERROR: test_addition_integer_string (__main__.AdditionTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_simple_calulator.py", line 22, in test_addition_integer_string
    result = self.calculator.sum(5, "6")
  File "/Users/robley/Desktop/code/python/unittest_demo/src/simple_calculator.py", line 7, in sum
    return a + b
TypeError: unsupported operand type(s) for +: 'int' and 'str'

----------------------------------------------------------------------
Ran 3 tests in 0.002s

FAILED (errors=1)

Our errors have reduced from 3 to just once 1. The report summary on the first line E.. indicates that one test resulted in an error and could not complete execution, and the remaining two passed. To make the first test pass, we have to refactor our sum function as follows:

    def sum(self, a, b):
        if isinstance(a, int) and isinstance(b, int):
            return a + b

When we run our tests one more time:

$ python3 test_simple_calulator.py
F..
======================================================================
FAIL: test_addition_integer_string (__main__.AdditionTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_simple_calulator.py", line 23, in test_addition_integer_string
    self.assertEqual(result, "ERROR")
AssertionError: None != 'ERROR'

----------------------------------------------------------------------
Ran 3 tests in 0.001s

FAILED (failures=1)

This time, our sum function executes to completion but our test fails. This is because we did not return any value when one of the inputs is not an integer. Our assertion compares None to ERROR and since they are not equal, the test fails. To make our test pass we have to return the error in our sum() function:

def sum(self, a, b):
    if isinstance(a, int) and isinstance(b, int):
        return a + b
    else:
        return "ERROR"

And when we run our tests:

$ python3 test_simple_calulator.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK

All our tests pass now and we get 3 full-stops to indicate all our 3 tests for the addition functionality are passing. The subtraction, multiplication, and division test suites are also implemented in a similar fashion.

We can also test if an exception is raised. For instance, when a number is divided by zero, the ZeroDivisionError exception is raised. In our DivisionTestSuite, we can confirm whether the exception was raised:

class DivisionTestSuite(unittest.TestCase):
    def setUp(self):
        """ Executed before every test case """
        self.calculator = SimpleCalculator()

    def test_divide_by_zero_exception(self):
        with self.assertRaises(ZeroDivisionError):
            self.calculator.divide(10, 0)

The test_divide_by_zero_exception() will execute the divide(10, 0) function of our calculator and confirm that the exception was indeed raised. We can execute the DivisionTestSuite in isolation, as follows:

$ python3 -m unittest test_simple_calulator.DivisionTestSuite.test_divide_by_zero_exception
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

The full division functionality test suite can found in the gist linked below alongside the test suites for the multiplication and subtraction functionality.

Conclusion

In this article, we have explored the unittest framework and identified the situations where it is used when developing Python programs. The unittest framework, also known as PyUnit, comes with the Python distribution by default as opposed to other testing frameworks. In a TDD-manner, we wrote the tests for a simple calculator, executed the tests and then implemented the functionality to make the tests pass.

The unittest framework provided the functionality to create and group test cases and check the output of our calculator against the expected output to verify that it's working as expected.

The full calculator and test suites can be found here in this gist on GitHub.

↧

Reuven Lerner: Black Friday: All of my Python courses are 50% off!

November 26, 2019, 6:09 am

≫ Next: Programiz: Python CSV

≪ Previous: Stack Abuse: Unit Testing in Python with Unittest

This coming Friday is “Black Friday,” when many stores offer big discounts on their products. I’m happy to say that from Friday through Monday, every course in my online store will be 50% off.

This includes all eight of the video courses in my online store:

Intro Python: Fundamentals (basic syntax and data structures)
Intro Python: Functions (writing and using functions)
Comprehending comprehensions (using list, set, and dict comprehensions)
Object-oriented Python (classes, instances, attributes, and methods)
NumPy (using NumPy for numeric analysis)
Pandas (using Pandas for data analysis)
Understanding and mastering Git
Practice Makes Regexp (50 exercises to improve your use of regular expressions)

There’s a new course in there — my brand-new “Intro Python: Functions” course tells you everything you need to understand writing and using Python functions. It’s aimed at people with programming experience but without a lot of experience with Python.

Oh, and you might also have noticed that my Pandas course is now complete, weighing in at 12.5 hours of videos (!), along with a large number of exercises.

But wait, there’s more: In 2020, I’ll be offering all six versions of Weekly Python Exercise (3 for beginners, and 3 for more experienced developers). If you buy them during this sale, you’ll save 50%. The cohorts might not be starting for several months, but you can lock in this price, and then begin the course along with the other students when it begins.

I’ll have more information about my Black Friday sale later this week. And I hope that this Black Friday will be an additional milestone as you improve your Python fluency.

Questions? Comments? Thoughts? Contact me at reuven@lerner.co.il, or on Twitter as @reuvenmlerner .

The post Black Friday: All of my Python courses are 50% off! appeared first on Reuven Lerner.

↧