Quantcast
Channel: Planet Python
Viewing all 22645 articles
Browse latest View live

Damián Avila: RISE 5.4.1 is out!

$
0
0

We're pleased to announce the release of RISE 5.4.1!

RISE lets you show your Jupyter notebook rendered as an executableReveal.js-based slideshow. It is your very same notebook but in a slidy way!

Read more… (1 min remaining to read)


Weekly Python StackOverflow Report: (cxl) stackoverflow python report

$
0
0

Robert Collins: Monads and Python

$
0
0

When I wrote this I was going to lead in by saying: I’ve been spending a chunk of time recently thinking about how best to represent Monads in Python. Then I forgot I had this draft for 3 years. So.. I *did* spend a chunk of time. Perhaps it will be of interest anyway… though I had not finished it (otherwise it wouldn’t still be draft would it :))

Why would I do this? Because there are some nifty things you get with them: you get some very mature patterns for dealing with error (Either, Maybe), with nondeterminism (List), with DSLs (Free).

Why wouldn’t you do this? Because you get some baggage. There are two bits in particular. firstly, Monads solve a problem Python doesn’t have. Consider:

x = read_file('fred')
y = delete_file('fred')

In Haskell, the compiler is free to run those functions in either order as there is no data dependency between them. In Python, it is not – the order is specified directly by the code. Haskell requires a data dependency to force ordering (and in fact RealWorld in order to distinguish different invocations of IO). So to define a sequence here it defines a new operator (really just an infix function) called bind (>>= in haskell). You then create a function to run after the monad does whatever it needs to do. Whenever you see code like this in Haskell:

do x <- action1
     y <- action2
     return x + y

The compiler is creating functions for you using lambdas. And while I could dive into that, its entirely irrelevant in Python: its solving a non-problem. But the context is important, so I’m going to show the general shape of it.

action1 >>=
  \x action2 >>=
     \y return x+y

A direct transliteration into Python is possible a few ways. One of the key things though is to preserve the polymorphism – bind is dependent on the monad instance in use, and the original code is valid under many instances.

def action1(m): return m.unit(1)
def action2(m): return m.unit(2)
m = MonadInstance()
action1(m).bind(
    lambda m, x: action2(m).bind(
        lambda m, y: m.unit(x+y)))

In this style functions in a Monad would take a monad instance as a parameter and use that to access the type. Note in particular that the behavior of bind is involved at every step here.

I’ve recently been diving down into Effect as part of preparing my talk for Kiwi PyCon. Effect was described to me as modelling the Free monad, and I wrote my talk on that basis – only to realise, in doing so, that it doesn’t. The Free monad models a domain specific language – it lets you write interpreters for such a language, and thanks to the lazy nature of Haskell, you essentially end up iterating over a (potentially) infinitely recursive structure until the program ends – the Free bind method steps forward once. This feels very similar to Effect in some ways. Its also used (in some cases) for similar reasons: to let more code be pure and thus reliably testable.

But writing an interpreter for Effect is very different to writing one for Free. Compare these blogposts with the howto for Effect. In the Free Monad the interpreter can hand off to different interpreters at any point. In Effect, a single performer is given just a single Intent, and Intents just return plain values. Its up to the code that processes values and returns new Effect’s to perform flow control.

That said, they are very similar in feel: it feels like one is working with data, not code. Except, in Haskell, its possible to use do notation to write code in the Free monad in imperative style… but Effect provides no equivalent facility.

This confused me, so I reached out to Chris and we had a really fascinating chat about it. He pointed me at another way that Haskellers separate out IO for testing. That approach is to create a class specifically for the IO in your code and have two implementations. One production one and one test implementation. In Python:

class Impure:
    def readline(self):
        raise NotImplementedError(self.readline)
...
class Production:
    def readline(self):
        return sys.stdin.readline()
...
class Test:
    def __init__(self, inputs):
        self.inputs = inputs
    def readline(self):
        return self.inputs.pop(0)
...

Then you write code using that directly.

def echo(impl):
    impl.writeline(impl.readline())

This seems to be a much more direct way to achieve the goal of being able to write pure testable code. And it got me thinking about the actual basic premise of porting monads to Python.

The goal is to be able to write Pythonic, pithy code that takes advantage of the behaviour in the bind for that monad. Lets consider Maybe.

class Something:
def __init__(self, thing):
self.thing = thing
@classmethod
def unit(klass, thing):
return Something(thing)
def bind(self, l):
return l(self, self.thing)
def __str__(self):
return str(self.thing)
def action1(m): return m.unit(1)
def action2(m): return m.unit(2)
m = Something
r = action1(m).bind(
lambda m, x: action2(m).bind(
lambda m, y: m.unit(x+y)))
print("%s" % r)
# 3

Trivial so far, though having to wrap the output types in our functions is a bit ick. Lets add in None to our example.

class Nothing:
def bind(self, l):
return self
def __str__(self):
return "Nothing"
def action1(m): return Nothing()
def action2(m): return m.unit(2)
m = Something
r = action1(m).bind(
lambda m, x: action2(m).bind(
lambda m, y: m.unit(x+y)))
print("%s" % r)
# Nothing

The programmable semicolon aspect of monads comes in from the bind method – between each bit of code we write, Something chooses to call forward, and Nothing bypasses our code entirely.

But we can’t use that unless we start writing our normally straight forward code such that every statement becomes a closure – which we don’t want.. so we want to interfere with the normal process by which Python chooses to run new code.

There is a mechanism that Python gives us where we get control over that: generators. While they are often used for concurrency, they can also be used for flow control.

Representing monads as generators has been done here, here, and don’t forget other languages like Scala.

The problem is, that its still not regular Python code, and its still somewhat mental gymnastics. Natural for someone thats used to thinking in those patterns, and it works beautiful in Haskell, or Rust, or other languages.

There are two fundamental underpinnings behind this for Haskell; type control from context rather than as part of the call signature and do notation which makes code using it look like Python.  In python we are losing the notation, but gaining the bind operator on the Maybe monad which short circuits Nothing to Nothing across an arbitrary depth of of computation.

What else short circuits across an arbitrary depth of computation?

Exceptions.

This won’t give the full generality of Monads (for instance, a Monad that short circuits up to 50 steps but no more is possible) – but its possibly

Python basically is do notation, and if we just had some way of separating out the side effects from the pure code, we’d have pure code. And we have that from above.

So there you have it, a three year old mull: perhaps we shouldn’t port Monads to Python at all, and instead just:

  • Write pure code
  • Use a strategy object to represent impure activity
  • Use exceptions to handle short circuiting of code

I think there is room if we wanted to to do a really nice, syntax integrated Monad style facility in Python (and Maybe would be a great reference case for it), but generator overloading – possibly async might let a nicer thing be done but I haven’t investigated that yet.

Advertisements

REPL|REBL: Dictionaries — An almost complete guide to Python's key:value store

$
0
0

Dictionaries are key-value stores, meaning they store, and allow retrieval of data (or values) through a unique key. This is analogous with a real dictionary where you look up definitions (data) using a given key— the word. Unlike a language dictionary however, keys in Python dictionaries are not alphabetically sorted.

From Python 3.6 onwards dictionaries are ordered in that elements are stored and retrieved in the order in which they are added. This usually only has consequences for iterating (see later).

Anything which can be stored in a Python variable can be stored in a dictionary value. That includes mutable types including list and even dict— meaning you can nest dictionaries inside on another. In contrast keys must be hashable1 and immutable — the object hash must not change once calculated. This means list or dict objects cannot be used for dictionary keys, however a tuple is fine.

Creating

Dictionaries can be defined using both literal or constructor syntax. Literal syntax is a bit cleaner, but there are situations where dict() is useful.

d={}# An empty dictionary, using literal syntaxd=dict()# An empty dictionary, using object syntax

You can add initial items to a dictionary by passing the key-value pairs at creation time. The following two syntaxes are equivalent, and will produce an identical dictionary.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>d{'key1':'value1','key2':'value2','key3':3}>>>d=dict(key1='value1',key2='value2',key3=3)>>>d{'key1':'value1','key2':'value2','key3':3}

However, note that keys in the dict syntax are limited to valid keyword parameter names only — for example, you cannot use anything which would not be a valid variable name (including numbers, number-initial alphanumeric names or punctuation).

>>>dict(1='hello')SyntaxError:invalidsyntax>>>dict(1a='hello')SyntaxError:invalidsyntax

As always in Python, keyword parameters are interpreted as string names, ignoring any variables defined with the same name.

>>>a=12345>>>{a:'test'}{12345:'test'}>>>dict(a='test'){'a':'test'}

For this reason dict() is only really useful where you have very restricted key names. This is often the case, but you can avoid these annoyances completely by sticking with the literal {} syntax.

Adding

You can add items to a dictionary by assigning a value to a key, using the square bracket [] syntax.

>>>d={}>>>d['this']='that'>>>d{'this':'that'}

Assigning to keys which already exist will replace the existing value for that key.

>>>d={}>>>d['this']='that'>>>d['this']='the other'>>>d{'this':'the other'}

Retrieving

Values for a given key can be retrieved by key, using the square bracket [] syntax.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>d['key1']'value1'

Retrieving an item does not remove it from the dictionary.

>>>d{'key1':'value1','key2':'value2','key3':3}

The value returned is the same object stored in the dictionary, not a copy. This is important to bear in mind when using mutable objects such as lists as values.

>>>d={'key1':[1,2,3,4]}>>>l=d['key1']>>>l[1,2,3,4]>>>l.pop()4>>>dd={'key1':[1,2,3]}

Notice that changes made to the returned list continue to be reflected in the dictionary. The retrieved list and the value in the dictionary are the same object.

Removing

To remove an item from a dictionary you can use del using square bracket syntax with the key to access the element.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>deld['key1]>>>d{'key2':'value2','key3':3}

You can also remove items from a dictionary by using .pop(<key>). This removes the given key from the dictionary, and returns the value.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>d.pop('key1)'value1'>>>d{'key2':'value2','key3':3}

Counting

The number of elements in a dictionary can be found by using len().

>>>d={'key1':'value1','key2':'value2','key3':3}>>>len(d)3

The length of a dictionaries .keys(), .values() and .items() are always equal.

View objects

The keys, values and items from a dictionary can be accessed using the .keys(), .values() and .items() methods. These methods return view objects which provide a view on the source dictionary.

There are separate view objects for each of keys, values and itemsdict_keys, dict_values and dict_items respectively.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>d.keys()dict_keys(['key1','key2','key3'])>>>d.values()dict_values(['value1','value2',3])

dict_items provides a view over tuples of (key, value) pairs.

>>>d.items()dict_items([('key1','value1'),('key2','value2'),('key3',3)])

These view objects are all iterable. They are also dynamic— changes to the original dictionary continue to be reflected in the view after it is created.

>>>k=d.keys()>>>kdict_keys(['key1','key2','key3'])>>>d['key4']='value4'>>>kdict_keys(['key1','key2','key3','key4'])

This is different to Python 2.7, where .keys(), .values() and .items() returned a static list.

Membership

To determine if a given key is present in a dictionary, you can use the in keyword. This will return True if the give key is found, False if it is not.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>'key2'indTrue>>>'key5'indFalse

You can also check whether a given value or key-value pair is in a dictionary by using the .values() and .items() views.

>>>'value1'ind.values()True>>>'value5'ind.values()False>>>('key1','value1')ind.items()True>>>('key3','value5')ind.items()False

These lookups are less efficient that key-based lookups on dictionaries, and needing to lookup values or items is often an indication that a dict is not a good store for your data.

Lists from dictionaries

To get a list of a dictionary's keys, values or items of a dictionary to lists, we can take the dict_keys, dict_values or dict_items view objects and pass them to list().

>>>d={'key1':'value1','key2':'value2','key3':3}>>>list(d.keys())['key1','key2','key3']>>>list(d.values())['value1','value2',3]>>>list(d.items())[('key1','value1'),('key2','value2'),('key3',3)]

Converting the view objects to lists breaks the link to the original dictionary, so further updates to the dictionary will not be reflected in the list.

Dictionaries from lists

Similarly lists can be used to generate dictionaries. The simplest approach is using a list of 2-tuple where the first element in the tuple is used for the key and the second for the value.

>>>l=[('key1','value1'),('key2','value2'),('key3',3)]>>>d=dict(l)# Pass the list as to the dict constructor>>>d{'key1':'value1','key2':'value2','key3':3}

You can pass in other iterators, not just lists. The only restriction is that the iterator needs to return 2 items per iteration.

If you have your key and value elements in seperate lists, you can use zip to combine them together into tuples before creating the dictionary.

>>>keys=['key1','key2','key3']>>>vals=['value1','value2',3]>>>l=zip(keys,vals)>>>l<zipobject>>>>dict(l){'key1':'value1','key2':'value2','key3':3}

If key and value lists are not of the same length, the behaviour of zip is to silently drop any extra items from the longer list.

>>>keys=['key1','key2','oops']>>>vals=['value1','value2']>>>dict(zip(keys,vals)){'key1':'value1','key2':'value2'}

Iterating

By default iterating over a dictionary iterates over the keys.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>forkind:...print(k)key1key2key3

This is functionally equivalent to iterating over the .keys() view.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>forkind.keys():...print(k)key1key2key3

The dictionary is unaffected by iterating over it, and you can use the key within your loop to access the value from the dictionary.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>forkind:...print(k,d[k])# Access value by key.key1value1key2value2key33

If you want access to dictionary values within your loop, you can iterate over items to have them returned in the for loop. The keys vand values are returned as a 2-tuple.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>forkvind.items():...print(kv)('key1','value1')('key2','value2')('key3',3)

You can unpack the key and value to seperate variables in the loop, making them available without indexing. This is the most common loop structure used with dictionaries.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>fork,vind.items():...print(k,v)key1value1key2value2key33

If you are only interested in the dictionary values you can also iterate over these directly.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>forvind.values():...print(v)value1value23

If you want to count as you iterate you can use enumerate as with any iterator, but you must nest the unpacking.

>>>d={'key1':'value1','key2':'value2','key3':3}>>>forn,(k,v)inenumerate(d.items()):...print(n,k,v)0key1value11key2value22key33

Dictionary comprehensions

Dictionary comprehensions are shorthand iterations which can be used to construct dictionaries, while filtering or altering keys or values.

Iterating over a list of (key, value) tuples and assigning to keys and values will create a new dictionary.

>>>l=[('key1','value1'),('key2','value2'),('key3',3)]>>>{k:vfork,vinl}{'key1':'value1','key2':'value2','key3':3}

You can filter elements by using a trailing if clause. If this expression evaluates to False the element will be skipped (if it evaluates True it will be added).

>>>l=[('key1','value1'),('key2','value2'),('key3',3)]>>>{k:vfork,vinlifisinstance(v,str)}# Only add strings.{'key1':'value1','key2':'value2'}

Any valid expression can be used for the comparison, as long as it returns thruthy or falsey values.

>>>l=[('key1','value1'),('key2','value2'),('key3',3)]>>>{k:vfork,vinlifv!='value1'}{'key2':'value2','key3':3}

Comparisons can be performed against keys, values, or both.

>>>l=[('key1','value1'),('key2','value2'),('key3',3)]>>>{k:vfork,vinlifv!='value1'andk!='key3'}{'key2':'value2'}

Since empty string evaluates as False in Python testing the value alone can be used to strip empty string values from a dictionary.

>>>d={'key1':'value1','key2':'value2','key3':'','another-empty':''}>>>{k:vfork,vind.items()ifv}{'key1':'value1','key2':'value2'}

Separate lists of keys and values can be zipped, and filtered using a dictionary comprehension.

>>k=['key1','key2','key3']>>v=['value1','value2',3]>>>{k:vfork,vinzip(k,v)ifk!='key1'}{'key2':'value2','key3':3}

Expressions can also be used in the k:v construct to alter keys or values that are generated for the dictionary.

>>>l=[('key1',1),('key2',2),('key3',3)]>>>{k:v**2fork,vinl}{'key1':1,'key2':4,'key3':9}

Any expressions are valid, for both keys and values, including calling functions.

>>>l=[('key1',1),('key2',2),('key3',3)]>>>defcube(v):...returnv**3>>>defreverse(k):...returnk[::-1]>>>{reverse(k):cube(v)fork,vinl}{'1yek':1,'2yek':8,'3yek':27}

You can use a ternary if-else in the k:v to selectively replace keys. In the following example values are replaced if they don't match 'value1'.

>>>l=[('key1','value1'),('key2','value2'),('key3',3)]>>>{k:vifv=='value1'elseNonefork,vinl}{'key1':'value1','key2':None,'key3':None}

You can also use ternary syntax to process keys. Any expressions are valid here, in the follow example we replace missing keys with the current iteration number (1-indexed).

>>>l=[(None,'value1'),(None,'value2'),('key3',3)]>>>{kifkelsen:vforn,(k,v)inenumerate(l,1)}{1:'value1',2:'value2','key3':3}

If your expressions generate duplicate keys, the later value will take precedence for that key.

>>>l=[(None,'value1'),(None,'value2'),('key3',3)]>>>{kifkelse0:vforn,(k,v)inenumerate(l)}{0:'value2','key3':3}# 0:value1 has been overwritten by 0:value1

You can use nested loops within dictionary comprehensions although you often won't want to since it can get pretty confusing. One useful application of this however is for flattening nested dictionaries. The follow example unnestes 2-deep dictionaries, discarding the outer keys.

>>>d={'a':{'naa':1,'nab':2,'nac':3},'b':{'nba':4,'nbb':5,'nbc':6}}>>>{k:vfordiind.values()fork,vindi.items()}{'naa':1,'nab':2,'nac':3,'nba':4,'nbb':5,'nbc':6}

The left hand loops it the outer loop, which iterates the d dictionary producing the values in di. The inner loop on the right iterates this dictionary keys and values as k and v, which are used to construct the new dictionary on the far left k:v.

Merging

There are a number of ways to merge dictionaries. The major difference between the approaches is in how (or whether) they handle duplicate keys.

Update

Each dictionary object has an .update() method, which can be used to add a set of keys and values to an existing dictionary, using another dictionary as the source.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key4':'value4','key5':'value5'}>>>d1.update(d2)>>>d1{'key1':'value1','key2':'value2','key3':3,'key4':'value4','key5':'value5'}

This updates the original dictionary, and does not return a copy.

If there are duplicate keys in the dictionary being updated from, the values from that dictionary will replace those in the dictionary being updated.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':'value5'}>>>d1.update(d2)>>>d1{'key1':'value1','key2':'value2','key3':'value3-new','key5':'value5'}

If you do not want to replace already existing keys, you can use a dictionary comprehension to pre-filter.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':'value5'}>>>d1.update({k:vfork,vind2.items()ifknotind1})>>>d1{'key1':'value1','key2':'value2','key3':3,'key5':'value5'}

Unpacking

Dictionaries can be unpacked to key=value keyword pairs, which is used to pass parameters to functions or constructors. This can be used to combine multiple dictionaries by unpacking them consecutively.

This requires Python 3.6 and above.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key4':'value4','key5':'value5'}>>>d={**d1,**d2}>>>d{'key1':'value1','key2':'value2','key3':3,'key4':'value4','key5':'value5'}

Unpacking using this syntax handles duplicate keys, with the later dictionary taking precedence of the earlier.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':'value5'}>>>d={**d1,**d2}>>>d{'key1':'value1','key2':'value2','key3':'value3-new','key5':'value5'}

You can use this same syntax to merge multiple dictionaries together.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':'value5'}>>>d3={'key4':'value4','key6':'value6'}>>>d={**d1,**d2,**d3}>>>d{'key1':'value1','key2':'value2','key3':'value3-new','key5':'value5','key4':'value4','key6':'value6'}

You can also unpack to a dict()

>>>dict(**d1,**d3){'key1':'value1','key2':'value2','key3':3,'key4':'value4','key6':'value6'}>>>dict(**d1,**d2)TypeError:typeobjectgotmultiplevaluesforkeywordargument'key3'

However, in this case duplicate keys are not supported, and you are limited by the keyword naming restrictions described earlier.

>>>dict(**d1,**d2)TypeError:typeobjectgotmultiplevaluesforkeywordargument'key3'>>>dict(**{3:'value3'})TypeError:keywordargumentsmustbestrings

There is no such restriction for {} unpacking.

>>>{**{3:'value3'}}{3:'value3'}

Addition (Python 2.7 only)

In Python 2.7 dict.items() returns a list of (key, value) tuples. Lists can be concatenated using the + operator, and the resulting list can be converted back to a new dictionary by passing to the dict constructor.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':'value5'}>>>l=dict(d1.items()+d2.items())>>>l[('key3',3),('key2','value2'),('key1','value1'),('key3','value3-new'),('key5','value5')]>>>dict(l){'key3':'value3-new','key2':'value2','key1':'value1','key5':'value5'}

You can add together multiple dictionaries using this method. The later dictionary keys take precedence over the former.

Union (set merge)

If both the keys and values of a dictionary are hashable, the dict_items view supports set-like operations.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':'value5'}>>>d3={'key4':'value4','key6':'value6'}>>>dict(d1.items()|d2.items()|d3.items()){'key4':'value4','key5':'value5','key2':'value2','key6':'value6','key3':3,'key1':'value1'}

The merging occurs right-to left.

If the values are not hashable this will raise a TypeError.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2={'key3':'value3-new','key5':[]}# list is unhashable>>>d1.items()|d2.items()TypeError:unhashabletype:'list'

All standard set operations are possible on dict_keys and dict_items.

Copying

To make a copy of an existing dictionary you can use .copy(). This results in an identical dictionary which is a distinct object.

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2=d1.copy()>>>d2{'key1':'value1','key2':'value2','key3':3}>>>id(d1)==id(d2)False

You can also make a copy of a dictionary by passing an existing dictionary to the dict constructor. This is functionally equivalent to .copy().

>>>d1={'key1':'value1','key2':'value2','key3':3}>>>d2=dict(d1)>>>d2{'key1':'value1','key2':'value2','key3':3}>>>id(d1)==id(d2)False

In both cases these are shallow copies meaning nested objects within the dictionary are not also copied. Changes to this nested objects will also be reflected in the original dictionary.

>>>d1={'key1':'value1','key2':'value2','key3':{'nested':'dictionary'}}>>>d2=d1.copy()>>>d2{'key1':'value1','key2':'value2','key3':{'nested':'dictionary'}}>>>id(d1)==id(d2)False>>>id(d1['key3'])==id(d2['key3'])True>>>d2['key3']['nested']='I changed in d1'>>>d1{'key1':'value1','key2':'value2','key3':{'nested':'I changed in d1'}}

If you want nested objects to also be copied, you need to create a deepcopy of your dictionary.

>>>d1={'key1':'value1','key2':'value2','key3':{'nested':'dictionary'}}>>>fromcopyimportdeepcopy>>>d2=deepcopy(d1)>>>d2{'key1':'value1','key2':'value2','key3':{'nested':'dictionary'}}>>>id(d1)==id(d2)False>>>id(d1['key3'])==id(d2['key3'])False>>>d2['key3']['nested']=['I did not change in d1']>>>d1{'key1':'value1','key2':'value2','key3':{'nested':'dictionary'}}

Since a deepcopy copies all nested objects it is slower and uses more memory. Only use it when it's actually neccessary.


  1. A hash is a reproducible, compact, representation of an original value. Reproducible means that hashing the same input will always produce the same output. This is essential for dictionary keys where hashes are used to store and look up values: if the hash changed each time we hashed the key, we'd never find anything! 

BreadcrumbsCollector: Is your test suite wasting your time?

$
0
0

This article has been originally included in a PyconPL 2018 conference book.

Abstract

Nowadays there is no need for convincing anyone about the enormous advantages of writing automated tests for their code. Many developers had an occasion to feel total confidence in introducing changes to their codebases under the protection of vast test suites. The practice of writing tests has been widely adopted in the industry [4], including Python world.

Pythonistas have at their disposal the best programming language, empowering tools and tons of articles about writing tests. What can go wrong?

Price of automated testing

Apparently, there is no such thing as a free lunch. It turns out that apart from the effort needed to write tests and update them to keep up with changes in the codebase, there is yet another cost – the time of execution. This article is to explain how essential it is to strive for the shortest execution time possible and how to achieve that.

Without tests one is stuck with manual testing. Paraphrasing article [1], cost of manual verification can be expressed using formula:

testing_time_without_tests = n * (setup_time + testing_time)

…where n is a number of repetitions. Imagine going through such a cycle during development. If a feature that a developer is working on is relatively complex, then the expected number of repetitions will be high. Manual tests are slow by their nature. Therefore it takes a long time to get feedback whether the latest change broke something or moved a developer forward.

The situation is completely different when one has a comprehensive test suite at their disposal:

testing_time_with_tests = n * tests_execution_time

n surely will not be smaller than in the previous case. More important factor is tests_execution_time. If one could minimise it to a value close to zero then n would become irrelevant in this formula! That is exactly what we want, considering practices like Test Driven Development[5] which make a developer run tests every few seconds. It has been proved [6] that writing tests after the code is a defect-injection process. Writing code under the protection of an evolving test suite leads to far fewer mistakes and errors. Pythonista can leverage the TDD cycle provided they use it strictly for unit tests. However, the situation gets complicated when one is to deal with web applications.

There is a concept known as Test Pyramid[2]. In its original form, it assumes that unit tests should be a majority of a test suite, sometimes even 90%. Other kinds of tests, like integration and acceptance, should be a minority of the test suite. The proportion comes from the fact that unit tests are fastest to write and execute. Writing a good integration or acceptance test requires far more effort.

What prevents Pythonistas from relying on unit tests?

A theory is one thing, real-life experience often looks differently. Without a clear testing strategy, projects usually end up with something resembling Ice-Cream Cone[3] where the majority of testing is done manually, there are some end-to-end tests and literally a few unit tests.

For example, when we write a web application using Django it might be tempting to test everything using django.test.Client via views. Isn’t that… convenient?

@pytest.mark.usefixtures('transactional_db')
def test_should_win_auction_when_no_one_else_is_bidding(
        authenticated_client: Client, auction_without_bids: Auction
) -> None:
    url = reverse('make_a_bid', args=[auction_without_bids.pk])
    expected_current_price = auction_without_bids.current_price * 10
    data = json.dumps({
        'amount': str(expected_current_price)
    })
    response = authenticated_client.post(url, data, content_type='application/json')

    assert_wins_with_current_price(response, expected_current_price)


@pytest.mark.usefixtures('transactional_db')
def test_should_not_be_winning_if_bid_lower_than_current_price(
        authenticated_client: Client, auction_without_bids: Auction
) -> None:
    url = reverse('make_a_bid', args=[auction_without_bids.pk])
    bid_price = auction_without_bids.current_price - Decimal('1.00')
    data = json.dumps({
        'amount': str(bid_price)
    })
    response = authenticated_client.post(url, data, content_type='application/json')

    assert_loses_with_current_price(response, auction_without_bids.current_price)

It is, indeed. Additionally, it is horribly slow.

Where we lose most time?

There are two tests that check two basic scenarios:

  1. A user makes a bid on an auction without prior bids and becomes a winner
  2. A user makes a bid lower than previous winning one and therefore loses an auction

Given that our view looks as follows:

@csrf_exempt
@login_required
def make_a_bid(request: HttpRequest, auction_id: int) -> HttpResponse:
    data = json.loads(request.body)
    input_dto = placing_bid.PlacingBidInputDto(
        user_id=request.user.id,
        auction_id=auction_id,
        amount=Decimal(data['amount'])
    )
    presenter = PlacingBidPresenter()
    uc = placing_bid.PlacingBidUseCase(presenter)
    uc.execute(input_dto)

    data = presenter.get_presented_data()
    if data['is_winner']:
        return HttpResponse(f'Congratulations! You are a winner! :) Current price is {data["current_price"]}')
    else:
        return HttpResponse(f'Unfortunately, you are not winning. :( Current price is {data["current_price"]}')

Let’s carefully profile[7] one of these two tests and see where the time is spent:

pytest --profile tests/auctions/views/test_make_a_bid.py::test_should_not_be_winning_if_bid_lower_than_current_price

ncallstottimepercallcumtimepercallfilename:lineno(function)
13e-063e-060.010780.01078runner.py:106(pytest_runtest_call)
11.2e-051.2e-050.0038920.003892base.py:27(reverse)
14e-064e-060.0067120.006712client.py:522(post)
11.4e-051.4e-050.0028850.002885placing_bid.py:43(execute)

We see that this particular test took in total 0.01078 s. Our business logic took 0.002885 s, which amounts to ~ 26 % of execution time, time of saving objects to database included. Framework code execution took 0.003892 s (url reversal) + 0.006712 s (test client) – 0.002885 s (our logic) = 0.007719 s, which is ~ 72% of the time!

The conclusion is obvious. Overwhelming majority of the time is spent on testing the framework, not our code!

Of course, this code is relatively simple and pytest-profiling[7] does not take into account time spent on executing fixtures (which would be sloooowly inserting objects to a database). In a real-life case, numbers would be even higher.

Let’s break this down. Firstly, fixtures are run to insert the required models into the database. Tick, tock. Then Django’s test client spins up and calls framework’s machinery to serve the request. Tick, tock. Finally, control reaches our view that calls our logic. It starts with loading desired models from the database. Tick, tock. Then, the logic we wanted to actually test is run. We end by saving altered models back to the database. Tick, tock. Everything is green, the test passed!

How would numbers look like if we used unit tests instead?

def test_should_not_be_winning_if_bid_lower_than_current_price() -> None:
    auction = create_auction(bids=[
        Bid(id=1, bidder_id=1, amount=Decimal('10.00'))
    ])

    lower_bid = Bid(id=None, bidder_id=2, amount=Decimal('5.00'))
    auction.make_a_bid(lower_bid)

    assert lower_bid.bidder_id not in auction.winners

pytest --profile tests/auctions/domain/entities/test_auction.py::test_should_not_be_winning_if_bid_lower_than_current_price

ncallstottimepercallcumtimepercallfilename:lineno(function)
15e-065e-060.0002360.000236runner.py:106(pytest_runtest_call)
11.3e-051.3e-053.6e-053.6e-05test_auction.py:85(test_should_not_be_winning_if_bid_lower_than_current_price)
19e-069e-061.1e-051.1e-05auction.py:15(make_a_bid)

The whole test run that checks the same logic took 0.000236 s (over 12 times faster!) and method we tested took only 5% of total execution time. The rest was consumed by pytest.

That’s a substantial time-saving. The situation escalates quickly when we are to check more scenarios with different edge cases. The total duration of running pytest command for 100 unit tests took 0.53 seconds while running 100 view tests with django.test.Client would cost a developer 25.79 seconds of their lives! Such a long time is enough for everyone to lose focus and start wondering what new posts have appeared on their Facebook boards. For the TDD cycle to work, feedback from test must not come after the time longer than 1 second!

How to leverage unit tests?

Official Django docs do not encourage development with a massive number of unit tests. It became obvious that it is hard and unnatural to use TDD in a framework where everything is tightly coupled with an ORM and as a consequence, with a database.

To be able to truly leverage the power of automated testing a very different approach is needed. One of them is called the Clean Architecture, where testability is one of the biggest advantages.

Bibliography

  1. Maciej Polańczyk The Measurable Benefits of Unit Testinghttps://stxnext.com/blog/2017/11/02/measurable-benefits-unit-testing/
  2. Ham Vocke Practical Test Pyramidhttps://martinfowler.com/articles/practical-test-pyramid.html
  3. Alister Scott Testing Pyramids & Ice-Cream Coneshttps://watirmelon.blog/testing-pyramids/
  4. Sebastian Buczyński – How Can Your Software Benefit From Automated Testing?https://stxnext.com/blog/2017/08/09/how-can-your-software-benefit-automated-testing/
  5. Kent Beck – Test Driven Development: By Example
  6. Mary Poppendieck, Tom Poppendieck – Leading Lean Software Development: Results are not the point
  7. pytest-profiling https://pypi.org/project/pytest-profiling/
  8. snakeviz – https://jiffyclub.github.io/snakeviz/

Daniel Roy Greenfeld: Stop Using Executable Code Outside of Version Control

$
0
0

There's an anti-pattern in the development world, and it's for using executable code as a means to store configuration values. In the Python universe, you sometimes see things like this in settings modules:

# Warning: This is an anti-pattern!try:from.local_settingsimport*exceptImportError:pass

What people do is have a local_settings.py file that has been identified in a .gitignore file. Therefore, for local development you have your project running through an executable code file outside of version control.

If this sounds uncomfortable to you, then you are on the right track. Executable code always needs to be in version control.

A better approach is to place secrets and keys into environment variables. If you don't like that, or can't use it due to your environment, stick those values into JSON, YAML, or TOML files.

So what can happen if you allow the local_settings anti-pattern into your project?

The local_settings anti-pattern

The local_settings anti-pattern means that you can have executable code in production that usually can't be viewed by developers trying to debug problems. If you've ever experience it, this is one of the worst production debugging nightmares.

It worked fine on my laptop!

What works locally and tests successfully can throw subtle bugs that won't be discovered until it's too late. Here's a real-world example of what can happen that I helped resolve for a client last year:

  1. Project had been using a third-party package for slugification for years. Configuration done in settings.
  2. Developer decided to write their own slugification project. Worked great locally, so they made changes across the site to account for the new behavior.
  3. Tests did not account for edge cases in the new slugification library.
  4. Appeared to work in local development, staging, and even production.
  5. A few days later customers in certain regions of the world started to complain about records being unreachable.
  6. No one can figure out why production is behaving differently.

I was brought in (and I billed them). First thing I check is for this sad code snippet in their settings modules

# Warning: This is an anti-pattern!try:from.localimport*exceptImportError:pass

They had executable code outside of version control. What worked for the developer, didn't work the same everywhere else. Enough that it caused subtle bugs that weren't caught by humans or formal tests. Subtle developer bugs grew into serious bugs when encountered by real users.

And what was really bad is that these serious bugs were impossible to debug at first because the deployed code didn't match what was in someone's local_settings.py file.

But I won't make these mistakes!

People often say indignantly, "I'm not stupid like you, I don't make this kind of mistake."

Yet about once a year for the past 20 years I resolve or help resolve an issue stemming from executable code that wasn't tracked in version control.

I believe that all of us coders, no matter how talented and experienced, can and will make stupid mistakes. That's why good engineers/coders follow best practices - to help catch ourselves when we do something stupid. If you believe you can personally avoid making stupid mistakes in programming, I've got a bridge in New York City I can sell you.

With all of this in mind, why not do the smart thing and put all executable code in version control? You can put your secrets and keys in environment variables or configuration files. Done! Argument over!

How to handle location specific variables

Use either either environment variables or config files. Really. And don't take my word for it, look at all the deployment tools and hosting services that recommend it (all of them do).

To do this, either figure out your own process for handling them or use a third-party package. Personally, for Django I like the simplicity of having this function in my various settings modules:

# Good code!fromdjango.core.exceptionsimportImproperlyConfigureddefget_env_var(var_name):try:returnos.environ[var_name]exceptKeyError:error_msg=f"Set the {var_name} environment variable"raiseImproperlyConfigured(error_msg)SECRET_KEY=get_env_var("SECRET_KEY")

I wrote a book to stop antipatterns

In 2012 I kept getting offered rescue projects because people were using anti-patterns, especially this one. It was frustrating to see the same mistakes again and again. So I started to write a book, Two Scoops of Django, designed to instruct people on how not to fall into anti-patterns like the one described in this article.

If you don't want to buy my book, please read and embrace the config section of The Twelve Factor App. Your future self will thank me for it.

Python Sweetness: A fork in the road for Mitogen

$
0
0

Mitogen for Ansible's original plan described facets of a scheme centered on features made possible by a rigorous single cohesive distributed program model, but of those facets, it quickly became clear that most users are really only interested in the big one: a much faster Ansible.

While I'd prefer feature work, this priority is fine: better performance usually entails enhancements that benefit the overall scheme, and improving people's lives in this manner is highly rewarding, so incentives remain aligned. It is impossible not to find renewed energy when faced with comments like this:

Enabling the mitogen plugin in ansible feels like switching from floppy to SSD
https://t.co/nCshkioX9h

Although feedback on the project has been very positive, the existing solution is sometimes not enough. Limitations in the extension and Ansible really bite, most often manifesting when running against many targets. In these scenarios, it is heartbreaking to see the work fail to help those who could benefit from it most, and that's what I'd like to talk about.

Controller-side Performance

Some time ago I began refactoring Ansible's linear strategy, aiming to get it to where controller-side enhancements might exist without adding more spaghetti, while becoming familiar with requirements for later features. To recap, the strategy plugin is responsible for almost every post-parsing task, including worker management. It is in many ways the beating heart at the core of every Ansible run.

After some months and one particularly enlightening conversation that work was resumed, eventually subsuming all of the remaining strategy support and result processing code, forming one huge refactor of a big chunk of upstream that I have been sitting on for nearly a month.

The result exists today and is truly wonderful. It integrates Mitogen into the heart of Ansible without baking it in, introduces a carefully designed process model with strong persistence properties, eliminating most bottlenecks endured by the extension and vanilla Ansible, and provides an architectural basis for the next planned iteration of scalability work, Windows compatibility, some features I've already mentioned, and quite a few I've been keeping quiet.

With the new strategy it is possible to almost perfectly saturate an 8 vCPU machine given 100 targets, with minimal loss of speedup compared to single-target. Regarding single target, simple loops against localhost are up to 4x faster than the current stable extension.

There are at least 2 obvious additional enhancements now possible with the new work, but I stopped myself in order to allow stablizing one piece of the puzzle at a time. When this is done, it is clear exactly where to pick things up next.

Deep Cuts

There's just a small hitch: this work goes deep, entailing changes that, while so far would be possible as monkey-patches, are highly version-specific, and unlikely to remain monkey-patchable as the branch receives real-world usage. There must be a mechanism to ship unknown future patches to upstream code.

I hoped it could land after Ansible 2.7, benefitting from related changes planned upstream, but they appear to have been delayed or abandoned, and so a situation exists where I cannot ship improvements for at least another 4-6 months, assuming the related changes finally arrived in Ansible 2.8.

To the right is a rough approximation of components involved in executing a playbook. Those modified or replaced by the stable extension are green, yellow are replaced by the branch-in-waiting. Finally in orange are components affected by planned features and optimizations.

Although there are tens of thousands of lines of surrounding code, as should hopefully be clear, the number of untouched major components involved in a run has been dwindling fast. In short, the existing mechanism for delivering improvements is reaching its limit.

The F Word

I hope any seasoned developer, especially those familiar with the size of the Ansible code base, should understand the predicament. There is no problem delivering improvements today, assuming an unsupported one-off code dump was all anyone wanted, but that is never the case.

The problem lies in entering an unsustainable permanent marriage with a large project, not forgetting to mention this outcome was an explicit non-goal from the start. Simultaneously over the months I have garnered significant trust to deliver these kinds of improvements, and abandoning one of the best yet would seem foolish.

Something of a many-variabled optimization process has recently come to an end, and a solution has been found that I am comfortable with. While making an announcement requires more time and may still not be definite, I wanted to document at least some of my reasoning before it comes.

Even though I wanted to avoid this outcome, and while the solution in mind is not without restraint, it is still a cloud with many silver linings. For instance, new user configuration steps can be reduced to almost zero, core features can be added with minimal friction, and creative limitations are significantly uncapped.

The key question was how to sustain continued work on a solution that has clear value to a real problem that plagued upstream since conception. The answer it turns out, is obvious: the scalability fixes I wish to release primarily benefit one type of user.

What about upstream?

Beyond debating strawmen and lines of code, no actionable outcome has ever materialized, not after carefully worded chain rattling, and not even in the form of a bug report. If it had, it was always going to at best be a compromise with an organization that has delivered consistently worsening performance every major release for the past 2 and a half years, and it is the principal reason crowdfunding the extension was the only method to deliver real improvements.

The cold reality is that the upstream trend is not a good one: this problem has existed forever and it is slowly getting worse over time. My best interpretation is that some veterans hate the extension's solution, perhaps some of those around since 2012 when Michael DeHaan, the project founder, first attempted a connection method uncannily similar to today's design.

In any case they have my e-mail address, an existing thread to hit Reply to, and at least two invitations to a telephone call. A conversation requires interest and initiative, and above all else it requires two parties.

What About The Extension?

The planned structure keeps the extension front-and-centre, so regardless of outcome it will continue to receive significant feature work and maintenance. It is definitely not going away.

With a third stable release looming, it's probably high time for a quick update. Many bugs were squashed since July, with stable work recently centered around problems with Ansible 2.6. This involved some changes to temporary file handling, and in the process, discovery of a huge missed optimization.

v0.2.3 will need only 2 roundtrips for each copy and template, or in terms of a 250ms transcontinental link, 10 seconds to copy 20 files vs. 30 seconds previously, or 2 minutes compared to vanilla's best configuration. This work is delayed somewhat as a new RPC chaining mechanism is added to better support all similar future changes, and identical situations likely to appear in similar tools.

Just tuning in?

Until next time!

Podcast.__init__: Fast Stream Processing In Python Using Faust with Ask Solem

$
0
0
The need to process unbounded and continually streaming sources of data has become increasingly common. One of the popular platforms for implementing this is Kafka along with its streams API. Unfortunately, this requires all of your processing or microservice logic to be implemented in Java, so what's a poor Python developer to do? If that developer is Ask Solem of Celery fame then the answer is, help to re-implement the streams API in Python. In this episode Ask describes how Faust got started, how it works under the covers, and how you can start using it today to process your fast moving data in easy to understand Python code. He also discusses ways in which Faust might be able to replace your Celery workers, and all of the pieces that you can replace with your own plugins.

Summary

The need to process unbounded and continually streaming sources of data has become increasingly common. One of the popular platforms for implementing this is Kafka along with its streams API. Unfortunately, this requires all of your processing or microservice logic to be implemented in Java, so what’s a poor Python developer to do? If that developer is Ask Solem of Celery fame then the answer is, help to re-implement the streams API in Python. In this episode Ask describes how Faust got started, how it works under the covers, and how you can start using it today to process your fast moving data in easy to understand Python code. He also discusses ways in which Faust might be able to replace your Celery workers, and all of the pieces that you can replace with your own plugins.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Join the community in the new Zulip chat workspace at podcastinit.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Ask Solem about Faust, a library for building high performance, high throughput streaming systems in Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Faust and what was your motivation for building it?
    • What were the initial project requirements that led you to use Kafka as the primary infrastructure component for Faust?
  • Can you describe the architecture for Faust and how it has changed from when you first started writing it?
    • What mechanism does Faust use for managing consensus and failover among instances that are working on the same stream partition?
  • What are some of the lessons that you learned while building Celery that were most useful to you when designing Faust?
  • What have you found to be the most common areas of confusion for people who are just starting to build an application on top of Faust?
  • What has been the most interesting/unexpected/difficult aspects of building and maintaining Faust?
  • What have you found to be the most challenging aspects of building streaming applications?
  • What was the reason for releasing Faust as an open source project rather than keeping it internal to Robinhood?
  • What would be involved in adding support for alternate queue or stream implementations?
  • What do you have planned for the future of Faust?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA


Mike Driscoll: Fall eBook Sale 2018

$
0
0

It’s the start of a new school year, so I am running a new sale this Fall. Feel free to check out my current sales:

These sales will last until Sept. 1st. All my eBooks are available as PDF, mobi (Kindle) and epub format.

Mike Driscoll: PyDev of the Week: Manuel Kaufmann

$
0
0

This week we welcome Manuel Kaufmann (@reydelhumo) as our PyDev of the Week. Manuel has been very active in promoting Python in South America and even received a grant a few years ago to help him in that regard from the Python Software Foundation. He started the Argentina en Python project and he also works for Read the Docs. You can check out his website to learn more about him, although please note that it’s mostly in Spanish. You can also see what projects he is currently contributing to via Github. Let’s take some time to get to know Manuel better!

Can you tell us a little about yourself (hobbies, education, etc):

I’m Manuel Kaufmann. A passionate Python developer from Paraná, Entre Ríos, Argentina. Paraná is a small town (compared to other cities in Argentina) with not too much movement around technology. I studied System Engineer in a city close to where I was born called Santa Fé and I disliked what was taught there and how, so I decided to quit after some years of studying and continue by myself. I had some problems with English at that time and it was hard to keep up to date with recent technology topics and depend on translations. Also, at work it was complicated to me to follow long discussions and share thoughts naturally in English. Because of that, I decided to go back to University but this time to study to become an English teacher. When I got what I was looking for after 3 years of hard work, I quit and go back to what I love: programming.

Since I started my personal blog in 2008, I used to write every single day at least a couple of lines. Maybe they ended up in the drawer and never get published. While the blog was growing, all my posts where about technology and very technical, with many commands on them and step to do or fix something very specific. Time to time, I started sharing adventures around my activity in the national Python community and then I realized that I wanted to share more social things related to what was happening “around me” in different directions in my life. Stories, travel adventures, fiction, funny stories and many other topics where covered in my blog and I realized that more and more people was following my posts.

Another hobby that I used to have while studying and where I learnt a lot about life in many senses, was practicing circus. Now, I can say that I’m a juggler and an uni-cyclist. I used to juggle and play music on the stage with a group of around 10 people called “Circulando Circo Callejero” which its translation would be something like “Circulating Street Circus”-ish. I was really involved in the art movement in Paraná city with the photography, circus, music and more. I miss those days.

Why did you start using Python?

(The answer to this question changed completely the course of my life. Really. More on that later)

While I was studying System Engineer at University and finding my place in the technology movement, I decided to attend to the first “PyDay in Santa Fe” organized by some students from that University. That day, I saw how a person shows that “mylist = []” creates a list. That shocked me completely because I was doing linked list and doubly linked list in C because that was what I was learning at University and all that implementation were too brain consuming compared to how Python handled that. That day I decided that I wanted to study day and night Python and become a Python expert. Yes, I know, one thing is what you dream and another thing is what you achieve 🙂

What other programming languages do you know and which is your favorite?

My first contact to a programming language was editing the AUTOEXEC.BAT and CONFIG.SYS for MS-DOS and Windows 3.x to speed up things and manage the resources in a better way so I can play different video games that didn’t work otherwise. Then, on high school I met ActionScript while I was learning Macromedia Flash 4 by my own after school and found a text box where you could write things that didn’t make sense to me. I found and printed a book about ActionScript and I fell in love with what the concept of simple variable was.

At University I learnt C, C++, SmallTalk, Scheme, Prolog and Java. But I can’t say that I know how to program nowadays on those languages since I never used them professionally. I also studied by myself Javascript, a little of PHP, Octave to translate a software written in Matlab to analize RF frequencies, a little of Ruby, ELisp because I’m a fan of Emacs and a lot of Python. Honestly, is complicated to me to compare them all because my knowledge on each of them has nothing to do compared with Python. I love Python and I studied it since 2006, worked professionally with it and I tried to not apply for jobs in other languages while I was starting my professional career. The rest is history, I’d say.

Today, I really want to learn other languages since every time that I learnt a little of a new language I added a new way of think or a different way to attack the same problem. I think that’s an amazing skill to have and use. I’d like to take a look at Go, Rust and Haskell in the near future.

What projects are you working on now?

Currently, I’m not working on any personal project that involves coding. I’ve been more involved in the community the last years and working on developing courses, new talks, organizing events and more with the project “Argentina en Python” which I will talk more about that later.

I used to stay awake late at night and code for many long hours alone, drinking a beer or wine. Maybe I’m getting old, weak or lazy I don’t know, but now I’m not being able to do that anymore.

Which Python libraries are your favorite (core or 3rd party)?

I have an eternal love with Django. My professional career started with it and since then, each time that I read the source code, the documentation or think how they solved a specific problem I keep admiring these people and say “Thanks”. Also, I think that requests has changed the whole ecosystem in the Python community. I remember those days when using urllib making a lot of mistakes over and over again. It was very complicated to memorize how was the right way of using it.

On top of Django, I used django-rest-framework on a daily basis for several years now and I have to say that it’s also one of my favourites.

Not sure if it’s considered a Python library or not, but virtualenvwrapper would be another one that I’d put together with my favourites. Although I don’t use it anymore, I translated its documentation from English to Spanish some years ago because I was a really fan of it. It really changed my way on working with virtualenvs.

Could you tell us about the Argentina en Python project and what it was all about?

Argentina en Python is a personal and communitary project that promotes the usage of Python as a programming language to solve daily issues for common users and also to develop powerful and complex softwares in an easy way, encouraging the collaborative learning and the Software Libre philosophy.

To achieve this goal, I started travelling in my small-personal-tweaked-car (with a kind of a bed inside it) around Argentina and contacting people from Universities, co-working spaces and cultural centers among others, to help me to organize free Python-related events (PyDay, Sprint, Meetup, Workshop, Course, etc) in places where has never been a Python event before. Besides the technical aspect of the event, the main goal is to motivate local people from small towns to organize their own events and this way decentralize the knowledge from biggest cities and bring fair job opportunities to people from small towns.

The Argentina en Python’s team is composed by Johanna Sanchez (Chemistry), The Wanderer (our Car) and myself (Developer).

What is the PSF Python Ambassador Program?

During the last years (Argentina en Python started in early 2014) I’ve organized more than 60 events including 20 Django Girls Workshops in more than 7 countries of Latin America. All these events were supported by the Python worldwide community by giving us donations but also by the Python Software Foundation itself: granting us all the Grant Proposals I’ve submitted to support these events.

Because of all this work we have done and to organize the relationship I have through “Argentina en Python” with the PSF they declared me “PSF Python Ambassador in Latin America” to put a face of the PSF in these events and also giving me a year budget to use while organizing these events. The program started in 2017 as a Trial and we were discussing during the last year how it should be implemented and opened to the rest of the Python community and help other regions of the world in the organization of these kind of events.

In January, I collected all the ideas and problems that I had during the Trial program and I presented a document to the PSF trying to standardise what the program is about and proposing to open it to the whole Python community. Currently, we are still discussing the benefits/problems that it could imply into the community and the PSF itself.

How were you involved in the Django Girls movement?

During our travels, we organized events but we also participated in events that were organized by other people. After some time we realized that in most of them the number of woman attending to these kind of events were so low that it called our attention and we decided to do something to increase woman participation. At that moment, I researched a little about this and I didn’t found anything well organized or known world-wide (I think that by the end of 2014 Django Girls wasn’t already created or well known). So, I decided to create a 2 days course by myself with different introductory topics and organize an event in Posadas, Misiones, Argentina to deliver it. I called it “Python for Ladies“.

Honestly, although the content of the course was interesting, I’m not a teacher with knowledge about how to organize the content and how to make it fun/attractive to attendees without previous knowledge on the topic. Because of this, after a couple of months while I was trying to improve my course I found Django Girls and I fell in love with the tutorial that they had. It was so amazing that I decided that it was perfect for the course that I wanted to do by myself and use it in the class. Then, I kept reading and I wanted to share the same philosophy they had. The more I read about Django Girls the more I wanted to organize a Django Girls Workshop. They were able to express all these ideas/philosophy in a way that I couldn’t but also they already had “the right path to follow” to a solution of this problem. I mean, not only the tutorial, but also the coach’s guide, the organizer’s manual and so much amazing material!

Once I found Django Girls, I burnt out my “Python for Ladies” course and started to organize as many Django Girls Workshop as I could 🙂

What do you do at Read the Docs? Why is Read the Docs important?

For those that do not know, Read the Docs is a free software project and a SaaS that simplifies software documentation by automating building, versioning, and hosting of your docs for you. It has many good features which help you to focus on just write the documentation without worry about the tools to make your documentation work.

I started working at Read the Docs last year as a software developer to work on improving the documentation build stability and make the business product (readthedocs.com) better. I’m really happy with this work because it’s a free software project, it connects me with tons of users/developers from around the world, it has more users/traffic that any other project that I’ve worked before which implies really interesting challenges and also because I admire all the members of the team that I work with. It’s like a dream that became a real 🙂

Regarding the second question: “Documentation is important” and Read the Docs helps people to make it better.

Is there anything else you’d like to say?

To anyone reading this interview: “Please, get involved in the community. We need you!”

Thanks for doing the interview!

Simple is Better Than Complex: How to Create Custom Django Management Commands

$
0
0

Django comes with a variety of command line utilities that can be either invoked using django-admin.py or the convenient manage.py script. A nice thing about it is that you can also add your own commands. Those management commands can be very handy when you need to interact with your application via command line using a terminal and it can also serve as an interface to execute cron jobs. In this tutorial you are going to learn how to code your own commands.


Introduction

Just before we get started, let’s take a moment to familiarize with Django’s command line interface. You are probably already familiar with commands like startproject, runserver or collectstatic. To see a complete list of commands you can run the command below:

python manage.py help

Output:

Type 'manage.py help <subcommand>' for help on a specific subcommand.

Available subcommands:

[auth]
    changepassword
    createsuperuser

[contenttypes]
    remove_stale_contenttypes

[django]
    check
    compilemessages
    createcachetable
    dbshell
    diffsettings
    dumpdata
    flush
    inspectdb
    loaddata
    makemessages
    makemigrations
    migrate
    sendtestemail
    shell
    showmigrations
    sqlflush
    sqlmigrate
    sqlsequencereset
    squashmigrations
    startapp
    startproject
    test
    testserver

[sessions]
    clearsessions

[staticfiles]
    collectstatic
    findstatic
    runserver

We can create our own commands for our apps and include them in the list by creating a management/commands directory inside an app directory, like below:

mysite/                                   <-- project directory
 |-- core/                                <-- app directory
 |    |-- management/
 |    |    +-- commands/
 |    |         +-- my_custom_command.py  <-- module where command is going to live
 |    |-- migrations/
 |    |    +-- __init__.py
 |    |-- __init__.py
 |    |-- admin.py
 |    |-- apps.py
 |    |-- models.py
 |    |-- tests.py
 |    +-- views.py
 |-- mysite/
 |    |-- __init__.py
 |    |-- settings.py
 |    |-- urls.py
 |    |-- wsgi.py
 +-- manage.py

The name of the command file will be used to invoke using the command line utility. For example, if our command was called my_custom_command.py, then we will be able to execute it via:

python manage.py my_custom_command

Let’s explore next our first example.


Basic Example

Below, a basic example of what the custom command should look like:

management/commands/what_time_is_it.py

fromdjango.core.management.baseimportBaseCommandfromdjango.utilsimporttimezoneclassCommand(BaseCommand):help='Displays current time'defhandle(self,*args,**kwargs):time=timezone.now().strftime('%X')self.stdout.write("It's now %s"%time)

Basically a Django management command is composed by a class named Command which inherits from BaseCommand. The command code should be defined inside the handle() method.

See how we named our module what_time_is_it.py. This command can be executed as:

python manage.py what_time_is_it

Output:

It's now 18:35:31

You may be asking yourself, how is that different from a regular Python script, or what’s the benefit of it. Well, the main advantage is that all Django machinery is loaded and ready to be used. That means you can import models, execute queries to the database using Django’s ORM and interact with all your project’s resources.


Handling Arguments

Django make use of the argparse, which is part of Python’s standard library. To handle arguments in our custom command we should define a method named add_arguments.

Positional Arguments

The next example is a command that create random user instances. It takes a mandatory argument named total, which will define the number of users that will be created by the command.

management/commands/create_users.py

fromdjango.contrib.auth.modelsimportUserfromdjango.core.management.baseimportBaseCommandfromdjango.utils.cryptoimportget_random_stringclassCommand(BaseCommand):help='Create random users'defadd_arguments(self,parser):parser.add_argument('total',type=int,help='Indicates the number of users to be created')defhandle(self,*args,**kwargs):total=kwargs['total']foriinrange(total):User.objects.create_user(username=get_random_string(),email='',password='123')

Here is how one would use it:

python manage.py create_users 10
Optional Arguments

The optional (and named) arguments can be passed in any order. In the example below you will find the definition of an argument named “prefix”, which will be used to compose the username field:

management/commands/create_users.py

fromdjango.contrib.auth.modelsimportUserfromdjango.core.management.baseimportBaseCommandfromdjango.utils.cryptoimportget_random_stringclassCommand(BaseCommand):help='Create random users'defadd_arguments(self,parser):parser.add_argument('total',type=int,help='Indicates the number of users to be created')# Optional argumentparser.add_argument('-p','--prefix',type=str,help='Define a username prefix',)defhandle(self,*args,**kwargs):total=kwargs['total']prefix=kwargs['prefix']foriinrange(total):ifprefix:username='{prefix}_{random_string}'.format(prefix=prefix,random_string=get_random_string())else:username=get_random_string()User.objects.create_user(username=username,email='',password='123')

Usage:

python manage.py create_users 10 --prefix custom_user

or

python manage.py create_users 10 -p custom_user

If the prefix is used, the username field will be created as custom_user_oYwoxtt4vNHR. If not prefix, it will be created simply as oYwoxtt4vNHR– a random string.

Flag Arguments

Another type of optional arguments are flags, which are used to handle boolean values. Let’s say we want to add an --admin flag, to instruct our command to create a super user or to create a regular user if the flag is not present.

management/commands/create_users.py

fromdjango.contrib.auth.modelsimportUserfromdjango.core.management.baseimportBaseCommandfromdjango.utils.cryptoimportget_random_stringclassCommand(BaseCommand):help='Create random users'defadd_arguments(self,parser):parser.add_argument('total',type=int,help='Indicates the number of users to be created')parser.add_argument('-p','--prefix',type=str,help='Define a username prefix')parser.add_argument('-a','--admin',action='store_true',help='Create an admin account')defhandle(self,*args,**kwargs):total=kwargs['total']prefix=kwargs['prefix']admin=kwargs['admin']foriinrange(total):ifprefix:username='{prefix}_{random_string}'.format(prefix=prefix,random_string=get_random_string())else:username=get_random_string()ifadmin:User.objects.create_superuser(username=username,email='',password='123')else:User.objects.create_user(username=username,email='',password='123')

Usage:

python manage.py create_users 2 --admin

Or

python manage.py create_users 2 -a
Arbitrary List of Arguments

Let’s create a new command now named delete_users. In this new command we will be able to pass a list of user ids and the command should delete those users from the database.

management/commands/delete_users.py

fromdjango.contrib.auth.modelsimportUserfromdjango.core.management.baseimportBaseCommandclassCommand(BaseCommand):help='Delete users'defadd_arguments(self,parser):parser.add_argument('user_id',nargs='+',type=int,help='User ID')defhandle(self,*args,**kwargs):users_ids=kwargs['user_id']foruser_idinusers_ids:try:user=User.objects.get(pk=user_id)user.delete()self.stdout.write('User "%s (%s)" deleted with success!'%(user.username,user_id))exceptUser.DoesNotExist:self.stdout.write('User with id "%s" does not exist.'%user_id)

Usage:

python manage.py delete_users 1

Output:

User "SMl5ISqAsIS8 (1)" deleted with success!

We can also pass a number of ids separated by spaces, so the command will delete the users in a single call:

python manage.py delete_users 1 2 3 4

Output:

User with id "1" does not exist.
User "9teHR4Y7Bz4q (2)" deleted with success!
User "ABdSgmBtfO2t (3)" deleted with success!
User "BsDxOO8Uxgvo (4)" deleted with success!

Styling

We could improve the previous example a little big by setting an appropriate color to the output message:

management/commands/delete_users.py

fromdjango.contrib.auth.modelsimportUserfromdjango.core.management.baseimportBaseCommandclassCommand(BaseCommand):help='Delete users'defadd_arguments(self,parser):parser.add_argument('user_id',nargs='+',type=int,help='User ID')defhandle(self,*args,**kwargs):users_ids=kwargs['user_id']foruser_idinusers_ids:try:user=User.objects.get(pk=user_id)user.delete()self.stdout.write(self.style.SUCCESS('User "%s (%s)" deleted with success!'%(user.username,user_id)))exceptUser.DoesNotExist:self.stdout.write(self.style.WARNING('User with id "%s" does not exist.'%user_id))

Usage is the same as before, difference now is just the output:

python manage.py delete_users 3 4 5 6

Output:

Terminal

Below a list of all available styles, in form of a management command:

fromdjango.core.management.baseimportBaseCommandclassCommand(BaseCommand):help='Show all available styles'defhandle(self,*args,**kwargs):self.stdout.write(self.style.ERROR('error - A major error.'))self.stdout.write(self.style.NOTICE('notice - A minor error.'))self.stdout.write(self.style.SUCCESS('success - A success.'))self.stdout.write(self.style.WARNING('warning - A warning.'))self.stdout.write(self.style.SQL_FIELD('sql_field - The name of a model field in SQL.'))self.stdout.write(self.style.SQL_COLTYPE('sql_coltype - The type of a model field in SQL.'))self.stdout.write(self.style.SQL_KEYWORD('sql_keyword - An SQL keyword.'))self.stdout.write(self.style.SQL_TABLE('sql_table - The name of a model in SQL.'))self.stdout.write(self.style.HTTP_INFO('http_info - A 1XX HTTP Informational server response.'))self.stdout.write(self.style.HTTP_SUCCESS('http_success - A 2XX HTTP Success server response.'))self.stdout.write(self.style.HTTP_NOT_MODIFIED('http_not_modified - A 304 HTTP Not Modified server response.'))self.stdout.write(self.style.HTTP_REDIRECT('http_redirect - A 3XX HTTP Redirect server response other than 304.'))self.stdout.write(self.style.HTTP_NOT_FOUND('http_not_found - A 404 HTTP Not Found server response.'))self.stdout.write(self.style.HTTP_BAD_REQUEST('http_bad_request - A 4XX HTTP Bad Request server response other than 404.'))self.stdout.write(self.style.HTTP_SERVER_ERROR('http_server_error - A 5XX HTTP Server Error response.'))self.stdout.write(self.style.MIGRATE_HEADING('migrate_heading - A heading in a migrations management command.'))self.stdout.write(self.style.MIGRATE_LABEL('migrate_label - A migration name.'))

Django Styles


Cron Job

If you have a task that must run periodically, like generating a report every Monday. Or let’s say you have a Web scrapper that collects data from some Website every 10 minutes. You can define this code as a management command and simply add it to your server’s crontab like this:

# m h  dom mon dow   command
0 4 *** /home/mysite/venv/bin/python /home/mysite/mysite/manage.py my_custom_command

The example above will execute the my_custom_command every day at 4 a.m.


Further Reading

The examples above should be enough to get you started. More advanced usage will boil down to knowing how to use the argparse features. And of course, Django’s official documentation on management commands is the best resource.

You can find all the code used in this tutorial on GitHub.

Stack Abuse: Text Classification with Python and Scikit-Learn

$
0
0

Introduction

Text Classification with Python and Scikit-Learn

Text classification is one of the most important tasks in Natural Language Processing. It is the process of classifying text strings or documents into different categories, depending upon the contents of the strings. Text classification has a variety of applications, such as detecting user sentiment from a tweet, classifying an email as spam or ham, classifying blog posts into different categories, automatic tagging of customer queries, and so on.

In this article, we will see a real-world example of text classification. We will train a machine learning model capable of predicting whether a given movie review is positive or negative. This is a classic example of sentimental analysis where people's sentiments towards a particular entity are classified into different categories.

Dataset

The dataset that we are going to use for this article can be downloaded from the Cornell Natural Language Processing Group. The dataset consists of a total of 2000 documents. Half of the documents contain positive reviews regarding a movie while the remaining half contains negative reviews. Further details regarding the dataset can be found at this link.

Unzip or extract the dataset once you download it. Open the folder "txt_sentoken". The folder contains two subfolders: "neg" and "pos". If you open these folders, you can see the text documents containing movie reviews.

Sentiment Analysis with Scikit-Learn

Now that we have downloaded the data, it is time to see some action. In this section, we will perform a series of steps required to predict sentiments from reviews of different movies. These steps can be used for any text classification task. We will use Python's Scikit-Learn library for machine learning to train a text classification model.

Following are the steps required to create a text classification model in Python:

  1. Importing Libraries
  2. Importing The dataset
  3. Text Preprocessing
  4. Converting Text to Numbers
  5. Training and Test Sets
  6. Training Text Classification Model and Predicting Sentiment
  7. Evaluating The Model
  8. Saving and Loading the Model

Importing Libraries

Execute the following script to import the required libraries:

import numpy as np  
import re  
import nltk  
from sklearn.datasets import load_files  
nltk.download('stopwords')  
import pickle  
from nltk.corpus import stopwords  

Importing the Dataset

We will use the load_files function from the sklearn_datasets library to import the dataset into our application. The load_files function automatically divides the dataset into data and target sets. For instance, in our case, we will pass it the path to the "txtsentoken" directory. The load_files will treat each folder inside the "txtsentoken" folder as one category and all the documents inside that folder will be assigned its corresponding category.

Execute the following script to see load_files function in action:

movie_data = load_files(r"D:\txt_sentoken")  
X, y = movie_data.data, movie_data.target  

In the script above, the load_files function loads the data from both "neg" and "pos" folders into the X variable, while the target categories are stored in y. Here X is a list of 2000 string type elements where each element corresponds to single user review. Similarly, y is a numpy array of size 2000. If you print y on the screen, you will see an array of 1s and 0s. This is because, for each category, the load_files function adds a number to the target numpy array. We have two categories: "neg" and "pos", therefore 1s and 0s have been added to the target array.

Text Preprocessing

Once the dataset has been imported, the next step is to preprocess the text. Text may contain numbers, special characters, and unwanted spaces. Depending upon the problem we face, we may or may not need to remove these special characters and numbers from text. However, for the sake of explanation, we will remove all the special characters, numbers, and unwanted spaces from our text. Execute the following script to preprocess the data:

documents = []

for sen in range(0, len(X)):  
    # Remove all the special characters
    document = re.sub(r'\W', ' ', str(X[sen]))

    # remove all single characters
    document = re.sub(r'\s+[a-zA-Z]\s+', ' ', document)

    # Remove single characters from the start
    document = re.sub(r'\^[a-zA-Z]\s+', ' ', document) 

    # Substituting multiple spaces with single space
    document = re.sub(r'\s+', ' ', document, flags=re.I)

    # Removing prefixed 'b'
    document = re.sub(r'^b\s+', '', document)

    # Converting to Lowercase
    document = document.lower()

    # Lemmatization
    document = document.split()

    document = [stemmer.lemmatize(word) for word in document]
    document = ' '.join(document)

    documents.append(document)

In the script above we use Regex Expressions from Python re library to perform different preprocessing tasks. We start by removing all non-word characters such as special characters, numbers, etc.

Next, we remove all the single characters. For instance, when we remove the punctuation mark from "David's" and replace it with a space, we get "David" and a single character "s", which has no meaning. To remove such single characters we use \s+[a-zA-Z]\s+ regular expression which substitutes all the single characters having spaces on either side, with a single space.

Next, we use the \^[a-zA-Z]\s+ regular expression to replace a single character from the beginning of the document, with a single space. Replacing single characters with a single space may result in multiple spaces, which is not ideal.

We again use the regular expression \s+ to replace one or more spaces with a single space. When you have a dataset in bytes format, the alphabet letter "b" is appended before every string. The regex ^b\s+ removes "b" from the start of a string. The next step is to convert the data to lower case so that the words that are actually the same but have different cases can be treated equally.

The final preprocessing step is the lemmatization. In lemmatization, we reduce the word into dictionary root form. For instance "cats" is converted into "cat". Lemmatization is done in order to avoid creating features that are semantically similar but syntactically different. For instance, we don't want two different features named "cats" and "cat", which are semantically similar, therefore we perform lemmatization.

Converting Text to Numbers

Machines, unlike humans, cannot understand the raw text. Machines can only see numbers. Particularly, statistical techniques such as machine learning can only deal with numbers. Therefore, we need to convert our text into numbers.

Different approaches exist to convert text into the corresponding numerical form. The Bag of Words Model and the Word Embedding Model are two of the most commonly used approaches. In this article, we will use the bag of words model to convert our text to numbers.

Bag of Words

The following script uses the bag of words model to convert text documents into corresponding numerical features:

from sklearn.feature_extraction.text import CountVectorizer  
vectorizer = CountVectorizer(max_features=1500, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))  
X = vectorizer.fit_transform(documents).toarray()  

The script above uses CountVectorizer class from the sklearn.feature_extraction.text library. There are some important parameters that are required to be passed to the constructor of the class. The first parameter is the max_features parameter, which is set to 1500. This is because when you convert words to numbers using the bag of words approach, all the unique words in all the documents are converted into features. All the documents can contain tens of thousands of unique words. But the words that have a very low frequency of occurrence are unusually not a good parameter for classifying documents. Therefore we set the max_features parameter to 1500, which means that we want to use 1500 most occurring words as features for training our classifier.

The next parameter is min_df and it has been set to 5. This corresponds to the minimum number of documents that should contain this feature. So we only include those words that occur in at least 5 documents. Similarly, for the max_df, feature the value is set to 0.7; in which the fraction corresponds to a percentage. Here 0.7 means that we should include only those words that occur in a maximum of 70% of all the documents. Words that occur in almost every document are usually not suitable for classification because they do not provide any unique information about the document.

Finally, we remove the stop words from our text since, in the case of sentiment analysis, stop words may not contain any useful information. To remove the stop words we pass the stopwords object from the nltk.corpus library to the stop_wordsparameter.

The fit_transform function of the CountVectorizer class converts text documents into corresponding numeric features.

Finding TFIDF

The bag of words approach works fine for converting text to numbers. However, it has one drawback. It assigns a score to a word based on its occurrence in a particular document. It doesn't take into account the fact that the word might also be having a high frequency of occurrence in other documents as well. TFIDF resolves this issue by multiplying the term frequency of a word by the inverse document frequency. The TF stands for "Term Frequency" while IDF stands for "Inverse Document Frequency".

The term frequency is calculated as:

Term frequency = (Number of Occurrences of a word)/(Total words in the document)  

And the Inverse Document Frequency is calculated as:

IDF(word) = Log((Total number of documents)/(Number of documents containing the word))  

The TFIDF value for a word in a particular document is higher if the frequency of occurrence of that word is higher in that specific document but lower in all the other documents.

To convert values obtained using the bag of words model into TFIDF values, execute the following script:

from sklearn.feature_extraction.text import TfidfTransformer  
tfidfconverter = TfidfTransformer()  
X = tfidfconverter.fit_transform(X).toarray()  
Note:

You can also directly convert text documents into TFIDF feature values (without first converting documents to bag of words features) using the following script:

from sklearn.feature_extraction.text import TfidfVectorizer  
tfidfconverter = TfidfVectorizer(max_features=1500, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))  
X = tfidfconverter.fit_transform(documents).toarray()  

Training and Testing Sets

Like any other supervised machine learning problem, we need to divide our data into training and testing sets. To do so, we will use the train_test_split utility from the sklearn.model_selection library. Execute the following script:

from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)  

The above script divides data into 20% test set and 80% training set.

Training Text Classification Model and Predicting Sentiment

We have divided our data into training and testing set. Now is the time to see the real action. We will use the Random Forest Algorithm to train our model. You can you use any other model of your choice.

To train our machine learning model using the random forest algorithm we will use RandomForestClassifier class from the sklearn.ensemble library. The fit method of this class is used to train the algorithm. We need to pass the training data and training target sets to this method. Take a look at the following script:

classifier = RandomForestClassifier(n_estimators=1000, random_state=0)  
classifier.fit(X_train, y_train)  

Finally, to predict the sentiment for the documents in our test set we can use the predict method of the RandomForestClassifier class as shown below:

y_pred = classifier.predict(X_test)  

Congratulations, you have successfully trained your first text classification model and have made some predictions. Now is the time to see the performance of the model that you just created.

Evaluating the Model

To evaluate the performance of a classification model such as the one that we just trained, we can use metrics such as the confusion matrix, F1 measure, and the accuracy.

To find these values, we can use classification_report, confusion_matrix, and accuracy_score utilities from the sklearn.metrics library. Execute the following script to do so:

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test,y_pred))  
print(classification_report(y_test,y_pred))  
print(accuracy_score(y_test, y_pred))  

The output looks like this:

[[180  28]
 [ 30 162]]
             precision    recall  f1-score   support

          0       0.86      0.87      0.86       208
          1       0.85      0.84      0.85       192

avg / total       0.85      0.85      0.85       400

0.855  

From the output, it can be seen that our model achieved an accuracy of 85.5%, which is very good given the fact that we randomly chose all the parameters for CountVectorizer as well as for our random forest algorithm.

Saving and Loading the Model

In the script above, our machine learning model did not take much time to execute. One of the reasons for the quick training time is the fact that we had a relatively smaller training set. We had 2000 documents, of which we used 80% (1600) for training. However, in real-world scenarios, there can be millions of documents. In such cases, it can take hours or even days (if you have slower machines) to train the algorithms. Therefore, it is recommended to save the model once it is trained.

We can save our model as a pickle object in Python. To do so, execute the following script:

with open('text_classifier', 'wb') as picklefile:  
    pickle.dump(classifier,picklefile)

Once you execute the above script, you can see the text_classifier file in your working directory. We have saved our trained model and we can use it later for directly making predictions, without training.

To load the model, we can use the following code:

with open('text_classifier', 'rb') as training_model:  
    model = pickle.load(training_model)

We loaded our trained model and stored it in the model variable. Let's predict the sentiment for the test set using our loaded model and see if we can get the same results. Execute the following script:

y_pred2 = model.predict(X_test)

print(confusion_matrix(y_test, y_pred2))  
print(classification_report(y_test, y_pred2))  
print(accuracy_score(y_test, y_pred2))  

The output looks like this:

[[180  28]
 [ 30 162]]
             precision    recall  f1-score   support

          0       0.86      0.87      0.86       208
          1       0.85      0.84      0.85       192

avg / total       0.85      0.85      0.85       400

0.855  

The output is similar to the one we got earlier which showed that we successfully saved and loaded the model.

Conclusion

Text classification is one of the most commonly used NLP tasks. In this article, we saw a simple example of how text classification can be performed in Python. We performed the sentimental analysis of movie reviews.

I would advise you to change some other machine learning algorithm to see if you can improve the performance. Also, try to change the parameters of the CountVectorizerclass to see if you can get any improvement.

Techiediaries - Django: Using Python with Electron Tutorial

$
0
0

In this tutorial, you'll learn to build GUIs for your Python applications using Electron and web technologies i.e HTML, CSS and JavaScript-this means taking advantage of the latest advancements in front-end web development to build desktop applications but also taking advantages of Python and the powerful libraries it has to easily implement advanced requirements.

You can find the code in this GitHub repository.

Electron Tutorial

Electron allows you to develop desktop applications with web technologies such as JavaScript, HTML and CSS by providing a web container and a runtime with rich native cross-platform APIs. You could also think of it as a Node.js environment for desktop apps.

Electron Applications Architecture

In Electron, you have two types of processes; the Main and Renderer processes.

The main process is the one that runs the main script in the package.json file. This script can create and display GUI windows, also many Electron APIs are only available from the main process. An Electron application has always only one main process.

Electron makes use of the chromium browser to display web pages. Each web page runs on its own process called the renderer process.

You could also think of Electron as a web browser but unlike typical browsers (such as Chrome, Firefox and Edge etc.) web pages don't run inside isolated or sandboxed environments since they have access to Node.js APIs and by result can communicate with the low level APIs of the underlying operating system.

Note that Electron is not a JavaScript binding for GUI libraries but a browser/Node.js runtime that uses web pages as its GUI.

Electron Main vs. Renderer Processes

The main process uses the BrowserWindow to create native GUI Windows. A window runs a web page in its own renderer process.

Renderer processes are not able to call native GUI APIs so they need to communicate with the main process, via different mechanisms, which will handle the native operations and return any results to the requesting renderer process.

Communication Between Renderer and Main Processes

Electron provides different ways to allow communication between main and renderer processes, such as:

Sharing Data Between Renderer Processes

Each renderer process is isolated and only manages its own web page but in many situations, you need to share data between web pages (i.e renderer processes). There are multiple ways to achieve that, such as:

For example; in the main script, add the following code:

global.sharedObject={aProperty:'value'}

We simply, add variables and objects to the global namespace.

Then, in scripts running in the web pages, add:

require('electron').remote.getGlobal('sharedObject').aProperty='new value';

We import the electron module and we use the getGlobal() method of the remote property to access and modify global objects.

Using Node.js in Electron

Electron provides complete access to Node.js in main and renderer processes. That means, you have access to a full and rich ecosystem of APIs and also the modules available from npm which is the biggest repository of open-source modules in the world.

Compiling Native Node.js Modules for Electron

Keep in mind that native Node.js modules, such as SQLite3, require re-compilation to target the Electron ABI. You need to use the electron-rebuild package for rebuilding native APIs to target the Electron API

You can follow this tutorial for more information on how to compile native Node.js module for Electron.

Accessing Electron APIs

Electron provides a rich and cross-platform ecosystem of APIs. APIs can be accessed from only the remote process or only the renderer processes or both.

To access APIs, you need to import/require theelectron module:

constelectron=require('electron')

For example, the BrowserWindow API, which is only available from the main process, can be imported using the following syntax:

const{BrowserWindow}=require('electron');constwindow=newBrowserWindow();

If you want to access it from a renderer process, you can simply run:

const{BrowserWindow}=require('electron').remoteconstwindow=newBrowserWindow()

Creating your First Electron Application

Let's now see how to create our first Electron application. You can develop Electron apps just like you would normally develop Node.js apps.

You first need to start with creating or generating a package.json file inside your project's folder using the following command:

npm init -y

This will create a basic package.json file with default values:

{"name":"electronjs-python","version":"1.0.0","description":"","main":"index.js","scripts":{"test":"echo \"Error: no test specified\"&& exit 1"},"keywords":[],"author":"","license":"ISC"}

Next, create the two index.html and main.js files inside the project's folder.

touch main.js index.html

The main.js file is the main script so we need to change the main property of our package.json file to main.js instead of the default index.js file (It's only a preference not required):

"main":"main.js",

Next, you need to install electron from npm:

npm install --save-dev electron

This will install electron locally; you can also follow the official guide for more available options for installing electron.

Next, add the start script to run the main.js file. Open the package.json file and add:

"scripts":{"start":"electron .","test":"echo \"Error: no test specified\"&& exit 1"},

Now, let's add the code which runs a GUI window in the main process. Open the main.js file and add, the first line to import the electron module:

const{app,BrowserWindow}=require('electron')

Next, add the following function which makes an instance of BrowserWindow and load the index.html file:

functioncreateWindow(){window=newBrowserWindow({width:800,height:600})window.loadFile('index.html')}

When the application is ready, run the createWindow() method:

app.on('ready',createWindow)

We can also handle different events such as when closing all Windows using:

app.on('window-all-closed',()=>{// On macOS it is common for applications and their menu bar// to stay active until the user quits explicitly with Cmd + Qif(process.platform!=='darwin'){app.quit()}})

Finally, let's add the following content to the index.html file:

<!DOCTYPE html><html><head><metacharset="UTF-8"><title>Hello Python from Electron!</title></head><body><h1>Hello Python!</h1></body></html>

Now, you can run the application using:

npm start

This is a screenshot of the application running:

Electron Python

Running a Python Script from Electron

Since we want to develop our application using Python and use Electron to build the GUI frontend with the web; we need to be able to communicate between Python and Electron.

Let's see how to run a basic Python script from Electron. First create a hello.py file and add the following Python code which prints Hello from Python! to the standard output:

importsysprint('Hello from Python!')sys.stdout.flush()

In your main.js file, run the following code to spawn a Python process and execute the hello.py script:

functioncreateWindow(){/*...*/varpython=require('child_process').spawn('python',['./hello.py']);python.stdout.on('data',function(data){console.log("data: ",data.toString('utf8'));});}

Electron Python

Using python-shell to Communicate between Python and Node.js/Electron

A better way to communicate with Node.js/Electron and Python is through using the python-shell package.

python-shell provides an easy way to run Python scripts from Node.js with basic and efficient inter-process communication and better error handling.

Using python-shell, you can:

  • spawn Python scripts in a child process;
  • switch between text, JSON and binary modes;
  • use custom parsers and formatters;
  • perform data transfers through stdin and stdout streams;
  • get stack traces when an error is thrown.

Head back to your terminal, make sure you are inside the root folder of your project and run the following command to install python-shell from npm:

npm install --save python-shell 

You can then simply run a Python shell using:

varpyshell=require('python-shell');pyshell.run('hello.py',function(err,results){if(err)throwerr;console.log('hello.py finished.');console.log('results',results);});

Electron python-shell

Conlusion

In this tutorial, we've seen how to use Electron and Python to build a simple desktop application.

We've also seen how to use the python-shell module to run a Python shell from a Node.js/Electron application and communicate between Electron and Python.

Real Python: Python Community Interview With Mariatta Wijaya

$
0
0

For this week’s community interview, I am joined by Mariatta Wijaya.

Mariatta is a web developer at Zapier. She also spends much of her time volunteering in the Python community: she is a core developer and contributes to conferences and Meetups.

If you ever have the pleasure of meeting her, then you can join her in an #icecreamselfie or talk about her bots taking over GitHub. You can find Mariatta’s preferred contact links at the end of this interview.

Ricky:Let’s start with an easy one. How’d you get into programming, and when did you start using Python?

Mariatta Wijaya

Mariatta: I started around junior high school. We had extracurricular activities in my school, and one of them was “computer” class. At first, it was an introduction to MS-DOS and Windows. We were shown how to use WordStar and Lotus spreadsheets. (I’m really old.)

Later on, we got introduced to programming with QBASIC. Sometime later, I got introduced to “the world wide web,” and I started learning HTML and how to build web pages on my own. After I finished high school, I moved to Canada and studied computer science.

Before Python, I was a developer writing Windows and embedded apps, using the .NET Framework and C#. In 2008, I worked for a startup company working on a Windows project. When that project ended, they transferred me to a different team.

This team was working on web-based apps using Python, Django, and Google App Engine. I didn’t want to be looking for another job at the time. So I stayed around, started picking up Python, and began a new career path as a web developer.

Ricky:Most might know you for your work as a Python core developer. In fact, you did a talk at this year’s PyCon titled What is a Python Core Developer? For those who haven’t seen your talk, what’s the TL;DR version, and what is your role as a core developer?

Mariatta: The TL;DR version is that becoming a Python core developer comes with a lot of responsibilities, and it goes beyond just writing more code into CPython. In fact, writing code is the least we expect out of core developers nowadays. As a core dev, you’ll be expected to do more code reviews, mentoring, providing feedback, and making decisions, instead of writing more PRs yourself.

The other point that I want to highlight is that we’re all volunteers. I am not employed by any corporation or The PSF as a Python Core Developer. A lot of people still don’t realize this. Often, people write to the bug tracker as if they’re writing to customer support, expecting an immediate response, not taking no for an answer, and blaming us for various problems. Not only are we just volunteers doing this in our limited free time, but there are really very few of us compared the hundreds and thousands of users and contributors.

As a core dev myself, I’ve been focusing more on helping with the workflow, to make it easier for core devs and contributors to contribute and collaborate. I write utility tools and bots like cherry_picker, miss-islington, and recently the check_python_cla website.

I also focus on reviewing PRs from first-time contributors and documentation related issues. I like to make sure our devguide is up-to-date because that’s one of the first places we point contributors to when they have questions about our workflow.

I’m also doing weekly Python office hours now, over at Zulipchat. It is every Thursday evening at 7 PM PST. During that office hour, I’ll be available via DM, and I can respond and help in an almost real-time manner. During other times, I usually go to Zulip only once per day.

Ricky:As if you didn’t already do enough for the community, you also co-organize the PyLadies Vancouver Meetup and the PyCascades conference. Can you tell us a little bit about how you got involved with those, and what people can expect if they’re looking to attend?

Mariatta: The story of how PyCascades was founded was unclear, even to me. All I know is, one day I got an email from Seb, introducing me to the rest of the folks (Alan, Eric, Don, and Bryan), and it seems as if there’s an email thread that says, “Let’s do a Python conference in the Pacific-Northwest.”

I replied to it almost immediately. I didn’t think too much about what the responsibilities were going to be, or even how much work I’d have to put into it. I just thought, “Why not?” Within a couple weeks, we started scouting venues in Vancouver, and everything else just fell into place.

PyCascades is a one of a kind conference. We focus on highlighting first-time speakers and speakers from the Pacific-Northwest community. CFP for PyCascades 2019 is open from August 20 to the end of October. Please do submit a talk! I’m not involved in the program committee this year. Instead, I’m going to focus on mentoring speakers, especially first-time speakers and those from an underrepresented group.

I only started helping out with PyLadies Vancouver about two years ago. At the time, there were two organizers—and one of them had just stepped down—and they put up a call for more organizers. By then, even though I hadn’t been attending many Meetups, I’d benefited from PyLadies enough in the form of receiving financial aid for PyCon. So I just felt like it was an opportunity for me to pay it forward and give back to the community by also actively participating and ensuring the continuity of the Vancouver PyLadies community, instead of just waiting for the next Meetup to happen.

Our community has grown bigger now. I’ve looked back at our events over the past years, and we’ve put out so many great talks and workshops. We’ve had Python core developers and international PyCon speakers at our events. I’m quite proud of that!

Ricky:Looking through your Github, I can see that you seem to have an affinity for bots. You maintain two for the Python core devs Github, but you have many more on your Github. I’m intrigued to find out what you find so alluring about them?

Mariatta: My first introduction to GitHub bots was when I started contributing to coala two years ago. They have a GitHub bot that is very much like a personal assistant to all the maintainers. The bot was always up and running, replying and commenting. At the time, I didn’t even realize that bots could do all of those things, so I was quite impressed and fascinated with how it all worked. I always thought the bot was a very complicated system.

As I started helping to create and maintain Python’s GitHub bots, I’ve gained a better understanding of the bot’s architecture, and I was able to satisfy my initial curiosity about how GitHub bots work.

But then I started thinking differently. Now that I know how they work, and I know what GitHub APIs are available, I keep asking myself, “What else can be automated? What else can I delegate to the bots? Have we really reached peak automation?” Turns out there are a whole lot of tasks that I can automate, and all I need is Python. And now that I know which tasks can be done by bots, I get grumpy when I have to do some of those chores myself.

Ricky:I can’t have this interview with you without talking about ice cream selfies. It has become somewhat of a tradition of yours. There might be a few puzzled looks from our readers about now, so why don’t you explain all about the awesome #icecreamselfie?

Mariatta: The first #icecreamselfie I did was right after DjangoCon in Philadelphia, July 2016. I had just given my first ever conference talk, and I was feeling fabulous and just wanted to celebrate. Plus, it was a hot summer day. So I went to an ice cream shop near my hotel. Somehow, I just decided to take a selfie with the ice cream. It was actually unusual for me. Normally I just take pictures of the food, not a selfie.

My next talk was for PyCaribbean, in Puerto Rico. I wasn’t even planning for ice cream, we (myself and my roommate, and fellow speaker, Kim Crayton) were enjoying ourselves at the beach, and an ice cream cart showed up.

After that, I went to Italy for DjangoCon Europe and PyCon Italy. Of course, I had to have some gelato. No trip to Italy was going to be complete without it. Even at that point, I didn’t think of the #icecreamselfie as a tradition. The selfies have been more of a coincidence.

But after my talk at PyCon US, which was a pretty emotional talk, all I could think about was that I needed to go for ice cream. So my friend Jeff took me to this place he knew in Portland. And I felt really good after that ice cream! From then on, the #icecreamselfie became an official tradition for myself, and I go to great lengths researching the best ice cream right after I get a talk accepted.

Ricky:And now for my last question: what other hobbies and interests do you have, aside from Python? Any you’d like to share and/or plug?

Mariatta: I like doing nature walks, traveling, and going camping. I have a strange hobby of taking pictures of my food, and I post them to Instagram. My other favorite pastime is playing Mahjong. Not Mahjong solitaire (a matching game), but Hong Kong style Mahjong. I still have trouble finding people who’d play this game with me.

If people are looking for ways to support me, please do send me a happiness packet, support me on Patreon, or just say thanks.

 


Thank you Mariatta for the interview. You can find Mariatta on Twitter or her on her website if you would like to know more about her.

If there is someone you would like me to interview in the future, reach out to me in the comments below, or send me a message on Twitter.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Stack Abuse: Comparing Strings using Python

$
0
0

In Python, strings are sequences of characters, which are effectively stored in memory as an object. Each object can be identified using the id() method, as you can see below. Python tries to re-use objects in memory that have the same value, which also makes comparing objects very fast in Python:

$ python
Python 2.7.9 (default, Jun 29 2016, 13:08:31)  
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.  
>>> a = "abc">>> b = "abc">>> c = "def">>> print (id(a), id(b), id(c))
(139949123041320, 139949123041320, 139949122390576)
>>> quit()

In order to compare strings, Python offers a few different operators to do so. First, we will explain them in more detail below. Second, we'll go over both the string and the re modules, which contain methods to handle case-insensitive and inexact matches. Third, to deal with multi-line strings the difflib module is quite handy. A number of examples will help you to understand how to use them.

The == and != Operators

As a basic comparison operator you'll want to use == and !=. They work in exactly the same way as with integer and float values. The == operator returns True if there is an exact match, otherwise False will be returned. In contrast, the != operator returns True if there is no match and otherwise returns False. Listing 1 demonstrates this.

In a for loop, a string containing the name of the Swiss city "Lausanne" is compared with an entry from a list of other places, and the comparison result is printed on stdout.

Listing 1:

# define strings
listOfPlaces = ["Berlin", "Paris", "Lausanne"]  
currentCity = "Lausanne"

for place in listOfPlaces:  
    print ("comparing %s with %s: %s" % (place, currentCity, place == currentCity))

Running the Python script from above the output is as follows:

$ python3 comparing-strings.py
comparing Berlin with Lausanne: False  
comparing Paris with Lausanne: False  
comparing Lausanne with Lausanne: True  

The == and is Operators

Python has the two comparison operators == and is. At first sight they seem to be the same, but actually they are not. == compares two variables based on their actual value. In contrast, the is operator compares two variables based on the object id and returns True if the two variables refer to the same object.

The next example demonstrates that for three variables with integer values. The two variables a and b have the same value, and Python refers to the same object in order to minimize memory usage.

>>> a = 1
>>> b = 1
>>> c = 2
>>> a is b
True  
>>> a is c
False  
>>> id(a)
10771520  
>>> id(b)
10771520  

As soon as the value changes Python will reinstantiate the object and assign the variable. In the next code snippet b gets the value of 2, and subsequently b and c refer to the same object.

>>> b = 2
>>> id(b)
10771552  
>>> id(c)
10771552  

A rule of thumb to follow is to use == when comparing immutable types (like ints) and is when comparing objects.

More Comparison Pperators

For a comparison regarding a lexicographical order you can use the comparison operators <, >, <=, and >=. The comparison itself is done character by character. The order depends on the order of the characters in the alphabet. This order depends on the character table that is in use on your machine while executing the Python code.

Keep in mind the order is case-sensitive. As an example for the Latin alphabet, "Bus" comes before "bus". Listing 2 shows how these comparison operators work in practice.

Listing 2:

# define the strings
listOfPlaces = ["Berlin", "Paris", "Lausanne"]  
currentCity = "Lausanne"

for place in listOfPlaces:  
    if place < currentCity:
            print ("%s comes before %s" % (place, currentCity))
    elif place > currentCity:
            print ("%s comes after %s" % (place, currentCity))
    else:
            print ("%s is similar to %s" % (place, currentCity))

Running the Python script from above the output is as follows:

$ python3 comparing-strings-order.py
Berlin comes before Lausanne  
Paris comes after Lausanne  
Lausanne is similar to Lausanne  

Case-Insensitive Comparisons

The previous examples focused on exact matches between strings. To allow case-insensitive comparisons Python offers special string methods such as upper() and lower(). Both of them are directly available as methods of the according string object.

upper() converts the entire string into uppercase letters, and lower() into lowercase letters, respectively. Based on Listing 1 the next listing shows how to use the lower() method.

Listing 3:

# using the == operator
listOfPlaces = ["Berlin", "Paris", "Lausanne"]  
currentCity = "lausANne"

for place in listOfPlaces:  
    print ("comparing %s with %s: %s" % (place, currentCity, place.lower() == currentCity.lower()))

The output is as follows:

$ python3 comparing-strings-case-insensitive.py
comparing Berlin with lausANne: False  
comparing Paris with lausANne: False  
comparing Lausanne with lausANne: True  

Using a Regular Expression

A Regular Expression - or "regex" for short - defines a specific pattern of characters. Regarding this topic, Jeffrey Friedl wrote an excellent book titled Mastering Regular Expressions, which I'd highly recommend.

To make use of this mechanism in Python import the re module first and define a specific pattern, next. Again, the following example is based on Listing 1. The search pattern matches "bay", and begins with either a lowercase or an uppercase letter. Precisely, the following Python code finds all the strings in which the search pattern occurs no matter at which position of the string - at the beginning, or in the middle, or at the end.

Listing 4:

# import the additional module
import re

# define list of places
listOfPlaces = ["Bayswater", "Table Bay", "Bejing", "Bombay"]

# define search string
pattern = re.compile("[Bb]ay")

for place in listOfPlaces:  
    if pattern.search(place):
        print ("%s matches the search pattern" % place)

The output is as follows, and matches "Bayswater", "Table Bay", and "Bombay" from the list of places:

$ python3 comparing-strings-re.py
Bayswater matches the search pattern  
Table Bay matches the search pattern  
Bombay matches the search pattern  

Multi-Line and List Comparisons

So far our comparisons have only been on a few words. Using the difflib module Python also offers a way to compare multi-line strings, and entire lists of words. The output can be configured according to various formats of diff tools.

The next example (Listing 5) compares two multi-line strings line by line, and shows deletions as well as additions. After the initialization of the Differ object in line 12 the comparison is made using the compare() method in line 15. The result is printed on stdout (line 18).

Listing 5:

# import the additional module
import difflib

# define original text
# taken from: https://en.wikipedia.org/wiki/Internet_Information_Services
original = ["About the IIS", "", "IIS 8.5 has several improvements related", "to performance in large-scale scenarios, such", "as those used by commercial hosting providers and Microsoft's", "own cloud offerings."]

# define modified text
edited = ["About the IIS", "", "It has several improvements related", "to performance in large-scale scenarios."]

# initiate the Differ object
d = difflib.Differ()

# calculate the difference between the two texts
diff = d.compare(original, edited)

# output the result
print ('\n'.join(diff))  

Running the script creates the output as seen below. Lines with deletions are indicated by - signs whereas lines with additions start with a + sign. Furthermore, lines with changes start with a question mark. Changes are indicated using ^ signs at the according position. Lines without an indicator are still the same.

$ python comparing-strings-difflib.py
  About the IIS

- IIS 8.5 has several improvements related
?  ^^^^^^

+ It has several improvements related
?  ^

- to performance in large-scale scenarios, such
?                                        ^^^^^^

+ to performance in large-scale scenarios.
?                                        ^

- as those used by commercial hosting providers and Microsoft's
- own cloud offerings.

Conclusion

In this article you have learned various ways to compare strings in Python. We hope that this overview helps you effectively programming in your developer's life.

Acknowledgements

The author would like to thank Mandy Neumeyer for her support while preparing the article.


REPL|REBL: Displaying images on OLED screens — Using 1-bpp images in MicroPython

$
0
0

We've previously covered the basics of driving OLED I2C displays from MicroPython, including simple graphics commands and text. Here we look at displaying monochrome 1 bit-per-pixel images and animations using MicroPython on a Wemos D1.

Processing the images and correct choice of image-formats is important to get the most detail, and to not run out of memory.

Requirements
Wemos D1 v2.2+ or good imitations.Buy 0.91in OLED Screen 128x32 pixels, I2c interface.Buy Breadboard Any size will do.Buy Wires Loose ends, or jumper leads.

Setting up

The display communicates over I2C, but we need a driver to interface with it. For this we can use the ssd1306 module for OLED displays available in the MicroPython repository. Click Raw format and save the file with a .py extension.

You can then use the ampy tool (or the WebREPL) to upload the file to your device's filesystem:

ampy --port /dev/tty./dev/tty.wchusbserial141120 put ssd1306.py

With the ssd1306.py file on your Wemos D1, you can import it as any other Python module. Connect to your device, and then in the REPL enter:

frommachineimportI2C,Pinimportssd1306

If the import ssd1306 succeeds, the package is correctly uploaded and you're good to go.

Wire up the OLED display, connecting pins D1 to SCL and D2 to SDA. Provide power from G and 5V. The display below is a 2-colour version, where the top 1/4 of the pixels are yellow, while the rest is blue. They're intended for mobile screens, but it looks kind of neat with Scatman.

The circuit

i2c=I2C(-1,Pin(5),Pin(4))display=ssd1306.SSD1306_I2C(128,64,i2c)

If your display is a different size just fiddle the numbers above. You'll need to change some parameters on loops later too.

To test the display is working, let's set all the pixels to on and show it.

display.fill(1)display.show()

The screen should light up completely. If it doesn't, something is wrong.

Image Processing

To display an image on a 1-bit per pixel monochrome display we need to get our image into the same format. The best way to do this is using image manipulation software, such as Photoshop or GIMP. These allow you to down-sample the image to monochrome while maintaining detail by adding dither or other adjustments.

The first step is to crop the image down to the correct dimensions — the display used here is 128x64 pixels. To preserve as much of the image as possible you might find it useful to resize the larger axis to the max (e.g. if the image is wider than high, resize the width to 128 pixels). Then crop the remaining axis.

You can convert images to 1-bit-per-pixel in GIMP through the Image -> Mode -> Indexed... dialog.

Convert image to Indexed 1bpp

If you're image is already in an indexed format this won't be available. So convert back to RGB/Grayscale first, then re-select Image -> Mode -> Indexed.

Select "Use black and white (1-bit) palette" to enable 1bpp mode. The colour dithering settings are best chosen by trial and error depending on the image being converted although turning off dithering entirely is often best for images of solid colour blocks (e.g. logos).

Once the imagine is converted to black & white you can save to file. There are two good options for saving 1bpp images — PBM and PGM. PBM is a 1 bit-per-pixel format, while PGM is grayscale 1 byte per pixel.

TypeMagic number (ASCII)Magic number (Binary)ExtensionColors
Portable BitMapP1P4.pbm0–1 (white & black)
Portable GrayMapP2P5.pgm0–255 (gray scale)
Portable PixMapP3P6.ppm0–255 (RGB)

While PBM is clearly better suited, we can pre-process PGM down to an equivalent bit stream. Both approaches are included here, in case your software can only produce one or the other.

Save as either PBM (recommended) or PGM, and select Raw mode, not ASCII.

Raw mode dialog

Example images

Some example images (128x64 pixels) are shown below, in PNG format. Each of the images is available in this zip which contains PBM, PGM and PNG formats.

Alan PartridgeBlackadderREPL|REBLScatman

Portable Bitmap Format

Portable Bitmap Format (PBM) format consists of a regular header, separated by newlines, then the image data. The header starts with a magic number indicating the image format and whether the format is ASCII for binary. In all examples here we're using binary since it's more compact. The second line is a comment, which is usually the program used to create it. Third are the image dimensions. Then, following a final newline, you get the image binary blob.

P4
# CREATOR: GIMP PNM Filter Version 1.1
128 64
<data>

The data is stored as a 1-bit-per-pixel stream, with pixel on as 1 pixel off as 0. On a normal display screen an on pixel appears as black — this is different on the OLED, which we need to account for later.

To upload your PBM file to the controller —

ampy --port /dev/tty.wchusbserial141120 put alan.pbm

Loading images

The PBM data stream is already in the correct format for use. We can wrap the data in bytearray, use this to create a FrameBuffer and blit it immediately. However, we need to skip the header region (3x readline) before reading the subsequent data block.

withopen('scatman.pbm'%n,'rb')asf:f.readline()# Magic numberf.readline()# Creator commentf.readline()# Dimensionsdata=bytearray(f.read())fbuf=framebuf.FrameBuffer(data,128,64,framebuf.MONO_HLSB)

We can't use readlines() since the binary image data may contain ASCII code 13 (newline).

The framebuf.MONO_HLSB format is desribed in the MicroPython docs as

Monochrome (1-bit) color format This defines a mapping where the bits in a byte are horizontally mapped. Each byte occupies 8 horizontal pixels with bit 0 being the leftmost. Subsequent bytes appear at successive horizontal locations until the rightmost edge is reached. Further bytes are rendered on the next row, one pixel lower.

This matches exactly with the format of our PBM data.

This framebuffer format framebuf.MONO_HLSB used is different to that used by the ssd1306 screen (framebuf.MONO_VLSB). This is handled transparently by the framebuffer when blitting.

Displaying an image

We have the image data in fbuf, which can be blitted directly to our display framebuffer, using .blit. This accepts coordinates at which to blit. Because the OLED screen displays inverse (on = light, off = black) we need to switch .invert(1) on the display.

display.invert(1)display.blit(fbuf,0,0)display.show()

Portable Graymap Format

Portable Graymap Format (PGM) format shares a similar header to PBM, again newline separated. However, there is an additional 4th header line which contains the max value — indicating the number of values between black and white. Black is again zero, max (255 here) is white.

P5
# CREATOR: GIMP PNM Filter Version 1.1
128 64
255
<data>

The format uses 1 byte per pixel. This is 8x too many for our purposes, but we can process it down to 1bpp. Since we're saving a mono image each pixel will contain either 0 (fully off) or 255 (fully on).

To upload your PGM file to the controller —

ampy --port /dev/tty.wchusbserial141120 put alan.pgm

Loading images

Since each pixel is a single byte it is easy to iterate, though slow as hell. We opt here to turn on bright pixels, which gives us the correct output without switching the display invert on.

withopen('alan.pgm','rb')asf:f.readline()# Magic numberf.readline()# Creator commentf.readline()# Dimensionsdata=bytearray(f.read())forxinrange(128):foryinrange(32):c=data[x+y*128]display.pixel(x,y,1ifc==255else0)

Packing bits

Using 1 byte per pixel wastes 7 bits which is not great, and iterating to draw the pixels is slow. If we pack the bits we can blit as we did with PBM. To do this we simply iterate over the PGM image data in blocks of 8 (8 bits=1 byte).

Each iteration we create our zero'd-byte (an int of 0). As we iterate over the 8 bits, we add 2**(7-n) if that bit should be set to on. The first byte we hit sets the topmost bit, which has a value of 2**(7-0) = 2**7 = 128, the second 2**(7-1) = 2**6 = 64. The table below shows the values for each bit in a byte.

76543210
2^72^62^52^42^32^22^12^0
1286432168421

The result is a single byte with a single bit set in turn for each byte we iterated over.

p=[]foriinrange(0,len(d),8):byte=0forn,bitinenumerate(d[i:i+8]):byte+=2**(7-n)ifbit==255else0p.append(byte)

We choose to interpret the 255 values as on (the opposite as in PBM where black = on, giving an inverted image). You could of course reverse it.

The variable p now contains a list of int values in the range 0-255 (bytes). We can cast this to a bytearray and then use this create our FrameBuffer object.

# Create a framebuffer objectfbuf=framebuf.FrameBuffer(bytearray(p),128,64,framebuf.MONO_HLSB)

The framebuf.MONO_HLSB format is desribed in the MicroPython docs as

Monochrome (1-bit) color format This defines a mapping where the bits in a byte are horizontally mapped. Each byte occupies 8 horizontal pixels with bit 0 being the leftmost. Subsequent bytes appear at successive horizontal locations until the rightmost edge is reached. Further bytes are rendered on the next row, one pixel lower.

This matches exactly with the format of our PGM (and bit-packed) data.

This framebuffer format framebuf.MONO_HLSB used is different to that used by the ssd1306 screen (framebuf.MONO_VLSB). This is handled transparently by the framebuffer when blitting.

Packing script

A command-line packing script is given below (and you can download it here), which can be used to pack a PGM into a 1bpp bitstream. The script accepts a single filename of a PGM file to process, and outputs the resulting packed bit data as <filename>.bin.

importosimportsysfn=sys.argv[1]withopen(fn,'rb')asf:f.readline()# Magic numberf.readline()# Creator commentf.readline()# Dimensionsf.readline()# Max value, 255data=bytearray(f.read())p=[]foriinrange(0,len(data),8):byte=0forn,bitinenumerate(data[i:i+8]):byte+=2**(7-n)ifbit==255else0p.append(byte)b=bytearray(p)basename,_=os.path.splitext(fn)withopen('%s.bin'%basename,'wb')asf:f.write(b)

The resulting file is 1KB in size, and identical to a .pbm format file, minus the header and with colours inverted (this makes display simpler).

python pack.py scatman.1.pgm

ls -l

-rw-r--r--  1 martin  staff  1024 26 Aug 18:11 scatman.bin
-rw-r--r--  1 martin  staff  8245 26 Aug 18:02 scatman.pgm

To upload your BIN file to the controller —

ampy --port /dev/tty.wchusbserial141120 put scatman.bin

Loading images

Since we've stripped off the PGM header, the resulting file can be read directly into a bytearray.

withopen('scatman.bin','rb')asf:data=bytearray(f.read())fbuf=framebuf.FrameBuffer(data,128,64,framebuf.MONO_HLSB)

The colours were inverted in our bit packer so we can just blit the framebuffer directly without inverting the display.

display.blit(fbuf,0,0)display.show()

Animation

Both the PBM and PGM images are 1KB in memory once loaded, leaving us plenty of space to load multiple images and animate them. The following loads a series of Scatman John PBM images and animates them in a loop.

frommachineimportI2C,Pinimportssd1306importtimeimportframebufi2c=I2C(-1,Pin(5),Pin(4))display=ssd1306.SSD1306_I2C(128,64,i2c)images=[]forninrange(1,7):withopen('scatman.%s.pbm'%n,'rb')asf:f.readline()# Magic numberf.readline()# Creator commentf.readline()# Dimensionsdata=bytearray(f.read())fbuf=framebuf.FrameBuffer(data,128,64,framebuf.MONO_HLSB)images.append(fbuf)display.invert(1)whileTrue:foriinimages:display.blit(i,0,0)display.show()time.sleep(0.1)

The resulting animation —

I'm the Scatman

The image distortion is due to frame rate mismatch with the camera and won't be visible in person.

Optimization

There is still plenty of room left for optimization. For static images there are often multiple consecutive blocks of bits of the same colour (think backround regions) or regular patterns (dithering). By setting aside a few bits as repeat markers we could compress these regions down to a single pattern, at the cost of random larger files for very random images and unpacking time.

We could get away with a lot less data for the animation (particularly the example above) by storing only frame deltas (changes), and using key frames. But we'd also need masking, and that takes memory... and yeah. Let's not, for now.

Rene Dudfield: Draft of, "How to port and market games using #python and #pygame."

$
0
0
This is a collaborative document, and a really early draft. Please feel free to add any tips or links in a comment here or on the reddit post https://www.reddit.com/r/pygame/comments/9aodt7/collaborative_doc_lets_write_pygame_distribution/
You've spent two years making a game, but now want other people to see it?
How do you port it to different platforms, and make it available to others? How do you let people know it is even a thing? Is your game Free Libre software, or shareware?

All python related applications are welcome on www.pygame.org. You'll need a screenshot, a description of your game, and some sort of URL to link people to (a github/gitlab/bitbucket perhaps).  But how and where else can you share it?

a few platforms to port to

  • itch.io and windows
  • windows store?
  • mac (for itch.io)
  • mac store
  • steam
  • linux 'flatpack' (latest fedora/ubuntu etc use this like an app store).
  • pypi (python packages can actually be installed by lots of people)
  • android store
  • web
  • debian
  • redhat/fedora

Make it a python package.

Some of the tools work more easily with your package as a python package. Working with all the different tools is sort of hard, and having a convention for packaging would make things easier.

Python packaging guide - http://packaging.python.org/

So, why don't we do things as a simple python package with one example app to do this? pygame has an example app already, solarwolf - https://github.com/pygame/solarwolf. With work, it could be a good example app to use. We can also link in this guide to other pygame apps that have been distributed on various places.

There are other example apps linked below for different distribution technology.
 

pyinstaller

https://www.pyinstaller.org/
This can make ones for linux, windows, and mac.
pyinstaller --onefile --windowed --icon=icon.ico .py

Windows

pynsist and pyinstaller can be used.
https://github.com/takluyver/pynsist

The benefit of pynsist is that it can create installers. Whereas pyinstaller is for making standalone executables (which is good if you are putting your app on the Steam store for example).

Windows code signing

Flatpak

apps on linux - https://flatpak.org/
Here's an example of making a pygame one. https://github.com/flathub/flathub/pull/478
Developer guide for more detail here - http://docs.flatpak.org/en/latest/

pypi

The python package system can mean your app can be available for everyone who can use pip. Which is an audience in the millions.

Mac

pyinstaller is probably the best option at the moment. If your game is open source, then you could use TravisCI for free to make builds with pyinstaller.

Unfortunately you probably need a Mac to make a mac build, test it, and release on the mac/ios stores. Getting a cheap apple machine off ebay might be the way to go. Or a cloud account perhaps from 'macincloud'. Also the mac developer program costs $100.

Another option might be to borrow a friends machine to make the builds when it's time.
See:

iOS

It's not easy, but possible.

With pygame 2 this should be possible since it uses the new SDL2.
If you use LGPL code on iOS you still have to let your users benefit from the protections the LGPL gives them.

Tom from renpy says... "I've been distributing Ren'Py under LGPL section 6c, which says that you can distribute it along with a written offer to provide the source code required to create the executables. Since Ren'Py has a reasonably strong distinction between the engine and game scripts, the user can then combine the game data from an iOS backup with the newly-linked Ren'Py to get a package they can install through xcode."https://github.com/renpy/pygame_sdl2/issues/109#issuecomment-412156973

An apple developer account costs $100, and selling things costs 30% of the cost of your app. https://developer.apple.com/

Steam

There's a few games released using pygame on steam. Here are two threads of games released:
Costs $100 to join up and sell a game on this store. https://partner.steamgames.com/
Recently someone used pyinstaller to package thier game.
pyinstaller --onefile --windowed --icon=icon.ico .py

SteamworksPy

A python module for the C++ steam sdk. https://github.com/Gramps/SteamworksPy
Made by someone who has released their game (using pygame) on steam.

Itch.io

"itch.io is an open marketplace for independent digital creators with a focus on independent video games."
Quite a few people have released their pygame games on itch.io.

Android

This isn't really possible to do well at the moment without a bit of work.

python-for-android seems the best option, but doesn't work well with pygame. https://github.com/kivy/python-for-androidThere is an old and unmaintained pygame recipe included (for an old pygame 1.9.1). With some work it should be possible to update the recipe to use the SDL2 support in pygame.

There was an older 'pygame subset for android' which is now unmaintained, and does not work with more recent Android devices.

Web

There's not really an 'export for web' option at the moment. It is possible with both CPython and SDL as well as SDL2 working on emscripten (the compiler for WASM and stuff that goes on the web).
Here is the latest 'cpython on web' project. https://github.com/iodide-project/pyodide

Building if you do not have a windows/mac/linux machine

CI tools

If your game is open source, you can use these systems to build your game remotely for free.
How to do that? Well, that's an exercise left up to the reader. Probably getting it to use pyinstaller, and having them upload the result somewhere.

One python app that uses Travis and Appveyor is the Mu editor. You can see how in their .travis.yml and appveyor.yml files. See https://github.com/mu-editor/mu


Virtualbox

With virtualbox (and other emulators) you can run some systems on your local machine. Which means you do not need to buy a new development machine yourself for those platforms.

Both windows and linux images are available that you could use legally.

https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/

Note, that it is good to do your testing on a free install, rather than testing on the same machine that you made your executables with. This is because perhaps you forgot to include some dependency, and that dependency is on the development machine, but not everyone else's machines.

Writing portable python code

Some old (but still valid) basic advice on making your game portable: https://www.pygame.org/wiki/distributing

Things like naming your files case sensitively.

Announcing your game.

Generic Indie game marketing guides all apply here.

Some python/pygame specific avenues for marketing and announcing...
Of course the python world is a tiny place compared to the entire world.



Icons.

Each platform has slightly different requirements for icons. This might be a nice place to link to all the requirements (TODO).


Making a game trailer (for youtube)

You may not need to make the best trailer, or even a good trailer. Just a screen capture of your game might be 'good enough' and is better than nothing.

How about making a trailer with pygame itself? You could call it 'demo mode', or 'intro mode'.
There's a free iMovie on Mac, the Microsoft video editor on windows, and blender for all platforms. An alternative is to use the python module moviepy and script your game trailer.

OBS is pretty good multi platform free screen capture software. https://obsproject.com/download

Animated gif

These are useful for sharing on twitter and other such places, so people can see game play.
You can save the .png files with pygame, and convert them to a gif with the 'convert' tool from imagemagik.
# brew install imagemagick
# sudo apt-get install imagemagick

# call this in your main loop.
pygame.image.save(surf, 'bla_%05d.png' % frame_idx)

Now you can convert the png files to
convert -delay 20 -loop 0 bla_*png animated.gif

Some solutions on stack overflow.

The No Title® Tech Blog: Optimize Images v1.2 – new features and finally available on PyPI!

$
0
0

The new release of my image optimization command-line utility is out. It has a couple of cool new features and, for the first time, it is now available on PyPI, which means you can just pip install it as any other Python package.

Mike Driscoll: Jupyter Notebook 101 Pre-Order

$
0
0

My latest book, Jupyter Notebook 101, is now available for Pre-Order on Leanpub.

This book is scheduled to be finished by November 2018. Should you purchase this book, you will get it in PDF, ePub and mobi formats.

Techiediaries - Django: Using Electron with Flask and python-shell

$
0
0

In the previous tutorial, we've seen how to use Electron and python-shell to create Python apps with Electron GUIs. This opens the door for using the modern frontend web technologies, the Node.js and npm modules (the biggest open source repository in the world) and the Python libraries combined to create powerful applications.

In this tutorial, we'll use Flask, a popular web framework for building web applications with Python, and Electron to build a desktop application with an Electron GUI. There are many benefits of combining Flask with Electron to build applications, such as:

  • If you are running a Python/Flask web developer, you can use your existing skills to build cross platform desktop applications;
  • If you already have an existing Flask application, you can easily target desktop apps without reinventing the wheel etc.

What's Electron?

We assume here that you are a Flask developer so an Electron introduction might be useful.

Electron is a platform, created by GitHub, to enable developers to create cross-platform desktop applications for Windows, Linux and macOS using web technologies i.e JavaScript, HTML and CSS.

Electron is based on Chromium, just like Chrome and Opera (and many browsers) so it's actually a web container. Electron also provides a Node.js runtime so you can use the Node.js APIs and ecosystem for building desktop apps (not just server apps and CLI tools).

Using Electron, you can use take benefits of the Node.js APIs, the modern HTML5 APIs but also a rich and cross-platform API for accessing native operating system features and creating native windows and dialogs.

Creating the Electron Application

Let's not re-invent the wheel and use the Electron application we created in the previous tutorial. It's available from GitHub, so you simply need to clone it and install the dependencies using the following commands:

git clone https://github.com/techiediaries/python-electron-app
cd python-electron-app
npm install
npm start

Electron Python

Creating a Basic Flask Application

Now, that we've created our Electron GUI application, let's create a basic Python/Flask application and use it as an "engine" for our application. We'll also use the python-shell module to enable communication between the Python process and Electron process.

We'll use Pipenv to create an isolated virtual environment for Python packages. Pipenv is the official package manager for Python.

First, create a virtual environment based on Python 3 using the following command:

pipenv --three

This will create a Pipfile file inside the project's folder and create a virtual environment inside your home folder.

You can now install the flask package using:

pipenv install flask 

Next activate the environment using:

pipenv shell

Next create the engine.py file and add this basic code to run a Flask server that simply returns the Hello World from Flask! response:

importsysfromflaskimportFlaskapp=Flask(__name__)@app.route("/")defhello():return"Hello World from Flask!"if__name__=="__main__":app.run(host='127.0.0.1',port=5000)

Now, open the main.js file and add the following code, inside the createWindow() function, to spawn Python and run the engine.py:

varpyshell=require('python-shell');pyshell.run('engine.py',function(err,results){if(err)console.log(err);});

We use the python-shell module which is installed when you executed npm install in the cloned project. If you are creating a new project from scratch, make sure to install python-shell and any other dependencies.

Now open the index.html file and add:

<ahref="http://127.0.0.1:5000/">Go</a>

When you click on the Go link, you'll visit the the home path of the Flask server:

Flask Electron

You'll get the Hello World from Flask! response:

Flask Electron

Conclusion

We've created a basic application with Flask and Electron. This can be further developed to create more complex desktop apps by implementing the logic in Python and Flask and use Electron with web technologies for creating the GUI interface.

Viewing all 22645 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>