Quantcast
Channel: Planet Python
Viewing all 23447 articles
Browse latest View live

Nikola: Nikola v7.7.5 is out!

$
0
0

On behalf of the Nikola team, I am pleased to announce the immediate availability of Nikola v7.7.5. It fixes some bugs and adds new features.

What is Nikola?

Nikola is a static site and blog generator, written in Python. It can use Mako and Jinja2 templates, and input in many popular markup formats, such as reStructuredText and Markdown — and can even turn Jupyter (IPython) Notebooks into blog posts! It also supports image galleries, and is multilingual. Nikola is flexible, and page builds are extremely fast, courtesy of doit (which is rebuilding only what has been changed).

Find out more at the website: https://getnikola.com/

Downloads

Install using pip install Nikola or download tarballs on GitHub and PyPI.

Changes

Features

  • Add nikola theme --new command for creating new themes (Issue #2231)
  • Add nikola theme --copy-template command for copying templates to customize them (Issue #2231)
  • Add nikola theme --uninstall command for deleting themes (Issue #2231)
  • Replace nikola install_theme with more capable nikola theme command (Issue #2231)
  • Allow for customizing github_deploy commit messages with -m (Issue #2198)
  • Commit to source branch automatically in github_deploy if GITHUB_COMMIT_SOURCE is set to True (Issue #2186)
  • Hugo-like shortcodes (Issue #1707)
  • New Galician translation
  • New facilities for data persistence and data caching (Issues #2209 and #2009)
  • (internal) allow scripts/jinjify.py usage with scripts (Issue #2240)

Bugfixes

  • Fix some rebuilds with indexes and galleries
  • Make state files work on Python 3
  • Don’t attempt to create redirects for URLs with query strings in WordPress imports if the site is in a subdirectory (Issue #2224)
  • Avoid some random file rebuilds (Issue #2220)
  • Honor MATHJAX_CONFIG setting
  • Display tags and archives in a unified format, with the date on the left, instead of a misplaced dash in tags (Issue #2212)
  • Decide is_mathjax based on current language tags (Issue #2205)
  • Don't duplicate images in flowr when resizing page (Issue #2202)

Mike Driscoll: Python Partials

$
0
0

Python comes with a fun module called functools. One of its classes is the partial class. You can use it create a new function with partial application of the arguments and keywords that you pass to it. You can use partial to “freeze” a portion of your function’s arguments and/or keywords which results in a new object. Another way to put it is that partial creates a new function with some defaults. Let’s look at an example!

>>>from functools import partial
>>>def add(x, y):
...     return x + y
... 
>>> p_add = partial(add, 2)>>> p_add(4)6

Here we create a simple adding function that returns the result of adding its arguments, x and y. Next we create a new callable by creating an instance of partial and passing it our function and an argument for that function. In other words, we are basically defaulting the x parameter of our add function to the number 2. Finally we call our new callable, p_add, with the argument of the number 4 which results in 6 because 2 + 4 = 6.

One handy use case for partials is to pass arguments to callbacks. Let’s take a look using wxPython:

import wx
 
from functools import partial 
 
 
########################################################################class MainFrame(wx.Frame):
    """
    This app shows a group of buttons
    """ 
    #----------------------------------------------------------------------def__init__(self, *args, **kwargs):
        """Constructor"""super(MainFrame, self).__init__(parent=None, title='Partial')
        panel = wx.Panel(self) 
        sizer = wx.BoxSizer(wx.VERTICAL)
        btn_labels = ['one', 'two', 'three']for label in btn_labels:
            btn = wx.Button(panel, label=label)
            btn.Bind(wx.EVT_BUTTON, partial(self.onButton, label=label))
            sizer.Add(btn, 0, wx.ALL, 5) 
        panel.SetSizer(sizer)self.Show() 
    #----------------------------------------------------------------------def onButton(self, event, label):
        """
        Event handler called when a button is pressed
        """print'You pressed: ', label
 
 
if __name__ == '__main__':
    app = wx.App(False)
    frame = MainFrame()
    app.MainLoop()

Here we use partial to call the onButton event handler with an extra argument, which happens to be the button’s label. This might not seem all that useful to you, but if you do much GUI programming, you’ll see a lot of people asking how to do this sort of thing. Of course, you could also use a lambda instead for passing arguments to callbacks.

One use case that we’ve used at work was for our automated test framework. We test a UI with Python and we wanted to be able to pass a function along to dismiss certain dialogs. Basically you would pass a function along with the name of the dialog to dismiss, but it would need to be called at a certain point in the process to work correctly. Since I can’t show that code, here’s a really basic example of passing a partial function around:

from functools import partial
 
#----------------------------------------------------------------------def add(x, y):
    """"""return x + y
 
#----------------------------------------------------------------------def multiply(x, y):
    """"""return x * y
 
#----------------------------------------------------------------------def run(func):
    """"""print func() 
#----------------------------------------------------------------------def main():
    """"""
    a1 = partial(add, 1, 2)
    m1 = partial(multiply, 5, 8)
    run(a1)
    run(m1) 
if __name__ == "__main__":
    main()

Here we create a couple of partial functions in our main function. Next we pass those partials to our run function, call it and then print out the result of the function that was called.


Wrapping Up

At this point, you should know how to use functools partial to create your own “frozen” callables. Partials have many uses, but they’re not always obvious. I recommend that you just start experimenting with them and you might start seeing uses for your own code. Have fun!


Related Reading

Import Python: ImportPython Issue 61

$
0
0

Word From Our Sponsor


Python Programmers let companies apply to you, not the other way around. Receive interview offers which include salary and equity information. Companies see each other's offers, and compete for your attention. Engage only with the companies you like. REGISTER

Worthy Read

podcast
In this episode I interview Ned Batchelder. I know that coverage.py is very important to a lot of people to understand how much of their code is being covered by their test suites. Since I’m far from an expert on coverage, I asked Ned to discuss it on the show.

Today, I'm pleased to announce the first major release of Zappa - a system for running "serverless" Python web applications using AWS Lambda and AWS API Gateway. Zappa handles all of the configuration and deployment automatically - now, you can deploy an infinitely scalable application to the cloud with a single command - all for a minute fraction of the cost of a traditional web server.

pycon
The Talks committee has been hard at work since the Call For Proposals closed 5 weeks ago, and today we are thrilled to present the result — here are the talks that the committee has chosen for PyCon 2016 in Portland, Oregon!

django
I think we can all agree that React and Django Rest Framework are both awesome. But hooking React into your Django app can really be a nightmare, especially if you’re unfamiliar with webpack, npm, and babel. I’m going to walk you through how to get Django Rest Framework (DRF) to work with React.

The Python 101 Screencast has been finished for a little over a month now and I am now releasing it for general consumption. The Python 101 Screencast is based on my book, Python 101. I went through all 44 chapters of the book and turned each of them into a standalone screencast.

pycon
PyCon accepted my talk "Write an Excellent Programming Blog". If you got in, too, congratulations! Now we have to write our talks. Steps Plan your time, Inspire, Outline, Rehearse immediately, Make room for new insights, Put off making slides, Rehearse with friends, Get a coach, Excel.

webcast
It’s time for another free hour-long Webinar! This time, I’ll be talking about the increasingly popular tools for data science in Python, namely Pandas and Matplotlib. How can you read data into Pandas, manipulate it, and then plot it? I’ll show you a large number of examples and use cases, and we’ll also have lots of time for Q&A.

docker
Docker is an open source infrastructure management platform for running and deploying software. The Docker platform is constantly evolving so an exact definition is currently a moving target. Docker can package up applications along with their necessary operating system dependencies for easier deployment across environments. In the long run it has the potential to be the abstraction layer that easily manages containers running on top of any type of server, regardless of whether that server is on Amazon Web Services, Google Compute Engine, Linode, Rackspace or elsewhere.

testing
The pytest core group is heading towards the biggest sprint in its history, to take place in the black forest town Freiburg in Germany. As of February 2016 we have started a funding campaign on Indiegogo to cover expenses The page also mentions some preliminary topics. Here is the campaign URL https://www.indiegogo.com/projects/python-testing-sprint-mid-2016#/

interview
Aisha Bello is a current student at Cardiff Metropolitan University, where she’s finishing up a MSc in Information Technology. Her final project is centered on open source data mining technologies for small and medium-sized hospitality organizations. Aisha co-organized and coached at Django Girls Windhoek in January 2016, and is also organizing a Django Girls workshop in Lagos, Nigeria in February 2016.

PEP
This PEP proposes the creation of a new platform tag for Python package built distributions, such as wheels, called manylinux1_{x86_64,i686} with external dependencies limited to a standardized, restricted subset of the Linux kernel and core userspace ABI. It proposes that PyPI support uploading and distributing wheels with this platform tag, and that pip support downloading and installing these packages on compatible platforms.

podcast
Looking for an open source alternative to Mathematica or MatLab for solving algebraic equations? Look no further than the excellent SymPy project. It is a well built and easy to use Computer Algebra System (CAS) and in this episode we spoke with the current project maintainer Aaron Meurer about its capabilities and when you might want to use it.

core python
Why are you recommending Python? That's the question a colleague of mine asked when I was pitching Python for data science work. It is a fair question, and I tried to answer with facts and not opinions.

django
I've been using this pattern for many years in my applications, and I always found strange nobody ever mentioned it. Since it has prove useful to me in many different projects, I think it's a perfect occasion to put some life again in my blog.

Luthor utilizes all efficient tricks from pholcidae to make XML parsing as simple as never before. Luthor uses lxml's iterable parsing mechanism to parse files of any size.


Jobs


Paris, France
At Gorgias (Techstars NYC '15), we’re making customer support software to easily handle big volumes of customer requests. We’re using machine-learning to automatically group requests, suggest the right responses and show only the relevant information for the customer support agent to take swift action on repetitive tasks.

Bangalore, Karnataka, India
Springboard is an online education startup with a big vision for transforming education using open online learning. We believe we are in the early days of a revolution that will not only increase access to great education, but also transform the way people learn.

Cornwall, United Kingdom
Headforwards are looking for a Python developer to join their team on an existing project. Some front end experience with web technologies such as HTML / CSS / JQuery / AJAX / JavaScript / HTML5 would be useful additional knowledge.



Projects

BinaryNet - 76 Stars, 5 Fork
Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

pip-check - 54 Stars, 1 Fork
Gives you a quick overview of all installed packages and their update status. Very much like pip list -o but with colors. Inspired by npm-check though lacks most of its features (yet).

bigchaindb - 47 Stars, 6 Fork
A scalable blockchain database

pyq - 45 Stars, 1 Fork
A tool to search for Python code using jQuery-like selectors

Implementation of Dynamic memory networks by Kumar et al. http://arxiv.org/abs/1506.07285

grabbySub2 - 15 Stars, 1 Fork
Grab movie/tv subtitles, hand crafted for you!

models - 9 Stars, 3 Fork
Models built with TensorFlow

django_simpleCrud - 8 Stars, 1 Fork
This is django simple crud featuring Bootstrap CSS + jQuery

kaggle-homesite - 7 Stars, 1 Fork
Top15 Model for Kaggle-Competition "Homesite Quote Conversion"

AWSInventoryLambda - 6 Stars, 3 Fork
Save AWS inventory as CSV on S3 and trigger emails

modpack-helper - 6 Stars, 0 Fork
Ease the deployement of modded Minecraft servers

pkgquiz - 4 Stars, 3 Fork
Match debian package names to their descriptions

pygraph - 4 Stars, 0 Fork
CLI interface to python graphviz

erandom - 4 Stars, 0 Fork
Like /dev/random but with emojis!

bin_analyzer - 4 Stars, 0 Fork
Toy project for static analysis of ELF binaries

animated-win-background - 4 Stars, 1 Fork
Animated backgrounds on Windows (GIF frames)

django-auth-adfs - 3 Stars, 0 Fork
A Django authentication backend for ADFS

Brett Cannon: How the heck does async/await work in Python 3.5?

$
0
0

Being a core developer of Python has made me want to understand how the language generally works. I realize there will always be obscure corners where I don't know every intricate detail, but to be able to help with issues and the general design of Python I feel like I should try and understand its core semantics and how things work under the hood.

But until recently I didn't understand how async/await worked in Python 3.5. I knew that yield from in Python 3.3 combined with asyncio in Python 3.4 had led to this new syntax. But having not done a lot of networking stuff -- which asyncio is not limited to but does focus on -- had led to me not really paying much attention to all of this async/await stuff. I mean I knew that:

yield fromiterator

was (essentially) equivalent to:

forxiniterator:yieldx

And I knew that asyncio was an event loop framework which allowed for asynchronous programming, and I knew what those words (basically) meant on their own. But having never dived into the async/await syntax to understand how all of this came together, I felt I didn't understand asynchronous programming in Python which bothered me. So I decided to take the time and try and figure out how the heck all of it worked. And since I have heard from various people that they too didn't understand how this new world of asynchronous programming worked, I decided to write this essay (yes, this post has taken so long in time and is so long in words that my wife has labeled it an essay).

Now because I wanted a properly understanding of how the syntax worked, this essay has some low-level technical detail about how CPython does things. It's totally okay if it's more detail than you want or that you don't fully understand it as I don't explain every nuance of CPython internals in order to keep this from turning into a book (e.g., if you don't know that code objects have flags, let alone what a code object is, it's okay and you don't need to care to get something from this essay). I have tried to provide a more accessible summary at the end of every section so that you can skim the details if they turn out to be more than you want to deal with.

A history lesson about coroutines in Python

According to Wikipedia, "Coroutines are computer program components that generalize subroutines for nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations". That's a rather technical way of saying, "coroutines are functions whose execution you can pause". And if you are saying to yourself, "that sounds like generators", you would be right.

Back in Python 2.2, generators were first introduced by PEP 255 (they are also called generator iterators since generators implement the iterator protocol). Primarily inspired by the Icon programming language, generators allowed for a way to create an iterator that didn't waste memory when calculating the next value in the iteration. For instance, if you wanted to create your own version of range(), you could do it in an eager fashion by creating a list of integers:

defeager_range(up_to):"""Create a list of integers, from 0 to up_to, exclusive."""sequence=[]index=0whileindex<up_to:sequence.append(index)index+=1returnsequence

The problem with this, though, is that if you want a large sequence like the integers from 0 to 1,000,000, you have to create a list long enough to hold 1,000,000 integers. But when generators were added to the language, you could suddenly create an iterator that didn't need to create the whole sequence upfront. Instead, all you had to do is have enough memory for one integer at a time.

deflazy_range(up_to):"""Generator to return the sequence of integers from 0 to up_to, exclusive."""index=0whileindex<up_to:yieldindexindex+=1

Having a function pause what it is doing whenever it hit a yield expression -- although it was a statement until Python 2.5 -- and then be able to resume later is very useful in terms of using less memory, allowing for the idea of infinite sequences, etc.

But as you may have noticed, generators are all about iterators. Now having a better way to create iterators is obviously great (and this is shown when you define an __iter__() method on an object as a generator), but people knew that if we took the "pausing" part of generators and added in a "send stuff back in" aspect to them, Python would suddenly have the concept of coroutines in Python (but until I say otherwise, consider this all just a concept in Python; concrete coroutines in Python are discussed later on). And that exact feature of sending stuff into a paused generator was added in Python 2.5 thanks to PEP 342. Among other things, PEP 342 introduced the send() method on generators. This allowed one to not only pause generators, but to send a value back into a generator where it paused. Taking our range() example further, you could make it so the sequence jumped forward or backward by some amount:

defjumping_range(up_to):"""Generator for the sequence of integers from 0 to up_to, exclusive.    Sending a value into the generator will shift the sequence by that amount."""index=0whileindex<up_to:jump=yieldindexifjumpisNone:jump=1index+=jumpif__name__=='__main__':iterator=jumping_range(5)print(next(iterator))# 0print(iterator.send(2))# 2print(next(iterator))# 3print(iterator.send(-1))# 2forxiniterator:print(x)# 3, 4

Generators were not mucked with again until Python 3.3 when PEP 380 added yield from. Strictly speaking, the feature empowers you to refactor generators in a clean way by making it easy to yield every value from an iterator (which a generator conveniently happens to be).

deflazy_range(up_to):"""Generator to return the sequence of integers from 0 to up_to, exclusive."""index=0defgratuitous_refactor():whileindex<up_to:yieldindexindex+=1yield fromgratuitous_refactor()

By virtue of making refactoring easier, yield from also lets you chain generators together so that values bubble up and down the call stack without code having to do anything special.

defbottom():# Returning the yield lets the value that goes up the call stack to come right back# down.return(yield42)defmiddle():return(yield frombottom())deftop():return(yield frommiddle())# Get the generator.gen=top()value=next(gen)print(value)# Prints '42'.try:value=gen.send(value*2)exceptStopIterationasexc:value=exc.valueprint(value)# Prints '84'.

Summary

Generators in Python 2.2 let the execution of code be paused. Once the ability to send values back into the paused generators were introduced in Python 2.5, the concept of coroutines in Python became possible. And the addition of yield from in Python 3.3 made it easier to refactor generators as well as chain them together.

What is an event loop?

It's important to understand what an event loop is and how they make asynchronous programming possible if you're going to care about async/await. If you have done GUI programming before -- including web front-end work -- then you have worked with an event loop. But since having the concept of asynchronous programming as a language construct is new in Python, it's okay if you don't happen to know what an event loop is.

Going back to Wikipedia, an event loop"is a programming construct that waits for and dispatches events or messages in a program". Basically an event loop lets you go, "when A happens, do B". Probably the easiest example to explain this is that of the JavaScript event loop that's in every browser. Whenever you click something ("when A happens"), the click is given to the JavaScript event loop which checks if any onclick callback was registered to handle that click ("do B"). If any callbacks were registered then the callback is called with the details of the click. The event loop is considered a loop because it is constantly collecting events and loops over them to find what to with the event.

In Python's case, asyncio was added to the standard library to provide an event loop. There's a focus on networking in asyncio which in the case of the event loop is to make the "when A happens" to be when I/O from a socket is ready for reading and/or writing (via the selectors module). Other than GUIs and I/O, event loops are also often used for executing code in another thread or subprocess and have the event loop act as the scheduler (i.e., cooperative multitasking). If you happen to understand Python's GIL, event loops are useful in cases where releasing the GIL is possible and useful.

Summary

Event loops provide a loop which lets you say, "when A happens then do B". Basically an event loop watches out for when something occurs, and when something that the event loop cares about happens it then calls any code that cares about what happened. Python gained an event loop in the standard library in the form of asyncio in Python 3.4.

How async and await work

The way it was in Python 3.4

Between the generators found in Python 3.3 and an event loop in the form of asyncio, Python 3.4 had enough to support asynchronous programming in the form of concurrent programming. Asynchronous programming is basically programming where execution order is not known ahead of time (hence asynchronous instead of synchronous). Concurrent programming is writing code to execute independently of other parts, even if it all executes in a single thread (concurrency is not parallelism). For example, the following is Python 3.4 code to count down every second in two asynchronous, concurrent function calls.

importasyncio# Borrowed from http://curio.readthedocs.org/en/latest/tutorial.html.@asyncio.coroutinedefcountdown(number,n):whilen>0:print('T-minus',n,'({})'.format(number))yield fromasyncio.sleep(1)n-=1loop=asyncio.get_event_loop()tasks=[asyncio.ensure_future(countdown("A",2)),asyncio.ensure_future(countdown("B",3))]loop.run_until_complete(asyncio.wait(tasks))loop.close()

In Python 3.4, the asyncio.coroutine decorator was used to label a function as acting as a coroutine that was meant for use with asyncio and its event loop. This gave Python its first concrete definition of a coroutine: an object who implemented the methods added to generators in PEP 342 and represented by the collections.abc.Coroutine abstract base class. This meant that suddenly all generators implemented the coroutine interface even if they weren't meant to be used in that fashion. To fix this, asyncio required that all generators meant to be used as a coroutine had to be decorated with asyncio.coroutine.

With this concrete definition of a coroutine (which matched an API that generators provided), you then used yield from on any asyncio.Future object to pass it down to the event loop, pausing execution of the coroutine while you waited for something to happen (being a future object is an implementation detail of asyncio and not important). Once the future object reached the event loop it was monitored there until the future object was done doing whatever it needed to do. Once the future was done doing its thing, the event loop noticed and the coroutine that was paused waiting for the future's result started again with its result sent back into the coroutine using its send() method.

Take our example above. The event loop starts each of the countdown() coroutine calls, executing until it hits yield from and the asyncio.sleep() function in one of them. That returns an asyncio.Future object which gets passed down to the event loop and pauses execution of the coroutine. There the event loop watches the future object until the one second is over (as well as checking on other stuff it's watching, like the other coroutine). Once the one second is up, the event loop takes the paused countdown() coroutine that gave the event loop the future object, sends the result of the future object back into the coroutine that gave it the future object in the first place, and the coroutine starts running again. This keeps going until all of the countdown() coroutines are finished running and the event loop has nothing to watch. I'll actually show you a complete example of how exactly all of this coroutine/event loop stuff works later, but first I want to explain how async and await work.

Going from yield from to await in Python 3.5

In Python 3.4, a function that was flagged as a coroutine for the purposes of asynchronous programming looked like:

# This also works in Python 3.5.@asyncio.coroutinedefpy34_coro():yield fromstuff()

In Python 3.5, the types.coroutine decorator has been added to also flag a generator as a coroutine like asyncio.coroutine does. You can also use async def to syntactically define a function as being a coroutine, although it cannot contain any form of yield expression; only return and await are allowed for returning a value from the coroutine.

asyncdefpy35_coro():awaitstuff()

A key thing async and types.coroutine do, though, istighten the definition of what a coroutine is. It takescoroutines from simply being an interface to an actual type, making the distinction between any generator and a generator that is meant to be a coroutine much more stringent (and the inspect.iscoroutine() function is even stricter by saying async has to be used).

You will also notice that beyond just async, the Python 3.5 example introduces await expressions (which are only valid within an async def). While await operates much like yield from, the objects that are acceptable to an await expression are different. Coroutines are definitely allowed in an await expression since the concept of coroutines are fundamental in all of this. But when you call await on an object , it technically needs to be an awaitable object: an object that defines an __await__() method which returns an iterator which is not a coroutine itself . Coroutines themselves are also considered awaitable objects (hence why collections.abc.Coroutine inherits from collections.abc.Awaitable). This definition follows a Python tradition of making most syntax constructs translate into a method call underneath the hood, much like a + b is a.__add__(b) or b.__radd__(a) underneath it all.

How does the difference between yield from and await play out at a low level (i.e., a generator with types.coroutine vs. one with async def)? Let's look at the bytecode of the two examples above in Python 3.5 to get at the nitty-gritty details. The bytecode for py34_coro() is:

>>>dis.dis(py34_coro)20LOAD_GLOBAL0(stuff)3CALL_FUNCTION0(0positional,0keywordpair)6GET_YIELD_FROM_ITER7LOAD_CONST0(None)10YIELD_FROM11POP_TOP12LOAD_CONST0(None)15RETURN_VALUE

The bytecode for py35_coro() is :

>>>dis.dis(py35_coro)10LOAD_GLOBAL0(stuff)3CALL_FUNCTION0(0positional,0keywordpair)6GET_AWAITABLE7LOAD_CONST0(None)10YIELD_FROM11POP_TOP12LOAD_CONST0(None)15RETURN_VALUE

Ignoring the difference in line number due to py34_coro() having the asyncio.coroutine decorator, the only visible difference between them is the GET_YIELD_FROM_ITER opcode versus the GET_AWAITABLE opcode. Both functions are properly flagged as being coroutines, so there's no difference there. In the case of GET_YIELD_FROM_ITER, it simply checks if its argument is a generator or coroutine, otherwise it calls iter() on its argument (the acceptance of a coroutine object by the opcode for yield from is only allowed when the opcode is used from within a coroutine itself, which is true in this case thanks to the types.coroutine decorator flagging the generator as such at the C level with the CO_ITERABLE_COROUTINE flag on the code object).

But GET_AWAITABLE does something different. While the bytecode will accept a coroutine just like GET_YIELD_FROM_ITER, it will not accept a generator if has not been flagged as a coroutine. Beyond just coroutines, though, the bytecode will accepted an awaitable object as discussed earlier. This makes yield from expressions and await expressions both accept coroutines while differing on whether they accept plain generators or awaitable objects, respectively.

You may be wondering why the difference between what an async-based coroutine and a generator-based coroutine will accept in their respective pausing expressions? The key reason for this is to make sure you don't mess up and accidentally mix and match objects that just happen to have the same API to the best of Python's abilities. Since generators inherently implement the API for coroutines then it would be easy to accidentally use a generator when you actually expected to be using a coroutine. And since not all generators are written to be used in a coroutine-based control flow, you need to avoid accidentally using a generator incorrectly. But since Python is not statically compiled, the best the language can offer is runtime checks when using a generator-defined coroutine. This means that when types.coroutine is used, Python's compiler can't tell if a generator is going to be used as a coroutine or just a plain generator (remember, just because the syntax says types.coroutine that doesn't mean someone hasn't earlier done types = spam earlier), and thus different opcodes that have different restrictions are emitted by the compiler based on the knowledge it has at the time.

One very key point I want to make about the difference between a generator-based coroutine and an async one is that only generator-based coroutines can actually pause execution and force something to be sent down to the event loop. You typically don't see this very important detail because you usually call event loop-specific functions like the asyncio.sleep() function since event loops implement their own APIs and these are the kind of functions that have to worry about this little detail. For the vast majority of us, we will work with event loops rather than be writing them and thus only be writing async coroutines and never need to really care about this. But if you're like me and were wondering why you couldn't write something like asyncio.sleep() using only async coroutines, this can be quite the "aha!" moment.

Summary

Let's summarize all of this into simpler terms. Defining a method with async def makes it a coroutine. The other way to make a coroutine is to flag a generator with types.coroutine -- technically the flag is the CO_ITERABLE_COROUTINE flag on a code object -- or a subclass of collections.abc.Coroutine. You can only make a coroutine call chain pause with a generator-based coroutine.

An awaitable object is either a coroutine or an object that defines __await__() -- technically collections.abc.Awaitable -- which returns an iterator that is not a coroutine. An await expression is basically yield from but with restrictions of only working with awaitable objects (plain generators will not work with an await expression). An async function is a coroutine that either has return statements -- including the implicit return None at the end of every function in Python -- and/or await expressions (yield expressions are not allowed). The restrictions for async functions is to make sure you don't accidentally mix and match generator-based coroutines with other generators since the expected use of the two types of generators are rather different.

Think of async/await as an API for asynchronous programming

A key thing that I want to point out is actually something I didn't really think deeply about until I watched David Beazley's Python Brasil 2015 keynote. In that talk, David pointed out that async/await is really an API for asynchronous programming (which he reiterated to me on Twitter). What David means by this is that people shouldn't think that async/await as synonymous with asyncio, but instead think that asyncio is a framework that can utilize the async/await API for asynchronous programming.

David actually believes this idea of async/await being an asynchronous programming API that he has created the curio project to implement his own event loop. This has helped make it clear to me that async/await allows Python to provide the building blocks for asynchronous programming, but without tying you to a specific event loop or other low-level details (unlike other programming languages which integrate the event loop into the language directly). This allows for projects like curio to not only operate differently at a lower level (e.g., asyncio uses future objects as the API for talking to its event loop while curio uses tuples), but to also have different focuses and performance characteristics (e.g., asyncio has an entire framework for implementing transport and protocol layers which makes it extensible while curio is simpler and expects the user to worry about that kind of thing but also allows it to run faster).

Based on the (short) history of asynchronous programming in Python, it's understandable that people might think that async/await == asyncio. I mean asyncio was what helped make asynchronous programming possible in Python 3.4 and was a motivating factor for adding async/await in Python 3.5. But the design of async/await is purposefully flexible enough to not require asyncio or contort any critical design decision just for that framework. In other words, async/await continues Python's tradition of designing things to be as flexible as possible while still being pragmatic to use (and implement).

An example

At this point your head might be awash with new terms and concepts, making it a little hard to fully grasp how all of this is supposed to work to provide you asynchronous programming. To help make it all much more concrete, here is a complete (if contrived) asynchronous programming example, end-to-end from event loop and associated functions to user code. The example has coroutines which represents individual rocket launch countdowns but that appear to be counting down simultaneously . This is asynchronous programming through concurrency; three separate coroutines will be running independently, and yet it will all be done in a single thread.

importdatetimeimportheapqimporttypesimporttimeclassTask:"""Represent how long a coroutine should before starting again.    Comparison operators are implemented for use by heapq. Two-item    tuples unfortunately don't work because when the datetime.datetime    instances are equal, comparison falls to the coroutine and they don't    implement comparison methods, triggering an exception.    Think of this as being like asyncio.Task/curio.Task."""def__init__(self,wait_until,coro):self.coro=coroself.waiting_until=wait_untildef__eq__(self,other):returnself.waiting_until==other.waiting_untildef__lt__(self,other):returnself.waiting_until<other.waiting_untilclassSleepingLoop:"""An event loop focused on delaying execution of coroutines.    Think of this as being like asyncio.BaseEventLoop/curio.Kernel."""def__init__(self,*coros):self._new=corosself._waiting=[]defrun_until_complete(self):# Start all the coroutines.forcoroinself._new:wait_for=coro.send(None)heapq.heappush(self._waiting,Task(wait_for,coro))# Keep running until there is no more work to do.whileself._waiting:now=datetime.datetime.now()# Get the coroutine with the soonest resumption time.task=heapq.heappop(self._waiting)ifnow<task.waiting_until:# We're ahead of schedule; wait until it's time to resume.delta=task.waiting_until-nowtime.sleep(delta.total_seconds())now=datetime.datetime.now()try:# It's time to resume the coroutine.wait_until=task.coro.send(now)heapq.heappush(self._waiting,Task(wait_until,task.coro))exceptStopIteration:# The coroutine is done.pass@types.coroutinedefsleep(seconds):"""Pause a coroutine for the specified number of seconds.    Think of this as being like asyncio.sleep()/curio.sleep()."""now=datetime.datetime.now()wait_until=now+datetime.timedelta(seconds=seconds)# Make all coroutines on the call stack pause; the need to use `yield`# necessitates this be generator-based and not an async-based coroutine.actual=yieldwait_until# Resume the execution stack, sending back how long we actually waited.returnactual-nowasyncdefcountdown(label,length,*,delay=0):"""Countdown a launch for `length` seconds, waiting `delay` seconds.    This is what a user would typically write."""print(label,'waiting',delay,'seconds before starting countdown')delta=awaitsleep(delay)print(label,'starting after waiting',delta)whilelength:print(label,'T-minus',length)waited=awaitsleep(1)length-=1print(label,'lift-off!')defmain():"""Start the event loop, counting down 3 separate launches.    This is what a user would typically write."""loop=SleepingLoop(countdown('A',5),countdown('B',3,delay=2),countdown('C',4,delay=1))start=datetime.datetime.now()loop.run_until_complete()print('Total elapsed time is',datetime.datetime.now()-start)if__name__=='__main__':main()

As I said, it's contrived, but if you run this in Python 3.5 you will notice that all three coroutines run independently in a single thread and yet the total amount of time taken to run is about 5 seconds. You can consider Task, SleepingLoop, and sleep() as what an event loop provider like asyncio and curio would give you. For a normal user, only the code in countdown() and main() are of importance. As you can see, there is no magic to async, await, or this whole asynchronous programming deal; it's just an API that Python provides you to help make this sort of thing easier.

My hopes and dreams for the future

Now that I understand how this asynchronous programming works in Python, I want to use it all the time! It's such an awesome concept that's so much better than something you would have used threads for previously. The problem is that Python 3.5 is so new that async/await is also very new. That means there are not a lot of libraries out there supporting asynchronous programming like this. For instance, to do HTTP requests you either have to construct the HTTP request yourself by hand (yuck), use a project like the aiohttp framework which adds HTTP on top of another event loop (in this case, asyncio), or hope more projects like the hyper library continue to spring up to provide an abstraction for things like HTTP which allow you to use whatever I/O library you want (although unfortunately hyper only supports HTTP/2 at the moment).

Personally, I hope projects like hyper take off so that we have a clear separation between getting binary data from I/O and how we interpret that binary data. This kind of abstraction is important because most I/O libraries in Python are rather tightly coupled to how they do I/O and how they handle data coming from I/O. This is a problem with the http package in Python's standard library as it doesn't have an HTTP parser but a connection object which does all the I/O for you. And if you were hoping requests would support asynchronous programming, your hopes have already been dashed because the synchronous I/O that requests uses is baked into its design. This shift in ability to do asynchronous programming gives the Python community a chance to fix a problem it has with not having abstractions at the various layers of the network stack. And we have the perk of it not being hard to make asynchronous code run as if its synchronous, so tools filling the void for asynchronous programming can work in both worlds.

I also hope that Python gains some form of support in async coroutines for yield. Maybe this will require yet another keyword (maybe something like anticipate?), but the fact that you actually can't implement an event loop system with just async coroutines bothers me. Luckily, it turns out I'm not the only one who thinks this, and since the author of PEP 492 agrees with me, I think there's a chance of getting this quirk removed.

Conclusion

Basically async and await are fancy generators that we call coroutines and there is some extra support for things called awaitable objects and turning plain generators in to coroutines. All of this comes together to support concurrency so that we have better support for asynchronous programming in Python. It's awesome and much easier to use than comparable approaches like threads -- I wrote an end-to-end example of asynchronous programming in under 100 lines of commented Python code -- while still being quite flexible and fast (the curio FAQ says that it runs faster than twisted by 30-40% but slower than gevent by 10-15%, and all while being implemented in pure Python; remember that Python 2 + Twisted can use less memory and is easier to debug than Go, so just imagine what you could do with this!). I'm very happy that this landed in Python 3 and I look forward to the community embracing it and helping to flesh out its support in libraries and frameworks so we can all benefit from asynchronous programming in Python.

Ned Batchelder: The value of unit tests

$
0
0

Seems like testing and podcasts are in the air... First, I was interviewed on Brian Okken's Python Test podcast. I wasn't sure what to expect. The conversation went in a few different directions, and it was really nice to just chat with Brian for 45 minutes. We talked about coverage.py, testing, doing presentations, edX, and a few other things.

Then I see that Brian was himself a guest on Talk Python to Me, Michael Kennedy's podcast about all things Python.

On that episode, Brian does a good job arguing against some of the prevailing beliefs about testing. For example, he explains why unit tests are bad, and integration tests are good. His argument boils down to, you should test the promises you've made. Unit tests mostly deal with internal details that are not promises you've made to the outside world, so why focus on testing them? The important thing is whether your product behaves right from the outside.

I liked this argument, it made sense. But I don't think I agree with it. Or, I completely agree with it, and come to a different conclusion.

When I build a complex system, I can't deal with the whole thing at once. I need to think of it as a collection of smaller pieces. And the boundaries between those pieces need to remain somewhat stable. So they are promises, not to the outside world, but to myself. And since I have made those promises to myself, I want unit tests to be sure I'm keeping those promises.

Another value of unit tests is that they are a way to chop up combinatorial explosions. If my system has three main components, and each of them can be in ten different states, I'll need 1000 integration tests to cover all the possibilities. If I can test each component in isolation, then I only need 30 unit tests to cover the possibilities, plus a small number of integration tests to consider everything mashed together. Not to mention, the unit tests will be faster than the integration tests. Which would you rather have? 1000 slow tests, or 30 fast tests plus 20 slow tests?

Sure, it's possible to overdo unit testing. And it's really easy to have all your unit tests pass and still have a broken system. You need integration tests to be sure everything fits together properly. Finding the right balance is an art. I really like hearing Brian's take on it. Give it a listen.

Giampaolo Rodola: How to always execute exit functions in Python

$
0
0
...or why atexit.register() and signal.signal() are evil

Many people erroneously think that any function registered via atexit module is guaranteed to always be executed when the program terminates. You may have noticed this is not the case when, for example, you daemonize your app in production then try to stop it or restart it: the cleanup functions will not be executed. This is because functions registered wth atexit module are not called when the program is killed by a signal:
import atexit, os, signal

@atexit.register
def cleanup():
print("on exit") # XXX this never gets printed

os.kill(os.getpid(), signal.SIGTERM)

It must be noted that the same thing would happen if instead of atexit.register() we would use a "finally" clause. It turns out the correct way to make sure the exit function is always called in case a signal is received is to register it via signal.signal(). That has a drawback though: in case a third-party module has already registered a function for that signal (SIGTERM or whatever), your new function will overwrite the old one:

import os, signal

def old(*args):
print("old") # XXX this never gets printed

def new(*args):
print("new")

signal.signal(signal.SIGTERM, old)
signal.signal(signal.SIGTERM, new)
os.kill(os.getpid(), signal.SIGTERM)

Also, we would still have to use atexit.register() so that the function is called also on "clean" interpreter exit and take into account other signals other than SIGTERM which would cause the process to terminate. This recipe attempts to address all these issues so that:
  •  the exit function is always executed for all exit signals (SIGTERM, SIGINT, SIGQUIT, SIGABRT) and on "clean" interpreter exit.
  • any exit function(s) previously registered via atexit.register() or signal.signal() will be executed as well (after the new one). 
  • It must be noted that exit function will not be executed in case of SIGKILL, SIGSTOP or os._exit().

The code

import atexit
import os
import signal
import sys


_registered_exit_funs = set()
_executed_exit_funs = set()
if os.name == 'posix':
# https://en.wikipedia.org/wiki/Unix_signal#POSIX_signals
_exit_signals = frozenset([
signal.SIGTERM, # sent by kill cmd by default
signal.SIGINT, # CTRL ^ C, aka KeyboardInterrupt
signal.SIGQUIT, # CTRL ^ D
# signal.SIGHUP, # terminal closed or daemon rotating files
signal.SIGABRT, # os.abort()
])
else:
_exit_signals = frozenset([
signal.SIGTERM,
signal.SIGINT, # CTRL ^ C
signal.SIGABRT, # os.abort()
signal.SIGBREAK, # CTRL ^ break / signal.CTRL_BREAK_EVENT
])


def register_exit_fun(fun, signals=_exit_signals):
"""Register a function which will be executed on clean interpreter
exit or in case one of the `signals` is received by this process
(differently from atexit.register()).

Also, it makes sure to execute any previously registered signal
handler as well. If any, it will be executed after `fun`.

Functions which were already registered or executed will be
skipped.

Exit function will not be executed on SIGKILL, SIGSTOP or
os._exit(0).
"""
def fun_wrapper():
if fun not in _executed_exit_funs:
try:
fun()
finally:
_executed_exit_funs.add(fun)

def signal_wrapper(signum=None, frame=None):
if signum is not None:
pass
# You may want to add some logging here.
# XXX: if logging module is used it may complain with
# "No handlers could be found for logger"
# smap = dict([(getattr(signal, x), x) for x in dir(signal)
# if x.startswith('SIG')])
# print("signal {} received by process with PID {}".format(
# smap.get(signum, signum), os.getpid()))
fun_wrapper()
# Only return the original signal this process was hit with
# in case fun returns with no errors, otherwise process will
# return with sig 1.
if signum is not None:
sys.exit(signum)

if not callable(fun):
raise TypeError("{!r} is not callable".format(fun))
set([fun]) # raise exc if obj is not hash-able

for sig in signals:
# Register function for this signal and pop() the previously
# registered one (if any). This can either be a callable,
# SIG_IGN (ignore signal) or SIG_DFL (perform default action
# for signal).
old_handler = signal.signal(sig, signal_wrapper)
if old_handler not in (signal.SIG_DFL, signal.SIG_IGN):
# ...just for extra safety.
if not callable(old_handler):
continue
# This is needed otherwise we'll get a KeyboardInterrupt
# strace on interpreter exit, even if the process exited
# with sig 0.
if (sig == signal.SIGINT and
old_handler is signal.default_int_handler):
continue
# There was a function which was already registered for this
# signal. Register it again so it will get executed (after our
# new fun).
if old_handler not in _registered_exit_funs:
atexit.register(old_handler)
_registered_exit_funs.add(old_handler)

# This further registration will be executed in case of clean
# interpreter exit (no signals received).
if fun not in _registered_exit_funs or not signals:
atexit.register(fun_wrapper)
_registered_exit_funs.add(fun)

Usage

As a function:
def cleanup():
print("cleanup")

register_exit_fun(cleanup)
As a decorator:
@register_exit_fun
def cleanup():
print("cleanup")

Unit tests

This recipe is currently provided as a gist with a full set of unittests. It works with Python 2 and 3.

Notes about Windows

On Windows signals are only partially supported meaning a function which was previously registered via signal.signal() will be executed only on interpreter exit, but not if the process receives a signal. Apparently this is a limitation either of Windows or the signal module (most likely Windows).

Proposal for stdlib inclusion

The fact that atexit module does not handle signals and that signal.signal() overwrites previously registered handlers is unfortunate. It is also confusing because it is not immediately clear which one you are supposed to use (and it turns out you're supposed to use both). Most of the times you have no idea (or don't care) that you're overwriting another exit function. As a user, I would just want to execute an exit function, no matter what, possibly without messing with whatever a module I've previously imported has done with signal.signal(). To me this suggests there could be space for something like "atexit.register_w_signals".

External discussions

Weekly Python StackOverflow Report: (vi) stackoverflow python report

$
0
0

Investing using Python: 15 years of forex tick data to MongoDB using Python. Part One

$
0
0
Here's another idea of mine. I've decided to download, process and make available through my brand new MongoDB 15 years of forex tick data from GAIN (Also possible from TrueFX/ Pepperstone). The procedure, all automatic (I'm too lazy to download by hand): Walk through pages and get all files to download. Download. Unzip, read csv […]

Omaha Python Users Group: February 17 Meeting Details

$
0
0

Topic/Speaker – Web Scraping by Keith Nickumweb-scraping

Location – The recently opened Do Space at 7205 Dodge Street, Omaha.  SW corner of 72nd and Dodge.

Meeting starts at 6:30 pm, Wednesday, February 17

Yasoob Khalid: Free Weekly Python Workshop

$
0
0

Hi there everyone!

I have an interesting opportunity for you guys. I am planning on doing a weekly online Python workshop. It will be for 2 hours every week. This is only for women (for now) who want to get started with programming or want to improve their current knowledge. It is entirely free. This is your best chance to learn from me. The seats will be limited to 5-10 in the first batch as it is an experiment for me as well. So try to fill in the form as soon as you can. I will announce the names of the selected candidates after a couple of days.

Best of luck!

Form: http://goo.gl/forms/nk7lM976bm

PS: You can share this form with anyone else you know who would be interested in this opportunity.

If you have any questions then feel free to write them in the comments bellow. I would love to answer them :)


Vasudev Ram: Examples of method chaining in Python

$
0
0

By Vasudev Ram



The topic of method chaining came up during a training program I was conducting. So I thought of writing a post about it, with a couple of examples.

Method chaining is a technique (in object-oriented languages) for making multiple method calls on the same object, without using the object reference more than once. Example:

Let's say we have a class Foo that contains two methods, bar and baz.
We create an instance of the class Foo:
foo = Foo()
Without method chaining, to call both bar and baz in turn, on the object foo, we would do this:
# Fragment 1
foo.bar() # Call method bar() on object foo.
foo.baz() # Call method baz() on object foo.
# With method chaining, we can this:
# Fragment 2
Chain calls to methods bar() and baz() on object foo.
foo.bar().baz()
So you can loosely think of method chaining as the object-oriented version of nested function calls in procedural programming, where, instead of this:
# Fragment 3
temp1 = foo(args)
result = bar(temp)
you would do this:
# Fragment 4
result = bar(foo(args))
We use nested function calls all the time in procedural programming, and even in the procedural sections of code that occur in a Python program that uses OOP. We can do the latter because Python supports both styles (procedural and object-oriented) at the same time, even in the same program; Guido be thanked for that :)

The above was my informal description of method chaining. For more details, refer to this Wikipedia article, which includes examples in various programming languages. The article also makes a distinction between method chaining and method cascading, and according to it, what I call method chaining here (involving returning the self reference) is really method cascading. Are you confused enough? :) Kidding, the difference is not really complex.

One advantage of method chaining is that it reduces the number of times you have to use the name of the object: only once in Fragment 2 above, vs. twice in Fragment 1; and this difference will increase when there are more method calls on the same object. Thereby, it also slightly reduces the amount of code one has to read, understand, test, debug and maintain, overall. Not major benefits, but can be useful.

Note: One limitation of method chaining is that it can only be used on methods which do not need to return any other meaningful value, such as a count of lines modified, words found, records deleted, etc. (which some methods need to do), because you need to return the self object. Even the fact that Python (and some other languages) support returning multiple values from a return statement, may not solve this. (There could be some workaround for this, but it might look awkward, is my guess.)

Simple method chaining can be implemented easily in Python.
Here is one way of doing it:
# foo_bar_baz.py
# Demonstrates method chaining.

class Foo(object):
def bar(self):
print "Method Foo.bar called"
return self

def baz(self):
print "Method Foo.baz called"
return self

foo = Foo()
# Saving return value in foo2 not needed;
# doing to use with id function below.
foo2 = foo.bar().baz()
print

# We can also do it like this, if we don't want
# to save the object foo for later use:
Foo().bar().baz()
print

# Show that the original foo's id and the returned foo2's id
# are the same, i.e. they are the same object:
print " id(foo):", id(foo)
print "id(foo2):", id(foo2)
Here is the output of running the above program:
$ python foo_bar_baz.py
Method Foo.bar called
Method Foo.baz called

Method Foo.bar called
Method Foo.baz called

id(foo): 34478576
id(foo2): 34478576
While writing this post, I also searched for more information, and found a couple of interesting links on method chaining:

Stack Overflow question on method chaining in Python, with some other approaches.

ActiveState Code Python recipe on method chaining

I also wrote another small program, string_processor.py, which shows a somewhat more realistic situation in which one might want to use method chaining:
'''
Program: string_processor.py
Demo of method chaining in Python.
By: Vasudev Ram -
http://jugad2.blogspot.in/p/about-vasudev-ram.html
Copyright 2016 Vasudev Ram
'''

import copy

class StringProcessor(object):
'''
A class to process strings in various ways.
'''
def __init__(self, st):
'''Pass a string for st'''
self._st = st

def lowercase(self):
'''Make lowercase'''
self._st = self._st.lower()
return self

def uppercase(self):
'''Make uppercase'''
self._st = self._st.upper()
return self

def capitalize(self):
'''Make first char capital (if letter); make other letters lower'''
self._st = self._st.capitalize()
return self

def delspace(self):
'''Delete spaces'''
self._st = self._st.replace(' ', '')
return self

def rep(self):
'''Like Python's repr'''
return self._st

def dup(self):
'''Duplicate the object'''
return copy.deepcopy(self)

def process_string(s):
print
sp = StringProcessor(s)
print 'Original:', sp.rep()
print 'After uppercase:', sp.dup().uppercase().rep()
print 'After lowercase:', sp.dup().lowercase().rep()
print 'After uppercase then capitalize:', sp.dup().uppercase().\
capitalize().rep()
print 'After delspace:', sp.dup().delspace().rep()

def main():
print "Demo of method chaining in Python:"
# Use extra spaces between words to show effect of delspace.
process_string('hOWz It GoInG?')
process_string('The QUIck brOWn fOx')

main()
Does adding the rep() and dup() make it more methodical? :)

Here is the output of running it:
$ python string_processor.py
Demo of method chaining in Python:

Original: hOWz It GoInG?
After uppercase: HOWZ IT GOING?
After lowercase: howz it going?
After uppercase then capitalize: Howz it going?
After delspace: hOWzItGoInG?

Original: The QUIck brOWn fOx
After uppercase: THE QUICK BROWN FOX
After lowercase: the quick brown fox
After uppercase then capitalize: The quick brown fox
After delspace: TheQUIckbrOWnfOx
So, to sum up, we can see that method chaining has its uses, though overdoing it is probably not a good idea.

Finally, and related, via the Stack Overflow article linked above, I came across this post about Collection Pipelines on Martin Fowler's site.

Reading that article made me realize that nested function calls, method chaining and Unix command pipelines are all related concepts. You may also find these other posts by me of interest:

fmap(), "inverse" of Python map() function

Generate PDF from a Python-controlled Unix pipeline

- Enjoy.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes


Podcast.__init__: Episode 44 - Airflow with Maxime Beauchemin

$
0
0

Visit our site to listen to past episodes, support the show, join our community, and sign up for our mailing list.

Summary

Are you struggling with trying to manage a series of related, interdependent batch jobs? Then you should check out Airflow. In this episode we spoke with the project’s creator Maxime Beauchemin about what inspired him to create it, how it works, and why you might want to use it. Airflow is a data pipeline management tool that will simplify how you build, deploy, and monitor your complex data processing tasks so that you can focus on getting the insights you need from your data.

Brief Introduction

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • Subscribe on iTunes, Stitcher, TuneIn or RSS
  • Follow us on Twitter or Google+
  • Give us feedback! Leave a review on iTunes, Tweet to us, send us an email or leave us a message on Google+
  • Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
  • I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.com
  • Linode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project
  • I would also like to thank Hired, a job marketplace for developers and designers, for sponsoring this episode of Podcast.__init__. Use the link hired.com/podcastinit to double your signing bonus.
  • Your hosts as usual are Tobias Macey and Chris Patti
  • Today we are interviewing Maxime Beauchemin about his work on the Airflow project.
Linode Sponsor Banner

Use the promo code podcastinit20 to get a $20 credit when you sign up!

Hired Logo

On Hired software engineers & designers can get 5+ interview requests in a week and each offer has salary and equity upfront. With full time and contract opportunities available, users can view the offers and accept or reject them before talking to any company. Work with over 2,500 companies from startups to large public companies hailing from 12 major tech hubs in North America and Europe. Hired is totally free for users and If you get a job you’ll get a $2,000 “thank you” bonus. If you use our special link to signup, then that bonus will double to $4,000 when you accept a job. If you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus when they accept a job.

Interview with Maxime Beauchemin

  • Introductions
  • How did you get introduced to Python? - Chris
  • What is Airflow and what are some of the kinds of problems it can be used to solve? - Chris
  • What are some of the biggest challenges that you have seen when implementing a data pipeline with a workflow engine? - Tobias
  • What are some of the signs that a workflow engine is needed? - Tobias
  • Can you share some of the design and architecture of Airflow and how you arrived at those decisions? - Tobias
  • How does Airflow compare to other workflow management solutions, and why did you choose to write your own? - Chris
  • One of the features of Airflow that is emphasized in the documentation is the ability to dynamically generate pipelines. Can you describe how that works and why it is useful? - Tobias
  • For anyone who wants to get started with using Airflow, what are the infrastructure requirements? - Tobias
  • Airflow, like a number of the other tools in the space, support interoperability with Hadoop and its ecosystem. Can you elaborate on why JVM technologies have become so prevalent in the big data space and how Python fits into that overall problem domain? - Tobias
  • Airflow comes with a web UI for visualizing workflows, as do a few of the other Python workflow engines. Why is that an important feature for this kind of tool and what are some of the tasks and use cases that are supported in the Airflow web portal? - Tobias
  • One problem with data management is tracking the provenance of data as it is manipulated and shuttled between different systems. Does Airflow have any support for maintaining that kind of information and if not do you have recommendations for how practitioners can approach the issue? - Tobias
  • What other kinds of metadata can Airflow track as it executes tasks and what are some of the interesting uses you have seen or created for that information? - Tobias
  • With all the other languages competing for mindshare, what made you choose Python when you built Airflow? - Chris
  • I notice that Airflow supports Kerberos. It’s an incredibly capable security model but that comes at a high price in terms of complexity. What were the challenges and was it worth the additional implementation effort? - Chris
  • When does the data pipeline/workflow management paradigm break down and what other approaches or tools can be used in those cases? - Tobias
  • So, you wrote another tool recently called Panoramix. Can you describe what it is and maybe explain how it fits in the data management domain in relation to Airflow? - Tobias

Keep In Touch

Picks

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Ned Batchelder: Stellated icosahedron shirt

$
0
0

My new favorite t-shirt:

Stellated icosahedron t-shirt

It's a stellated icosahedron from Henry Segerman, who makes many interesting nerdy things, inspired by both math and juggling:

BTW: stellation is the process of creating new shapes by extending the faces of a polyhedron. The shirt is a stellation of a regular icosahedron (known in gaming circles as a D20). The logo for this site is a stellation of a regular dodecahedron.

Doing Math with Python: What readers are saying

$
0
0

Readers have shared how they are finding Doing Math with Python by posting reviews on Amazon and their own blog. You can view all of them on the Reviews page.

Some readers have also been kind enough to let me know personally how the book has helped them to restart their programming, or looking at something they have been putting off. As the author, I think this is the highest level of appreciation that I could have hoped for.

Recently, Aaron Meurer (the lead developer of SymPy) mentioned the book in an episode of Podcast.__init__ titled "SymPy with Aaron Meurer". If you are curious to learn more about SymPy, I would recommend listening to it.

I am curious to hear more. If you want to get in touch personally, please do so via any of the following channels:

You can email me at doingmathwithpython@gmail.com.

Alternatively, if you just plan to write a review, please do so on Amazon, O'Reilly or your personal blog.

Abu Ashraf Masnun: Django: Limiting User Access to Views

$
0
0

In this post, we would like to see how we can limit user accesses to our Django views.

Login Required & Permission Required Decorators

If you have worked with Django, you probably have used the login_required decorator already. Adding the decorator to a view limits access only to the logged in users. If the user is not logged in, s/he is redirected to the default login page. Or we can pass a custom login url to the decorator for that purpose.

Let’s see an example:

from django.contrib.auth.decorators import login_required

@login_required
def secret_page(request):
    return render_to_response("secret_page.html")

There’s another nice decorator – permission_required which works in a similar fashion:

from django.contrib.auth.decorators import permission_required

@permission_required('entity.can_delete', login_url='/loginpage/')
def my_view(request):
    return render_to_response("entity/delete.html")

Awesome but let’s learn how do they work internally.

How do they work?

We saw the magic of the login_required and permission_required decorators. But we’re the men of science and we don’t like to believe in magic. So let’s unravel the mystery of these useful decorators.

Here’s the code for the login_required decorator:

def login_required(function=None, redirect_field_name=REDIRECT_FIELD_NAME, login_url=None):
    """
    Decorator for views that checks that the user is logged in, redirecting
    to the log-in page if necessary.
    """
    actual_decorator = user_passes_test(
        lambda u: u.is_authenticated(),
        login_url=login_url,
        redirect_field_name=redirect_field_name
    )
    if function:
        return actual_decorator(function)
    return actual_decorator

By reading the code, we can see that the login_required decorator uses another decorator – user_passes_test which takes/uses a callable to determine whether the user should have access to this view. The callable must accept an user instance and return a boolean value. user_passes_test returns a decorator which is applied to our view.

If we see the source of permission_required, we would see something quite similar. It also uses the same user_passes_test decorator.

Building Our Own Decorators

Now that we know how to limit access to a view based on whether the logged in user passes a test, it’s quite simple for us to build our own decorators for various purposes. Let’s say we want to allow access only to those users who have verified their emails.

from django.contrib.auth.decorators import user_passes_test
from django.contrib.auth import REDIRECT_FIELD_NAME


def check_email_verification(user):
    return EmailVerification.objects.all().filter(user=user, verified=True)


def check_email(function=None, redirect_field_name=REDIRECT_FIELD_NAME, login_url=None):
    """
    Decorator for views that checks that the user is logged in, redirecting
    to the log-in page if necessary.
    """
    actual_decorator = user_passes_test(
        check_email_verification,
        login_url=login_url,
        redirect_field_name=redirect_field_name
    )
    if function:
        return actual_decorator(function)
    return actual_decorator

Now we can use the decorator to a view like:

@login_required
@check_email(login_url="/redirect/login/?reason=verify_email")
def verified_users_only(request):
    return render_to_response("awesome/offers.html")

Users who have verified their email addresses will be able to access this view. And if they didn’t, they will be redirected to the login view. Using the reason query string, we can display a nice message explaining what’s happening.

Please note, we have used two decorators on the same view. We can use multiple decorators like this to make sure the user passes all the tests we require them to.


Evennia: Climbing up Branches

$
0
0
Today I pushed the latest Evennia development branch "wclient". This has a bunch of updates to how Evennia's webclient infrastructure works, by making all exchanged data be treated equal (instead of treating text separately from other types of client instructions).

It also reworks the javascript client into a library that should be a lot easier to expand on and customize. The actual client GUI is still pretty rudimentary though, so I hope a user with more web development experience can take upon themselves to look it over for best practices.

A much more detailed description of what is currently going on (including how to check out the latest for yourself) is found in this mailing list post. Enjoy!

Lintel Technologies: How to write port-forwarding program using Twisted

$
0
0

Recently I was faced with an issue where a long running process is listening on loop back IP (127.0.0.1) on port 8080 on one of our servers and client programs on other machines are trying to access it on server’s local IP 10.91.20.66.  We ended up at this situation when we have updated server configuration and restarted the server program and forgot to change IP binding info in config file from loop back to local IP. Server got busy with it’s work, with lots of customer’s connections already, by the time we have discovered that some services of  server are not accessible to client programs on other machines. So, the dummy’s guide to fixing it by changing config and restarting the server program is not an option as we can’t risk to disconnect existing customers. So, hot patching is the only option until we can restart the program at next scheduled down time.

I could have fixed this in couple of ways either by adding few lines to iptables configuration or by writing simple socket program in python. The task is to forward data coming in on local IP port 8080 to loop back IP (127.0.0.1) port 8080 and send replies back to source address. Forwarding one socket data to other socket is pretty trivial using Python’s socket library and Twisted made it even more trivial, so I went with the following solution using Twisted.

__author__ = 'godson'

from twisted.protocols.portforward import ProxyFactory
from twisted.application import internet,service

src_ip = "10.91.20.66"
src_port = 8080
dst_ip = "127.0.0.1"
dst_port = 8080

application = service.Application("Proxy")
server = ProxyFactory(dst_ip, dst_port)
ps = internet.TCPServer(src_port,server,50,src_ip)

ps.setServiceParent(application)

That’s it. Now, all I needed to do is to run this program by the following command

twistd -y portforwarder.py

This simple program is made possible by the heavy lifting done by twisted library. Interested folks can look under hood at twisted’s portforward.py module.

The post How to write port-forwarding program using Twisted appeared first on Lintel Technologies Blog.

EuroPython: EuroPython 2016: Sending out the first gravitational waves

$
0
0

We are pleased to announce the launch of our all new EuroPython 2016 website. Over the last few weeks, we have been busy talking to sponsors and getting the website prepared for the launch.

You may have heard about the recent direct observation of gravitational wavesby the LIGO (Laser Interferometer Gravitational-wave Observatory). What you may not know is that Python helped in analyzing the data (archive.org), so we now have two things to celebrate:

  1. Python’s use in this phenomenal direct proof of Einstein’s prediction and
  2. the launch of our 2016 edition of the EuroPython conference.

So here it is:

image

https://ep2016.europython.eu/

Many thanks go to our launch sponsors who have signed up early to give us that extra boost in motivation to get the conference and it’s website set up.

Meet our Launch Sponsors

PS: We’d like to thank the EuroPython Web WG  for the web site improvements and our friends at Python Italia for making their code available.

With gravitational regards,

EuroPython 2016 Team

EuroPython Society: EuroPython 2016: Sending out the first gravitational waves

$
0
0

We are pleased to announce the launch of our all new EuroPython 2016 website. Over the last few weeks, we have been busy talking to sponsors and getting the website prepared for the launch.

You may have heard about the recent direct observation of gravitational wavesby the LIGO (Laser Interferometer Gravitational-wave Observatory). What you may not know is that Python helped in analyzing the data (archive.org), so we now have two things to celebrate:

  1. Python’s use in this phenomenal direct proof of Einstein’s prediction and
  2. the launch of our 2016 edition of the EuroPython conference.

So here it is:

image

https://ep2016.europython.eu/

Many thanks go to our launch sponsors who have signed up early to give us that extra boost in motivation to get the conference and it’s website set up.

Meet our Launch Sponsors

PS: We’d like to thank the EuroPython Web WG  for the web site improvements and our friends at Python Italia for making their code available.

With gravitational regards,

EuroPython 2016 Team

Julien Danjou: Timeseries storage and data compression

$
0
0

The first major version of the scalable timeserie database I work on, Gnocchi was a released a few months ago. In this first iteration, it took a rather naive approach to data storage. We had little ideas about if and how our distributed back-ends were going to be heavily used, so we stuck to the code of the first proof-of-concept written a couple of years ago.

Recently we got more feedbacks from our users, ran a few benchmarks. That gave us enough feedback to start investigating in improving our storage strategy.

Data split

Up to Gnocchi 1.3, all data for a single metric are stored in a single gigantic file per aggregation method (min, max, average…). This means that the file can grow to several megabytes in size, which make it slow to manipulate. For the next version of Gnocchi, our first work has been to rework that storage and split the data into smaller parts.

Gnocchi Carbonara archives split

The diagram above shows how data are organized inside Gnocchi. Until version 1.3, there would have been only one file for each aggregation methods.

In the upcoming 2.0 version, Gnocchi will split all these data into smaller parts, where each data split is stored in a file/object. This allows to manipulate smaller pieces of data and to increase the parallelism of the CRUD operations on the back-end – leading to large speed improvement.

In order to split timeseries into several chunks, Gnocchi defines a maximum number of N points to keep per chunk, to limit their maximum size. It then defines a hash function that produces a non-unique key for any timestamp. It makes it easy to find in which chunk any timestamp should be stored or retrieved.

Data compression

Up to Gnocchi 1.3, the data stored for each metric is simply serialized using msgpack, a fast and small serialization format. Though, this format does not provide any compression. That means that storing data points needs 8 bytes for a timestamp (64 bits timestamp with nanosecond precision) and 8 bytes for a value (64 bits double-precision floating-point), plus some overhead (extra information and msgpack itself).

After looking around on how to compress all these measures, I stumbled upon a paper from some Facebook engineers called about Gorilla, their in-memory timeserie database, entitled "Gorilla: A Fast, Scalable, In-Memory Time Series Database". For reference, part of this encoding is also used by InfluxDB in its new storage engine.

The first technique I implemented is easy enough, and it's inspired from delta-of-delta encoding. Instead of storing each timestamp for each data point, and since all the data points are aggregated on a regular interval, we transpose points to be the time difference divided by the interval. For example, the suite of timestamps timestamps = [41230, 41235, 41240, 41250, 41255] is encoded into timestamps = [41230, 1, 1, 2, 1], interval = 5. This allows regular compression algorithms to reduce the size of the integer list using run-length encoding.

To actually compress the values, I tried two different algorithms:

  • [LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm), a fast compression/decompression algorithm

  • The XOR based compression scheme described in the Gorilla paper mentioned above – that I had to implement myself. For reference, it also exists a Go implementation in go-tsz.

I then benchmarked these solutions:

Gnocchi Carbonara compression speed

The XOR algorithm implemented in Python is pretty slow, compared to LZ4. Truth is that python-lz4 is fully implemented in C, which makes it fast. I've profiled my XOR implementation in Python, to discover that one operation took 20 % of the time: count_lead_and_trail_zeroes, which is in charge of counting the number of leading and trailing zeroes in a binary number.

Gnocchi Carbonara compression XOR profiling

I tried 2 Python implementations of the same algorithm (and submitted them to my friend and Python developer Victor Stinner by the way).

The first version using string search with .index() is 10× faster than the second one that only do integer computation. Ah, Python… As Victor explained, each Python operation is slow and there's a lot in the second version, whereas .index() is implemented in C and really well optimized and only needs 2 Python operations.

Finally, I ended up optimizing that code by leveraging cffi to use directly ffsll() and flsll(). That decreased the run-time of count_lead_and_trail_zeroes by 45 %, making the entire XOR compression code speed increased by a small 7 %. This is not enough to catch up with LZ4 speed. At this stage, the only solution to achieve a high-speed would probably to go with a full C implementation.

Gnocchi Carbonara compression size

Considering the compression ratio of the different algorithms, they are pretty much identical. The worst case scenario (random values) for LZ4 compress down to 9 bytes per data point, whereas XOR can go down to 7.38 bytes per data point. In general XOR encoding beats LZ4 by 15 %, except for cases where all values are 0 or 1. However, LZ4 is faster than XOR by a factor of 4×-70× depending on cases.

That means that we'll use LZ4 for data compression in Gnocchi 2.0. It's possible that we could achieve as fast compression/decompression algorithm, but I don't think it's worth the effort right now – it'd represent a lot of code to write and to maintain.

Viewing all 23447 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>