Quantcast
Channel: Planet Python
Viewing all 22863 articles
Browse latest View live

Podcast.__init__: Exploratory Data Analysis Made Easy At The Command Line

$
0
0
There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn't be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Summary

There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn’t be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Saul Pwanson about Visidata, a terminal oriented interactive multitool for tabular data

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Visidata is and how the project got started?
    • What are the main use cases for Visidata?
    • What are some tools that it has replaced in your workflow?
  • Can you talk through a typical workflow for data exploration and analysis with Visidata?
  • One of the capabilities that you mention on the website is quickly opening large files. What are some strategies that you have used to enable performant access for files that might crash a typical editor (e.g. Vim, Emacs)?
  • Can you describe how Visidata is implemented and how it has evolved since you started working on it (including the upcoming 2.0 release)?
    • What libraries or language features have proven most useful?
  • Why did you choose to implement Visidata as a terminal only tool and what constraints does that bring with it?
    • What are some of the most challenging aspects of building a terminal UI for data exploration and analysis?
    • Because of its manifestation as a terminal/CLI application it relies heavily on keyboard bindings. How do you approach key assignments to ensure a consistent and intuitive user experience?
  • What are some of the types of analysis that Visidata can be used for out of the box?
  • What are some of the most interesting/unexpected/innovative ways that you have seen Visidata used?
  • How much community adoption have you seen and how do you approach project governance as a solo developer?
  • What do you have planned for the future of Visidata?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA


Continuum Analytics Blog: Anaconda Enterprise Receives Honors in Fourth Annual Datanami Readers’ and Editors’ Choice Awards

Real Python: Getting Started With Async Features in Python

$
0
0

Have you heard of asynchronous programming in Python? Are you curious to know more about Python async features and how you can use them in your work? Perhaps you’ve even tried to write threaded programs and run into some issues. If you’re looking to understand how to use Python async features, then you’ve come to the right place.

In this article, you’ll learn:

  • What a synchronous program is
  • What an asynchronous program is
  • Why you might want to write an asynchronous program
  • How to use Python async features

All of the example code in this article have been tested with Python 3.7.2. You can grab a copy to follow along by clicking the link below:

Dowload Code:Click here to download the code you'll use to learn about async features in Python in this tutorial.

Understanding Asynchronous Programming

A synchronous program is executed one step at a time. Even with conditional branching, loops and function calls, you can still think about the code in terms of taking one execution step at a time. When each step is complete, the program moves on to the next one.

Here are two examples of programs that work this way:

  • Batch processing programs are often created as synchronous programs. You get some input, process it, and create some output. Steps follow one after the other until the program reaches the desired output. The program only needs to pay attention to the steps and their order.

  • Command-line programs are small, quick processes that run in a terminal. These scripts are used to create something, transform one thing into something else, generate a report, or perhaps list out some data. This can be expressed as a series of program steps that are executed sequentially until the program is done.

An asynchronous program behaves differently. It still takes one execution step at a time. The difference is that the system may not wait for an execution step to be completed before moving on to the next one.

This means that the program will move on to future execution steps even though a previous step hasn’t yet finished and is still running elsewhere. This also means that the program knows what to do when a previous step does finish running.

Why would you want to write a program in this manner? The rest of this article will help you answer that question and give you the tools you need to elegantly solve interesting asynchronous problems.

Building a Synchronous Web Server

A web server’s basic unit of work is, more or less, the same as batch processing. The server will get some input, process it, and create the output. Written as a synchronous program, this would create a working web server.

It would also be an absolutely terrible web server.

Why? In this case, one unit of work (input, process, output) is not the only purpose. The real purpose is to handle hundreds or even thousands of units of work as quickly as possible. This can happen over long periods of time, and several work units may even arrive all at once.

Can a synchronous web server be made better? Sure, you could optimize the execution steps so that all the work coming in is handled as quickly as possible. Unfortunately, there are limitations to this approach. The result could be a web server that doesn’t respond fast enough, can’t handle enough work, or even one that times out when work gets stacked up.

Note: There are other limitations you might see if you tried to optimize the above approach. These include network speed, file IO speed, database query speed, and the speed of other connected services, to name a few. What these all have in common is that they are all IO functions. All of these items are orders of magnitude slower than the CPU’s processing speed.

In a synchronous program, if an execution step starts a database query, then the CPU is essentially idle until the database query is returned. For batch-oriented programs, this isn’t a priority most of the time. Processing the results of that IO operation is the goal. Often, this can take longer than the IO operation itself. Any optimization efforts would be focused on the processing work, not the IO.

Asynchronous programming techniques allow your programs to take advantage of relatively slow IO processes by freeing the CPU to do other work.

Thinking Differently About Programming

When you start trying to understand asynchronous programming, you might see a lot of discussion about the importance of blocking, or writing non-blocking code. (Personally, I struggled to get a good grasp of these concepts from the people I asked and the documentation I read.)

What is non-blocking code? What’s blocking code, for that matter? Would the answers to these questions help you write a better web server? If so, how could you do it? Let’s find out!

Writing asynchronous programs requires that you think differently about programming. While this new way of thinking can be hard to wrap your head around, it’s also an interesting exercise. That’s because the real world is almost entirely asynchronous, and so is how you interact with it.

Imagine this: you’re a parent trying to do several things at once. You have to balance the checkbook, do the laundry, and keep an eye on the kids. Somehow, you’re able to do all of these things at the same time without even thinking about it! Let’s break it down:

  • Balancing the checkbook is a synchronous task. One step follows another until it’s done. You’re doing all the work yourself.

  • However, you can break away from the checkbook to do laundry. You unload the dryer, move clothes from the washer to the dryer, and start another load in the washer.

  • Working with the washer and dryer is a synchronous task, but the bulk of the work happens after the washer and dryer are started. Once you’ve got them going, you can walk away and get back to the checkbook task. At this point, the washer and dryer tasks have become asynchronous. The washer and dryer will run independently until the buzzer goes off (notifying you that the task needs attention).

  • Watching your kids is another asynchronous task. Once they are set up and playing, they can do so independently for the most part. This changes when someone needs attention, like when someone gets hungry or hurt. When one of your kids yells in alarm, you react. The kids are a long-running task with high priority. Watching them supersedes any other tasks you might be doing, like the checkbook or laundry.

These examples can help to illustrate the concepts of blocking and non-blocking code. Let’s think about this in programming terms. In this example, you’re like the CPU. While you’re moving the laundry around, you (the CPU) are busy and blocked from doing other work, like balancing the checkbook. But that’s okay because the task is relatively quick.

On the other hand, starting the washer and dryer does not block you from performing other tasks. It’s an asynchronous function because you don’t have to wait for it to finish. Once it’s started, you can go back to something else. This is called a context switch: the context of what you’re doing has changed, and the machine’s buzzer will notify you sometime in the future when the laundry task is complete.

As a human, this is how you work all the time. You naturally juggle multiple things at once, often without thinking about it. As a developer, the trick is how to translate this kind of behavior into code that does the same kind of thing.

Programming Parents: Not as Easy as It Looks!

If you recognize yourself (or your parents) in the example above, then that’s great! You’ve got a leg up in understanding asynchronous programming. Again, you’re able to switch contexts between competing tasks fairly easily, picking up some tasks and resuming others. Now you’re going to try and program this behavior into virtual parents!

Thought Experiment #1: The Synchronous Parent

How would you create a parent program to do the above tasks in a completely synchronous manner? Since watching the kids is a high-priority task, perhaps your program would do just that. The parent watches over the kids while waiting for something to happen that might need their attention. However, nothing else (like the checkbook or laundry) would get done in this scenario.

Now, you can re-prioritize the tasks any way you want, but only one of them would happen at any given time. This is the result of a synchronous, step-by-step approach. Like the synchronous web server described above, this would work, but it might not be the best way to live. The parent wouldn’t be able to complete any other tasks until the kids fell asleep. All other tasks would happen afterward, well into the night. (A couple of weeks of this and many real parents might jump out the window!)

Thought Experiment #2: The Polling Parent

If you used polling, then you could change things up so that multiple tasks are completed. In this approach, the parent would periodically break away from the current task and check to see if any other tasks need attention.

Let’s make the polling interval something like fifteen minutes. Now, every fifteen minutes your parent checks to see if the washer, dryer or kids need any attention. If not, then the parent can go back to work on the checkbook. However, if any of those tasks do need attention, then the parent will take care of it before going back to the checkbook. This cycle continues on until the next timeout out of the polling loop.

This approach works as well since multiple tasks are getting attention. However, there are a couple of problems:

  1. The parent may spend a lot of time checking on things that don’t need attention: The washer and dryer haven’t yet finished, and the kids don’t need any attention unless something unexpected happens.

  2. The parent may miss completed tasks that do need attention: For instance, if the washer finished its cycle at the beginning of the polling interval, then it wouldn’t get any attention for up to fifteen minutes! What’s more, watching the kids is supposedly the highest priority task. They couldn’t tolerate fifteen minutes with no attention when something might be going drastically wrong.

You could address these issues by shortening the polling interval, but now your parent (the CPU) would be spending more time context switching between tasks. This is when you start to hit a point of diminishing returns. (Once again, a couple of weeks living like this and, well… See the previous comment about windows and jumping.)

Thought Experiment #3: The Threading Parent

“If I could only clone myself…” If you’re a parent, then you’ve probably had similar thoughts! Since you’re programming virtual parents, you can essentially do this by using threading. This is a mechanism that allows multiple sections of one program to run at the same time. Each section of code that runs independently is known as a thread, and all threads share the same memory space.

If you think of each task as a part of one program, then you can separate them and run them as threads. In other words, you can “clone” the parent, creating one instance for each task: watching the kids, monitoring the washer, monitoring the dryer, and balancing the checkbook. All of these “clones” are running independently.

This sounds like a pretty nice solution, but there are some issues here as well. One is that you’ll have to explicitly tell each parent instance what to do in your program. This can lead to some problems since all instances share everything in the program space.

For example, say that Parent A is monitoring the dryer. Parent A sees that the clothes are dry, so they take control of the dryer and begin unloading the clothes. At the same time, Parent B sees that the washer is done, so they take control of the washer and begin removing clothes. However, Parent B also needs to take control of the dryer so they can put the wet clothes inside. This can’t happen, because Parent A currently has control of the dryer.

After a short while, Parent A has finished unloading clothes. Now they want to take control of the washer and start moving clothes into the empty dryer. This can’t happen, either, because Parent B currently has control of the washer!

These two parents are now deadlocked. Both have control of their own resource and want control of the other resource. They’ll wait forever for the other parent instance to release control. As the programmer, you’d have to write code to work this situation out.

Note: Threaded programs allow you to create multiple, parallel paths of execution that all share the same memory space. This is both an advantage and a disadvantage. Any memory shared between threads is subject to one or more threads trying to use the same shared memory at the same time. This can lead to data corruption, data read in an invalid state, and data that’s just messy in general.

In threaded programming, the context switch happens under system control, not the programmer. The system controls when to switch contexts and when to give threads access to shared data, thereby changing the context of how the memory is being used. All of these kinds of problems are manageable in threaded code, but it’s difficult to get right, and hard to debug when it’s wrong.

Here’s another issue that might arise from threading. Suppose that a child gets hurt and needs to be taken to urgent care. Parent C has been assigned the task of watching over the kids, so they take the child right away. At the urgent care, Parent C needs to write a fairly large check to cover the cost of seeing the doctor.

Meanwhile, Parent D is at home working on the checkbook. They’re unaware of this large check being written, so they’re very surprised when the family checking account is suddenly overdrawn!

Remember, these two parent instances are working within the same program. The family checking account is a shared resource, so you’d have to work out a way for the child-watching parent to inform the checkbook-balancing parent. Otherwise, you’d need to provide some kind of locking mechanism so that the checkbook resource can only be used by one parent at a time, with updates.

Using Python Async Features in Practice

Now you’re going to take some of the approaches outlined in the thought experiments above and turn them into functioning Python programs.

All of the examples in this article have been tested with Python 3.7.2. The requirements.txt file indicates which modules you’ll need to install to run all the examples. If you haven’t yet downloaded the file, you can do so now:

Dowload Code:Click here to download the code you'll use to learn about async features in Python in this tutorial.

You also might want to set up a Python virtual environment to run the code so you don’t interfere with your system Python.

Synchronous Programming

This first example shows a somewhat contrived way of having a task retrieve work from a queue and process that work. A queue in Python is a nice FIFO (first in first out) data structure. It provides methods to put things in a queue and take them out again in the order they were inserted.

In this case, the work is to get a number from the queue and have a loop count up to that number. It prints to the console when the loop begins, and again to output the total. This program demonstrates one way for multiple synchronous tasks to process the work in a queue.

The program named example_1.py in the repository is listed in full below:

 1 importqueue 2  3 deftask(name,work_queue): 4 ifwork_queue.empty(): 5 print(f"Task {name} nothing to do") 6 else: 7 whilenotwork_queue.empty(): 8 count=work_queue.get() 9 total=010 print(f"Task {name} running")11 forxinrange(count):12 total+=113 print(f"Task {name} total: {total}")14 15 defmain():16 """17     This is the main entry point for the program.18 """19 # Create the queue of 'work'20 work_queue=queue.Queue()21 22 # Put some 'work' in the queue23 forworkin[15,10,5,2]:24 work_queue.put(work)25 26 # Create some synchronous tasks27 tasks=[28 (task,"One",work_queue),29 (task,"Two",work_queue)30 ]31 32 # Run the tasks33 fort,n,qintasks:34 t(n,q)35 36 if__name__=="__main__":37 main()

Let’s take a look at what each line does:

  • Line 1 imports the queue module. This is where the program stores work to be done by the tasks.
  • Lines 3 to 13 define task(). This function pulls work out of work_queue and processes the work until there isn’t any more to do.
  • Line 15 defines main() to run the program tasks.
  • Line 20 creates the work_queue. All tasks use this shared resource to retrieve work.
  • Lines 23 to 24 put work in work_queue. In this case, it’s just a random count of values for the tasks to process.
  • Lines 27 to 29 create a list of task tuples, with the parameter values those tasks will be passed.
  • Lines 33 to 34 iterate over the list of task tuples, calling each one and passing the previously defined parameter values.
  • Line 36 calls main() to run the program.

The task in this program is just a function accepting a string and a queue as parameters. When executed, it looks for anything in the queue to process. If there is work to do, then it pulls values off the queue, starts a for loop to count up to that value, and outputs the total at the end. It continues getting work off the queue until there is nothing left and it exits.

When this program is run, it produces the output you see below:

Task One runningTask One total: 15Task One runningTask One total: 10Task One runningTask One total: 5Task One runningTask One total: 2Task Two nothing to do

This shows that Task One does all the work. The while loop that Task One hits within task() consumes all the work on the queue and processes it. When that loop exits, Task Two gets a chance to run. However, it finds that the queue is empty, so Task Two prints a statement that says it has nothing to do and then exits. There’s nothing in the code to allow both Task One and Task Two to switch contexts and work together.

Simple Cooperative Concurrency

The next version of the program allows the two tasks to work together. Adding a yield statement means the loop will yield control at the specified point while still maintaining its context. This way, the yielding task can be restarted later.

The yield statement turns task() into a generator. A generator function is called just like any other function in Python, but when the yield statement is executed, control is returned to the caller of the function. This is essentially a context switch, as control moves from the generator function to the caller.

The interesting part is that control can be given back to the generator function by calling next() on the generator. This is a context switch back to the generator function, which picks up execution with all function variables that were defined before the yield still intact.

The while loop in main() takes advantage of this when it calls next(t). This statement restarts the task at the point where it previously yielded. All of this means that you’re in control when the context switch happens: when the yield statement is executed in task().

This is a form of cooperative multitasking. The program is yielding control of its current context so that something else can run. In this case, it allows the while loop in main() to run two instances of task() as a generator function. Each instance consumes work from the same queue. This is sort of clever, but it’s also a lot of work to get the same results as the first program. The program example_2.py demonstrates this simple concurrency and is listed below:

 1 importqueue 2  3 deftask(name,queue): 4 whilenotqueue.empty(): 5 count=queue.get() 6 total=0 7 print(f"Task {name} running") 8 forxinrange(count): 9 total+=110 yield11 print(f"Task {name} total: {total}")12 13 defmain():14 """15     This is the main entry point for the program.16 """17 # Create the queue of 'work'18 work_queue=queue.Queue()19 20 # Put some 'work' in the queue21 forworkin[15,10,5,2]:22 work_queue.put(work)23 24 # Create some tasks25 tasks=[26 task("One",work_queue),27 task("Two",work_queue)28 ]29 30 # Run the tasks31 done=False32 whilenotdone:33 fortintasks:34 try:35 next(t)36 exceptStopIteration:37 tasks.remove(t)38 iflen(tasks)==0:39 done=True40 41 if__name__=="__main__":42 main()

Here’s what’s happening in the code above:

  • Lines 3 to 11 define task() as before, but the addition of yield on Line 10 turns the function into a generator. This where the context switch is made and control is handed back to the while loop in main().
  • Lines 25 to 28 create the task list, but in a slightly different manner than you saw in the previous example code. In this case, each task is called with its parameters as its entered in the tasks list variable. This is necessary to get the task() generator function running the first time.
  • Lines 34 to 39 are the modifications to the while loop in main() that allow task() to run cooperatively. This is where control returns to each instance of task() when it yields, allowing the loop to continue and run another task.
  • Line 35 gives control back to task(), and continues its execution after the point where yield was called.
  • Line 39 sets the done variable. The while loop ends when all tasks have been completed and removed from tasks.

This is the output produced when you run this program:

Task One runningTask Two runningTask Two total: 10Task Two runningTask One total: 15Task One runningTask Two total: 5Task One total: 2

You can see that both Task One and Task Two are running and consuming work from the queue. This is what’s intended, as both tasks are processing work, and each is responsible for two items in the queue. This is interesting, but again, it takes quite a bit of work to achieve these results.

The trick here is using the yield statement, which turns task() into a generator and performs a context switch. The program uses this context switch to give control to the while loop in main(), allowing two instances of a task to run cooperatively.

Notice how Task Two outputs its total first. This might lead you to think that the tasks are running asynchronously. However, this is still a synchronous program. It’s structured so the two tasks can trade contexts back and forth. The reason why Task Two outputs its total first is that it’s only counting to 10, while Task One is counting to 15. Task Two simply arrives at its total first, so it gets to print its output to the console before Task One.

Cooperative Concurrency With Blocking Calls

The next version of the program is the same as the last, except for the addition of time.sleep(delay) in the body of your task loop. This adds a delay based on the value retrieved from the work queue to every iteration of the task loop. The delay is added to simulate the effect of a blocking call occurring in your task.

A blocking call is code that stops the CPU from doing anything else for some time. In the thought experiments above, if a parent wasn’t able to break away from balancing the checkbook until it was complete, then that would be a blocking call.

time.sleep(delay) does the same thing in this example, because the CPU can’t do anything else but wait for the delay to expire.

elapsed_time provides a way to get the elapsed time from when an instance of the class is created until it’s called as a function. The program example_3.py is listed below:

 1 importtime 2 importqueue 3 fromlib.elapsed_timeimportET 4  5 deftask(name,queue): 6 whilenotqueue.empty(): 7 delay=queue.get() 8 et=ET() 9 print(f"Task {name} running")10 time.sleep(delay)11 print(f"Task {name} total elapsed time: {et():.1f}")12 yield13 14 defmain():15 """16     This is the main entry point for the program.17 """18 # Create the queue of 'work'19 work_queue=queue.Queue()20 21 # Put some 'work' in the queue22 forworkin[15,10,5,2]:23 work_queue.put(work)24 25 tasks=[26 task("One",work_queue),27 task("Two",work_queue)28 ]29 30 # Run the tasks31 et=ET()32 done=False33 whilenotdone:34 fortintasks:35 try:36 next(t)37 exceptStopIteration:38 tasks.remove(t)39 iflen(tasks)==0:40 done=True41 42 print(f"\nTotal elapsed time: {et():.1f}")43 44 if__name__=="__main__":45 main()

Here’s what’s different in the code above:

  • Line 1 imports the time module to give the program access to time.sleep().
  • Line 11 changes task() to include a time.sleep(delay) to mimic an IO delay. This replaces the for loop that did the counting in example_1.py.

When you run this program, you’ll see the following output:

Task One runningTask One total elapsed time: 15.0Task Two runningTask Two total elapsed time: 10.0Task One runningTask One total elapsed time: 5.0Task Two runningTask Two total elapsed time: 2.0Total elapsed time: 32.01021909713745

As before, both Task One and Task Two are running, consuming work from the queue and processing it. However, even with the addition of the delay, you can see that cooperative concurrency hasn’t gotten you anything. The delay stops the processing of the entire program, and the CPU just waits for the IO delay to be over.

This is exactly what’s meant by blocking code in Python async documentation. You’ll notice that the time it takes to run the entire program is just the cumulative time of all the delays. Running tasks this way is not a win.

Cooperative Concurrency With Non-Blocking Calls

The next version of the program has been modified quite a bit. It makes use of Python async features using asyncio/await provided in Python 3.

The time and queue modules have been replaced with the asyncio package. This gives your program access to asynchronous friendly (non-blocking) sleep and queue functionality. The change to task() defines it as asynchronous with the addition of the async prefix on line 4. This indicates to Python that the function will be asynchronous.

The other big change is removing the time.sleep(delay) and yield statements, and replacing them with await asyncio.sleep(delay). This creates a non-blocking delay that will perform a context switch back to the caller main().

The while loop inside main() no longer exists. Instead of task_array, there’s a call to await asyncio.gather(...). This tells asyncio two things:

  1. Create two tasks based on task() and start running them.
  2. Wait for both of these to be completed before moving forward.

The last line of the program asyncio.run(main()) runs main(). This creates what’s known as an event loop). It’s this loop that will run main(), which in turn will run the two instances of task().

The event loop is at the heart of the Python async system. It runs all the code, including main(). When task code is executing, the CPU is busy doing work. When the await keyword is reached, a context switch occurs, and control passes back to the event loop. The event loop looks at all the tasks waiting for an event (in this case, an asyncio.sleep(delay) timeout) and passes control to a task with an event that’s ready.

await asyncio.sleep(delay) is non-blocking in regards to the CPU. Instead of waiting for the delay to timeout, the CPU registers a sleep event on the event loop task queue and performs a context switch by passing control to the event loop. The event loop continuously looks for completed events and passes control back to the task waiting for that event. In this way, the CPU can stay busy if work is available, while the event loop monitors the events that will happen in the future.

Note: An asynchronous program runs in a single thread of execution. The context switch from one section of code to another that would affect data is completely in your control. This means you can atomize and complete all shared memory data access before making a context switch. This simplifies the shared memory problem inherent in threaded code.

The example_4.py code is listed below:

 1 importasyncio 2 fromlib.elapsed_timeimportET 3  4 asyncdeftask(name,work_queue): 5 whilenotwork_queue.empty(): 6 delay=awaitwork_queue.get() 7 et=ET() 8 print(f"Task {name} running") 9 awaitasyncio.sleep(delay)10 print(f"Task {name} total elapsed time: {et():.1f}")11 12 asyncdefmain():13 """14     This is the main entry point for the program.15 """16 # Create the queue of 'work'17 work_queue=asyncio.Queue()18 19 # Put some 'work' in the queue20 forworkin[15,10,5,2]:21 awaitwork_queue.put(work)22 23 # Run the tasks24 et=ET()25 awaitasyncio.gather(26 asyncio.create_task(task("One",work_queue)),27 asyncio.create_task(task("Two",work_queue)),28 )29 print(f"\nTotal elapsed time: {et():.1f}")30 31 if__name__=="__main__":32 asyncio.run(main())

Here’s what’s different between this program and example_3.py:

  • Line 1 imports asyncio to gain access to Python async functionality. This replaces the time import.
  • Line 4 shows the addition of the async keyword in front of the task() definition. This informs the program that task can run asynchronously.
  • Line 9 replaces time.sleep(delay) with the non-blocking asyncio.sleep(delay), which also yields control (or switches contexts) back to the main event loop.
  • Line 17 creates the non-blocking asynchronous work_queue.
  • Lines 20 to 21 put work into work_queue in an asynchronous manner using the await keyword.
  • Lines 25 to 28 create the two tasks and gather them together, so the program will wait for both tasks to complete.
  • Line 32 starts the program running asynchronously. It also starts the internal event loop.

When you look at the output of this program, notice how both Task One and Task Two start at the same time, then wait at the mock IO call:

Task One runningTask Two runningTask Two total elapsed time: 10.0Task Two runningTask One total elapsed time: 15.0Task One runningTask Two total elapsed time: 5.0Task One total elapsed time: 2.0Total elapsed time: 17.0

This indicates that await asyncio.sleep(delay) is non-blocking, and that other work is being done.

At the end of the program, you’ll notice the total elapsed time is essentially half the time it took for example_3.py to run. That’s the advantage of a program that uses Python async features! Each task was able to run await asyncio.sleep(delay) at the same time. The total execution time of the program is now less than the sum of its parts. You’ve broken away from the synchronous model!

Synchronous (Blocking) HTTP Calls

The next version of the program is kind of a step forward as well as a step back. The program is doing some actual work with real IO by making HTTP requests to a list of URLs and getting the page contents. However, it’s doing so in a blocking (synchronous) manner.

The program has been modified to import the wonderful requests module to make the actual HTTP requests. Also, the queue now contains a list of URLs, rather than numbers. In addition, task() no longer increments a counter. Instead, requests gets the contents of a URL retrieved from the queue, and prints how long it took to do so.

The example_5.py code is listed below:

 1 importqueue 2 importrequests 3 fromlib.elapsed_timeimportET 4  5 deftask(name,work_queue): 6 withrequests.Session()assession: 7 whilenotwork_queue.empty(): 8 url=work_queue.get() 9 print(f"Task {name} getting URL: {url}")10 et=ET()11 session.get(url)12 print(f"Task {name} total elapsed time: {et():.1f}")13 yield14 15 defmain():16 """17     This is the main entry point for the program.18 """19 # Create the queue of 'work'20 work_queue=queue.Queue()21 22 # Put some 'work' in the queue23 forurlin[24 "http://google.com",25 "http://yahoo.com",26 "http://linkedin.com",27 "http://apple.com",28 "http://microsoft.com",29 "http://facebook.com",30 "http://twitter.com"31 ]:32 work_queue.put(url)33 34 tasks=[35 task("One",work_queue),36 task("Two",work_queue)37 ]38 39 # Run the tasks40 et=ET()41 done=False42 whilenotdone:43 fortintasks:44 try:45 next(t)46 exceptStopIteration:47 tasks.remove(t)48 iflen(tasks)==0:49 done=True50 51 print(f"\nTotal elapsed time: {et():.1f}")52 53 if__name__=="__main__":54 main()

Here’s what’s happening in this program:

  • Line 2 imports requests, which provides a convenient way to make HTTP calls.
  • Line 11 introduces a delay, similar to example_3.py. However, this time it calls session.get(url), which returns the contents of the URL retrieved from work_queue.
  • Lines 23 to 32 put the list of URLs into work_queue.

When you run this program, you’ll see the following output:

Task One getting URL: http://google.comTask One total elapsed time: 0.3Task Two getting URL: http://yahoo.comTask Two total elapsed time: 0.8Task One getting URL: http://linkedin.comTask One total elapsed time: 0.4Task Two getting URL: http://apple.comTask Two total elapsed time: 0.3Task One getting URL: http://microsoft.comTask One total elapsed time: 0.5Task Two getting URL: http://facebook.comTask Two total elapsed time: 0.5Task One getting URL: http://twitter.comTask One total elapsed time: 0.4Total elapsed time: 3.2

Just like in earlier versions of the program, yield turns task() into a generator. It also performs a context switch that lets the other task instance run.

Each task gets a URL from the work queue, retrieves the contents of the page, and reports how long it took to get that content.

As before, yield allows both your tasks to run cooperatively. However, since this program is running synchronously, each session.get() call blocks the CPU until the page is retrieved. Note the total time it took to run the entire program at the end. This will be meaningful for the next example.

Asynchronous (Non-Blocking) HTTP Calls

This version of the program modifies the previous one to use Python async features. It also imports the aiohttp module, which is a library to make HTTP requests in an asynchronous fashion using asyncio.

The tasks here have been modified to remove the yield call since the code to make the HTTP GET call is no longer blocking. It also performs a context switch back to the event loop.

The example_6.py program is listed below:

 1 importasyncio 2 importaiohttp 3 fromlib.elapsed_timeimportET 4  5 asyncdeftask(name,work_queue): 6 asyncwithaiohttp.ClientSession()assession: 7 whilenotwork_queue.empty(): 8 url=awaitwork_queue.get() 9 print(f"Task {name} getting URL: {url}")10 et=ET()11 asyncwithsession.get(url)asresponse:12 awaitresponse.text()13 print(f"Task {name} total elapsed time: {et():.1f}")14 15 asyncdefmain():16 """17     This is the main entry point for the program.18 """19 # Create the queue of 'work'20 work_queue=asyncio.Queue()21 22 # Put some 'work' in the queue23 forurlin[24 "http://google.com",25 "http://yahoo.com",26 "http://linkedin.com",27 "http://apple.com",28 "http://microsoft.com",29 "http://facebook.com",30 "http://twitter.com",31 ]:32 awaitwork_queue.put(url)33 34 # Run the tasks35 et=ET()36 awaitasyncio.gather(37 asyncio.create_task(task("One",work_queue)),38 asyncio.create_task(task("Two",work_queue)),39 )40 print(f"\nTotal elapsed time: {et():.1f}")41 42 if__name__=="__main__":43 asyncio.run(main())

Here’s what’s happening in this program:

  • Line 2 imports the aiohttp library, which provides an asynchronous way to make HTTP calls.
  • Line 5 marks task() as an asynchronous function.
  • Line 6 creates an aiohttp session context manager.
  • Line 11 creates an aiohttp response context manager. It also makes an HTTP GET call to the URL taken from work_queue.
  • Line 12 uses the response to get the text retrieved from the URL asynchronously.

When you run this program, you’ll see the following output:

Task One getting URL: http://google.comTask Two getting URL: http://yahoo.comTask One total elapsed time: 0.3Task One getting URL: http://linkedin.comTask One total elapsed time: 0.3Task One getting URL: http://apple.comTask One total elapsed time: 0.3Task One getting URL: http://microsoft.comTask Two total elapsed time: 0.9Task Two getting URL: http://facebook.comTask Two total elapsed time: 0.4Task Two getting URL: http://twitter.comTask One total elapsed time: 0.5Task Two total elapsed time: 0.3Total elapsed time: 1.7

Take a look at the total elapsed time, as well as the individual times to get the contents of each URL. You’ll see that the duration is about half the cumulative time of all the HTTP GET calls. This is because the HTTP GET calls are running asynchronously. In other words, you’re effectively taking better advantage of the CPU by allowing it to make multiple requests at once.

Because the CPU is so fast, this example could likely create as many tasks as there are URLs. In this case, the program’s run time would be that of the single slowest URL retrieval.

Conclusion

This article has given you the tools you need to start making asynchronous programming techniques a part of your repertoire. Using Python async features gives you programmatic control of when context switches take place. This means that many of the tougher issues you might see in threaded programming are easier to deal with.

Asynchronous programming is a powerful tool, but it isn’t useful for every kind of program. If you’re writing a program that calculates pi to the millionth decimal place, for instance, then asynchronous code won’t help you. That kind of program is CPU bound, without much IO. However, if you’re trying to implement a server or a program that performs IO (like file or network access), then using Python async features could make a huge difference.

To sum it up, you’ve learned:

  • What synchronous programs are
  • How asynchronous programs are different, but also powerful and manageable
  • Why you might want to write asynchronous programs
  • How to use the built-in async features in Python

You can get the code for all of the example programs used in this tutorial:

Dowload Code:Click here to download the code you'll use to learn about async features in Python in this tutorial.

Now that you’re equipped with these powerful skills, you can take your programs to the next level!


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

tryexceptpass: Command Execution Tricks with Subprocess - Designing CI/CD Systems

$
0
0

The most crucial step in any continuous integration process is the one that executes build instructions and tests their output. There’s an infinite number of ways to implement this step ranging from a simple shell script to a complex task system.

Keeping with the principles of simplicity and practicality, today we’ll look at continuing the series on Designing CI/CD Systems with our implementation of the execution script.

Vinta Software: DjangoCon US 2019: Python & Django in San Diego!

$
0
0
We are back to San Diego!! Our team will be joining DjangoCon US's conference, one of the biggest Django events in the world. For this year, we'll be giving two talks: Pull Requests: Merging good practices into your project and Building effective Django queries with expressions Here is the slide from the talk we gave during the conference: Pull Re

Mike Driscoll: Python Code Kata: Fizzbuzz

$
0
0

A code kata is a fun way for computer programmers to practice coding. They are also used a lot for learning how to implement Test Driven Development (TDD) when writing code. One of the popular programming katas is called FizzBuzz. This is also a popular interview question for computer programmers.

The concept behind FizzBuzz is as follows:

  • Write a program that prints the numbers 1-100, each on a new line
  • For each number that is a multiple of 3, print “Fizz” instead of the number
  • For each number that is a multiple of 5, print “Buzz” instead of the number
  • For each number that is a multiple of both 3 and 5, print “FizzBuzz” instead of the number

Now that you know what you need to write, you can get started!


Creating a Workspace

The first step is to create a workspace or project folder on your machine. For example, you could create a katas folder with a fizzbuzz inside of it.

The next step is to install a source control program. One of the most popular is Git, but you could use something else like Mercurial. For the purposes of this tutorial, you will be using Git. You can get it from the Git website.

Now open up a terminal or run cmd.exe if you are a Windows user. Then navigate in the terminal to your fizzbuzz folder. You can use the cd command to do that. Once you are inside the folder, run the following command:


git init

This will initialize the fizzbuzz folder into a Git repository. Any files or folders that you add inside the fizzbuzz folder can now be added to Git and versioned.


The Fizz Test

To keep things simple, you can create your test file inside of the fizzbuzz folder. A lot of people will save their tests in sub-folder called test or tests and tell their test runner to add the top level folder to sys.path so that the tests can import it.

Note: If you need to brush up on how to use Python’s unittest library, then you might find Python 3 Testing: An Intro to unittest helpful.

Go ahead an create a file called test_fizzbuzz.py inside your fizzbuzz folder.

Now enter the following into your Python file:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
       self.assertEqual(fizzbuzz.process(6), 'Fizz') 
if __name__ == '__main__':
    unittest.main()

Python comes with the unittest library builtin. To use it, all you need to do is import it and subclass unittest.TestCase. Then you can create a series of functions that represent the tests that you want to run.

Note that you also import the fizzbuzz module. You haven’t created that module yet, so you will receive a ModuleNotFoundError when you run this test code. You could create this file without even adding any code other than the imports and have a failing test. But for completeness, you go ahead and assert that fizzbuzz.process(6) returns the correct string.

The fix is to create an empty fizzbuzz.py file. This will only fix the ModuleNotFoundError, but it will allow you to run the test and see its output now.

You can run your test by doing this:


python test_fizzbuzz.py

The output will look something like this:


ERROR: test_multiple_of_three (__main__.TestFizzBuzz)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/michael/Dropbox/code/fizzbuzz/test_fizzbuzz.py", line 7, in test_multiple_of_three
self.assertEqual(fizzbuzz.process(6), 'Fizz')
AttributeError: module 'fizzbuzz' has no attribute 'process'

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

So this tells you that your fizzbuzz module is missing an attribute called process.

You can fix that by adding a process() function to your fizzbuzz.py file:

def process(number):
    if number %3 == 0:
        return'Fizz'

This function accepts a number and uses the modulus operator to divide the number by 3 and check to see if there is a remainder. If there is no remainder, then you know that the number is divisible by 3 so you can return the string “Fizz”.

Now when you run the test, the output should look like this:


.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

The period on the first line above means that you ran one test and it passed.

Let’s take a quick step back here. When a test is failing, it is considered to be in a “red” state. When a test is passing, that is a “green” state. This refers to the Test Driven Development (TDD) mantra of red/green/refactor. Most developers will start a new project by creating a failing test (red). Then they will write the code to make the test pass, usually in the simplest way possible (green).

When your tests are green, that is a good time to commit your test and the code change(s). This allows you to have a working piece of code that you can rollback to. Now you can write a new test or refactor the code to make it better without worrying that you will lose your work because now you have an easy way to roll back to a previous version of the code.

To commit your code, you can do the following:


git add fizzbuzz.py test_fizzbuzz.py
git commit -m "First commit"

The first command will add the two new files. You don’t need to commit *.pyc files, just the Python files. There is a handy file called .gitignore that you can add to your Git repository that you may use to exclude certain file types or folder, such as *.pyc. Github has some default gitignore files for various languages that you can get if you’d like to see an example.

The second command is how you can commit the code to your local repository. The “-m” is for message followed by a descriptive message about the changes that you’re committing. If you would like to save your changes to Github as well (which is great for backup purposes), you should check out this article.

Now we are ready to write another test!


The Buzz Test

The second test that you can write can be for multiples of five. To add a new test, you can create another method in the TestFizzBuzz class:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
        self.assertEqual(fizzbuzz.process(6), 'Fizz') 
    def test_multiple_of_five(self):
        self.assertEqual(fizzbuzz.process(20), 'Buzz') 
if __name__ == '__main__':
    unittest.main()

This time around, you want to use a number that is only divisible by 5. When you call fizzbuzz.process(), you should get “Buzz” returned. When you run the test though, you will receive this:


F.
======================================================================
FAIL: test_multiple_of_five (__main__.TestFizzBuzz)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_fizzbuzz.py", line 10, in test_multiple_of_five
self.assertEqual(fizzbuzz.process(20), 'Buzz')
AssertionError: None != 'Buzz'

----------------------------------------------------------------------
Ran 2 tests in 0.000s

FAILED (failures=1)

Oops! Right now your code uses the modulus operator to check for remainders after dividing by 3. If the number 20 has a remainder, that statement won’t run. The default return value of a function is None, so that is why you end up getting the failure above.

Go ahead and update the process() function to be the following:

def process(number):
    if number %3 == 0:
        return'Fizz'elif number %5 == 0:
        return'Buzz'

Now you can check for remainders with both 3 and 5. When you run the tests this time, the output should look like this:


..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

Yay! Your tests passed and are now green! That means you can commit these changes to your Git repository.

Now you are ready to add a test for FizzBuzz!


The FizzBuzz Test

The next test that you can write will be for when you want to get “FizzBuzz” back. As you may recall, you will get FizzBuzz whenever the number is divisible by 3 and 5. Go ahead and add a third test that does just that:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
        self.assertEqual(fizzbuzz.process(6), 'Fizz') 
    def test_multiple_of_five(self):
        self.assertEqual(fizzbuzz.process(20), 'Buzz') 
    def test_fizzbuzz(self):
        self.assertEqual(fizzbuzz.process(15), 'FizzBuzz') 
if __name__ == '__main__':
    unittest.main()

For this test, test_fizzbuzz, you ask your program to process the number 15. This shouldn’t work right yet, but go ahead and run the test code to check:


F..
======================================================================
FAIL: test_fizzbuzz (__main__.TestFizzBuzz)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_fizzbuzz.py", line 13, in test_fizzbuzz
self.assertEqual(fizzbuzz.process(15), 'FizzBuzz')
AssertionError: 'Fizz' != 'FizzBuzz'

----------------------------------------------------------------------
Ran 3 tests in 0.000s

FAILED (failures=1)

Three tests were run with one failure. You are now back to red. This time the error is ‘Fizz’ != ‘FizzBuzz’ instead of comparing None to FizzBuzz. The reason for that is because your code checks if 15 is divisible by 3 and it is so it returns “Fizz”.

Since that isn’t what you want to happen, you will need to update your code to check if the number is divisible by 3 and 5 before checking for just 3:

def process(number):
    if number %3 == 0and number %5 == 0:
        return'FizzBuzz'elif number %3 == 0:
        return'Fizz'elif number %5 == 0:
        return'Buzz'

Here you do the divisibility check for 3 and 5 first. Then you check for the other two as before.

Now if you run your tests, you should get the following output:


...
----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK

So far so good. However you don’t have the code working for returning numbers that aren’t divisible by 3 or 5. Time for another test!


The Final Test

The last thing that your code needs to do is return the number when it does have a remainder when divided by 3 and 5. Let’s test it a couple of different ways:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
        self.assertEqual(fizzbuzz.process(6), 'Fizz') 
    def test_multiple_of_five(self):
        self.assertEqual(fizzbuzz.process(20), 'Buzz') 
    def test_fizzbuzz(self):
        self.assertEqual(fizzbuzz.process(15), 'FizzBuzz') 
    def test_regular_numbers(self):
        self.assertEqual(fizzbuzz.process(2), 2)self.assertEqual(fizzbuzz.process(98), 98) 
if __name__ == '__main__':
    unittest.main()

For this test, you test normal numbers 2 and 98 with the test_regular_numbers() test. These numbers will always have a remainder when divided by 3 or 5, so they should just be returned.

When you run the tests now, you should get something like this:


...F
======================================================================
FAIL: test_regular_numbers (__main__.TestFizzBuzz)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_fizzbuzz.py", line 16, in test_regular_numbers
self.assertEqual(fizzbuzz.process(2), 2)
AssertionError: None != 2

----------------------------------------------------------------------
Ran 4 tests in 0.000s

FAILED (failures=1)

This time you are back to comparing None to the number, which is what you probably suspected would be the output.

Go ahead and update the process() function as follows:

def process(number):
    if number %3 == 0and number %5 == 0:
        return'FizzBuzz'elif number %3 == 0:
        return'Fizz'elif number %5 == 0:
        return'Buzz'else:
        return number

That was easy! All you needed to do at this point was add an else statement that returns the number.

Now when you run the tests, they should all pass:


....
----------------------------------------------------------------------
Ran 4 tests in 0.000s

OK

Good job! Now your code works. You can verify that it works for all the numbers, 1-100, by adding the following to your fizzbuzz.py module:

if __name__ == '__main__':
    for i inrange(1, 101):
        print(process(i))

Now when you run fizzbuzz yourself using python fizzbuzz.py, you should see the appropriate output that was specified at the beginning of this tutorial.

This is a good time to commit your code and push it to the cloud.


Wrapping Up

Now you know the basics of using Test Driven Development to drive you to solve a coding kata. Python’s unittest module has many more types of asserts and functionality than is covered in this brief tutorial. You could also modify this tutorial to use pytest, another popular 3rd party Python package that you can use in place of Python’s own unittest module.

The nice thing about having these tests is that now you can refactor your code and verify you didn’t break anything by running the tests. This also allows you to add new features more easily without breaking existing features. Just be sure to add more tests as you add more features.


Related Reading

The post Python Code Kata: Fizzbuzz appeared first on The Mouse Vs. The Python.

Audrey Roy Greenfeld: Voronoi Mandalas

$
0
0
SciPy has tools for creating Voronoi tessellations. Besides the obvious data science applications, you can use them to make pretty art like this:
The above was generated by this code:




I started with Carlos Focil's mandalapy code, modifying the parameters until I had a design I liked. I decided to make the Voronoi diagram show both points and vertices, and I gave it an equal aspect ratio. Carlos' mandalapy code is a port of Antonio Sánchez Chinchón's inspiring work drawing mandalas with R, using the deldir library to plot Voronoi tesselations.

Dataquest: xlwings Tutorial: Make Excel Faster Using Python


EuroPython: EuroPython 2019 - Videos for Thursday available

$
0
0

With a slight delay, we are pleased to announce the second batch of cut videos from EuroPython 2019 in Basel, Switzerland, with another 45 videos.

image

EuroPython 2019 YouTube Channel

In this batch, we have included all videos for Thursday, July 11 2019, the second conference day.

In the coming week we will publish videos for the final conference day. In total, we will have more than 130 videos available for you to watch.

All EuroPython videos, including the ones from previous conferences, are available on our EuroPython YouTube Channel.

Enjoy,

EuroPython 2019 Team
https://ep2019.europython.eu/
https://www.europython-society.org/

Wingware Blog: Debugging Python Code Running in Docker Containers with Wing 7

$
0
0

Docker is a containerization system that uses a relatively light-weight form of virtualization to package and isolate application components from the host system, making it easier to spin up uniformly configured virtual machines for use in application development, testing, and deployment.

Wing 7 can be used to develop and debug Python code running inside of Docker containers. This is accomplished by setting up a mapping of local (host-side) directories into the container, and then configuring Wing so it can accept debug connections from the container.

Prerequisites

Before you can work with Docker you will need to download and install it.

On Windows and macOS, downloading Docker Desktop from the Docker website is the easiest way to install it. Be sure to launch the Docker Desktop after you install it, so the daemon is started.

On most Linux distributions, Docker CE (the free community edition) can be installed with the docker-engine package as described here.

You should also install Wing Pro if you don't already have it.

Create a Working Example

Next set up a small real world example by creating a directory docker and placing the following files into it.

Dockerfile:

FROM python:3.7
WORKDIR /app
RUN pip install --trusted-host pypi.python.org Flask
EXPOSE 80
CMD ["python", "app.py"]

app.py:

fromflaskimportFlaskapp=Flask(__name__)@app.route("/")defhello():return"<h3>Hello World!</h3>Your app is working.<br/></br/>"if__name__=="__main__":app.run(host='0.0.0.0',port=80,use_reloader=True)

Then build the Docker container by typing the following in the docker directory:

docker build --tag=myapp .

You can now run your container like this:

docker run -v "/path/to/docker":/app -p 4000:80 myapp

You will need to substitute /path/to/docker with the path to the docker directory you created above; the quotes make it work if the path has spaces in it.

You can now try this tiny Flask- web app by pointing a browser running on your host system at it:

If you are using Docker Desktop, then use http://localhost:4000/

If you are using Docker CE, you will need to determine the IP address of your container and use that instead of localhost. One way to do this is to type dockerps to find the Container ID for your container and then use it in the following in place of c052478b0f8a:

docker inspect -f "{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}" c052478b0f8a

Notice that if you make a change to app.py in Wing, then the change will be reflected in your browser when you reload the page. This is due to using both the -v argument for dockerrun to mount a volume in the container, and the fact that app.run() for Flask is being passed use_reloader=True.

Configure Debugging

In order to debug app.py in Wing, you will need to copy in and configure some additional files that allow invocation of Wing's debugger and connection to the IDE.

(1) Install the debugger

To access Wing's debugger on the container, add another -v mapping to your dockerrun command line, so the Wing installation on the host is made available to the container. For example on Windows:

docker run -v "C:/Program Files (x86)/Wing Pro 7.1":/wingpro7 -v C:/Users/test/docker:/app myapp

Or on Linux:

docker run -v /usr/lib/wingpro7:/wingpro7 -v /home/test/docker:/app myapp

Or for macOS:

docker run -v /Applications/WingPro.app/Contents/Resources:/wingpro7 -v /Users/test/docker:/app myapp

You will need to substitute the correct installation location for Wing on your host, which can be seen in Wing's About box, and the full path to the docker directory you created earlier.

Mapping the Wing installation across OSes (for example from Windows host to Linux container) works because Wing's installation has all the files necessary files for debugging on every supported OS.

(2) Copy and configure wingdbstub.py

Debugging is initiated on the Docker side by importing Wing's wingdbstub module. To use this, copy wingdbstub.py from your Wing installation to your mapped directory on the host. For example on a Windows host:

copy "C:/Program Files (x86)/Wing Pro 7.1/wingdbstub.py" C:/Users/test/docker

Or on a Linux host:

cp /usr/lib/wingpro7/wingdbstub.py /home/test/docker

Or a macOS host:

cp /Applications/WingPro.app/Contents/Resources/wingdbstub.py /Users/test/docker

After copying, you will need to edit the file to change kWingHostPost from localhost:50005 to a value that uses the IP address or name of the host computer, for example if your host's IP address is 192.168.1.50:

kWingHostPort='192.168.1.50:50005'

You will also need to set WINGHOME to the location where you have mapped your Wing installation on the container:

WINGHOME='/wingpro7'

(3) Enable access

Next you need to copy the authentication token file wingdebugpw from the SettingsDirectory listed in Wing's About box to the same directory as your copy of wingdbstub.py, in this case the docker directory on the host system.

Then add to the Debugger>Advanced>AllowedHosts preference either the host's IP address (if using Docker Desktop) or the container's IP address determined with dockerinspect as described above (if using Docker CE). The host IP is used for Docker Desktop because of how it configures networking for containers; there is access from container to host but no access in the other direction, so the host thinks it is receiving a connection from one of its own network interfaces.

You will also need to tell Wing to listen for debug connections initiated from the outside by clicking on the bug icon in the lower left of Wing's window and enabling AcceptDebugConnections.

(4) Establish a file mapping

If the docker directory you mapped with the -v option for dockerrun does not appear on the same path on the host and container then you will need to communicate the mapping to Wing as well, with the Debugger>Advanced>LocationMap preference.

For the dockerrun example above and container IP address of 172.17.0.2 you would add an entry as follows:

Remote IP Address: 172.17.0.2
File Mappings:
    🔘 Specify Mapping
    Remote: /app
    Local: C:/Users/test/docker

This step could be skipped entirely if the location of files on the container and the host are the same (for example using /app also on the host instead of creating a directory named docker).

Also, if using Docker Desktop where the container IP is the same as the hosts's, it is important to choose a location for the container side of the mapping that either (a) does not exist on the host, or (b) is the same as the location on the host. If the directory exists on the host but has different Python files in it, the LocationMap will be incorrectly applied to them if you try to debug them.

(5) Initiate debug

Once these steps are complete, you can initiate debug from Python code running in the container by importing the module wingdbstub.py as follows:

importwingdbstub

This can be added as the first line of app.py. After saving the file, Flask should auto-reload it, which will initiate debug and connect to the IDE so that the bug icon in the lower left of Wing's Window turns green and the toolbar changes to its debug configuration. The application keeps running until it reaches a breakpoint or exception.

Next set a breakpoint by clicking leftmost margin to the left of the return statement in app.py and then trigger it by reloading the page in your browser. Now you can use Wing to step through and inspect the data being used in the debug process.

To learn more about Wing's debugger, take a look at the Tutorial in Wing's Help menu or the DebuggingCode section of the Quick Start Guide.

Trouble-Shooting

If your configuration does not work, try setting kLogFile in your copy of wingdbstub.py to see whether the debugger is reporting errors. Also, looking at the end of ide.log in the SettingsDirectory listed in Wing's About box may reveal why a connection is failing, if it is being refused by the IDE.

Setting kExitOnFailure in your copy of wingdbstub.py is another way to see why debug or the connection to the IDE is failing. In this case, when you restart the container it will fail to start and print a message indicating the error encountered during importwingdbstub.

If the debug connection is established but breakpoints are not reached, the LocationMap preference is likely incorrect. One way to diagnose this is to add assert0 to your code. Wing will always stop on that and will report the file it thinks it should be opening in the Exceptions tool.

And, as always, don't hesitate to email support@wingware.com for help.

Notes

Docker CE (but not Docker Desktop) is sometimes used to host a more complete installation of Linux, acting more like a stand-alone system that includes the ability to ssh from the host system into the container. In this case, Wing Pro's Remote Development capability can be used, with much less manual configuration, to debug code running under Docker. For more information, see Remote Python Development (if the debug process can be launched from the IDE) or Remote Web Development (if the debug process is launched from outside of the IDE).



That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

Ned Batchelder: Coverage.py 5.0a7, and the future of pytest-cov

$
0
0

Progress continues in the Python coverage world. Two recent things: first, the latest alpha of Coverage.py 5.0 is available: 5.0a7. Second, pytest-cov is supporing coverage.py 5.0, and we’re talking about the future of pytest-cov.

There are two big changes in Coverage.py 5.0a7. First, there is a new reporting command: coverage json produces a JSON file with information similar to the XML report. In coverage.py 4.x, the data storage was a lightly cloaked JSON file. That file was not in a supported format, and in fact, it is gone in 5.0. This command produces a supported JSON format for people who want programmatic access to details of the coverage data. A huge thanks to Matt Bachmann for implementing it.

The second big change is to the SQL schema in the 5.x data file, which is a SQLite database. Previously, each line measured produced a row in the “line” table. But this proved too bulky for large projects. Now line numbers are stored in a compact binary form. There is just one row in the “line_bits” table for each file and context measured. This makes it more difficult to use the data with ad-hoc queries. Coverage provides functions for working with the line number bitmaps, but I’m interested in other ideas about how to make the data more usable.

The pytest-cov changes are to support coverage.py 5.0. Those changes are already on the master branch.

I’m also working on a pull request to add a --cov-contexts=test option so that pytest can announce when tests change, for accurate and detailed dynamic contexts.

Longer-term, I’d like to shrink the size of the pytest-cov plugin. Pytest should be about running tests, not reporting on coverage after the tests are run. Too much of the code, and too many of the bug reports, are due to it trying to take on more than it needs to. The command-line arguments are getting convoluted, for no good reason. I’ve written an issue to get feedback: Proposal: pytest-cov should do less. If you have opinions one way or the other, that would be a good place to talk about them.

Real Python: Thonny: The Beginner-Friendly Python Editor

$
0
0

Are you a Python beginner looking for a tool that can support your learning? This course is for you! Every programmer needs a place to write their code. This course will cover an awesome tool called Thonny that will enable you to start working with Python in a beginner-friendly environment.

In this course, you’ll learn:

  • How to install Thonny on your computer
  • How to navigate Thonny’s user interface to use its built-in features
  • How to use Thonny to write and run your code
  • How to use Thonny to debug your code

By the end of this course, you’ll be comfortable with the development workflow in Thonny and ready to use it for your Python learning. So what is Thonny? Great question!

Thonny is a free Python Integrated Development Environment (IDE) that was especially designed with the beginner Pythonista in mind. Specifically, it has a built-in debugger that can help when you run into nasty bugs, and it allows you to do step-through expression evaluation, among other really awesome features.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Jean-Paul Calderone: Tahoe-LAFS on Python 3 - Call for Porters

$
0
0
Hello Pythonistas,

Earlier this year a number of Tahoe-LAFS community members began an effort to port Tahoe-LAFS from Python 2 to Python 3.  Around five people are currently involved in a part-time capacity.  We wish to accelerate the effort to ensure a Python 3-compatible release of Tahoe-LAFS can be made before the end of upstream support for CPython 2.x.

Tahoe-LAFS is a Free and Open system for private, secure, decentralized storage.  It encrypts and distributes your data across multiple servers.  If some of the servers fail or are taken over by an attacker, the entire file store continues to function correctly, preserving your privacy and security.

Foolscap, a dependency of Tahoe-LAFS, is also being ported.  Foolscap is an object-capability-based RPC protocol with flexible serialization.

Some details of the porting effort are available in a milestone on the Tahoe-LAFS trac instance.

For this help, we are hoping to find a person/people with significant prior Python 3 porting experience and, preferably, some familiarity with Twisted, though in general the Tahoe-LAFS project welcomes contributors of all backgrounds and skill levels.

We would prefer someone to start with us as soon as possible and no later than October 15th. If you are interested in this opportunity, please send us any questions you have, as well as details of your availability and any related work you have done previously (GitHub, LinkedIn links, etc). If you would like to find out more about this opportunity, please contact us at jessielisbetfrance at gmail (dot) com or on IRC in #tahoe-lafs on Freenode.

Python Software Foundation: The Python Software Foundation has updated its Code of Conduct

$
0
0
The Python community values members who are accepting, helpful, and respectful: for many years, the Python Software Foundation (PSF) has had an organization-wide Code of Conduct that defines these values, and behaviors that we want to have in our community. The Foundation has also insisted for years that every event that we sponsor have a Code of Conduct in place.

But spaces where our community meets – online, or in person – need a Code of Conduct that does more than just emphasize our values. The PSF’s flagship conference, PyCon US, has had its own Code of Conduct – separate from the PSF Code of Conduct – for many years. The PyCon US Code of Conduct not only highlights our community’s values, but it also identified behaviors that are not acceptable at the conference, explained how to report violations, and included enforcement procedures.

The PSF Board approved a new organization-wide Code of Conduct and enforcement guidelines at the August 2019 board meeting, and reporting guidelines at the September 2019 board meeting, taking effect immediately.

Our new Code of Conduct brings together the statement of values defined in the former PSF Code of Conduct, and enforcement guidelines – proven through our experience at PyCon US – that the PSF can now apply to every space that we oversee.

It saves the PSF from having to enforce two Codes of Conduct: one for PyCon US, and another for our other spaces. In crafting the Code of Conduct, we undertook an intentional effort to account for the unique needs of an international community that spans all seven continents on Earth.

Community members will now know that if they’re participating in an online space, a project, or an event facilitated by the PSF they will be subject to the same Code of Conduct, and will be able to report incidents in the same way.

The process of defining the new Code of Conduct was led by the PSF’s Conduct Working Group, which the PSF established in 2018. The PSF worked with Sage Sharp of Otter Tech to produce the draft of the new Code of Conduct. Sage has previously worked on the Codes of Conduct for Open Source communities including the Data Carpentries, Elastic Search, and GNOME, and previously worked with the PSF on modernizing PyCon US’ Code of Conduct and incident response procedures. 

In the future, the Conduct Working Group will help the Board oversee the reporting and enforcement of Code of Conduct reports, following the enforcement guidelines that accompany the new Code of Conduct.

The Board thanks the Conduct Working Group, and Sage Sharp for their invaluable service in getting our new Code of Conduct in place.

PyCoder’s Weekly: Issue #387 (Sept. 24, 2019)

$
0
0

#387 – SEPTEMBER 24, 2019
View in Browser »

The PyCoder’s Weekly Logo


Python Debugging With pdb

Learn the basics of using pdb, Python’s interactive source code debugger. pdb is a great tool for tracking down hard-to-find bugs, and it allows you to fix faulty code more quickly.
REAL PYTHONvideo

How Do You Verify That PyPI Can Be Trusted?

“A co-worker of mine attended a technical talk about how Go’s module mirror works and he asked me whether there was something there that Python should do.”
BRETT CANNON

Automated Python Code Reviews, Directly From Your Git Workflow

alt

Take the hassle out of code reviews - Codacy flags errors so you can fix them quickly. Address security concerns, duplication, complexity, drops in coverage, and style violations before you merge. Integrates seamlessly with GitHub, Bitbucket, and GitLab →
CODACYsponsor

What’s in a Name? Tales of Python, Perl, and the GIMP

Fun read about the challenges of naming and renaming open-source projects.
SVEN GREGORI

PyCascades 2020 CFP

The PyCascades call for proposals is open until October 1st.
PAPERCALL.IO

Discussions

Python Jobs

Backend Developer (Kfar Saba, Israel)

3DSignals

Senior Backend Engineer (Remote)

Close

More Python Jobs >>>

Articles & Tutorials

Getting Started With Async Features in Python

Get the tools you need to start making asynchronous programming techniques a part of your repertoire. You’ll learn how to use Python async features to take advantage of IO processes and free up your CPU.
REAL PYTHON

Don’t Pickle Your Data (2014)

“Pretty much every Python programmer out there has broken down at one point and and used the ‘pickle’ module for writing objects out to disk. […] However, using pickle is still a terrible idea that should be avoided whenever possible.”
BEN FREDERICKSONopinion

Python Developers Are in Demand on Vettery

alt

Vettery is an online hiring marketplace that’s changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today →
VETTERYsponsor

Designing CI/CD Systems: Command Execution Tricks With subprocess

Use Python’s subprocess module to execute instructions inside a Docker container that builds and tests your code in an automated CI/CD system.
CRISTIAN MEDINA

Thonny: The Beginner-Friendly Python Editor

Learn all about Thonny, a free Python Integrated Development Environment (IDE) that was especially designed with the beginner Pythonista in mind. It has a built-in debugger and allows you to do step-through expression evaluation.
REAL PYTHONvideo

Placing matplotlib Titles

Matplotlib titles have configurable locations. And you can have more than one at once… This short tutorial shows you how to use this feature.
RTWILSON.COM

Game of Thrones and Network Theory

Analyzing Game of Thrones data to identify character importance, factions and gender interactions using Network Theory and Python.
RABEEZ RIAZ• Shared by Nathan Piccini

Learn Functional Python in 10 Minutes

You’ll learn what the functional paradigm is as well as how to use the basics of functional programming in Python.
BRANDON SKERRITT

Simple Image Filters With OpenCV and Python

How to enhance your images with colored filters and add border backgrounds using Python and OpenCV.
APURVA MEHTA

“Level Up Your Python” Humble Bundle

Support Pythonic charities like the PSF and get books, software, and videos collectively valued at $867 for a pay-what-you-want price.
HUMBLEBUNDLE.COMsponsor

Projects & Code

hypothesis-auto: Python Tests That Write Themselves

An extension for the Hypothesis project that enables fully automatic tests for type annotated functions.
TIMOTHYCROSLEY.GITHUB.IO

Python Grids: Python Package Comparison Grids

Comparison grids to help you find great Python packages for your projects.
PYTHONGRIDS.ORG

Events

DjangoCon US

September 22 to September 28, 2019
DJANGOCON.US

PyWeek 28

September 22 to September 30, 2019
PYWEEK.ORG

PyCon Estonia

October 3 to October 4, 2019
PYCON.EE

PyCon Balkan 2019

October 3 to October 6, 2019
PYCONBALKAN.COM


Happy Pythoning!
This was PyCoder’s Weekly Issue #387.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]


eGenix.com: Python Meeting Düsseldorf - 2019-09-25

$
0
0

The following text is in German, since we're announcing a regional user group meeting in Düsseldorf, Germany.

Ankündigung

Das nächste Python Meeting Düsseldorf findet an folgendem Termin statt:

25.09.2019, 18:00 Uhr
Raum 1, 2.OG im Bürgerhaus Stadtteilzentrum Bilk
Düsseldorfer Arcaden, Bachstr. 145, 40217 Düsseldorf


Programm

Bereits angemeldete Vorträge

Jochen Wersdorfer
        "Datasette"

Charlie Clark
        "Python for Android"

Andreas Bollig
        "Plotly Dash"

Weitere Vorträge können gerne noch angemeldet werden. Bei Interesse, bitte unter info@pyddf.de melden.

Startzeit und Ort

Wir treffen uns um 18:00 Uhr im Bürgerhaus in den Düsseldorfer Arcaden.

Das Bürgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet sich an der Seite der Tiefgarageneinfahrt der Düsseldorfer Arcaden.

Über dem Eingang steht ein großes "Schwimm’ in Bilk" Logo. Hinter der Tür direkt links zu den zwei Aufzügen, dann in den 2. Stock hochfahren. Der Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.

>>> Eingang in Google Street View

Einleitung

Das Python Meeting Düsseldorf ist eine regelmäßige Veranstaltung in Düsseldorf, die sich an Python Begeisterte aus der Region wendet.

Einen guten Überblick über die Vorträge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der Vorträge nach den Meetings veröffentlichen.

Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, Düsseldorf:

Programm

Das Python Meeting Düsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.

Vorträge können vorher angemeldet werden, oder auch spontan während des Treffens eingebracht werden. Ein Beamer mit XGA Auflösung steht zur Verfügung.

(Lightning) Talk Anmeldung bitte formlos per EMail an info@pyddf.de

Kostenbeteiligung

Das Python Meeting Düsseldorf wird von Python Nutzern für Python Nutzer veranstaltet.

Da Tagungsraum, Beamer, Internet und Getränke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. Schüler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.

Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.

Anmeldung

Da wir nur für ca. 20 Personen Sitzplätze haben, möchten wir bitten, sich per EMail anzumelden. Damit wird keine Verpflichtung eingegangen. Es erleichtert uns allerdings die Planung.

Meeting Anmeldung bitte per Meetup oder formlos per EMail an info@pyddf.de

Weitere Informationen

Weitere Informationen finden Sie auf der Webseite des Meetings:

              http://pyddf.de/

Viel Spaß !

Marc-Andre Lemburg, eGenix.com

PyCharm: PyCharm 2019.2.3

$
0
0

PyCharm 2019.2.3 is now available!

Fixed in this Version

  • We solved an issue that was causing an error when invoking string literals.
  • Regression errors were fixed in the debugger:
    • The usage of multiprocessing on programs resulted on the inability to debug them and that was solved.
    • We fixed the issue that caused uncaught exceptions not to show traceback data.
    • Breakpoint usage while executing code in debug mode was causing different execution behaviors resulting in errors and that was fixed.
  • The problem causing Python interpreter not to be properly set if downloaded from the Windows Store was fixed.

Further Improvements

  • The SVN performance was improved to avoid unnecessary processing of SVN operations’ results. Now projects with large amount of files using such version control system will experience enhanced processing times.
  • Some other platform issues were solved as well and much more, check out our release notes for more details.

Getting the New Version

You can update PyCharm by choosing Help | Check for Updates (or PyCharm | Check for Updates on macOS) in the IDE. PyCharm will be able to patch itself to the new version, there should no longer be a need to run the full installer.

If you’re on Ubuntu 16.04 or later, or any other Linux distribution that supports snap, you should not need to upgrade manually, you’ll automatically receive the new version.

Real Python: How to Use Generators and yield in Python

$
0
0

Have you ever had to work with a dataset so large that it overwhelmed your machine’s memory? Or maybe you have a complex function that needs to maintain an internal state every time it’s called, but the function is too small to justify creating its own class. In these cases and more, generators and the Python yield statement are here to help.

By the end of this article, you’ll know:

  • What generators are and how to use them
  • How to create generator functions and expressions
  • How the Python yield statement works
  • How to use multiple Python yield statements in a generator function
  • How to use advanced generator methods
  • How to build data pipelines with multiple generators

If you’re a beginner or intermediate Pythonista and you’re interested in learning how to work with large datasets in a more Pythonic fashion, then this is the tutorial for you.

You can get a copy of the dataset used in this tutorial by clicking the link below:

Download Dataset:Click here to download the dataset you'll use in this tutorial to learn about generators and yield in Python.

Using Generators

Introduced with PEP 255, generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory. For an overview of iterators in Python, take a look at Python “for” Loops (Definite Iteration).

Now that you have a rough idea of what a generator does, you might wonder what they look like in action. Let’s take a look at two examples. In the first, you’ll see how generators work from a bird’s eye view. Then, you’ll zoom in and examine each example more thoroughly.

Example 1: Reading Large Files

A common use case of generators is to work with data streams or large files, like CSV files. These text files separate data into columns by using commas. This format is a common way to share data. Now, what if you want to count the number of rows in a CSV file? The code block below shows one way of counting those rows:

csv_gen=csv_reader("some_csv.txt")row_count=0forrowincsv_gen:row_count+=1print(f"Row count is {row_count}")

Looking at this example, you might expect csv_gen to be a list. To populate this list, csv_reader() opens a file and loads its contents into csv_gen. Then, the program iterates over the list and increments row_count for each row.

This is a reasonable explanation, but would this design still work if the file is very large? What if the file is larger than the memory you have available? To answer this question, let’s assume that csv_reader() just opens the file and reads it into an array:

defcsv_reader(file_name):file=open(file_name)result=file.read().split("\n")returnresult

This function opens a given file and uses file.read() along with .split() to add each line as a separate element to a list. If you were to use this version of csv_reader() in the row counting code block you saw further up, then you’d get the following output:

>>>
Traceback (most recent call last):
  File "ex1_naive.py", line 22, in <module>main()
  File "ex1_naive.py", line 13, in maincsv_gen=csv_reader("file.txt")
  File "ex1_naive.py", line 6, in csv_readerresult=file.read().split("\n")MemoryError

In this case, open() returns a generator object that you can lazily iterate through line by line. However, file.read().split() loads everything into memory at once, causing the MemoryError.

Before that happens, you’ll probably notice your computer slow to a crawl. You might even need to kill the program with a KeyboardInterrupt. So, how can you handle these huge data files? Take a look at a new definition of csv_reader():

defcsv_reader(file_name):forrowinopen(file_name,"r"):yieldrow

In this version, you open the file, iterate through it, and yield a row. This code should produce the following output, with no memory errors:

Row count is 64186394

What’s happening here? Well, you’ve essentially turned csv_reader() into a generator function. This version opens a file, loops through each line, and yields each row, instead of returning it.

You can also define a generator expression (also called a generator comprehension), which has a very similar syntax to list comprehensions. In this way, you can use the generator without calling a function:

csv_gen=(rowforrowinopen(file_name))

This is a more succinct way to create the list csv_gen. You’ll learn more about the Python yield statement soon. For now, just remember this key difference:

  • Using yield will result in a generator object.
  • Using return will result in the first line of the file only.

Example 2: Generating an Infinite Sequence

Let’s switch gears and look at infinite sequence generation. In Python, to get a finite sequence, you call range() and evaluate it in a list context:

>>>
>>> a=range(5)>>> list(a)[0, 1, 2, 3, 4, 5]

Generating an infinite sequence, however, will require the use of a generator, since your computer memory is finite:

definfinite_sequence():num=0whileTrue:yieldnumnum+=1

This code block is short and sweet. First, you initialize the variable num and start an infinite loop. Then, you immediately yield num so that you can capture the initial state. This mimics the action of range().

After yield, you increment num by 1. If you try this with a for loop, then you’ll see that it really does seem infinite:

>>>
>>> foriininfinite_sequence():... print(i,end=" ")...0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 2930 31 32 33 34 35 36 37 38 39 40 41 42[...]6157818 6157819 6157820 6157821 6157822 6157823 6157824 6157825 6157826 61578276157828 6157829 6157830 6157831 6157832 6157833 6157834 6157835 6157836 61578376157838 6157839 6157840 6157841 6157842KeyboardInterruptTraceback (most recent call last):
  File "<stdin>", line 2, in <module>

The program will continue to execute until you stop it manually.

Instead of using a for loop, you can also call next() on the generator object directly. This is especially useful for testing a generator in the console:

>>>
>>> gen=infinite_sequence()>>> next(gen)0>>> next(gen)1>>> next(gen)2>>> next(gen)3

Here, you have a generator called gen, which you manually iterate over by repeatedly calling next(). This works as a great sanity check to make sure your generators are producing the output you expect.

Note: When you use next(), Python calls .__next__() on the function you pass in as a parameter. There are some special effects that this parameterization allows, but it goes beyond the scope of this article. Experiment with changing the parameter you pass to next() and see what happens!

Example 3: Detecting Palindromes

You can use infinite sequences in many ways, but one practical use for them is in building palindrome detectors. A palindrome detector will locate all sequences of letters or numbers that are palindromes. These are words or numbers that are read the same forward and backward, like 121. First, define your numeric palindrome detector:

defis_palindrome(num):# Skip single-digit inputsifnum//10==0:returnFalsetemp=numreversed_num=0whiletemp!=0:reversed_num=(reversed_num*10)+(temp%10)temp=temp//10ifnum==reversed_num:returnnumelse:returnFalse

Don’t worry too much about understanding the underlying math in this code. Just note that the function takes an input number, reverses it, and checks to see if the reversed number is the same as the original. Now you can use your infinite sequence generator to get a running list of all numeric palindromes:

>>>
>>> foriininfinite_sequence():... pal=is_palindrome(i)... ifpal:... print(pal)...112233[...]997999989999999100001101101102201KeyboardInterruptTraceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 5, in is_palindrome

In this case, the only numbers that are printed to the console are those that are the same forward or backward.

Note: In practice, you’re unlikely to write your own infinite sequence generator. The itertools module provides a very efficient infinite sequence generator with itertools.count().

Now that you’ve seen a simple use case for an infinite sequence generator, let’s dive deeper into how generators work.

Understanding Generators

So far, you’ve learned about the two primary ways of creating generators: by using generator functions and generator expressions. You might even have an intuitive understanding of how generators work. Let’s take a moment to make that knowledge a little more explicit.

Generator functions look and act just like regular functions, but with one defining characteristic. Generator functions use the Python yield keyword instead of return. Recall the generator function you wrote earlier:

definfinite_sequence():num=0whileTrue:yieldnumnum+=1

This looks like a typical function definition, except for the Python yield statement and the code that follows it. yield indicates where a value is sent back to the caller, but unlike return, you don’t exit the function afterward.

Instead, the state of the function is remembered. That way, when next() is called on a generator object (either explicitly or implicitly within a for loop), the previously yielded variable num is incremented, and then yielded again. Since generator functions look like other functions and act very similarly to them, you can assume that generator expressions are very similar to other comprehensions available in Python.

Note: Are you rusty on Python’s list, set, and dictionary comprehensions? You can check out Using List Comprehensions Effectively.

Building Generators With Generator Expressions

Like list comprehensions, generator expressions allow you to quickly create a generator object in just a few lines of code. They’re also useful in the same cases where list comprehensions are used, with an added benefit: you can create them without building and holding the entire object in memory before iteration. In other words, you’ll have no memory penalty when you use generator expressions. Take this example of squaring some numbers:

>>>
>>> nums_squared_lc=[num**2fornuminrange(5)]>>> nums_squared_gc=(num**2fornuminrange(5))

Both nums_squared_lc and nums_squared_gc look basically the same, but there’s one key difference. Can you spot it? Take a look at what happens when you inspect each of these objects:

>>>
>>> nums_squared_lc[0, 1, 4, 9, 16]>>> nums_squared_gc<generator object <genexpr> at 0x107fbbc78>

The first object used brackets to build a list, while the second created a generator expression by using parentheses. The output confirms that you’ve created a generator object and that it is distinct from a list.

Profiling Generator Performance

You learned earlier that generators are a great way to optimize memory. While an infinite sequence generator is an extreme example of this optimization, let’s amp up the number squaring examples you just saw and inspect the size of the resulting objects. You can do this with a call to sys.getsizeof():

>>>
>>> importsys>>> nums_squared_lc=[i*2foriinrange(10000)]>>> sys.getsizeof(nums_squared_lc)87624>>> nums_squared_gc=(i**2foriinrange(10000))>>> print(sys.getsizeof(nums_squared_gc))120

In this case, the list you get from the list comprehension is 87,624 bytes, while the generator object is only 120. This means that the list is over 700 times larger than the generator object!

There is one thing to keep in mind, though. If the list is smaller than the running machine’s available memory, then list comprehensions can be faster to evaluate than the equivalent generator expression. To explore this, let’s sum across the results from the two comprehensions above. You can generate a readout with cProfile.run():

>>>
>>> importcProfile>>> cProfile.run('sum([i * 2 for i in range(10000)])')         5 function calls in 0.001 seconds   Ordered by: standard name   ncalls  tottime  percall  cumtime  percall filename:lineno(function)        1    0.001    0.001    0.001    0.001 <string>:1(<listcomp>)        1    0.000    0.000    0.001    0.001 <string>:1(<module>)        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}>>> cProfile.run('sum((i * 2 for i in range(10000)))')         10005 function calls in 0.003 seconds   Ordered by: standard name   ncalls  tottime  percall  cumtime  percall filename:lineno(function)    10001    0.002    0.000    0.002    0.000 <string>:1(<genexpr>)        1    0.000    0.000    0.003    0.003 <string>:1(<module>)        1    0.000    0.000    0.003    0.003 {built-in method builtins.exec}        1    0.001    0.001    0.003    0.003 {built-in method builtins.sum}        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Here, you can see that summing across all values in the list comprehension took about a third of the time as summing across the generator. If speed is an issue and memory isn’t, then a list comprehension is likely a better tool for the job.

Note: These measurements aren’t only valid for objects made with generator expressions. They’re also the same for objects made from the analogous generator function since the resulting generators are equivalent.

Remember, list comprehensions return full lists, while generator expressions return generators. Generators work the same whether they’re built from a function or an expression. Using an expression just allows you to define simple generators in a single line, with an assumed yield at the end of each inner iteration.

The Python yield statement is certainly the linchpin on which all of the functionality of generators rests, so let’s dive into how yield works in Python.

Understanding the Python Yield Statement

On the whole, yield is a fairly simple statement. Its primary job is to control the flow of a generator function in a way that’s similar to return statements. As briefly mentioned above, though, the Python yield statement has a few tricks up its sleeve.

When you call a generator function or use a generator expression, you return a special iterator called a generator. You can assign this generator to a variable in order to use it. When you call special methods on the generator, such as next(), the code within the function is executed up to yield.

When the Python yield statement is hit, the program suspends function execution and returns the yielded value to the caller. (In contrast, return stops function execution completely.) When a function is suspended, the state of that function is saved. This includes any variable bindings local to the generator, the instruction pointer, the internal stack, and any exception handling.

This allows you to resume function execution whenever you call one of the generator’s methods. In this way, all function evaluation picks back up right after yield. You can see this in action by using multiple Python yield statements:

>>>
>>> defmulti_yield():... yield_str="This will print the first string"... yieldyield_str... yield_str="This will print the second string"... yieldyield_str...>>> multi_obj=multi_yield()>>> print(next(multi_obj))This will print the first string>>> print(next(multi_obj))This will print the second string>>> print(next(multi_obj))Traceback (most recent call last):
  File "<stdin>", line 1, in <module>StopIteration

Take a closer look at that last calls to next(). You can see that execution has blown up with a traceback. This is because generators, like all iterators, can be exhausted. Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next(), then instead you’ll get an explicit StopIteration exception.

Note:StopIteration is a natural exception that’s raised to signal the end of an iterator. for loops, for example, are built around StopIteration. You can even implement your own for loop by using a while loop:

>>>
>>> letters=["a","b","c","y"]>>> it=iter(letters)>>> whileTrue:... try:... letter=next(it)... exceptStopIteration:... break... print(letter)...abcy

You can read more about StopIteration in the Python documentation on exceptions. For more on iteration in general, check out Python “for” Loops (Definite Iteration) and Python “while” Loops (Indefinite Iteration).

yield can be used in many ways to control your generator’s execution flow. The use of multiple Python yield statements can be leveraged as far as your creativity allows.

Using Advanced Generator Methods

You’ve seen the most common uses and constructions of generators, but there are a few more tricks to cover. In addition to yield, generator objects can make use of the following methods:

  • .send()
  • .throw()
  • .close()

How to Use .send()

For this next section, you’re going to build a program that makes use of all three methods. This program will print numeric palindromes like before, but with a few tweaks. Upon encountering a palindrome, your new program will add a digit and start a search for the next one from there. You’ll also handle exceptions with .throw() and stop the generator after a given amount of digits with .close(). First, let’s recall the code for your palindrome detector:

defis_palindrome(num):# Skip single-digit inputsifnum//10==0:returnFalsetemp=numreversed_num=0whiletemp!=0:reversed_num=(reversed_num*10)+(temp%10)temp=temp//10ifnum==reversed_num:returnTrueelse:returnFalse

This is the same code you saw earlier, except that now the program returns strictly True or False. You’ll also need to modify your original infinite sequence generator, like so:

 1 definfinite_palindromes(): 2 num=0 3 whileTrue: 4 ifis_palindrome(num): 5 i=(yieldnum) 6 ifiisnotNone: 7 num=i 8 num+=1

There are a lot of changes here! The first one you’ll see is in line 5, where i = (yield num). Though you learned earlier that yield is a statement, that isn’t quite the whole story.

As of Python 2.5 (the same release that introduced the methods you are learning about now), yield is an expression, rather than a statement. Of course, you can still use it as a statement. But now, you can also use it as you see in the code block above, where i takes the value that is yielded. This allows you to manipulate the yielded value. More importantly, it allows you to .send() a value back to the generator. When execution picks up after yield, i will take the value that is sent.

You’ll also check if i is not None, which could happen if next() is called on the generator object. (This can also happen when you iterate with a for loop.) If i has a value, then you update num with the new value. But regardless of whether or not i holds a value, you’ll then increment num and start the loop again.

Now, take a look at the main function code, which sends the lowest number with another digit back to the generator. For example, if the palindrome is 121, then it will .send() 1000:

pal_gen=infinite_palindromes()foriinpal_gen:digits=len(str(i))pal_gen.send(10**(digits))

With this code, you create the generator object and iterate through it. The program only yields a value once a palindrome is found. It uses len() to determine the number of digits in that palindrome. Then, it sends 10 ** digits to the generator. This brings execution back into the generator logic and assigns 10 ** digits to i. Since i now has a value, the program updates num, increments, and checks for palindromes again.

Once your code finds and yields another palindrome, you’ll iterate via the for loop. This is the same as iterating with next(). The generator also picks up at line 5 with i = (yield num). However, now i is None, because you didn’t explicitly send a value.

What you’ve created here is a coroutine, or a generator function into which you can pass data. These are useful for constructing data pipelines, but as you’ll see soon, they aren’t necessary for building them. (If you’re looking to dive deeper, then this course on coroutines and concurrency is one of the most comprehensive treatments available.)

Now that you’ve learned about .send(), let’s take a look at .throw().

How to Use .throw()

.throw() allows you to throw exceptions with the generator. In the below example, you raise the exception in line 6. This code will throw a ValueError once digits reaches 5:

 1 pal_gen=infinite_palindromes() 2 foriinpal_gen: 3 print(i) 4 digits=len(str(i)) 5 ifdigits==5: 6 pal_gen.throw(ValueError("We don't like large palindromes")) 7 pal_gen.send(10**(digits))

This is the same as the previous code, but now you’ll check if digits is equal to 5. If so, then you’ll .throw() a ValueError. To confirm that this works as expected, take a look at the code’s output:

>>>
11111111110101Traceback (most recent call last):
  File "advanced_gen.py", line 47, in <module>main()
  File "advanced_gen.py", line 41, in mainpal_gen.throw(ValueError("We don't like large palindromes"))
  File "advanced_gen.py", line 26, in infinite_palindromesi=(yieldnum)ValueError: We don't like large palindromes

.throw() is useful in any areas where you might need to catch an exception. In this example, you used .throw() to control when you stopped iterating through the generator. You can do this more elegantly with .close().

How to Use .close()

As its name implies, .close() allows you to stop a generator. This can be especially handy when controlling an infinite sequence generator. Let’s update the code above by changing .throw() to .close() to stop the iteration:

 1 pal_gen=infinite_palindromes() 2 foriinpal_gen: 3 print(i) 4 digits=len(str(i)) 5 ifdigits==5: 6 pal_gen.close() 7 pal_gen.send(10**(digits))

Instead of calling .throw(), you use .close() in line 6. The advantage of using .close() is that it raises StopIteration, an exception used to signal the end of a finite iterator:

>>>
11111111110101Traceback (most recent call last):
  File "advanced_gen.py", line 46, in <module>main()
  File "advanced_gen.py", line 42, in mainpal_gen.send(10**(digits))StopIteration

Now that you’ve learned more about the special methods that come with generators, let’s talk about using generators to build data pipelines.

Creating Data Pipelines With Generators

Data pipelines allow you to string together code to process large datasets or streams of data without maxing out your machine’s memory. Imagine that you have a large CSV file:

permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,rounddigg,Digg,60,web,SanFrancisco,CA,1-Dec-06,8500000,USD,bdigg,Digg,60,web,SanFrancisco,CA,1-Oct-05,2800000,USD,afacebook,Facebook,450,web,PaloAlto,CA,1-Sep-04,500000,USD,angelfacebook,Facebook,450,web,PaloAlto,CA,1-May-05,12700000,USD,aphotobucket,Photobucket,60,web,PaloAlto,CA,1-Mar-05,3000000,USD,a

This example is pulled from the TechCrunch Continental USA set, which describes funding rounds and dollar amounts for various startups based in the USA. Click the link below to download the dataset:

Download Dataset:Click here to download the dataset you'll use in this tutorial to learn about generators and yield in Python.

It’s time to do some processing in Python! To demonstrate how to build pipelines with generators, you’re going to analyze this file to get the total and average of all series A rounds in the dataset.

Let’s think of a strategy:

  1. Read every line of the file.
  2. Split each line into a list of values.
  3. Extract the column names.
  4. Use the column names and lists to create a dictionary.
  5. Filter out the rounds you aren’t interested in.
  6. Calculate the total and average values for the rounds you are interested in.

Normally, you can do this with a package like pandas, but you can also achieve this functionality with just a few generators. You’ll start by reading each line from the file with a generator expression:

 1 file_name="techcrunch.csv" 2 lines=(lineforlineinopen(file_name))

Then, you’ll use another generator expression in concert with the previous one to split each line into a list:

 3 list_line=(s.rstrip().split(",")forsinlines)

Here, you created the generator list_line, which iterates through the first generator lines. This is a common pattern to use when designing generator pipelines. Next, you’ll pull the column names out of techcrunch.csv. Since the column names tend to make up the first line in a CSV file, you can grab that with a short next() call:

 4 cols=next(list_line)

This call to next() advances the iterator over the list_line generator one time. Put it all together, and your code should look something like this:

 1 file_name="techcrunch.csv" 2 lines=(lineforlineinopen(file_name)) 3 list_line=(s.rstrip().split(",")forsinlines) 4 cols=next(list_line)

To sum this up, you first create a generator expression lines to yield each line in a file. Next, you iterate through that generator within the definition of another generator expression called list_line, which turns each line into a list of values. Then, you advance the iteration of list_line just once with next() to get a list of the column names from your CSV file.

Note: Watch out for trailing newlines! This code takes advantage of .rstrip() in the list_line generator expression to make sure there are no trailing newline characters, which can be present in CSV files.

To help you filter and perform operations on the data, you’ll create dictionaries where the keys are the column names from the CSV:

 5 company_dicts=(dict(zip(cols,data))fordatainlist_line)

This generator expression iterates through the lists produced by list_line. Then, it uses zip() and dict() to create the dictionary as specified above. Now, you’ll use a fourth generator to filter the funding round you want and pull raisedAmt as well:

 6 funding=( 7 int(company_dict["raisedAmt"]) 8 forcompany_dictincompany_dicts 9 ifcompany_dict["round"]=="a"10 )

In this code snippet, your generator expression iterates through the results of company_dicts and takes the raisedAmt for any company_dict where the round key is A.

Remember, you aren’t iterating through all these at once in the generator expression. In fact, you aren’t iterating through anything until you actually use a for loop or a function that works on iterables, like sum(). In fact, call sum() now to iterate through the generators:

11 total_series_a=sum(funding)

Putting this all together, you’ll produce the following script:

 1 file_name="techcrunch.csv" 2 lines=(lineforlineinopen(file_name)) 3 list_line=(s.rstrip()split(",")forsinlines) 4 cols=next(list_line) 5 company_dicts=(dict(zip(cols,data))fordatainlist_line) 6 funding=( 7 int(company_dict["raisedAmt"]) 8 forcompany_dictincompany_dicts 9 ifcompany_dict["round"]=="A"10 )11 total_series_a=sum(funding)12 print(f"Total series A fundraising: ${total_series_a}")

This script pulls together every generator you’ve built, and they all function as one big data pipeline. Here’s a line by line breakdown:

  • Line 2 reads in each line of the file.
  • Line 3 splits each line into values and puts the values into a list.
  • Line 4 uses next() to store the column names in a list.
  • Line 5 creates dictionaries and unites them with a zip() call:
    • The keys are the column names cols from line 4.
    • The values are the rows in list form, created in line 3.
  • Line 6 gets each company’s series A funding amounts. It also filters out any other raised amount.
  • Line 11 begins the iteration process by calling sum() to get the total amount of series A funding found in the CSV.

When you run this code on techcrunch.csv, you should find a total of $4,376,015,000 raised in series A funding rounds.

Note: The methods for handling CSV files developed in this tutorial are important for understanding how to use generators and the Python yield statement. However, when you work with CSV files in Python, you should instead use the csv module included in Python’s standard library. This module has optimized methods for handling CSV files efficiently.

To dig even deeper, try figuring out the average amount raised per company in a series A round. This is a bit trickier, so here are some hints:

  • Generators exhaust themselves after being iterated over fully.
  • You will still need the sum() function.

Good luck!

Conclusion

In this tutorial, you’ve learned about generator functions and generator expressions.

You now know:

  • How to use and write generator functions and generator expressions
  • How the all-important Python yield statement enables generators
  • How to use multiple Python yield statements in a generator function
  • How to use .send() to send data to a generator
  • How to use .throw() to raise generator exceptions
  • How to use .close() to stop a generator’s iteration
  • How to build a generator pipeline to efficiently process large CSV files

You can get the dataset you used in this tutorial at the link below:

Download Dataset:Click here to download the dataset you'll use in this tutorial to learn about generators and yield in Python.

How have generators helped you in your work or projects? If you’re just learning about them, then how do you plan to use them in the future? Did you find a good solution to the data pipeline problem? Let us know in the comments below!


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Python Bytes: #149 Python's small object allocator and other memory features

Talk Python to Me: #231 Advice for freelancing with Python

$
0
0
Have you ever wanted to get into consulting? Maybe you're seeking the freedom to work on whatever project you'd like or gain more control of your time.
Viewing all 22863 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>