PyPy Development: PyPy JIT for Aarch64

July 25, 2019, 8:41 am

≫ Next: PSF GSoC students blogs: Weekly Check-In #7

≪ Previous: PSF GSoC students blogs: Sphinx Internationalization....

Hello everyone.

We are pleased to announce the availability of the new PyPy for AArch64. This port brings PyPy's high-performance just-in-time compiler to the AArch64 platform, also known as 64-bit ARM. With the addition of AArch64, PyPy now supports a total of 6 architectures: x86 (32 & 64bit), ARM (32 & 64bit), PPC64, and s390x. The AArch64 work was funded by ARM Holdings Ltd. and Crossbar.io.

PyPy has a good record of boosting the performance of Python programs on the existing platforms. To show how well the new PyPy port performs, we compare the performance of PyPy against CPython on a set of benchmarks. As a point of comparison, we include the results of PyPy on x86_64.

Note, however, that the results presented here were measured on a Graviton A1 machine from AWS, which comes with a very serious word of warning: Graviton A1's are virtual machines, and, as such, they are not suitable for benchmarking. If someone has access to a beefy enough (16G) ARM64 server and is willing to give us access to it, we are happy to redo the benchmarks on a real machine. One major concern is that while a virtual CPU is 1-to-1 with a real CPU, it is not clear to us how CPU caches are shared across virtual CPUs. Also, note that by no means is this benchmark suite representative enough to average the results. Read the numbers individually per benchmark.

The following graph shows the speedups on AArch64 of PyPy (hg id 2417f925ce94) compared to CPython (2.7.15), as well as the speedups on a x86_64 Linux laptop comparing the most recent release, PyPy 7.1.1, to CPython 2.7.16.

In the majority of benchmarks, the speedups achieved on AArch64 match those achieved on the x86_64 laptop. Over CPython, PyPy on AArch64 achieves speedups between 0.6x to 44.9x. These speedups are comparable to x86_64, where the numbers are between 0.6x and 58.9x.

The next graph compares between the speedups achieved on AArch64 to the speedups achieved on x86_64, i.e., how great the speedup is on AArch64 vs. the same benchmark on x86_64. This comparison should give a rough idea about the quality of the generated code for the new platform.

Note that we see a large variance: There are generally three groups of benchmarks - those that run at more or less the same speed, those that run at 2x the speed, and those that run at 0.5x the speed of x86_64.

The variance and disparity are likely related to a variety of issues, mostly due to differences in architecture. What is however interesting is that, compared to measurements performed on older ARM boards, the branch predictor on the Graviton A1 machine appears to have improved. As a result, the speedups achieved by PyPy over CPython are smaller than on older ARM boards: sufficiently branchy code, like CPython itself, simply runs a lot faster. Hence, the advantage of the non-branchy code generated by PyPy's just-in-time compiler is smaller.

One takeaway here is that many possible improvements for PyPy have yet to be implemented. This is true for both of the above platforms, but probably more so for AArch64, which comes with a large number of CPU registers. The PyPy backend was written with x86 (the 32-bit variant) in mind, which has a really low number of registers. We think that we can improve in the area of emitting more modern machine code, which may have a higher impact on AArch64 than on x86_64. There is also a number of missing features in the AArch64 backend. These features are currently implemented as expensive function calls instead of inlined native instructions, something we intend to improve.

Best,

Maciej Fijalkowski, Armin Rigo and the PyPy team

↧

PSF GSoC students blogs: Weekly Check-In #7

July 25, 2019, 9:03 am

≫ Next: Python Circle: Python Script 1: Convert ebooks from epub to mobi format

≪ Previous: PyPy Development: PyPy JIT for Aarch64

We have come to almost the end of the second phase. We are now spending time on the last few features that needs to be implemented and also the bugs that have come up in the recent past.

First of all we fixed bugs as usual. One of them was giving permissions to the users for adding notification blocks to their pages. We had implemented before GSoC even started but forgot to give permissions to any new user. Also, we were having issues tracking different types of blogs that the students were supposed to post. The system considered all the blog posts to be of the same type, whereas there are two of them. We implemented this into the system so that the students are notified accordingly.

Our system now automatically archives the Github Pages at the end of every GSoC, yaay for the maintainers! Oh and finally, I added the support for codeblock in the blogs (this was really necessary :p). The suborg forms also have been improved now as some of the fields get disabled conditionally, so overall a better UX for users filling out the form.

Will be doing more stuff like this in the next week too!

↧

Python Circle: Python Script 1: Convert ebooks from epub to mobi format

July 25, 2019, 9:45 am

≫ Next: Python Circle: try .. except .. else .. in python with example

≪ Previous: PSF GSoC students blogs: Weekly Check-In #7

Python script to convert the ebooks from one format to another in bulk, Automated book conversion to kindle format, Free kindle ebook format conversion, automating the book format conversion, python code to book format convert,

↧

Python Circle: try .. except .. else .. in python with example

July 25, 2019, 9:45 am

≫ Next: Catalin George Festila: Python 3.7.3 : Using the flask - part 002.

≪ Previous: Python Circle: Python Script 1: Convert ebooks from epub to mobi format

how to use else clause with try except in python, when to use else clause with try except in python, try except else finally clauses in python, try except else example in python,

↧

Catalin George Festila: Python 3.7.3 : Using the flask - part 002.

July 25, 2019, 4:37 am

≫ Next: PSF GSoC students blogs: We are almost there now!

≪ Previous: Python Circle: try .. except .. else .. in python with example

Let's see some tips for starting any project with flask python module. Use these python modules to work with databases: flask-sqlalchemy and flask_marshmallow. The Flask-SQLAlchemy is an extension for Flask that adds support for SQLAlchemy to your application. The marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, to and from native Python

↧

PSF GSoC students blogs: We are almost there now!

July 25, 2019, 9:46 am

≫ Next: PSF GSoC students blogs: Weekly Check-In #8

≪ Previous: Catalin George Festila: Python 3.7.3 : Using the flask - part 002.

Yes, we have almost finished implementing all the functionalities! It feels kinda sad that GSoC is going to end in a matter of weeks now. But nevertheless, this experience has been an enriching one for me. I am grateful to my mentor for his wonderful support throughout this. He has guided me through the whole development phase and helped me to be focused on the timeline and to keep a track of the bugs that come up.

What have I learnt from this till now? Well, the fact that software engineering is not an easy job! Especially when a couple of developers build a software it's easy to miss out a lot of cases which will surely come up as bugs. And even this has happened in our case, a new bug comes up while trying to fix an older one. So yeah, we need to keep an eye out for bugs, it's not always possible to build everything perfectly from the beginning.

Also, there has been times when I had a vision of building something (maybe implementing a feature) with utmost perfection and I was not being able to do that. Thanks to my mentor for pointing out that the vision I had was not something feasible and that we need to approach it in a different way. This was an important lesson for me as to what might seem to be perfect might not be perfect at all because it is not feasible.

And also a full testing is coming up now! I have never done this before and thus am looking forward to it very much. It will be a totally new experience and I hope it will be enjoyable. I guess more bugs will come up now :p.

↧

PSF GSoC students blogs: Weekly Check-In #8

July 25, 2019, 10:45 am

≫ Next: PSF GSoC students blogs: SEO 101: Creating custom sitemap using django

≪ Previous: PSF GSoC students blogs: We are almost there now!

A lot of fixes and new features came up this week. It was kind of a busy week.

We mostly worked on Search Engine Optimization (yes, SEO). We didn't really change any content but added appropriate meta description tags, created a uniform title format for both python-gsoc.org and blogs.python-gsoc.org. Also, we used services from search engines (yes, Google) to analyze our site for both PC and mobile performance. Some content was weird for mobiles, so we fixed that. We also created proper sitemaps for both the sites.

Except this, we also created a Send Email admin panel to send customized emails. It's like a small email client which can be used to sent emails. This will be used for sending emails to mentors and suborg admins or maybe some students when needed. It also has the feature of sending emails to groups like students, mentors, etc. The blog editor now also has options to upload image and attachments just by dragging and dropping. Yes, pretty useful and a cool feature right? :p

I'm gonna finish the leftover issues as fast as possible and get to testing!

↧

PSF GSoC students blogs: SEO 101: Creating custom sitemap using django

July 25, 2019, 11:01 am

≫ Next: PSF GSoC students blogs: Embed Google Calendar like a Pro

≪ Previous: PSF GSoC students blogs: Weekly Check-In #8

I must agree I was pretty impatient to use the codesnippet plugin on this editor.

print("tada!")

Well, a couple of days back I was trying to find a way to create sitemaps in django. One way is to use a crawler to crawl all your pages and generate the sitemap. But that's a bad solution as you need to manually update it everytime. We are looking for something more dynamic. Thanks to django, it already has a base class which helps in generation of sitemaps.

Let's start with the urls.py. We add the url for sitemap.xml and add the sitemaps object specifying which sitemap class to use. We can add multiple sitemaps together, and django will render it together for you.

from myblog import sitemaps

urlpatterns = [
    url(r'^sitemap.xml', sitemap, {
      'sitemaps': {
        'blogs': sitemaps.BlogSitemap
      }
    }
]

Then let's go on to sitemaps.py

import urllib.parse

from django.contrib.sitemaps import Sitemap
from django.conf import settings

from aldryn_newsblog.cms_appconfig import NewsBlogConfig
from aldryn_newsblog.models import Article

from cms.models import Page


class BlogSitemap(Sitemap):
    priority = 0.5

    def items(self):
        urls = ['/']
        blogs = NewsBlogConfig.objects.all()
        for blog in blogs:
            p = Page.objects.get(application_namespace=blog.namespace, publisher_is_draft=False)
            urls.append(p.get_absolute_url())
            articles = Article.objects.filter(app_config=blog).all()
            for i in range(len(articles) // 5):
                urls.append(f'{p.get_absolute_url()}?page={i + 2}')
            for article in articles:
                urls.append(f'{p.get_absolute_url()}{article.slug}/')
        return urls

    def location(self, obj):
        return obj

Generally items() should return an iterable of objects which will have a method called get_absolute_url() which will give the url, but django gives you the option to specify their locations too using the loaction(item) method. We are using this cleverly to just return a list of urls in the items method and then returning the item in the location method as the item is nothing but a string which is the url. In this way we can include pages like /blog-name/?page=2 which aren't returned by any object's get_absolute_url() method.

↧

PSF GSoC students blogs: Embed Google Calendar like a Pro

July 25, 2019, 11:39 am

≫ Next: Wingware News: Wing Python IDE 7.1 - July 25, 2019

≪ Previous: PSF GSoC students blogs: SEO 101: Creating custom sitemap using django

Have you ever tried to embed a Google Calendar in your site only to find it's not responsive for mobiles and you need to scroll left and right? Well, I'm gonna share a simple solution which will work.

There's an agenda mode in google calendar which seems to fit properly in a mobile screen, so how about changing it to the agenda mode when the screen size is smaller? Simple HTML and CSS will be enough for this.

<style>
 @media (max-width: 550px) {
     .big-container {
         display: none;
     }
 }
 @media (min-width: 550px) {
     .small-container {
         display: none;
     }
 }
 /* Responsive iFrame */
 .responsive-iframe-container {
     position: relative;
     padding-bottom: 56.25%;
     padding-top: 30px;
     height: 0;
     overflow: hidden;
 }
 .responsive-iframe-container iframe,   
 .vresponsive-iframe-container object,  
 .vresponsive-iframe-container embed {
     position: absolute;
     top: 0;
     left: 0;
     width: 100%;
     height: 100%;
 }
 </style>

<div class="responsive-iframe-container big-container">
  <iframe id="cal1" src="null" style="border-width:0" width="800" height="600" frameborder="0" scrolling="no"></iframe>
</div>
<div class="responsive-iframe-container small-container">
  <iframe id="cal2" src="null" style="border-width:0" width="800" height="600" frameborder="0" scrolling="no"></iframe>
</div>

This is be good enough for a responsive calendar. Want another pro tip? How about changing timezones of the calendar according to the client timezone?

Well we will need a bit of js for getting the client's timezone.

var offset = new Date().getTimezoneOffset();

This line of code gets the offset integer in minutes, so -330 for +05:30 timezone and so on. Then we somehow need to convert this to the region name as google calendar link takes an argument named ctz for that. Include the moment.min.js and the moment-timezone-with-data-10-year-range.min.js. These libraries will help us get the region name.

String.prototype.replaceAll = function(search, replacement) {
    var target = this;
    return target.split(search).join(replacement);
};
var timezone = moment.tz.guess(offset).replaceAll('/', '%2F')

Now you just need to add a new argument to the url.

newUrl = `${oldUrl}?ctz=${timezone}`
document.getElementById('calendar').src = newUrl;

That's it guys and you will be good to go.

Reference: https://answers.squarespace.com/questions/54774/how-to-embed-a-google-calendar-in-a-responsive-way.html

↧

Wingware News: Wing Python IDE 7.1 - July 25, 2019

July 24, 2019, 6:00 pm

≫ Next: PSF GSoC students blogs: Weekly Check-in #7 & #8

≪ Previous: PSF GSoC students blogs: Embed Google Calendar like a Pro

Wing 7.1 adds support for Python 3.8, warns about unused symbols, improves code warnings configuration, adds new auto-completer, project, and source browser icons, supports Dark Mode on OS X, and makes other improvements.

Download Wing 7.1 Now:Wing Pro | Wing Personal | Wing 101 | Compare Products

Some Highlights of Wing 7.1

Support for Python 3.8

Wing 7.1 supports editing, testing, and debugging code written for Python 3.8, so you can take advantage of assignment expressions and other improvements introduced in this new version of Python.

Improved Code Warnings

Wing 7.1 adds unused symbol warnings for imports, variables, and arguments found in Python code. This release also improves code warnings configuration, making it easier to disable unwanted warnings.

Cosmetic Improvements

Wing 7.1 improves the auto-completer, project tool, and code browser with redesigned icons that make use of Wing's icon color configuration. This release also improves text display on some Linux systems, supports Dark Mode on macOS, and improves display of Python code and icons found in documentation.

And More

Wing 7.1 also adds support for Windows 10 native OpenSSH installations for remote development, and makes a number of other minor improvements. This release drops support for macOS 10.11. System requirements remain unchanged on Windows and Linux.

For details see the change log.

For a complete list of new features in Wing 7, see What's New in Wing 7.

Try Wing 7.1 Now!

Wing 7.1 is an exciting new step for Wingware's Python IDE product line. Find out how Wing 7.1 can turbocharge your Python development by trying it today.

Downloads:Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 6 and earlier, and Migrating from Older Versions for a list of compatibility notes.

↧

PSF GSoC students blogs: Weekly Check-in #7 & #8

July 25, 2019, 1:36 pm

≫ Next: Codementor: map, filter and reduce functions in Python

≪ Previous: Wingware News: Wing Python IDE 7.1 - July 25, 2019

Hello!

I have been quite busy these past two weeks. I have been working on fleshing out a design for Panda's multitouch support, along with trying to get a deployment pipeline working on iOS. I got Travis CI working on the panda3d-thirdparty repository, so now anyone can go download and build the thirdparty packages for iOS. Once I'm done wrapping up the initial deployment pipeline, I am going to package up some of the sample apps for people to try out!

↧

Codementor: map, filter and reduce functions in Python

July 25, 2019, 10:18 pm

≫ Next: Stefan Behnel: Faster XML stream processing in Python

≪ Previous: PSF GSoC students blogs: Weekly Check-in #7 & #8

Learn what are map(), filter() and reduce() functions in Python. Also know how to use them with lambda and user-defined functions and along with each other.

↧

Stefan Behnel: Faster XML stream processing in Python

July 26, 2019, 8:36 am

≫ Next: ListenData: Importing Data into Python

≪ Previous: Codementor: map, filter and reduce functions in Python

It's been a while since I last wrote something about processing XML, specifically about finding something in XML. Recently, I read a blog post by Eli Bendersky about faster XML processing in Go, and he was comparing it to iterparse() in Python's ElementTree and lxml. Basically, all he said about lxml is that it performs more or less like ElementTree, so he concentrated on the latter (and on C and Go). That's not wrong to say, but it also doesn't help much. lxml has much more fine-grained tools for processing XML, so here's a reply.

I didn't have the exact same XML input file that Eli used, but I used the same (deterministic, IIUC) tool for generating one, running xmlgen -f2-o bench.xml. That resulted in a 223MiB XML file of the same structure that Eli used, thus probably almost the same as his.

Let's start with the original implementation:

importsysimportxml.etree.ElementTreeasETcount=0forevent,eleminET.iterparse(sys.argv[1],events=("end",)):ifevent=="end":ifelem.tag=='location'andelem.textand'Africa'inelem.text:count+=1elem.clear()print('count =',count)

The code parses the XML file, searches for location tags, and counts those that contain the word Africa.

Running this under time with ElementTree in CPython 3.6.8 (Ubuntu 18.04) shows:

count = 92
4.79user 0.08system 0:04.88elapsed 99%CPU (0avgtext+0avgdata 14828maxresident)k

We can switch to lxml (4.3.4) by changing the import to import lxml.etree as ET:

count = 92
4.58user 0.08system 0:04.67elapsed 99%CPU (0avgtext+0avgdata 23060maxresident)k

You can see that it uses somewhat more memory overall (~23MiB), but runs just a little faster, not even 5%. Both are roughly comparable.

For comparison, the base line memory usage of doing nothing but importing ElementTree versus lxml is:

$ time python3.6 -c 'import xml.etree.ElementTree'0.08user 0.01system 0:00.09elapsed 96%CPU (0avgtext+0avgdata 9892maxresident)k
0inputs+0outputs (0major+1202minor)pagefaults 0swaps
$ time python3.6 -c 'import lxml.etree'0.07user 0.01system 0:00.09elapsed 96%CPU (0avgtext+0avgdata 15264maxresident)k
0inputs+0outputs (0major+1742minor)pagefaults 0swaps

Back to our task at hand. As you may know, global variables in Python are more costly than local variables, and as you certainly know, global module code is badly testable. So, let's start with something obvious that we would always do in Python: write a function.

importsysimportlxml.etreeasETdefcount_locations(file_path,match):count=0forevent,eleminET.iterparse(file_path,events=("end",)):ifevent=="end":ifelem.tag=='location'andelem.textand'Africa'inelem.text:count+=1elem.clear()count=count_locations(sys.args[1],'Africa')print('count =',count)

count = 92
4.39user 0.06system 0:04.46elapsed 99%CPU (0avgtext+0avgdata 23264maxresident)k

Another thing we can see is that we're explicitly asking for only end events, and then check if the event we got is an end event. That's redundant. Removing this line yields:

count = 92
4.24user 0.06system 0:04.31elapsed 99%CPU (0avgtext+0avgdata 23264maxresident)k

Ok, another tiny improvement. We won a couple of percent, although not really worth mentioning. Now let's see what lxml's API can do for us.

First, let's look at the structure of the XML file. Nicely, the xmlgen tool has a mode for generating an indented version of the same file, which makes it easier to investigate. Here's the start of the indented version of the file (note that we are always parsing the smaller version of the file, which contains newlines but no indentation):

<?xml version="1.0" standalone="yes"?><site><regions><africa><itemid="item0"><location>United States</location><quantity>1</quantity><name>duteous nine eighteen </name><payment>Creditcard</payment><description><parlist><listitem><text>…
…

The root tag is site, which then contains regions (apparently one per continent), then a series of item elements, which contain the location. In a real data file, it would probably be enough to only look at the africa region when looking for Africa as a location, but a) this is (pseudo-)randomly generated data, b) even "real" data isn't always clean, and c) a location "Africa" actually seems weird when the region is already africa…

Anyway. Let's assume we have to look through all regions to get a correct count. But given the structure of the item tag, we can simply select the location elements and do the following in lxml:

defcount_locations(file_path,match):count=0forevent,eleminET.iterparse(file_path,events=("end",),tag='location'):ifelem.textandmatchinelem.text:count+=1elem.clear()returncount

count = 92
3.06user 0.62system 0:03.68elapsed 99%CPU (0avgtext+0avgdata 1529292maxresident)k

That's a lot faster. But what happened to the memory? 1.5 GB? We used to be able to process the whole file with only 23 MiB peak!

The reason is that the loop now only runs for location elements, and everything else is only handled internally by the parser – and the parser builds an in-memory XML tree for us. The elem.clear() call, that we previously used for deleting used parts of that tree, is now only executed for the location, a pure text tag, and thus cleans up almost nothing. We need to take care to clean up more again, so let's intercept on the item and look for the location from there.

defcount_locations(file_path,match):count=0for_,eleminET.iterparse(file_path,events=("end",),tag='item'):text=elem.findtext('location')iftextandmatchintext:count+=1elem.clear()returncount

count = 92
3.11user 0.37system 0:03.50elapsed 99%CPU (0avgtext+0avgdata 994280maxresident)k

Ok, almost as fast, but still – 1 GB of memory? Why doesn't the cleanup work? Let's look at the file structure some more.

$ egrep -n '^(  )?<' bench_pp.xml
1:<?xml version="1.0" standalone="yes"?>
2:<site>
3:  <regions>
2753228:  </regions>
2753229:  <categories>
2822179:  </categories>
2822180:  <catgraph>
2824181:  </catgraph>
2824182:  <people>
3614042:  </people>
3614043:  <open_auctions>
5520437:  </open_auctions>
5520438:  <closed_auctions>
6401794:  </closed_auctions>
6401795:</site>

Ah, so there is actually much more data in there that is completely irrelevant for our task! All we really need to look at is the first ~2.7 million lines that contain the regions data. The entire second half of the file is useless, and simply generates heaps of data that our cleanup code does not handle. Let's make use of that learning in our code. We can intercept on both the item and the regions tags, and stop as soon as the regions data section ends.

defcount_locations(file_path,match):count=0for_,eleminET.iterparse(file_path,events=("end",),tag=('item','regions')):ifelem.tag=='regions':breaktext=elem.findtext('location')iftextandmatchintext:count+=1elem.clear()returncount

count = 92
1.22user 0.04system 0:01.27elapsed 99%CPU (0avgtext+0avgdata 22048maxresident)k

That's great! We're actually using less memory than in the beginning now, and managed to cut down the runtime from 4.6 seconds to 1.2 seconds. That's almost a factor of 4!

Let's try one more thing. We are already intercepting on two tag names, and then searching for a third one. Why not intercept on all three directly?

defcount_locations(file_path,match):count=0for_,eleminET.iterparse(file_path,events=("end",),tag=('item','location','regions')):ifelem.tag=='location':text=elem.textiftextandmatchintext:count+=1elifelem.tag=='regions':breakelse:elem.clear()returncount

count = 92
1.10user 0.03system 0:01.13elapsed 99%CPU (0avgtext+0avgdata 21912maxresident)k

Nice. Another bit faster, and another bit less memory used.

Anything else we can do? Yes. We can tune the parser a little more. Since we're only interested in the non-empty text content inside of tags, we can ignore all newlines that appear in our input file between the tags. lxml's parser has an option for removing such blank text, which avoids creating an in-memory representation for it.

defcount_locations(file_path,match):count=0for_,eleminET.iterparse(file_path,events=("end",),tag=('item','location','regions'),remove_blank_text=True):ifelem.tag=='location':text=elem.textiftextandmatchintext:count+=1elifelem.tag=='regions':breakelse:elem.clear()returncount

count = 92
0.97user 0.02system 0:01.00elapsed 99%CPU (0avgtext+0avgdata 21928maxresident)k

While the overall memory usage didn't change, the avoided processing time for creating the useless text nodes and cleaning them up from memory is quite visible.

Overall, algorithmically improving our code and making better use of lxml's features gave us a speedup from initially 4.6 seconds down to one second. And we paid for that improvement with 4 additional lines of code inside our function. That's only half of the code which Eli's SAX based Go implementation needs (which, mind you, does not build an in-memory tree for you at all). And the Go code is only slightly faster than the initial Python implementations that we started from. Way to go! ;-)

Speaking of SAX, lxml also has a SAX interface. So let's compare how that performs.

importsysimportlxml.etreeasETclassDone(Exception):passclassSaxCounter:in_location=Falsedef__init__(self,match):self.count=0self.match=matchself.text=[]self.data=self.text.appenddefstart(self,tag,attribs):self.is_location=tag=='location'delself.text[:]defend(self,tag):iftag=='location':ifself.textandself.matchin''.join(self.text):self.count+=1eliftag=='regions':raiseDone()defclose(self):passdefcount_locations(file_path,match):target=SaxCounter(match)parser=ET.XMLParser(target=target)try:ET.parse(file_path,parser=parser)exceptDone:passreturntarget.countcount=count_locations(sys.argv[1],'Africa')print('count =',count)

count = 92
1.23user 0.02system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 16060maxresident)k

And the exact same code works in ElementTree if you change the import again:

count = 92
1.83user 0.02system 0:01.85elapsed 99%CPU (0avgtext+0avgdata 10280maxresident)k

Also, removing the regions check from the end() SAX method above, thus reading the entire file, yields this for lxml:

count = 92
3.22user 0.04system 0:03.27elapsed 99%CPU (0avgtext+0avgdata 15932maxresident)k

and this for ElementTree:

count = 92
4.72user 0.07system 0:04.79elapsed 99%CPU (0avgtext+0avgdata 10300maxresident)k

Seeing the numbers in comparison to iterparse(), it does not seem worth the complexity, unless the memory usage is really, really pressing.

A final note: here's the improved ElementTree iterparse() implementation that also avoids parsing useless data.

importsysimportxml.etree.ElementTreeasETdefcount_locations(file_path,match):count=0forevent,eleminET.iterparse(file_path,events=("end",)):ifelem.tag=='location':ifelem.textandmatchinelem.text:count+=1elifelem.tag=='regions':breakelem.clear()returncountcount=count_locations(sys.argv[1],'Africa')print('count =',count)

count = 92
1.71user 0.02system 0:01.74elapsed 99%CPU (0avgtext+0avgdata 11876maxresident)k

And while not as fast as the lxml version, it still runs considerably faster than the original implementation. And uses less memory.

Learnings to take away:

Say what you want.
Stop when you have it.

↧

ListenData: Importing Data into Python

July 26, 2019, 5:11 am

≫ Next: Stack Abuse: Creating Python GUI Applications with wxPython

≪ Previous: Stefan Behnel: Faster XML stream processing in Python

This tutorial explains various methods to read data into Python. Data can be in any of the popular formats - CSV, TXT, XLS/XLSX (Excel), sas7bdat (SAS), Stata, Rdata (R) etc. Loading data in python environment is the most initial step of analyzing data.

Import Data into Python

While importing external files, we need to check the following points -

Check whether header row exists or not
Treatment of special values as missing values
Consistent data type in a variable (column)
Date Type variable in consistent date format.
No truncation of rows while reading external data

Table of Contents

Install and Load pandas Package

pandas is a powerful data analysis package. It makes data exploration and manipulation easy. It has several functions to read data from various sources.
If you are using Anaconda, pandas must be already installed. You need to load the package by using the following command -

import pandas as pd

If pandas package is not installed, you can install it by running the following code in Ipython Console. If you are using Spyder, you can submit the following code in Ipython console within Spyder.

!pip install pandas

If you are using Anaconda, you can try the following line of code to install pandas -

!conda install pandas

1. Import CSV files

It is important to note that a singlebackslash does not work when specifying the file path. You need to either change it to forward slash or add one more backslash like below

import pandas as pd
mydata= pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv")

If no header (title) in raw data file

mydata1 = pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv", header = None)

You need to include header = None option to tell Python there is no column name (header) in data.

Add Column Names

We can include column names by using names= option.

mydata2 = pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv", header = None, names = ['ID', 'first_name', 'salary'])

The variable names can also be added separately by using the following command.

mydata1.columns = ['ID', 'first_name', 'salary']

Detailed Explanation : Import CSV File in Python

2. Import File from URL

You don't need to perform additional steps to fetch data from URL. Simply put URL in read_csv() function (applicable only for CSV files stored in URL).

mydata = pd.read_csv("http://winterolympicsmedals.com/medals.csv")

3. Read Text File

We can use read_table() function to pull data from text file. We can also use read_csv() with sep= "\t" to read data from tab-separated file.

mydata = pd.read_table("C:\\Users\\Deepanshu\\Desktop\\example2.txt")
mydata = pd.read_csv("C:\\Users\\Deepanshu\\Desktop\\example2.txt", sep ="\t")

↧

Stack Abuse: Creating Python GUI Applications with wxPython

July 26, 2019, 10:01 am

≫ Next: PSF GSoC students blogs: Week 8

≪ Previous: ListenData: Importing Data into Python

Introduction

In this tutorial, we're going to learn how to use wxPython library for developing Graphical User Interfaces (GUI) for desktop applications in Python. GUI is the part of your application which allows the user to interact with your application without having to type in commands, they can do pretty much everything with a click of the mouse.

Some of the popular Python alternatives for developing a GUI include Tkinter, and pyqt. However, in this tutorial, we will learn about wxPython.

Before we move further, there are a few prerequisites for this tutorial. You should have a basic understanding of Python's syntax, and/or have done at least beginner level programming in some other language. Although you can follow it even if you do not meet these criterias, but you might find some parts to be a bit complex. If you do, feel free to ask for clarifications in the comments.

Installation

The installation process for wxPython is fairly straight forward, although it differs slightly depending on the system you're using.

Mac and Windows

WxPython is quite easy to install on Mac and Windows using pip package manager. If you have pip installed in your system, run the following command to download to install wxPython:

$ pip install wxpython

Linux

For Linux, the procedure could be a bit of a pain, as it has a lot of prerequisite libraries that need to be installed. I would recommend to try running the following two commands in a sequence:

# Command 1
$ sudo apt-get install dpkg-dev build-essential python2.7-dev python3.5-dev python3.6-dev libgstreamer-plugins-base1.0-dev libnotify-dev libwebkitgtk-3.0-dev libwebkit-dev libwebkitgtk-dev libjpeg-dev libtiff-dev libgtk2.0-dev libsdl1.2-dev libgstreamer-plugins-base0.10-dev freeglut3 freeglut3-dev

# Command 2
$ pip install --upgrade --pre -f https://wxpython.org/Phoenix/snapshot-builds/ wxPython

However, if these do not work then you will have to manually install these libraries, a list of which is mentioned in the "Prerequisites" section of WxPython's Github repo.

Examples of Creating GUIs with wxPython

In this section, we will get our hands dirty with wxPython and create a basic string manipulation application with some basic functionalities, like counting the number of words, displaying the frequency of each word, most repeated word, etc.

Before moving forward, we will create a very simple skeleton application which we will use as a starting point in the upcoming examples to implement more advanced GUI functionalities.

Without further ado, let's start. Below is the basic skeleton or structure of a GUI application built using wxPython. We will change it further in the next section to make it object oriented for additional functionality.

import wx

# Creates an App object which runs a loop to display the
# GUI on the screen
myapp = wx.App()

# Initialises a frame that the user would be able to
# interact with
init_frame = wx.Frame(parent=None, title='Word Play')

# Display the initialised frame on screen
init_frame.Show()

# Run a loop on the app object
myapp.MainLoop()

If the loop is not run (i.e. the app.MainLoop() call), then the frame will appear on the screen for a split second, and even before you could see it, it will disappear. This function ensures that the frame remains visible on the screen, until the user exits the program, and it does so by running the frame in a loop.

Note: While running this on a Mac, I got the following error when I ran my code using python filename.py command in the terminal:

This program needs access to the screen. Please run with a Framework build of python, and only when you are logged in on the main display of your Mac.

To get rid of this, simply use pythonw instead of python in the above command.

Once the program runs, you should see the following blank window on your screen:

Object Oriented Code

Before we add functionality to our code, let's modularise it first by making classes and functions, so that it looks cleaner and its easier to extend it. The functionality of the following code is same as before, however, it has been refactored to implement object oriented programming concepts.

import wx
import operator

# We make a class for frame, so that each time we
# create a new frame, we can simply create a new
# object for it

class WordPlay(wx.Frame):
    def __init__(self, parent, title):
        super(WordPlay, self).__init__(parent, title=title)
        self.Show()

def main():
    myapp = wx.App()
    WordPlay(None, title='Word Play')
    myapp.MainLoop()

main()

In the script above, we create a class WordPlay that inherits the wxFrame class. The constructor of the WordPlay class accepts two parameters: parent and title. Inside the child constructor, the parent class construcor for the wxPython class is called and the parent and title attributes are passed to it. Finally the show method is called to display the frame. In the main() method, the object of WordPlay class is created.

The code now looks a lot more stuctured and cleaner; it is easier to understand and more functionalities can be seamlessly added to the above code.

Adding Functionalities

We will be adding functionalities one at a time in order to avoid confusion regarding which code part is added for which particular functionality. What we want in our basic application is a text box where we can add text, and then a few buttons to perform different functions on that text, like calculating the number of words in it, frequency of each word, etc., followed by the output being displayed on our app screen.

Let's start by adding a text box to our app in which we can add our text.

# Some of the code will be the same as the one above,
# so make sure that you understand that before moving
# to this part

import wx
import operator

# We make a class for frame, so that each time we create a new frame,
# we can simply create a new object for it

class WordPlay(wx.Frame):
    def __init__(self, parent, title):
        super(WordPlay, self).__init__(parent, title=title)
        self.widgets()
        self.Show()

    # Declare a function to add new buttons, icons, etc. to our app
    def widgets(self):
        text_box = wx.BoxSizer(wx.VERTICAL) # Vertical orientation

        self.textbox = wx.TextCtrl(self, style=wx.TE_RIGHT)
        text_box.Add(self.textbox, flag=wx.EXPAND | wx.TOP | wx.BOTTOM, border=5)

        grid = wx.GridSizer(5, 5, 10, 10) # Rows, columns, vertical gap, horizontal gap
        text_box.Add(grid, proportion=2, flag=wx.EXPAND)

        self.SetSizer(text_box)

def main():
    myapp = wx.App()
    WordPlay(None, title='Word Play')
    myapp.MainLoop()

main()

As you can see, we have added a new function named widgets() above, and it has been called in the WordPlay class's constructor. Its purpose is to add new widgets to our screen; however, in our case we are only interested in adding one widget, i.e. a text box where we can add some text.

Let's now understand some important things that are going on inside this widgets() function. The BoxSizer() method, as the name suggests, controls the widgets' size, as well as its position (relative or absolute). The wx.VERTICAL specifies that we want a vertical orientation for this widget. TextCtrl basically adds a small text box in our current from, where the user can enter in a text input. The GridSizer() method helps us create a table-like structure for our window.

Alright, let's see what our application looks like now.

A text box can now be seen in our application window.

Let's move further and add two buttons to our application, one for counting the number of words in the text, and the second to display the most repeated word. We will accomplish that in two steps, first we will add two new buttons, and then we will add event handlers to our program which will tell us which button the user has clicked, along with the text entered in the text box, so that a specific action can be performed on the input.

Adding buttons is rather simple, it only requires adding some additional code to our "widgets" function. In the code block below, we will only be displaying the updated widgets function; the rest of the code stays the same.

# Adding buttons to our main window

def widgets(self):
    text_box = wx.BoxSizer(wx.VERTICAL)

    self.textbox = wx.TextCtrl(self, style=wx.TE_RIGHT)
    text_box.Add(self.textbox, flag=wx.EXPAND | wx.TOP | wx.BOTTOM, border=5)

    grid = wx.GridSizer(2, 5, 5) # Values have changed to make adjustments to button positions
    button_list = ['Count Words', 'Most Repeated Word'] # List of button labels

    for lab in button_list:
        button = wx.Button(self, -1, lab) # Initialise a button object
        grid.Add(button, 0, wx.EXPAND) # Add a new button to the grid with the label from button_list

    text_box.Add(grid, proportion=2, flag=wx.EXPAND)

    self.SetSizer(text_box)

As you can see, two new buttons have now been added to our main window as well.

Adding an Event Handler

Our application's interface is now ready, all we need to do now is add event handlers to perform specific actions upon button clicks. For that we will have to create a new function and add an additional line of code in the widgets function. Let's start by writing our function.

# Declare an event handler function

def event_handler(self, event):
    # Get label of the button clicked
    btn_label = event.GetEventObject().GetLabel()

    # Get the text entered by user
    text_entered = self.textbox.GetValue()

    # Split the sentence into words
    words_list = text_entered.split()

    # Perform different actions based on different button clicks
    if btn_label == "Count Words":
        result = len(words_list)
    elif btn_label == "Most Repeated Word":
        # Declare an empty dictionary to store all words and
        # the number of times they occur in the text
        word_dict = {}

        for word in words_list:
            # Track count of each word in our dict
            if word in word_dict:
                word_dict[word] += 1
            else:
                word_dict[word] = 1

            # Sort the dict in descending order so that the
            # most repeated word is at the top
            sorted_dict = sorted(word_dict.items(),
                                key=operator.itemgetter(1),
                                reverse=True)

            # First value in the dict would be the most repeated word
            result = sorted_dict[0]

    # Set the value of the text box as the result of our computation
    self.textbox.SetValue(str(result))

The logic behind the "Most Repeated Word" feature is that we first run a loop that iterates through word from the list of all words. Then it checks if that particular word already exists in the dictionary or not; if it does, then that means it is being repeated, and its value is incremented by one each time the word reappears. Otherwise, if it doesn't exist in the dictionary, then that means it has appeared in the sentence for the first time, and its 'occurence' value should be set to 1. Lastly, we sort the dictionary (similar to Python list sorting) in descending order so that the word with the highest value (frequency) comes out on top, which we can then display.

Alright, so now that we have written the computation/action that needs to be performed when a specific button is clicked, let's "bind" that action to that particular button. For that, we'll have to slightly modify our widgets function.

# Only one line needs to be added in the "for loop" of
# our widgets function, so that's all we're showing
for lab in button_list:
    button = wx.Button(self, -1, lab)
    self.Bind(wx.EVT_BUTTON, self.event_handler, button)
    grid.Add(button, 0, wx.EXPAND)

In the code above, the self.Bind call is where the binding occurs. What it does is that it links a particular action to a specific button, so that when you click that button, a specific action linked to it will be performed. In our particular case, we only have one event handler function, which handles both actions by checking at runtime which button was clicked through the 'label' property and then performing the linked action. So in the self.Bind call we bind all our buttons to the single 'event_handler' function.

Alright, so our code is now complete. Let's try both of our features and see if it all works as expected.

In the first step, as shown below, we enter a string in the text box:

Next, if we click the "Count Words" button you should see "7" in the text box since there were 7 words in the string.

So far so good!

Now let's write another string in the text box, as shown in the following figure:

Now, if we click the "Most Repeated Word" button you will see the most repeated words in the text box, along with its frequency of occurrence, as shown below:

Works perfectly!

We have only added two features, but the purpose was to show you how all these components are connected, you can add as many functionalities as you want by simply writing additional functions for them. Furthermore, this tutorial was not focused much on the aesthetics. There are many widgets avaialable in wxPython to beautify your program now that you have grasped the basic knowledge of the toolkit.

Conclusion

To sum things up, we learned that wxPython is popularly used for developing GUI-based desktop applications in Python, and that Python also has some other cool alternatives for it. We went through the commands to download and install it on all popular Operating Systems. Lastly, we learned how to make a modularised application using wxPython which can be easily extended, as could be seen in this tutorial where we built up on a basic skeleton app and added more features step by step.

↧

PSF GSoC students blogs: Week 8

July 26, 2019, 12:37 pm

≫ Next: Talk Python to Me: #222 Interactive graphs with Bokeh and Python

≪ Previous: Stack Abuse: Creating Python GUI Applications with wxPython

Last week, I implemented multithread scanning with John's help. At first, I thought the logic was to create a function that instantiates a database everytime and close it finally, but that would also be too inefficient since it might take a long time to connect and disconnect the database if we let each thread call the function for each file. Instead, we could just use a queue to save all the files that will be scanned, and each thread just opens the database first and closes it only if there is no jobs to be done. We also don't need to take care of thread safety since the queue in Python is alread thread safe. Besides, I also added a flag to enable/disable the updating database so that users could save time to test or run the tool.

Compared with C, I think it is easier for Python to implement multithread/processing. For example, the communication between processes/threads are more various, in C we could only use signals, shared memory, pipe and message queue. In addition, in Python each thread we could call `join()`, which is like wait() in C. But in C the parent process is the only one who needs to call wait(), so in terms of coding, we have to implement parent and child processes individually.

The other thing that I learnt is about code coverage. Multithread is hard to debug because it is difficult for us to track every thread at the same time. With code coverage's help, we could see the report about which part of the code is not covered during the test, so we could find why it is not entered.

↧

Talk Python to Me: #222 Interactive graphs with Bokeh and Python

July 26, 2019, 1:00 am

≫ Next: Catalin George Festila: Python 3.7.3 : Tonny I.D.E. for python programmers.

≪ Previous: PSF GSoC students blogs: Week 8

Do you have data you want to visualize and share? It's easy enough to make a static graph of it. But what if you want to zoom in and highlight different sections? What if you need to rerun your ML model on selected data? Then you might want to consider working with Bokeh. It does this and much more. Join me on this episode where you'll meet Bryan Van de Ven who heads up the Bokeh project.

↧

Catalin George Festila: Python 3.7.3 : Tonny I.D.E. for python programmers.

July 26, 2019, 6:09 am

≫ Next: Roberto Alsina: Programación, matemática, y el problema de los tomates venenosos.

≪ Previous: Talk Python to Me: #222 Interactive graphs with Bokeh and Python

Today I tested the Thonny I.D.E. from thonny.org official webpage. Yesterday I tried several editors for python programming language and did not work. One of these is the spyder editor that does not work with python 3.7.3 - we have not discovered why. The Mu is a simple Python editor for beginner programmers and has a strange working I.D.E. for good and fast development. The PyCharm, it is

↧

Roberto Alsina: Programación, matemática, y el problema de los tomates venenosos.

July 26, 2019, 11:47 am

≫ Next: NumFOCUS: Meet our 2019 Google Summer of Code Students (Part 3)

≪ Previous: Catalin George Festila: Python 3.7.3 : Tonny I.D.E. for python programmers.

Malditos Tomates

Mucha gente, cuando no sabe programar, tiene prejuicios. Algunos de los más comunes son:

"Para programar hay que ser un bocho."
"Para programar hay que saber matemática."

Ambos prejuicios son perjudiciales para ese posible futuro programador por varios motivos. El primero y más obvio es que no son ciertos. Pero no es que no son ciertos en la manera en que "el tomate es una verdura" no es cierto, son falsos de la misma manera que "el tomate es venenoso" es falso.

Eso es lo que lo hace complicado. Porque el tomate ... el tomate es venenoso.

En el siglo 18, uno de los sobrenombres del tomate era "manzana venenosa"¹ porque la gente rica solía comer tomates y morir envenenada. Porque comía en platos de peltre, que contiene plomo y el jugo del tomate disolvía el plomo, y comer plomo es malo, gente.

Por otro lado el tomate es venenoso en sí mismo. Es una solanácea, un género de plantas que producen alcaloides. La tomatera produce solanina, un tóxico que provoca diarrea, vómito y dolor abdominal.

O sea, decir "el tomate es venenoso" es técnicamente cierto que es la peor manera de estar equivocado. Lo mismo pasa con decir "para programar hay que saber matemática".

Es técnicamente cierto. Pero no es importante. Igual que es técnicamente cierto que el tomate es venenoso, pero no es importante, y por eso comemos tomate igual.

Me voy a concentrar en el segundo prejuicio, acerca de programar y matemáticas, porque el primero no resiste el mas mínimo contacto con programadores (yo incluído).

¿Por qué es técnicamente cierto?

1. Te enseñan cosas que son "matemática" cuando aprendés a programar

Por ejemplo, te van a hablar de cosas como números binarios, hexadecimales y hasta octales. Y sí, eso es "matemática" y es necesario para ... ¿para qué, exactamente?

Para casi nada. Estas son las cosas que más frecuentemente encuentres programando para las cuales eso es útil:

binarios: para calcular subredes IP
octales: para calcular permisos en sistemas UNIX-like
hexadecimal: interpretar archivos o datos binarios a mano sin escribir los bytes en decimal

Mentira. El uso más frecuente del hexadecimal es buscar palabras que se pueden escribir como números hexa. Aguante DEADBEEF!

Si este año tengo que usar números binarios más allá de saber "un byte cuenta hasta 255" va a ser la segunda vez en la década.

Realmente es una de esas cosas que uno aprende, las guarda en un rincón de la cabeza y después las saca a pasear una vez cada tanto cuando se encuentra con un problema específico, igual que la explicación de la regla del offside o como se organiza un torneo por sistema suizo.

2. Algunas áreas del desarrollo de software tienen realmente una base matemática

Si querés hacer machine learning tenés que saber hacer regresión lineal. tenés que tener idea de cálculo. Te va a venir bien saber montones de cosas más.

De la misma manera si vas a hacer un sistema de liquidación de sueldos te va a ser útil saber sobre legislación laboral.

Si sos un médico y querés saber si la aspirina hace bien vas a tener que saber diseño experimental y estadística.

Si sos un manager de baseball y querés saber si te conviene comprar un bateador con un OPS de .575 pagándole 23 millones de dólares vas a necesitar probabilidad y estadística y contabilidad.

Si querés programar un algoritmo de crypto tenés que parar y no programarlo porque no es buena idea.

Que para una tarea en particular necesites saber algo no hace que sea un prerequisito para el área en general. Nadie sabe hacer todo. Nadie sabe programar todos los tipos de cosas. Eso es simplemente la condición humana.

Yo no sé hacer todo. Y no, no sé hacer machine learning. Y tampoco te puedo hacer un programa de trading. Y si vamos al caso tampoco puedo hacer una simple media tejida porque no sé tejer.

Para saber hacer cosas hay que estudiar, no hay mucho secreto. Entonces, para programar hay que estudiar como se programa, y para programar algunas cosas en particular hay que estudiar otras cosas también.

3. La programación en sí es matemática

Este motivo es más esotérico, pero si, es cierto. La matemática y los matemáticos te van a decir alegremente que el concepto mismo de algoritmo es matemática.

En cuyo caso, obviamente, apenas aprendés a hacer un if ya aprendiste matemática y es imposible expresar un programa sin matemática y pasamos de "técnicamente cierto" a "obvio e inútil". Si todo es matemática entonces el "hay que saber matemática" es una trivialidad. Será que sí, pero ¿cuánta? y ¿cuál?

4. La matemática es útil para hacerte mejor programador

Si aprendés complejidad algorítmica programás mejor.

Si aprendés suficiente "number sense" para saber cuando vale la pena hacer algo programás mejor.

Si aprendés suficiente probabilidad como para saber si algo es un riesgo que vale la pena atacar programás mejor.

Y varias cosas similares.

Éste es tal vez el sentido en el que estoy más dispuesto a decir que "para programar hay que saber matemática" pero tiene el problema de que no es lo que el receptor entiende cuando se lo decís.

Si el objetivo de comunicarse es que se transmita un mensaje (hey, teoría de la información! Más matemática!) es importante no sólo ser correcto en lo que se dice, es importante que lo que uno dice sea entendido de manera correcta por el receptor.

Así que ...

Mi declaración sobre la programación y la matemática a ver si me explico, mire

La matemática es una cosa super amplia, y en la vida nos cruzamos todo el tiempo con ella.

El saber la trayectoria que va a hacer la pelota cuando pateás con comba es matemática. Pero cuando pateás lo hacés sin calcularla porque sabés esa parte de la matemática. No hace falta que la expreses "matemáticamente". no te ponés a calcular el efecto Magnus de acuerdo a la velocidad de rotación de la número cinco y la influencia de los gajos en la aerodinamia.

Programar, en la súper gran mayoría de los casos, se parece mucho más a eso que a lo que te viene a la cabeza cuando te dicen matemática.

Vas a tener que aprender algunas herramientas. Y te las vas a olvidar. ¿Y sabés qué? No hay problema. Las aprendés de vuelta.

Y vas a hacer cosas como mirar un cacho de código y decir ... "ajá, complejidad logarítmica". Y mientras te acuerdes que forma tiene el dibujo comparado con una parábola, hasta ahí llegó lo que te importa en ese momento.

Y a veces vas a tener que meterte hasta las cachas en matemática, y vas a tener que ver como hacer una transformada afín, o como hacer un curve fitting, o un montón de otras cosas. ¡Yo una vez tuve que hacer análisis de regresión para ver como organizar una tabla HTML! ¿Y?

La matemática está por todos lados. Para programar vas a usar matemática. También podés usar matemática para vender chancletas.

No es que sea falso que "para programar hay que saber matemática" es que no es interesante.

De ahora en más se van a imaginar a Blancanieves morfándose un tomate. Sorry. ↩

↧

NumFOCUS: Meet our 2019 Google Summer of Code Students (Part 3)

July 26, 2019, 2:11 pm

≫ Next: Codementor: Why is there an f before this string? An introduction to f-strings and string formatting

≪ Previous: Roberto Alsina: Programación, matemática, y el problema de los tomates venenosos.

The post Meet our 2019 Google Summer of Code Students (Part 3) appeared first on NumFOCUS.

↧