PyCon: PyCon 2019 Reminders

October 29, 2018, 4:53 am

≫ Next: Real Python: Python Community Interview With Michael Kennedy

≪ Previous: Made With Mu: Python on Hardware Vlog, from Adafruit.

As November approaches, many of us are starting to plan for the holidays, vacations, and/or how to bring in the New Year. Before we get seasonally busy, we wanted to take the time to remind Pythonistas of a few PyCon deadlines and upcoming launches!

Launches

Registration for the 2019 conference will launch in early November! The first 800 registrations are sold at an early bird rate:

Corporate tickets will start at $550 USD
Hobbyist tickets will start at $350 USD
Student tickets will start at $100 USD

Deadlines

November 25, 2018 is the deadline to submit a PyCon Tutorial proposal
January 3, 2019 is the deadline to submit a PyCon Talk proposal, Poster proposal, Education Summit proposal, and Las PyCon Charlas proposal
Financial aid is now accepting applications. February 12, 2019 is the deadline to submit an application
The Hatchery Program is accepting proposals until end of day January 3, 2019

We’re looking for all types of Tutorial, Talk, Poster, and Education Summit topics to help our community learn, so be sure to check out our Call for Proposals for more details and submit your proposal via your dashboard soon!

To those in the Northern Hemisphere, Happy Autumn!

[Image description: Kitten sitting on a pile of leaves]

[Image description: Cat sitting on a pile of leaves with a pumpkin and squash nearby]

↧

Real Python: Python Community Interview With Michael Kennedy

October 29, 2018, 7:00 am

≫ Next: Weekly Python Chat: pathlib in Python: why you should use it

≪ Previous: PyCon: PyCon 2019 Reminders

This week, our Python community interview is with none other than Michael Kennedy of Talk Python to Me fame.

You may know his authoritative voice, but do you know his Python story? Read on to find out about his journey with Python, what he thinks about when stuck in traffic, and his love for two wheels.

Ricky:Welcome to Real Python! If I recall correctly, you started out as a .NET developer, and you were even a Microsoft Certified Trainer. So I’m curious as to how you came to Python and what made you stick around?

Michael Kennedy

Michael: Thanks for having me. Oh, this brings back memories. Yes, I was doing full time .NET development with C# for probably 10 years. It’s a language I still respect today.

I found my way to Python after wanting to branch out into areas outside of the Microsoft space. I guess this was probably 2012 or around then. I hadn’t been doing much outside of C++ and C# before then for some time other than JavaScript. (No one escapes JavaScript!) I looked at the popular languages, and this was just around the time Python was becoming popular and increasingly so.

I spent a few weeks learning Python and was pretty much hooked but didn’t know it.

I studied the language and ecosystem and found it to be really nice—much nicer than I expected. But I suffered the problem that everyone who knows some language really well does when they try something different. Everything I knew by heart was a challenge again. How do I create a web app? How do I host it? How do I query a database? And on and on.

I was willing to learn and did. But it was just that uneasiness that anyone would have, giving up the familiar path. However, I knew I was going to be hooked when I went back to write some C# code and found it way less tasteful than I had just a few weeks prior.

This is not to bash that language. But like all C-based languages, it had a lot of symbol noise, let’s say. Why do we need semicolons again? Why the massive love for parentheses and curly braces even when it’s (now) clear they are unneeded, etc. Here’s an example:

classThing{publicintTotalSales{get{inttotal=0;foreach(varbinitems){total+=b.value;}returntotal;}}}

classThing:@propertydeftotal_sales(self):total=0forbinitems:total+=b.valuereturntotal

Makes one wonder why you’ve been typing all those symbols, all those years.

Since then, as I’ve learned more and more of the popular packages and standard library modules, I’ve just enjoyed it more every day. Now I run my entire business on a Python stack, and it has yet to let me down.

Ricky:You are, of course, the host of the most popular Python podcast—Talk Python to Me—which now has over 180 episodes. You’re also a co-host on the Python Bytes podcast with Brian Okken. That’s a lot of content! How do you continue to be so consistent each week, and keep the shows relevant and newsworthy? It must be a lot of work?

Michael: That is definitely a lot of content. But it’s a very rewarding project that is into its fourth year for Talk Python To Me, and third year for Python Bytes.

How am I still consistent? That’s a good question. I started the podcast because others were inconsistent. There have been Python-based podcasts before mine. But they all stopped producing episodes long before I got in the game. In fact, that’s why I felt I could get started, because there was such a lack of content.

I am consistent for a few reasons. First, when I started the podcast, I promised myself I’d do it every week for six months and then decide whether the community and I enjoyed it. After that much consistent content creation, you are pretty deep within the habit of doing so.

Second, by then I had several companies sponsoring the podcast. I thought maybe, just maybe, I could find a way to use the podcast to become independent of my day job. I didn’t hate my day job, but it doesn’t compare to doing what you think is truly valuable for the community and world. Once you accept money to produce a thing over a long period of time, consistency is just part of the agreement.

Finally, the listeners were so supportive of my work. It genuinely felt great producing the content for everyone. I looked forward to each episode I created. After all, I was learning so much from each one and continue to do so to this day.

As for keeping the show relevant and newsworthy, that is easy. For Python Bytes, that’s literally the topic (weekly news items), and we get tons of help from listeners all suggesting great new items each week.

For Talk Python To Me, this is harder. Each episode digs deeply into a topic. For the first 20 episodes or so, that was easy enough for me. I’d used SQLAlchemy for example, so asking Mike Bayer about it was just thinking back on my experience. But it quickly grew into spaces I had little experience with. I now spend quite a bit of time researching topics to cover each week. Any given episode has between 4–8 hours of research before we even press record.

That leads into the final part of your question: Yes, it is a lot of work. I’ve had folks ask me how much time I spend on the show each week. They’ve even said, “You have such a sweet deal. What do you spend on the podcast per episode? A few hours?” Well, that would be something! I probably spend about 2 days per episode between the research, outreach to guests, email correspondence, website development, sponsorship relationships, and much more.

That is a lot of time, but it is also literally the foundation of my business. The podcasts and the courses only work if they are both well known and high quality. It’s very fortunate that I’ve been able to transition my part-time podcast into a full-time job (podcast and courses). It lets me really stay focused and stay consistent.

Ricky:If our readers don’t know you from the podcast, they surely will know you for your excellent courses on Talk Python Training. One of the first courses I took when I began learning Python was your Python Jumpstart by Building 10 Apps course—which is excellent by the way. And you’ve just released a new course named Async Techniques and Examples. Could you tell us a little more about it and why you decided to focus on Async, specifically?

Michael: Thanks! The courses have been a true passion project for me. I’ve wanted to create the best online library for Python courses out there for a long time.

When I first started the podcast, I also wanted to start the courses. I saw them as going hand in hand, with each supporting the other. However, at the time I worked for a company that did in-person and online training courses for developers.

They afforded me a lot of freedom and flexibility. But what would not fly is my creating effectively a competing company in my spare time. So I started with the podcast, and then once I could go full time independently, my first action was to launch the training company and Python Jumpstart by Building 10 Apps on Kickstarter. That was a really fun experience and a huge success.

The new async course is super fun and something I felt really needed to exist for the community. It needs to exist for a few reasons. The Python async/concurrent programming story is a little hard to understand and make sense of, for many people. We have the GIL, which actually is covered very nicely on Real Python. This means normal threading was only effective for IO bound work. CPU bound work requires another API, multiprocessing with its own oddities and techniques.

Now, as of Python 3.5, we have the amazing async and await keywords. They are powerful and clean but add more choices and more fog to the situation. This doesn’t take into account the async features of Cython and its nogil keyword.

So the first reason is there was a lot of confusion around async and Python. You hear of people leaving Python for Go explicitly because of Go’s “better” concurrency. Usually, the type of concurrency people are looking for is IO bound, which works extremely well in Python anyway.

The next reason is that async and concurrent programming is oddly taught in the wrong order. What I mean by this is that usually lots of confusing, low-level detail is presented up front. Then finally it’s put together into examples that are useful and compelling. But the learner has to make it that far for it to pay off. This is also often paired with dire warnings of thread safety and how hard race conditions are. All of this is true and accurate. But why start there?

I wanted a course that shows how productive, fun, and actually easy async is for many cases. Once the student sees the value, then you can dive into things like thread safety and so on.

Finally, there really just are not many async courses for Python out there. I only know of one other one, and it’s behind a subscription wall.

Ricky:It’s no secret that you’re a big fan of MongoDB. What do you find most appealing about it? And if someone has never used it before, why might they consider using it with their next Python project?

Michael: I am a big fan of MongoDB. Long ago, I was complaining to a friend about how painful deploying relational database apps was. About how it’s a pain to apply the migration scripts without downtime and things like that. He said, “Well, why don’t you just use MongoDB, and you won’t have that problem?”.

I took his advice, and he was right! Since then, I’ve launched 4 or 5 major projects on MongoDB. Both of my podcast sites (talkpython.fm and pythonbytes.fm) and the course site run on MongoDB.

I know some folks have had bad experiences with MongoDB. There were a few “best practices” that were not the default in MongoDB in the early days, and there are lots of stories about these. Most of them have been fixed, and if you know to avoid them, you’re in good shape. The one major gotcha still out there is that MongoDB runs without authentication unless you explicitly create an account.

That said, MongoDB has been totally bulletproof for me and my projects. It’s been powering these sites mentioned above for years without any downtime. I have literally never run a migration or upgrade script to deploy a new schema or data model.

Mongo’s flexible schema model and MongoEngine’s class-based ODM (think ORM for MongoDB) are just right for my projects. The websites run with super low latency. It’s pretty common to get 10–20 ms response time for non-trivial pages (from request to response out of the server).

I personally can totally recommend MongoDB for your projects. I also created a free course if people are interested at freemongodbcourse.com.

Ricky:Now the Python is out of the way, it’s time to talk about the fun stuff… Math! You have a master’s degree in Mathematics, and you did start your Ph.D. Any plans to finish it in the future, or has that ship sailed? I would imagine you still have a passion for it. Do you get to scratch that itch on a daily basis?

Michael: It’s sailed, over the horizon, and halfway to Antarctica! I did study math and still very much appreciate the beauty of everything about it. I was just thinking about the different types of infinity, the different sizes of infinity, on just the simple number line between [0, 1] while stuck in traffic last week.

But that’s not working in math, day to day. I believe software development is my true calling. I love doing it every day. What I learned in math was excellent preparation for development. The rules of mathematics and the “rules” (language, APIs, algorithms, big-O, etc.) of software are surprisingly similar. The types of thinking and problem solving are also quite comparable.

The career opportunity in software and entrepreneurship vs. mathematics is not comparable. It’s just better to build things people can use (software) rather than theories only 5–20 people in the world will understand and care about (math these days).

So I love it and still read books about it, but I’m not doing anything practical with math these days.

Ricky:Do you have any other hobbies or interests, aside from Python?

Michael: I’ve had many fun hobbies over the years. I do think it’s important to have a balance between computer time and other things in your life.

I’m pretty fortunate in that I basically have 3 really engaging aspects to my job that I would almost consider hobbies. I run the podcast and am really into improving that craft and connecting with the listeners. I am doing software development still almost daily. And running my business and the whole entrepreneurial side of things is amazing and fun.

In terms of actual hobbies, I love racing and anything with two wheels! I grew up racing BMX bikes in grades 1–5, then motocross through middle school and high school, and finally mountain bikes in college. My brothers and I built a motocross track in our backyard, and it was pretty common to come home from school, drop our backpacks, and spend an hour or two challenging each other to clear this series of jumps or just having a fun time.

These days, I only watch motocross and am also a huge fan of IndyCar. I still ride but keep it to mellow adventures with my wife on our street motorcycles around the mountains here in Portland, OR. It’s great to be able to share that experience of riding with her and my daughters, who jump on the back of one of our bikes.

Ricky:What do we have to look forward to in the future from Talk Python? Any secret projects you’d like to tell us about or anything you’d like to share and/or plug?

Michael: I have some projects involving exciting courses coming out. I’d actually like to be more forthcoming with what I’m working on there, but there is a surprising amount of “copying” my popular courses let’s say.

We have at least 4 courses in active development right now and a massive list of things we’d like to build. So in terms of courses, just expect us to keep working on new ones that we see the need for in the community. You can also expect some more world-class authors there creating content. I’m really honored to be able to work on the courses for everyone and make my dream of this resource and business a reality.

In terms of the podcast, no slowing down there. We have Talk Python and Python Bytes, and both are going strong. I just hope to bring better and deeper stories to the community on Talk Python and stay on top of the most exciting programming language out there with weekly updates on Python Bytes with my co-host Brian Okken.

Thank you all for having me here on Real Python. I’m a big fan of the resource you all have created. If readers are interested in my projects, please subscribe to the podcasts at talkpython.fm and pythonbytes.fm. If they have aspects of Python they’d like to learn either personally or for their team, check out our 100+ hours of courses at training.talkpython.fm.

Thank you, Michael, for joining me for this week’s interview. It’s been great having you on the other side of the interview mic.

As always, if there is someone you would like me to interview in the future, reach out to me in the comments below, or send me a message on Twitter.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Weekly Python Chat: pathlib in Python: why you should use it

October 29, 2018, 7:30 am

≫ Next: Made With Mu: Brian is in the Kitchen

≪ Previous: Real Python: Python Community Interview With Michael Kennedy

Python is great for working with files, but there are a lot of different functions and modules to memorize when you need to work with files and directories in Python.

The os module and the os.path module are full of helper utilities and so is the shutil module. Python's pathlib module is a new(ish) part of the standarb library that can be used as a fairly neat replacement for most of these many file and directory management utilities.

Join me in this live chat for a live chat about pathlib and how it can help us write more readable Python code.

↧

Made With Mu: Brian is in the Kitchen

October 30, 2018, 1:00 am

≫ Next: PyBites: Code Challenge 56 - Calculate the Duration of a Directory of Audio Files - Review

≪ Previous: Weekly Python Chat: pathlib in Python: why you should use it

This second guest post, from Thierry Chantier, describes how the French Python community is helping young programmers with Mu. I’m especially pleased that he’s submitted his blog in French. We love to celebrate the efforts of Mu users all over the world!

It’s a common cliché: french are not good english speakers. But hey, micro:bit, Monthy Python … and Mu comes from the UK, we have to deal with it. (Oh, I forgot the Sex Pistols) The rest of this article will be in french, but that won’t be a problem, you are all french fluent, aren’t you? ;) If Nicholas need some training, I know how to organize something in may 2019 in Lyon :)

Tout d’abord, je me présente : Thierry, aka titimoby sur Internet depuis déjà quelques années. Même si je suis également passionné de musique en fréquences basses, je suis pour ma part tombé dans la marmite informatique dès le plus jeune âge.

D’ailleurs, mon premier rêve de geek fut l’achat d’un ZX Spectrum+, cette merveille elle aussi issue des terres anglaises. Toujours est il qu’après un parcours dont cet article n’est pas le sujet, je me suis un jour retrouvé après une keynote de la conférence MixITà me lancer dans une association pour les enfants.

MixTeen est née de cette passion pour tout cet univers numérique (en français, digital concerne l’usage des doigts ;)) Nous sommes donc une petite équipe qui essayons d’organiser des ateliers de découverte. Les premiers pas se sont fait en utilisant Scratch, puis au fur et à mesure, tout ce que nous nous sentions d’utiliser. Rapidement, l’envie de lier cette découverte du code au monde physique s’est faite sentir.

Comme j’ai pris goût au monde des makers, comme quelques petits bricolages par ici ou par là c’est avec joie que j’ai vu apparaître la carte micro:bit Très vite, cette carte ou de celles de chez Adafruit sont devenues des invitées permanentes de nos ateliers. En parallèle, certains enfants que nous rencontrons reviennent, grandissent et souhaitent aller plus loin avec Python. Le souci est alors d’avoir des outils suffisamment simples à installer et à utiliser, si possible sans connexion à Internet.

L’apparition de Mu semble répondre à tous ces critères et on lui fait une place dorénavant. La gestion des cartes micro:bit est très sympa et la découverte de python me semble plus claire avec Mu.

Les pythonistas de la PyconFR ont d’ailleurs bien aimé Mu. Cette fois là, en plus des enfants, nous nous sommes retrouver à initier des parents et c’était un très bon moment.

Il ne manque plus qu’une fonction de selfie dans Mu pour avoir des photos de bonne qualité, je suis généralement trop occupé pour pouvoir faire une photo qui ne soit pas floue ;)

En attendant, nous continuerons d’aider à traduire Mu (ou les ressources de la fondation Raspberry, on est très ouverts)

Et comme le disent les Girls Can Code! :

“ALLEZ MU !”

↧

PyBites: Code Challenge 56 - Calculate the Duration of a Directory of Audio Files - Review

October 30, 2018, 3:45 am

≫ Next: PyBites: Code Challenge 57 - Analyze Olympic Games Data With Pandas

≪ Previous: Made With Mu: Brian is in the Kitchen

In this article we review last week's Calculate the Duration of a Directory of Audio Files code challenge.

Hacktoberfest almost over

The last two days of Hacktoberfest No. 5! Wrap up your PRs!

Community Pull Requests

Another 14 PRs this week, cool!

$ git pull origin community
...
From github.com:pybites/challenges
 * branch            community  -> FETCH_HEAD
   6ac949d..8eec5f9  community  -> origin/community
Updating 6ac949d..8eec5f9
Fast-forward
...
 25 files changed, 1854 insertions(+)

Check out the awesome PRs by our community for PCC56 (or from fork: git checkout community && git merge upstream/community):

PCC56 Lessons

I learned how to use python os module to get all files(mp3, m4a, mp4) of a given directory. Also, I learned how to use Mutagen module to get files info such as durations in seconds and bitrate. In the future, I would like to use pathlib module as an alternative to os

Got to utilize pathlib, and isinstance for type checking.

Equated with subprocess, datetime, csv libraries.

Refreshed my memory on pandas and glob, these are not something I get to use everyday.

Learned new mp3 metadata library (eyeD3), learned dataclasses, and got better at using os.

Practiced with subprocess (call and parse external ffmpeg binary), calculate with datetimes, glob.glob to list files in a directory. Another nice refresher: "how to have this script in my PATH": I put it in ~/bin (already in my $PATH) + chmod 755 + added a bang (#!/usr/bin/env python)

Read Code for Fun and Profit

You can look at all submitted code here and/or on our Community branch.

Other learnings we spotted in Pull Requests for other challenges this week:

(PCC01) Learned about rstrip and the use of dictionary get to set a default value.

(PCC02) Learned about list extend and itertools.permutations.

(PCC20) Learned how to use inheritance, class properties with the property decorator, some error handling, and using magic methods for printing and comparing.

(PCC42) Learned a few extra re tricks

(PCC51) Learned SQL

Thanks to everyone for your participation in our blog code challenges! Keep the PRs coming and include a README.md with one or more screenshots if you want to be featured in this weekly review post.

Become a Python Ninja

Master Python through Code Challenges:

Subscribe to our blog (sidebar) to get a new PyBites Code Challenge (PCC) in your inbox every week.
Take any of our 50+ challenges on our platform.
Prefer coding bite-sized Python exercises in the comfort of your browser? Try our growing collection of Bites of Py.
Want to do the #100DaysOfCode but not sure what to work on? Take our course and/or start logging your 100 Days progress using our Progress Grid Featureon our platform.

Keep Calm and Code in Python!

-- Bob and Julian

↧

PyBites: Code Challenge 57 - Analyze Olympic Games Data With Pandas

October 30, 2018, 4:10 am

≫ Next: Montreal Python User Group: Montréal-Python 73: Despotic Wagon

≪ Previous: PyBites: Code Challenge 56 - Calculate the Duration of a Directory of Audio Files - Review

Life is about facing new challenges - Kostya Tszyu

Hey Pythonistas,

A new week, a new Python code challenge!

This week you can use Python, Pandas and all the libraries you need to analyze the data of Olympic Games and find out interesting things and present them to everyone with Matpolib, Seaborn and/or Plotly.

The Challenge

Basic/ required

Analyse statistics of Olympic Games in a CSV file that you can find on Kaggle.

Find out the (male and female) athlete who won most medals in all the Summer Olympic Games (1896-2014). The answer will be Michael Phelps for the men and Larisa Latynina for the women.
Display the first 10 countries that won most medals:
- The order for men will be: USA, Russia (considering that before was USSR), UK,France, Italy, Sweden , Germany, Hungary, Australia, Japan.
- The order for women will be: USA, Russia, China, Australia, Germany(it would be third if we sum the results with the German Democratic Republic), Netherlands, Romania, UK, Japan, Hungary.
Use matplotlib to build line plots of the 10 most awarded countries for time span 1896-2014. Use the 10 most popular summer Olympics disciplines where most popular you can define yourself.

One requirement: use pandas to create a dataframe you can work on.

For the data visualization part, you can try matplotlib or Seaborn if you want to try a heatmap and different kind of visualizatios. You can install Jupyter (Anaconda) to work in an interactive notebook.

Don't be shy

Create a barplot which shows the total medals won for each sport during the summer Olympics.

Bonus

To take this even further you could create a map and choose colors for each Country, pointing out the ones which won most medals. To get started on this, you can try plotly library, and specifically Choropleth Maps in Python.

Ideas and feedback

If you have ideas for a future challenge or find any issues, open a GH Issue or reach out directly.

Last but not least: there is no best solution, only learning more and better Python. Good luck!

Become a Python Ninja

At PyBites you get to master Python through Code Challenges:

Subscribe to our blog (sidebar) to get a new PyBites Code Challenge (PCC) in your inbox every week.
Apart from this blog code challenge we have a growing collection of 50+, check them out on our platform.
Prefer coding bite-sized Python exercises in the comfort of your browser? Try our growing collection of Bites of Py.
Want to do the #100DaysOfCode but not sure what to work on? Take our course and/or start logging your 100 Days progress using our Progress Grid Featureon our platform.

Keep Calm and Code in Python!

-- Andrea

↧

Montreal Python User Group: Montréal-Python 73: Despotic Wagon

October 29, 2018, 9:00 pm

≫ Next: Python Software Foundation: PyPI Security and Accessibility Q1 2019 Request for Information period opens.

≪ Previous: PyBites: Code Challenge 57 - Analyze Olympic Games Data With Pandas

Just in time for PyCon Canada, we are organizing an amazing evening with great local Pythonisthas. It is your chance to come to support them, see their talk in avant-première and who knows maybe to give them some feedback.

For PyCon Canada: don't forget it's next month, on November 10-11, in Toronto and there's still some tickets available. You should pick yours by going at https://2018.pycon.ca/registration.

Presentations

Andrew Francis

Physical libraries are great! Managing library material via web interfaces leaves much to be desired. In the age of Siri and Alexa, why can’t one manage one’s library loans with text messaging or voice? This talk discusses questions and answers by prototyping a Python based conversational agent

Python packaging for everyone - Eric Araujo

Packaging in Python used to be a complicated affair, for technical and human reasons. Thankfully, in recent years the Python community has developed robust tools and practices. If you are wondering how to develop and distribute your project, this talk will show you the best of 2018!

Numpy to PyTorch - Shagun Sodhani

Numpy is the de-facto choice for array-based operations while PyTorch largely used as a deep learning framework. At the core, both provide a powerful N-dimensional tensor. This talk would focus on the similarities and difference between the two and how we can use PyTorch to augment Numpy.

Why are robots becoming Pythonistas? - Maxime St-Pierre

Introduction: In the fast pace and intense world of robotics, many praises a particular language, this godsend is Python. In this talk, we will look at some robotic frameworks and try to understand why Python is a popular alternative to C++ and Java.

Keep It Simply Annotated, Stupid - Sébastien Portebois

Des déclarations de type en Python. Hérésie? Depuis quand? Survolons ensemble le support de Python 2.7 à 3.7, les contraintes pour les développeurs et au runtime, et surtout: pour pourquoi voudrait-on ou devrait-on faire ça!

When

Monday November 5th, 2018 at 6PM

Where

Shopify Montreal Office 490 rue de la Gauchetière Montréal, Québec

Schedule

6:00PM - Doors open
6:30PM - Presentations
8:00PM - End of the event
8:15PM - Benelux

↧

Python Software Foundation: PyPI Security and Accessibility Q1 2019 Request for Information period opens.

October 30, 2018, 2:53 am

≫ Next: EuroPython Society: EuroPython 2019: RFP First Round Response

≪ Previous: Montreal Python User Group: Montréal-Python 73: Despotic Wagon

The Python Software Foundation Packaging Working Group has applied for and received a commitment from the Open Technology Fund to fulfill a contract for their Core Infrastructure Fund.

PyPI is a foundational component of the Python ecosystem and broader computer software and technology landscape. This project aims to improve the security and accessibility of PyPI for all users worldwide, whether they are direct users like project maintainers and pip installers or indirect users. The impact of this work will be highly visible and improve crucial features of the service.

We plan to begin the project in January 2019. Because of the size of the project, funding has been allocated to secure one or more contractors to complete the development, testing, verification, and assist in the rollout of necessary features.

To receive notification when our Request for Information period closes and the Request for Proposals period opens, please register your interest here.

What is the Request for Information period?

A Request for Information (RFI) is a process intended to allow us (The Python Software Foundation) and potential contractors to openly share information to improve the scope and definition of the project at hand.

We hope that it will help potential contractors better understand the work to be completed and develop better specified proposals. Additionally we hope that the open nature of our RFI will expose the project to multiple perspectives and potentially help shape the direction for some choices in the project.

The Request for Information period opens today, October 30, 2018, and is scheduled to close November 13, 2018.

After the RFI period closes, we will use the results of the process to prepare and open a Request for Proposals to solicit proposals from contractors to complete the work.

More Information

The full version of our Request for Information document can be found here.

Participate!

Our RFI will be conducted on the Python Community Discussion Forum. Participants will need to create an account in order to propose new topics of discussion or respond to existing topics.

All discussions will remain public and available for review by potential proposal authors who do not wish to or cannot create an account to participate directly.

↧

EuroPython Society: EuroPython 2019: RFP First Round Response

October 30, 2018, 5:53 am

≫ Next: Stack Abuse: Applying Filter Methods in Python for Feature Selection

≪ Previous: Python Software Foundation: PyPI Security and Accessibility Q1 2019 Request for Information period opens.

We are happy to announce that we have received 17 RFP submissions for EP2019 from various venues all across Europe.

Review will take longer

This large number was a somewhat unexpected and the work to review all these proposals is taking longer as a result.

Since we want to give all RFP submissions a fair chance, we will therefore postpone the selection announcement until Wednesday next week, 2018-11-07, and adjust the timeline for the second round accordingly.

Updated timeline for the RFP

First round:

Start of RFP process: 2018-09-28
Deadline for RFP vendor questions: 2018-10-05
Vendor questions answered by: 2018-10-12
First round submission deadline: 2018-10-19
Second round candidates will be informed until: 2018-11-07

Second round:

Second round RFP questions posted: 2018-11-16
Deadline for RFP vendor questions: 2018-11-21
Vendor questions answered by: 2018-11-23
Final submission deadline: 2018-11-28
Final candidate will be informed until: 2018-12-07

Many thanks,
–
EuroPython Society Board
https://www.europython-society.org/

↧

Stack Abuse: Applying Filter Methods in Python for Feature Selection

October 30, 2018, 6:47 am

≫ Next: Test and Code: 51: Feature Testing

≪ Previous: EuroPython Society: EuroPython 2019: RFP First Round Response

Introduction

Machine learning and deep learning algorithms learn from data, which consists of different types of features. The training time and performance of a machine learning algorithm depends heavily on the features in the dataset. Ideally, we should only retain those features in the dataset that actually help our machine learning model learn something.

Unnecessary and redundant features not only slow down the training time of an algorithm, but they also affect the performance of the algorithm. The process of selecting the most suitable features for training the machine learning model is called "feature selection".

There are several advantages of performing feature selection before training machine learning models, some of which have been enlisted below:

Models with less number of features have higher explainability
It is easier to implement machine learning models with reduced features
Fewer features lead to enhanced generalization which in turn reduces overfitting
Feature selection removes data redundancy
Training time of models with fewer features is significantly lower
Models with fewer features are less prone to errors

Several methods have been developed to select the most optimal features for a machine learning algorithm. One category of such methods is called filter methods. In this article, we will study some of the basic filter methods for feature selection.

Filter Methods for Feature Selection

Filters methods belong to the category of feature selection methods that select features independently of the machine learning algorithm model. This is one of the biggest advantages of filter methods. Features selected using filter methods can be used as an input to any machine learning models. Another advantage of filter methods is that they are very fast. Filter methods are generally the first step in any feature selection pipeline.

Filter methods can be broadly categorized into two categories: Univariate Filter Methods and Multivariate filter methods.

The univariate filter methods are the type of methods where individual features are ranked according to specific criteria. The top N features are then selected. Different types of ranking criteria are used for univariate filter methods, for example fisher score, mutual information, and variance of the feature.

One of the major disadvantage of univariate filter methods is that they may select redundant features because the relationship between individual features is not taken into account while making decisions. Univariate filter methods are ideal for removing constant and quasi-constant features from the data.

Multivariate filter methods are capable of removing redundant features from the data since they take the mutual relationship between the features into account. Multivariate filter methods can be used to remove duplicate and correlated features from the data.

In this article, we will see how we can remove constant, quasi-constant, duplicate, and correlated features from our dataset with the help of Python.

Removing Constant features

Constant features are the type of features that contain only one value for all the outputs in the dataset. Constant features provide no information that can help in classification of the record at hand. Therefore, it is advisable to remove all the constant features from the dataset.

Let's see how we can remove constant features from a dataset. The dataset that we are going to use for this example is the Santandar Customer Satisfcation dataset, that can be downloaded from kaggle. We will use the file "train.csv". However, I have renamed it to "santandar_data.csv" for readability purpose.

Importing Required Libraries and Dataset

Constant features have values with zero variance since all the values are the same. We can find the constant columns using the VarianceThreshold function of Python's Scikit Learn Library. Execute the following script to import the required libraries and the dataset:

import pandas as pd  
import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.feature_selection import VarianceThreshold

santandar_data = pd.read_csv(r"E:\Datasets\santandar_data.csv", nrows=40000)  
santandar_data.shape

I filtered the top 40 thousand records. In the output, you should see (40000, 371) which means that we have 40 thousand rows and 371 columns in our dataset.

Splitting Data Into Training and Test Sets

It is important to mention here that, in order to avoid overfitting, feature selection should only be applied to the training set. Let's divide our data into training and test sets. Execute the following script:

train_features, test_features, train_labels, test_labels=train_test_split(  
    santandar_data.drop(labels=['TARGET'], axis=1),
    santandar_data['TARGET'],
    test_size=0.2,
    random_state=41)

Removing Constant Features using Variance Threshold

Now is the time to remove constant features. To do so we will use VarianceThreshold function that we imported earlier. The function requires a value for its threshold parameter. Passing a value of zero for the parameter will filter all the features with zero variance. Execute the following script to create a filter for constant features.

constant_filter = VarianceThreshold(threshold=0)

Next, we need to simply apply this filter to our training set as shown in the following example:

constant_filter.fit(train_features)

Now to get all the features that are not constant, we can use the get_support() method of the filter that we created. Execute the following script to see the number of non-constant features.

len(train_features.columns[constant_filter.get_support()])

In the output, you should see 320, which means that out of 370 features in the training set 320 features are not constant.

Similarly, you can find the number of constant features with the help of the following script:

constant_columns = [column for column in train_features.columns  
                    if column not in train_features.columns[constant_filter.get_support()]]

print(len(constant_columns))

To see all the constant columns, execute the following script:

for column in constant_columns:  
    print(column)

The outpu looks likes this:

ind_var2_0  
ind_var2  
ind_var18_0  
ind_var18  
ind_var27_0  
ind_var28_0  
ind_var28  
ind_var27  
ind_var34_0  
ind_var34  
ind_var41  
ind_var46_0  
ind_var46  
num_var18_0  
num_var18  
num_var27_0  
num_var28_0  
num_var28  
num_var27  
num_var34_0  
num_var34  
num_var41  
num_var46_0  
num_var46  
saldo_var18  
saldo_var28  
saldo_var27  
saldo_var34  
saldo_var41  
saldo_var46  
delta_imp_amort_var18_1y3  
delta_imp_amort_var34_1y3  
imp_amort_var18_hace3  
imp_amort_var18_ult1  
imp_amort_var34_hace3  
imp_amort_var34_ult1  
imp_reemb_var13_hace3  
imp_reemb_var17_hace3  
imp_reemb_var33_hace3  
imp_trasp_var17_out_hace3  
imp_trasp_var33_out_hace3  
num_var2_0_ult1  
num_var2_ult1  
num_reemb_var13_hace3  
num_reemb_var17_hace3  
num_reemb_var33_hace3  
num_trasp_var17_out_hace3  
num_trasp_var33_out_hace3  
saldo_var2_ult1  
saldo_medio_var13_medio_hace3

Finally, to remove constant features from training and test sets, we can use the transform() method of the constant_filter. Execute the following script to do so:

train_features = constant_filter.transform(train_features)  
test_features = constant_filter.transform(test_features)

train_features.shape, test_features.shape

If you execute the above script, you will see that both our training and test sets will now contain 320 columns, since the 50 constant columns have been removed.

Removing Quasi-Constant features

Quasi-constant features, as the name suggests, are the features that are almost constant. In other words, these features have the same values for a very large subset of the outputs. Such features are not very useful for making predictions. There is no rule as to what should be the threshold for the variance of quasi-constant features. However, as a rule of thumb, remove those quasi-constant features that have more than 99% similar values for the output observations.

In this section, we will create a quasi-constant filter with the help of VarianceThreshold function. However, instead of passing 0 as the value for the threshold parameter, we will pass 0.01, which means that if the variance of the values in a column is less than 0.01, remove that column. In other words, remove feature column where approximately 99% of the values are similar.

The steps are quite similar to the previous section. We will import the dataset and libraries, will perform train-test split and will remove the constant features first.

Importing Required Libraries and Dataset

Execute the following script to import the dataset and desired libraries:

import pandas as pd  
import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.feature_selection import VarianceThreshold

santandar_data = pd.read_csv(r"E:\Datasets\santandar_data.csv", nrows=40000)  
santandar_data.shape

Splitting Data Into Training and Test Sets

train_features, test_features, train_labels, test_labels = train_test_split(  
    santandar_data.drop(labels=['TARGET'], axis=1),
    santandar_data['TARGET'],
    test_size=0.2,
    random_state=41)

Removing Constant Features using Variance Threshold

Before we can remove quasi-constant features, we should first remove the constant features. Execute the following script to do so:

constant_filter = VarianceThreshold(threshold=0)  
constant_filter.fit(train_features)

len(train_features.columns[constant_filter.get_support()])

constant_columns = [column for column in train_features.columns  
                    if column not in train_features.columns[constant_filter.get_support()]]

train_features.drop(labels=constant_columns, axis=1, inplace=True)  
test_features.drop(labels=constant_columns, axis=1, inplace=True)

Removing Quasi-Constant Features Using Variance Threshold

Let's create our quasi-constant filter. Execute the following script to do so:

qconstant_filter = VarianceThreshold(threshold=0.01)

The rest of the steps are the same. We need to apply the filter to our training set using fit() method as shown below.

qconstant_filter.fit(train_features)

Let's check the number of our non-quasi-constant columns. Execute the following script:

len(train_features.columns[qconstant_filter.get_support()])

In the output, you should see 265 which means that out of 320 columns that we achieved after removing constant features, 55 are quasi-constant.

To verify the number of quasi-constant columns, execute the following script:

qconstant_columns = [column for column in train_features.columns  
                    if column not in train_features.columns[qconstant_filter.get_support()]]

print(len(qconstant_columns))

You should see 55 in the output.

Let's now print the names of all the quasi-constant columns. Execute the following script:

for column in qconstant_columns:  
    print(column)

In the output, you should see the following column names:

ind_var1  
ind_var6_0  
ind_var6  
ind_var13_largo  
ind_var13_medio_0  
ind_var13_medio  
ind_var14  
ind_var17_0  
ind_var17  
ind_var19  
ind_var20_0  
ind_var20  
ind_var29_0  
ind_var29  
ind_var30_0  
ind_var31_0  
ind_var31  
ind_var32_cte  
ind_var32_0  
ind_var32  
ind_var33_0  
ind_var33  
ind_var40  
ind_var39  
ind_var44_0  
ind_var44  
num_var6_0  
num_var6  
num_var13_medio_0  
num_var13_medio  
num_op_var40_hace3  
num_var29_0  
num_var29  
delta_imp_aport_var33_1y3  
delta_num_aport_var33_1y3  
ind_var7_emit_ult1  
ind_var7_recib_ult1  
num_aport_var33_hace3  
num_aport_var33_ult1  
num_var7_emit_ult1  
num_meses_var13_medio_ult3  
num_meses_var17_ult3  
num_meses_var29_ult3  
num_meses_var33_ult3  
num_meses_var44_ult3  
num_reemb_var13_ult1  
num_reemb_var17_ult1  
num_reemb_var33_ult1  
num_trasp_var17_in_hace3  
num_trasp_var17_in_ult1  
num_trasp_var17_out_ult1  
num_trasp_var33_in_hace3  
num_trasp_var33_in_ult1  
num_trasp_var33_out_ult1  
num_venta_var44_hace3

Finally, to see if our training and test sets only containt the non-constant and non-quasi-constant columns, we can use the transform() method of the qconstant_filter. Execute the following script to do so:

train_features = qconstant_filter.transform(train_features)  
test_features = qconstant_filter.transform(test_features)

train_features.shape, test_features.shape

If you execute the above script, you will see that both our training and test sets will now contain 265 columns, since the 50 constant, and 55 quasi-constant columns have been removed from a total of default 370 columns.

Removing Duplicate Features

Duplicate features are the features that have similar values. Duplicate features do not add any value to algorithm training, rather they add overhead and unnecessary delay to the training time. Therefore, it is always recommended to remove the duplicate features from the dataset before training.

Importing Required Libraries and Dataset

Execute the following script to import the dataset and desired libraries:

import pandas as pd  
import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.feature_selection import VarianceThreshold

santandar_data = pd.read_csv(r"E:\Datasets\santandar_data.csv", nrows=20000)  
santandar_data.shape

Removing duplicate columns can be computationally costly since we have to take the transpose of the data matrix before we can remove duplicate features. Therefore, in the above script, we only import the first 20 thousand records from the santandar customer satisfaction data that we have been using in this article.

Splitting Data Into Training and Test Sets

train_features, test_features, train_labels, test_labels = train_test_split(  
    santandar_data.drop(labels=['TARGET'], axis=1),
    santandar_data['TARGET'],
    test_size=0.2,
    random_state=41)

Removing Duplicate Features using Transpose

Unlike constant and quasi-constant features, we have no built-in Python method that can remove duplicate features. However, we have a method that can help us identify duplicate rows in a pandas dataframe. We will use this method to first take a transpose of our dataset as shown below:

train_features_T = train_features.T  
train_features_T.shape

In the script above we take the transpose of our training data and store it in the train_features_T dataframe. Our initial training set contains 16000 rows and 370 columns, if you take a look at the shape of the transposed training set, you will see that it contains 370 rows and 16000 columns.

Luckily, in pandas we have duplicated() method which can help us find duplicate rows from the dataframe. Remember, the rows of the transposed dataframe are actually the columns or the features of the actual dataframe.

Let's find the total number of duplicate features in our dataset using the sum() method, chained with the duplicated() method as shown below.

print(train_features_T.duplicated().sum())

In the output, you should see 94.

Finally, we can drop the duplicate rows using the drop_duplicates() method. If you pass the string value first to the keep parameter of the drop_duplicates() method, all the duplicate rows will be dropped except the first copy. In the next step we will remove all the duplicate rows and will take transpose of the transposed training set to get the original training set that doesnt contain any duplicate column. Execute the following script:

unique_features = train_features_T.drop_duplicates(keep='first').T

Now, let's print the shape of our new training set without duplicate features:

unique_features.shape

In the output, you should see (16000,276), you can see that after removing 94 duplicate columns, the size of our feature set has significantly reduced.

To see the names of the duplicate columns, execute this script:

duplicated_features = [dup_col for dup_col in train_features.columns if dup_col not in unique_features.columns]  
duplicated_features

In the output, you should see the following columns:

['ind_var2',
 'ind_var13_medio',
 'ind_var18_0',
 'ind_var18',
 'ind_var26',
 'ind_var25',
 'ind_var27_0',
 'ind_var28_0',
 'ind_var28',
 'ind_var27',
 'ind_var29_0',
 'ind_var29',
 'ind_var32',
 'ind_var34_0',
 'ind_var34',
 'ind_var37',
 'ind_var40_0',
 'ind_var40',
 'ind_var41',
 'ind_var39',
 'ind_var46_0',
 'ind_var46',
 'num_var13_medio',
 'num_var18_0',
 'num_var18',
 'num_var26',
 'num_var25',
 'num_op_var40_hace3',
 'num_op_var39_hace3',
 'num_var27_0',
 'num_var28_0',
 'num_var28',
 'num_var27',
 'num_var29_0',
 'num_var29',
 'num_var32',
 'num_var34_0',
 'num_var34',
 'num_var37',
 'num_var40_0',
 'num_var40',
 'num_var41',
 'num_var39',
 'num_var46_0',
 'num_var46',
 'saldo_var18',
 'saldo_var28',
 'saldo_var27',
 'saldo_var29',
 'saldo_var34',
 'saldo_var40',
 'saldo_var41',
 'saldo_var46',
 'delta_imp_amort_var18_1y3',
 'delta_imp_amort_var34_1y3',
 'delta_imp_reemb_var33_1y3',
 'delta_imp_trasp_var17_out_1y3',
 'delta_imp_trasp_var33_out_1y3',
 'delta_num_reemb_var13_1y3',
 'delta_num_reemb_var17_1y3',
 'delta_num_reemb_var33_1y3',
 'delta_num_trasp_var17_in_1y3',
 'delta_num_trasp_var17_out_1y3',
 'delta_num_trasp_var33_in_1y3',
 'delta_num_trasp_var33_out_1y3',
 'imp_amort_var18_hace3',
 'imp_amort_var18_ult1',
 'imp_amort_var34_hace3',
 'imp_amort_var34_ult1',
 'imp_var7_emit_ult1',
 'imp_reemb_var13_hace3',
 'imp_reemb_var17_hace3',
 'imp_reemb_var33_hace3',
 'imp_reemb_var33_ult1',
 'imp_trasp_var17_out_hace3',
 'imp_trasp_var17_out_ult1',
 'imp_trasp_var33_in_hace3',
 'imp_trasp_var33_out_hace3',
 'ind_var7_emit_ult1',
 'num_var2_0_ult1',
 'num_var2_ult1',
 'num_var7_emit_ult1',
 'num_reemb_var13_hace3',
 'num_reemb_var17_hace3',
 'num_reemb_var33_hace3',
 'num_reemb_var33_ult1',
 'num_trasp_var17_out_hace3',
 'num_trasp_var17_out_ult1',
 'num_trasp_var33_in_hace3',
 'num_trasp_var33_out_hace3',
 'saldo_var2_ult1',
 'saldo_medio_var13_medio_hace3',
 'saldo_medio_var13_medio_ult1',
 'saldo_medio_var29_hace3']

Removing Correlated Features

In addition to the duplicate features, a dataset can also contain correlated features. Two or more than two features are correlated if they are close to each other in the linear space.

Take the example of the feature set for a fruit basket, the weight of the fruit basket is normally correlated with the price. The more the weight, the higher the price.

Correlation between the output observations and the input features is very important and such features should be retained. However, if two or more than two features are mutually correlated, they convey redundant information to the model and hence only one of the correlated features should be retained to reduce the number of features.

The dataset we are going to be used for this section is the BNP Paribas Cardif Claims Management dataset, that can be downloaded from Kaggle. Follow these steps to find and remove the correlated features from the dataset.

Importing Required Libraries and Dataset

Execute the following script to import the dataset and desired libraries:

import pandas as pd  
import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.feature_selection import VarianceThreshold

paribas_data = pd.read_csv(r"E:\Datasets\paribas_data.csv", nrows=20000)  
paribas_data.shape

In the script above, I imported the dataset along with the required libraries. Next, we printed the shape of our dataframe. In the output, you should see (20000, 133) which means that our dataset contains 20 thousand rows and 133 features.

To find the correlation, we only need the numerical features in our dataset. In order to filter out all the features, except the numeric ones, we need to preprocess our data.

Data Preprocessing

Execute the following script, to remove non-numeric features from the dataset.

num_colums = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']  
numerical_columns = list(paribas_data.select_dtypes(include=num_colums).columns)  
paribas_data = paribas_data[numerical_columns]

In the first line of the script above, we define a list that contains the data types of the columns that we want to retain in our dataset. Next, we call the select_dtypes() method on our dataset and pass it the num_colums list that contains the type of columns that we want to retain. The select_dtypes() method will return the names of the specified numeric columns, which we store in the list numeric_columns. Next, we filter our columns from paribas_data dataframe with the help of the numerical_colums list. Let's print the shape of the paribas_data dataframe to see how many numeric columns do we have, execute the following script:

paribas_data.shape

In the output, you should see (20000, 114) which means that now our dataset contains 20 thousand records and 114 features. Remember, previously we had 133 features.

Splitting Data Into Training and Test Sets

As usual, we need to split our data into training and testing set before removing any correlated features, execute the following script to divide the data into training and test sets:

train_features, test_features, train_labels, test_labels = train_test_split(  
    paribas_data.drop(labels=['target', 'ID'], axis=1),
    paribas_data['target'],
    test_size=0.2,
    random_state=41)

In the above script, we divide our data into 80% training and 20% test set.

Removing Correlated Features using corr() Method

To remove the correlated features, we can make use of the corr() method of the pandas dataframe. The corr() method returns a correlation matrix containing correlation between all the columns of the dataframe. We can then loop through the correlation matrix and see if the correlation between two columns is greater than threshold correlation, add that column to the set of correlated columns. We can remove that set of columns from the actual dataset.

Let's first create correlation matrix for the columns in the dataset and an empty set that will contain all the correlated features. Execute the following script to do so:

correlated_features = set()  
correlation_matrix = paribas_data.corr()

In the script above, we create correlation matrix correlation_matrix for all the columns in our dataset. We also created a set correlated_features which will contain names of all the correlated features.

Next, we will loop through all the columns in the correlation_matrix and will add the columns with a correlation value of 0.8 to the correlated_features set as shown below. You can set any threshold value for the correlation.

for i in range(len(correlation_matrix .columns)):  
    for j in range(i):
        if abs(correlation_matrix.iloc[i, j]) > 0.8:
            colname = correlation_matrix.columns[i]
            correlated_features.add(colname)

Let's see the total number of columns in our dataset, with correlation value of greater than 0.8 with at least 1 other column. Execute the following script:

len(correlated_features)

You should see 55 in the output, which is almost 40% of the original features in the dataset. You can see how much redundant information does our dataset contain. Execute the following script to see the names of these features:

print(correlated_features)

The output looks like this:

{'v55', 'v105', 'v130', 'v12', 'v60', 'v67', 'v63', 'v46', 'v53', 'v43', 'v68', 'v123', 'v95', 'v103', 'v44', 'v108', 'v89', 'v104', 'v109', 'v83', 'v115', 'v21', 'v101', 'v93', 'v40', 'v78', 'v54', 'v118', 'v124', 'v73', 'v96', 'v121', 'v77', 'v114', 'v48', 'v116', 'v87', 'v86', 'v65', 'v122', 'v64', 'v81', 'v128', 'v49', 'v37', 'v84', 'v98', 'v111', 'v41', 'v25', 'v106', 'v32', 'v126', 'v76', 'v100'}

The names of the features have been masked by the bank since they contain sensitive information, however, you can see the code names for the features. These correlated columns convey similar information to the learning algorithm and therefore, should be removed.

The following script removes these columns from the dataset:

train_features.drop(labels=correlated_features, axis=1, inplace=True)  
test_features.drop(labels=correlated_features, axis=1, inplace=True)

Conclusion

Feature selection plays a vital role in the performance and training of any machine learning model. Different types of methods have been proposed for feature selection for machine learning algorithms. In this article, we studied different types of filter methods for feature selection using Python.

We started our discussion by removing constant and quasi-constant features followed by removing duplicate features. Finally, we studied how to remove correlated features from our dataset.

In the next article, we will take a look at some of the other types of feature selection methods. Till then, happy coding!

↧

Test and Code: 51: Feature Testing

October 30, 2018, 8:30 am

≫ Next: Continuum Analytics Blog: Open Source Model Management Roundup: Polyaxon, Argo, and Seldon

≪ Previous: Stack Abuse: Applying Filter Methods in Python for Feature Selection

Andy Knight joins me in discussing the concept of feature testing.

A feature tests is "a test verifying a service or library as the customer would use it, but within a single process." That was a quote from an article that appeared on the Twitter engineering blog. The article describes a shift away from class tests towards feature tests, the benefits of the shift, and some reactions to it.

Feature tests are similar to something I used to call "functional subcutaneous integration test", but it's a way better name, and I plan to use it more often.

The idea fits well with my testing philosophy. Andy Knight is someone still holding onto the testing pyramid. So I thought it would be fun to ask him to discuss feature testing with me. I think it's a balanced discussion. I hope you enjoy it and learn something.

Special Guest: Andy Knight.

Continuum Analytics Blog: Open Source Model Management Roundup: Polyaxon, Argo, and Seldon

October 30, 2018, 12:42 pm

≫ Next: Dusty Phillips: Announcing: Python 3 Object-oriented Programming, 3rd Edition

≪ Previous: Test and Code: 51: Feature Testing

By Daniel Rodriguez One of the most common questions the Anaconda Enterprise team receives is something along the lines of: “But really, how difficult is it to build this using open source tools?” This is certainly a fair question, as open source does provide a lot of functionality while offering a lower entry price than …
Read more →

The post Open Source Model Management Roundup: Polyaxon, Argo, and Seldon appeared first on Anaconda.

↧

Dusty Phillips: Announcing: Python 3 Object-oriented Programming, 3rd Edition

October 30, 2018, 2:46 pm

≫ Next: Made With Mu: Nǐ hǎo Mu! 你好穆!

≪ Previous: Continuum Analytics Blog: Open Source Model Management Roundup: Polyaxon, Argo, and Seldon

Python 3 Object-oriented Programming 3rd Edition My publisher unveiled the third edition of Python 3 Object-oriented Programming today! This has been the culmination of several months of work. Editing and updating the second edition was a pleasure. It was gratifying to discover that the content has aged well. This was not the case with the first edition; I did extensive restructuring and rewriting before I was satisfied with the second.

↧

Made With Mu: Nǐ hǎo Mu! 你好穆!

October 31, 2018, 3:00 am

≫ Next: Tryton News: Security Release for issue7792

≪ Previous: Dusty Phillips: Announcing: Python 3 Object-oriented Programming, 3rd Edition

Thanks to the efforts of volunteer teachers, young Chinese students are using Mu to learn Python.

My Chinese friend explains,

“I am a programmer in Hefei China, my colleagues and I give Python classes to more than 20 children every Tuesday afternoon. (A total of 14 weeks.)
This is a course of interest for the students who is interesting with programming, it’s free and we are volunteers. We have a plan to teach Python, from the base to the high, not that high.”

He goes on to explain how learning to code is making a big impression of some of his students and their families.

“The little boy in the [following] photo inspired me, he is very clever and he can answer almost every problem in class.
This boy father is so seriously to this class, he find a app which can coding in iphone, and he learn it himself to teach his son after school.
My first baby is coming soon, I think may be I should learn from him, you know he has no experience in programming and in his daily job there is no computer in front of him. Father’s love is great and will drive us to do many things. One father is more than a hundred schoolmasters.”

I have to admit that I was deeply moved by this explanation. I was also amazed to see that the teachers and students use a sort of chat app for sharing and marking homework.

It makes me extraordinarily happy to know that our collective efforts in translating Mu for learners who don’t have English as a first language are paying off. I’d like to applaud the efforts of the volunteer teachers in China who are helping to nurture the next generation of Python programmers. Finally, I would like to wish their students (and parents!) the very best of luck when learning to use Python. You are learning a skill that will help you flourish in our modern technological world. Please keep up your efforts and, if you have time, keep sharing what you have been doing. Perhaps, in the future, you will speak at PyCon China.

I’m struck by how similar programming is to another passion of mine: music. Both are cross cultural and international endeavours. By sharing our code, learning from each other and telling stories of programming we bring our world closer together.

My Chinese friend ends by saying,

“Thank you for your work, Mu is so cool, Mu is best for us.”

I’m so pleased you think so! Thank you!

↧

Tryton News: Security Release for issue7792

October 31, 2018, 6:00 am

≫ Next: Mike Driscoll: Python 101: Episode #31 – Parsing XML with the lxml Package

≪ Previous: Made With Mu: Nǐ hǎo Mu! 你好穆!

@ced wrote:

Synopsis
A vulnerability in tryton has been found by Cédric Krier.
With issue7792 the client tries to make the connection to the bus in plain text instead of encrypted. The connection tentative fails, but it contains in the header the current session of the user. This session could then be stolen by a man-in-th-middle.
Impact
CVSS v3.0 Base Score: 4.2
Attack Vector: Network
Attack Complexity: High
Privileges Required: None
User Interaction: UI
Scope: Unchanged
Confidentiality: Low
Integrity: Low
Availability: None
Workaround
There are no known workarounds.
Resolution
All affected users should upgrade tryton to the latest version.
It is recommended that users change their password to clear all existing sessions (the password itself has not been compromised).
Only series 5.0 has the component subject to the issue.
Affected versions per series: =5.0.0
Non affected versions per series: >=5.0.1
Reference
issue7792
Concern?
Any security concerns should be reported on the bug-tracker at
https://bugs.tryton.org/ with the type security.

Posts: 1

Participants: 1

Read full topic

↧

Mike Driscoll: Python 101: Episode #31 – Parsing XML with the lxml Package

October 31, 2018, 6:28 am

≫ Next: Codementor: How to build blockchain for a financial product

≪ Previous: Tryton News: Security Release for issue7792

In this screencast, you will learn the basics of using the popular lxml (https://lxml.de/) package for parsing XML.

You can also read the chapter this video is based on here or get the book on Leanpub

↧

Codementor: How to build blockchain for a financial product

October 31, 2018, 6:28 am

≫ Next: Roberto Alsina: DeVicenzo 2

≪ Previous: Mike Driscoll: Python 101: Episode #31 – Parsing XML with the lxml Package

Alt text of image https://thepracticaldev.s3.amazonaws.com/i/4xjz2yw21k3wgkw6zfml.png Technologies are changing fast; people are not. – Jakob Nielsen Blockchain is a relatively new technology that...

↧

Roberto Alsina: DeVicenzo 2

October 31, 2018, 6:54 am

≫ Next: Real Python: Setting Up Python for Machine Learning on Windows

≪ Previous: Codementor: How to build blockchain for a financial product

A long time ago I "wrote a web browser". Those there are some very heavy quotes. You may imagine me doing air quotes while I write it, maybe?

That's because I didn't really, what I actually did was write UI around Qt's webkit-based widget. It was a fun project, specially because I did it with the absurd constraint of staying below 128 lines of code.

And then I did not touch if for six years. But yesterday I did.

commit0b29b060ab9962a32e671551b0f035764cbeffaaAuthor:RobertoAlsina<ralsina@medallia.com>Date:TueOct3012:32:432018-0300InitialPySide2portcommit831c30d2c7e6b6b2a0a4d5d362ee7bc36493b975Author:roberto.alsina@gmail.com<roberto.alsina@gmail.com@1bbba601-83ea-880f-26a2-52609c2bd284>Date:FriJun115:24:462012+0000nicer,smallermargins

Six years is a long time. So, nowadays:

I prefer my code to be formatted better
Python 3 is the thing
PySide is official, so I would recommend using it instead of PyQt
Qt is now on version 5 instead of 4

So, with those new constraints in mind, I ported DeVicenzo to the latest everything, formatted the code properly using black, and expanded by line limit to a generous 256.

And Here it is ... it's not realy useful but it is an example of how expressive the Python/Qt combination can be, even while being an absurdly bad example nobody should follow (Oh, the lambdas!)

screenshot

↧

Real Python: Setting Up Python for Machine Learning on Windows

October 31, 2018, 7:00 am

≫ Next: Mike Driscoll: More typo-squatting Malware Found on PyPI

≪ Previous: Roberto Alsina: DeVicenzo 2

Python has been largely used for numerical and scientific applications in the last years. However, to perform numerical computations in an efficient manner, Python relies on external libraries, sometimes implemented in other languages, such as the NumPy library, which is partly implemented using the Fortran language.

Due to these dependencies, sometimes it isn’t trivial to set up an environment for numerical computations, linking all the necessary libraries. It’s common for people to struggle to get things working in workshops involving the use of Python for machine learning, especially when they are using an operating system that lacks a package management system, such as Windows.

In this article, you’ll:

Walk through the details for setting up a Python environment for numerical computations on a Windows operating system
Be introduced to Anaconda, a Python distribution proposed to circumvent these setup problems
See how to install the distribution on a Windows machine and use its tools to manage packages and environments
Use the installed Python stack to build a neural network and train it to solve a classic classification problem

Free Bonus:Click here to get access to a Conda cheat sheet with handy usage examples for managing your Python environment and packages.

Introducing Anaconda and Conda

Since 2011, Python has included pip, a package management system used to install and manage software packages written in Python. However, for numerical computations, there are several dependencies that are not written in Python, so the initial releases of pip could not solve the problem by themselves.

To circumvent this problem, Continuum Analytics released Anaconda, a Python distribution focused on scientific applications and Conda, a package and environment management system, which is used by the Anaconda distribution. It’s worth noticing that the more recent versions of pip can handle external dependencies using wheels, but, by using Anaconda, you’ll be able to install critical libraries for data science more smoothly. (You can read more on this discussion here.)

Although Conda is tightly coupled to the Anaconda Python Distribution, the two are distinct projects with different goals:

Anaconda is a full distribution of the software in the PyData ecosystem, including Python itself along with binaries for several third-party open-source projects. Besides Anaconda, there’s also Miniconda, which is a minimal Python distribution including basically Conda and its dependencies so that you can install only the packages you need, from scratch
Conda is a package, dependency, and environment management system that could be installed without the Anaconda or Miniconda distribution. It runs on Windows, macOS, and Linux and was created for Python programs, but it can package and distribute software for any language. The main purpose is to solve external dependencies issues in an easy way, by downloading pre-compiled versions of software.
In this sense, it is more like a cross-platform version of a general purpose package manager such as APT or YUM, which helps to find and install packages in a language-agnostic way. Also, Conda is an environment manager, so if you need a package that requires a different version of Python, by using Conda, it is possible to set up a separate environment with a totally different version of Python, maintaining your usual version of Python on your default environment.

There’s a lot of discussion regarding the creation of another package management system for the Python ecosystem. It’s worth mentioning that Conda’s creators pushed Python standard packaging to the limit and only created a second tool when it was clear that it was the only reasonable way forward.

Curiously, even Guido van Rossum, at his speech at the inaugural PyData meetup in 2012, said that, when it comes to packaging, “it really sounds like your needs are so unusual compared to the larger Python community that you’re just better off building your own.” (You can watch a video of this discussion.) More information about this discussion can be found here and here.

Anaconda and Miniconda have become the most popular Python distributions, widely used for data science and machine learning in various companies and research laboratories. They are free and open source projects and currently include 1400+ packages in the repository. In the following section, we’ll go through the installation of the Miniconda Python distribution on a Windows machine.

Installing the Miniconda Python Distribution

In this section, you’ll see step-by-step how to set up a data science Python environment on Windows. Instead of the full Anaconda distribution, you’ll be using Miniconda to set up a minimal environment containing only Conda and its dependencies, and you’ll use that to install the necessary packages.

The installation processes for Miniconda and Anaconda are very similar. The basic difference is that Anaconda provides an environment with a lot of pre-installed packages, many of which are never used. (You can check the list here.) Miniconda is minimalist and clean, and it allows you to easily install any of Anaconda’s packages.

In this article, the focus will be on using the command line interface (CLI) to set up the packages and environments. However, it’s possible to use Conda to install Anaconda Navigator, a graphical user interface (GUI), if you wish.

Miniconda can be installed using an installer available here. You’ll notice there are installers for Windows, macOS, and Linux, and for 32-bit or 64-bit operating systems. You should consider the appropriate architecture according to your Windows installation and download the Python 3.x version (at the time of writing this article, 3.7).

There’s no reason to use Python 2 on a fresh project anymore, and if you do need Python 2 on some project you’re working on, due to some library that has not been updated, it is possible to set up a Python 2 environment using Conda, even if you installed the Miniconda Python 3.x distribution, as you will see in the next section.

After the download finishes, you just have to run the installer and follow the installation steps:

Click on Next on the welcome screen:

Click on I Agree to agree to the license terms:

Choose the installation type and click Next. Another advantage of using Anaconda or Miniconda is that it is possible to install the distribution using a local account. (It isn’t necessary to have an administrator account.) If this is the case, choose Just Me. Otherwise, if you have an administrator account, you may choose All Users:

Choose the install location and click Next. If you’ve chosen to install just for you, the default location will be the folder Miniconda3 under your user’s personal folder. It’s important not to use spaces in the folder names in the path to Miniconda, since many Python packages have problems when spaces are used in folder names:

In Advanced Installation Options, the suggestion is to use the default choices, which are to not add Anaconda to the PATH environment variable and to register Anaconda as the default Python. Click Install to begin installation:

Wait while the installer copies the files:

When the installation completes, click on Next:

Click on Finish to finish the installation and close the installer:

As Anaconda was not included in the PATH environment variable, its commands won’t work in the Windows default command prompt. To use the distribution, you should start its own command prompt, which can be done by clicking on the Start button and on Anaconda Prompt under Anaconda3 (64 bit):

When the prompt opens, you can check if Conda is available by running conda --version:

(base) C:\Users\IEUser>conda --version
conda 4.5.11

To get more information about the installation, you can run conda info:

(base) C:\Users\IEUser>conda info

     active environment : base    active env location : C:\Users\IEUser\Miniconda3            shell level : 1       user config file : C:\Users\IEUser\.condarc populated config files : C:\Users\IEUser\.condarc          conda version : 4.5.11    conda-build version : not installed         python version : 3.7.0.final.0       base environment : C:\Users\IEUser\Miniconda3  (writable)           channel URLs : https://repo.anaconda.com/pkgs/main/win-64                          https://repo.anaconda.com/pkgs/main/noarch                          https://repo.anaconda.com/pkgs/free/win-64                          https://repo.anaconda.com/pkgs/free/noarch                          https://repo.anaconda.com/pkgs/r/win-64                          https://repo.anaconda.com/pkgs/r/noarch                          https://repo.anaconda.com/pkgs/pro/win-64                          https://repo.anaconda.com/pkgs/pro/noarch                          https://repo.anaconda.com/pkgs/msys2/win-64                          https://repo.anaconda.com/pkgs/msys2/noarch          package cache : C:\Users\IEUser\Miniconda3\pkgs                          C:\Users\IEUser\AppData\Local\conda\conda\pkgs       envs directories : C:\Users\IEUser\Miniconda3\envs                          C:\Users\IEUser\AppData\Local\conda\conda\envs                          C:\Users\IEUser\.conda\envs               platform : win-64             user-agent : conda/4.5.11 requests/2.19.1 CPython/3.7.0 Windows/10 Windows/10.0.17134          administrator : False             netrc file : None           offline mode : False

Now that you have Miniconda installed, let’s see how Conda environments work.

Understanding Conda Environments

When you start developing a project from scratch, it’s recommended that you use the latest versions of the libraries you need. However, when working with someone else’s project, such as when running an example from Kaggle or Github, you may need to install specific versions of packages or even another version of Python due to compatibility issues.

This problem may also occur when you try to run an application you’ve developed long ago, which uses a particular library version that does not work with your application anymore due to updates.

Virtual environments are a solution to this kind of problem. By using them, it is possible to create multiple environments, each one with different versions of packages. A typical Python set up includes Virtualenv, a tool to create isolated Python virtual environments, widely used in the Python community.

Conda includes its own environment manager and presents some advantages over Virtualenv, especially concerning numerical applications, such as the ability to manage non-Python dependencies and the ability to manage different versions of Python, which is not possible with Virtualenv. Besides that, Conda environments are entirely compatible with default Python packages that may be installed using pip.

Miniconda installation provides Conda and a root environment with a version of Python and some basic packages installed. Besides this root environment, it is possible to set up additional environments including different versions of Python and packages.

Using the Anaconda prompt, it is possible to check the available Conda environments by running conda env list:

(base) C:\Users\IEUser>conda env list
# conda environments:#base                  *  C:\Users\IEUser\Miniconda3

This base environment is the root environment, created by the Miniconda installer. It is possible to create another environment, named otherenv, by running conda create --name otherenv:

(base) C:\Users\IEUser>conda create --name otherenv
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\otherenvProceed ([y]/n)? yPreparing transaction: doneVerifying transaction: doneExecuting transaction: done## To activate this environment, use##     $ conda activate otherenv## To deactivate an active environment, use##     $ conda deactivate

As notified after the environment creation process is finished, it is possible to activate the otherenv environment by running conda activate otherenv. You’ll notice the environment has changed by the indication between parentheses in the beginning of the prompt:

(base) C:\Users\IEUser>conda activate otherenv

(otherenv) C:\Users\IEUser>

You can open the Python interpreter within this environment by running python:

(otherenv) C:\Users\IEUser>python
Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32Type "help", "copyright", "credits" or "license" for more information.>>>

The environment includes Python 3.7.0, the same version included in the root base environment. To exit the Python interpreter, just run quit():

>>> quit()(otherenv) C:\Users\IEUser>

To deactivate the otherenv environment and go back to the root base environment, you should run deactivate:

(otherenv) C:\Users\IEUser>deactivate

(base) C:\Users\IEUser>

As mentioned earlier, Conda allows you to easily create environments with different versions of Python, which is not straightforward with Virtualenv. To include a different Python version within an environment, you have to specify it by using python=<version> when running conda create. For example, to create an environment named py2 with Python 2.7, you have to run conda create --name py2 python=2.7:

(base) C:\Users\IEUser>conda create --name py2 python=2.7
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\py2  added / updated specs:    - python=2.7The following NEW packages will be INSTALLED:    certifi:        2018.8.24-py27_1    pip:            10.0.1-py27_0    python:         2.7.15-he216670_0    setuptools:     40.2.0-py27_0    vc:             9-h7299396_1    vs2008_runtime: 9.00.30729.1-hfaea7d5_1    wheel:          0.31.1-py27_0    wincertstore:   0.2-py27hf04cefb_0Proceed ([y]/n)? yPreparing transaction: doneVerifying transaction: doneExecuting transaction: done## To activate this environment, use##     $ conda activate py2## To deactivate an active environment, use##     $ conda deactivate(base) C:\Users\IEUser>

As shown by the output of conda create, this time some new packages were installed, since the new environment uses Python 2. You can check the new environment indeed uses Python 2 by activating it and running the Python interpreter:

(base) C:\Users\IEUser>conda activate py2

(py2) C:\Users\IEUser>python
Python 2.7.15 |Anaconda, Inc.| (default, May  1 2018, 18:37:09) [MSC v.1500 64 bit (AMD64)] on win32Type "help", "copyright", "credits" or "license" for more information.>>>

Now, if you run conda env list, you should see the two environments that were created, besides the root base environment:

(py2) C:\Users\IEUser>conda env list
# conda environments:#base                     C:\Users\IEUser\Miniconda3otherenv                 C:\Users\IEUser\Miniconda3\envs\otherenvpy2               *  C:\Users\IEUser\Miniconda3\envs\py2(py2) C:\Users\IEUser>

In the list, the asterisk indicates the activated environment. It is possible to remove an environment by running conda remove --name <environment name> --all. Since it is not possible to remove an activated environment, you should first deactivate the py2 environment, to remove it:

(py2) C:\Users\IEUser>deactivate

(base) C:\Users\IEUser>conda remove --name py2 --all

Remove all packages in environment C:\Users\IEUser\Miniconda3\envs\py2:## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\py2The following packages will be REMOVED:    certifi:        2018.8.24-py27_1    pip:            10.0.1-py27_0    python:         2.7.15-he216670_0    setuptools:     40.2.0-py27_0    vc:             9-h7299396_1    vs2008_runtime: 9.00.30729.1-hfaea7d5_1    wheel:          0.31.1-py27_0    wincertstore:   0.2-py27hf04cefb_0Proceed ([y]/n)? y(base) C:\Users\IEUser>

Now that you’ve covered the basics of managing environments with Conda, let’s see how to manage packages within the environments.

Understanding Basic Package Management With Conda

Within each environment, packages of software can be installed using the Conda package manager. The root base environment created by the Miniconda installer includes some packages by default that are not part of Python standard library.

The default installation includes the minimum packages necessary to use Conda. To check the list of installed packages in an environment, you just have to make sure it is activated and run conda list. In the root environment, the following packages are installed by default:

(base) C:\Users\IEUser>conda list
# packages in environment at C:\Users\IEUser\Miniconda3:## Name                    Version                   Build  Channelasn1crypto                0.24.0                   py37_0ca-certificates           2018.03.07                    0certifi                   2018.8.24                py37_1cffi                      1.11.5           py37h74b6da3_1chardet                   3.0.4                    py37_1conda                     4.5.11                   py37_0conda-env                 2.6.0                         1console_shortcut          0.1.1                         3cryptography              2.3.1            py37h74b6da3_0idna                      2.7                      py37_0menuinst                  1.4.14           py37hfa6e2cd_0openssl                   1.0.2p               hfa6e2cd_0pip                       10.0.1                   py37_0pycosat                   0.6.3            py37hfa6e2cd_0pycparser                 2.18                     py37_1pyopenssl                 18.0.0                   py37_0pysocks                   1.6.8                    py37_0python                    3.7.0                hea74fb7_0pywin32                   223              py37hfa6e2cd_1requests                  2.19.1                   py37_0ruamel_yaml               0.15.46          py37hfa6e2cd_0setuptools                40.2.0                   py37_0six                       1.11.0                   py37_1urllib3                   1.23                     py37_0vc                        14                   h0510ff6_3vs2015_runtime            14.0.25123                    3wheel                     0.31.1                   py37_0win_inet_pton             1.0.1                    py37_1wincertstore              0.2                      py37_0yaml                      0.1.7                hc54c509_2(base) C:\Users\IEUser>

To manage the packages, you should also use Conda. Next, let’s see how to search, install, update, and remove packages using Conda.

Searching and Installing Packages

Packages are installed from repositories called channels by Conda, and some default channels are configured by the installer. To search for a specific package, you can run conda search <package name>. For example, this is how you search for the keras package (a machine learning library):

(base) C:\Users\IEUser>conda search keras
Loading channels: done# Name                  Version           Build  Channelkeras                     2.0.8  py35h15001cb_0  pkgs/mainkeras                     2.0.8  py36h65e7a35_0  pkgs/mainkeras                     2.1.2          py35_0  pkgs/mainkeras                     2.1.2          py36_0  pkgs/mainkeras                     2.1.3          py35_0  pkgs/mainkeras                     2.1.3          py36_0  pkgs/main... (more)

According to the previous output, there are different versions of the package and different builds for each version, such as for Python 3.5 and 3.6.

The previous search shows only exact matches for packages named keras. To perform a broader search, including all packages containing keras in their names, you should use the wildcard *. For example, when you run conda search *keras*, you get the following:

(base) C:\Users\IEUser>conda search *keras*
Loading channels: done# Name                  Version           Build  Channelkeras                     2.0.8  py35h15001cb_0  pkgs/mainkeras                     2.0.8  py36h65e7a35_0  pkgs/mainkeras                     2.1.2          py35_0  pkgs/mainkeras                     2.1.2          py36_0  pkgs/mainkeras                     2.1.3          py35_0  pkgs/mainkeras                     2.1.3          py36_0  pkgs/main... (more)keras-applications           1.0.2          py35_0  pkgs/mainkeras-applications           1.0.2          py36_0  pkgs/mainkeras-applications           1.0.4          py35_0  pkgs/main... (more)keras-base                2.2.0          py35_0  pkgs/mainkeras-base                2.2.0          py36_0  pkgs/main... (more)

As the previous output shows, there are some other keras related packages in the default channels.

To install a package, you should run conda install <package name>. By default, the newest version of the package will be installed in the active environment. So, let’s install the package keras in the environment otherenv that you’ve already created:

(base) C:\Users\IEUser>conda activate otherenv

(otherenv) C:\Users\IEUser>conda install keras
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\otherenv  added / updated specs:    - kerasThe following NEW packages will be INSTALLED:    _tflow_1100_select:  0.0.3-mkl    absl-py:             0.4.1-py36_0    astor:               0.7.1-py36_0    blas:                1.0-mkl    certifi:             2018.8.24-py36_1    gast:                0.2.0-py36_0    grpcio:              1.12.1-py36h1a1b453_0    h5py:                2.8.0-py36h3bdd7fb_2    hdf5:                1.10.2-hac2f561_1    icc_rt:              2017.0.4-h97af966_0    intel-openmp:        2018.0.3-0    keras:               2.2.2-0    keras-applications:  1.0.4-py36_1    keras-base:          2.2.2-py36_0    keras-preprocessing: 1.0.2-py36_1    libmklml:            2018.0.3-1    libprotobuf:         3.6.0-h1a1b453_0    markdown:            2.6.11-py36_0    mkl:                 2019.0-117    mkl_fft:             1.0.4-py36h1e22a9b_1    mkl_random:          1.0.1-py36h77b88f5_1    numpy:               1.15.1-py36ha559c80_0    numpy-base:          1.15.1-py36h8128ebf_0    pip:                 10.0.1-py36_0    protobuf:            3.6.0-py36he025d50_0    python:              3.6.6-hea74fb7_0    pyyaml:              3.13-py36hfa6e2cd_0    scipy:               1.1.0-py36h4f6bf74_1    setuptools:          40.2.0-py36_0    six:                 1.11.0-py36_1    tensorboard:         1.10.0-py36he025d50_0    tensorflow:          1.10.0-mkl_py36hb361250_0    tensorflow-base:     1.10.0-mkl_py36h81393da_0    termcolor:           1.1.0-py36_1    vc:                  14-h0510ff6_3    vs2013_runtime:      12.0.21005-1    vs2015_runtime:      14.0.25123-3    werkzeug:            0.14.1-py36_0    wheel:               0.31.1-py36_0    wincertstore:        0.2-py36h7fe50ca_0    yaml:                0.1.7-hc54c509_2    zlib:                1.2.11-h8395fce_2Proceed ([y]/n)?

Conda manages the necessary dependencies for a package when it is installed. Since the package keras has a lot of dependencies, when you install it, Conda manages to install this big list of packages.

It’s worth noticing that, since the keras package’s newest build uses Python 3.6 and the otherenv environment was created using Python 3.7, the package python version 3.6.6 was included as a dependency. After confirming the installation, you can check that the Python version for the otherenv environment is downgraded to the 3.6.6 version.

Sometimes, you don’t want packages to be downgraded, and it would be better to just create a new environment with the necessary version of Python. To check the list of new packages, updates, and downgrades necessary for a package without installing it, you should use the parameter --dry-run. For example, to check the packages that will be changed by the installation of the package keras, you should run the following:

(otherenv) C:\Users\IEUser>conda install keras --dry-run

However, if necessary, it is possible to change the default Python of a Conda environment by installing a specific version of the package python. To demonstrate that, let’s create a new environment called envpython:

(otherenv) C:\Users\IEUser>conda create --name envpython
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\envpythonProceed ([y]/n)? yPreparing transaction: doneVerifying transaction: doneExecuting transaction: done## To activate this environment, use##     $ conda activate envpython## To deactivate an active environment, use##     $ conda deactivate

As you saw before, since the root base environment uses Python 3.7, envpython is created including this same version of Python:

(base) C:\Users\IEUser>conda activate envpython

(envpython) C:\Users\IEUser>python
Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32Type "help", "copyright", "credits" or "license" for more information.>>> quit()(envpython) C:\Users\IEUser>

To install a specific version of a package, you can run conda install <package name>=<version>. For example, this is how you install Python 3.6 in the envpython environment:

(envpython) C:\Users\IEUser>conda install python=3.6
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\envpython  added / updated specs:    - python=3.6The following NEW packages will be INSTALLED:    certifi:        2018.8.24-py36_1    pip:            10.0.1-py36_0    python:         3.6.6-hea74fb7_0    setuptools:     40.2.0-py36_0    vc:             14-h0510ff6_3    vs2015_runtime: 14.0.25123-3    wheel:          0.31.1-py36_0    wincertstore:   0.2-py36h7fe50ca_0Proceed ([y]/n)?

In case you need to install more than one package in an environment, it is possible to run conda install only once, passing the names of the packages. To illustrate that, let’s install numpy, scipy, and matplotlib, basic packages for numerical computation in the root base environment:

(envpython) C:\Users\IEUser>deactivate

(base) C:\Users\IEUser>conda install numpy scipy matplotlib
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3  added / updated specs:    - matplotlib    - numpy    - scipyThe following packages will be downloaded:    package                    |            build    ---------------------------|-----------------    libpng-1.6.34              |       h79bbb47_0         1.3 MB    mkl_random-1.0.1           |   py37h77b88f5_1         267 KB    intel-openmp-2019.0        |              117         1.7 MB    qt-5.9.6                   |   vc14h62aca36_0        92.5 MB    matplotlib-2.2.3           |   py37hd159220_0         6.5 MB    tornado-5.1                |   py37hfa6e2cd_0         668 KB    pyqt-5.9.2                 |   py37ha878b3d_0         4.6 MB    pytz-2018.5                |           py37_0         232 KB    scipy-1.1.0                |   py37h4f6bf74_1        13.5 MB    jpeg-9b                    |       hb83a4c4_2         313 KB    python-dateutil-2.7.3      |           py37_0         260 KB    numpy-base-1.15.1          |   py37h8128ebf_0         3.9 MB    numpy-1.15.1               |   py37ha559c80_0          37 KB    mkl_fft-1.0.4              |   py37h1e22a9b_1         120 KB    kiwisolver-1.0.1           |   py37h6538335_0          61 KB    pyparsing-2.2.0            |           py37_1          96 KB    cycler-0.10.0              |           py37_0          13 KB    freetype-2.9.1             |       ha9979f8_1         470 KB    icu-58.2                   |       ha66f8fd_1        21.9 MB    sqlite-3.24.0              |       h7602738_0         899 KB    sip-4.19.12                |   py37h6538335_0         283 KB    ------------------------------------------------------------                                           Total:       149.5 MBThe following NEW packages will be INSTALLED:    blas:            1.0-mkl    cycler:          0.10.0-py37_0    freetype:        2.9.1-ha9979f8_1    icc_rt:          2017.0.4-h97af966_0    icu:             58.2-ha66f8fd_1    intel-openmp:    2019.0-117    jpeg:            9b-hb83a4c4_2    kiwisolver:      1.0.1-py37h6538335_0    libpng:          1.6.34-h79bbb47_0    matplotlib:      2.2.3-py37hd159220_0    mkl:             2019.0-117    mkl_fft:         1.0.4-py37h1e22a9b_1    mkl_random:      1.0.1-py37h77b88f5_1    numpy:           1.15.1-py37ha559c80_0    numpy-base:      1.15.1-py37h8128ebf_0    pyparsing:       2.2.0-py37_1    pyqt:            5.9.2-py37ha878b3d_0    python-dateutil: 2.7.3-py37_0    pytz:            2018.5-py37_0    qt:              5.9.6-vc14h62aca36_0    scipy:           1.1.0-py37h4f6bf74_1    sip:             4.19.12-py37h6538335_0    sqlite:          3.24.0-h7602738_0    tornado:         5.1-py37hfa6e2cd_0    zlib:            1.2.11-h8395fce_2Proceed ([y]/n)?

Now that you’ve covered how to search and install packages, let’s see how to update and remove them using Conda.

Updating and Removing Packages

Sometimes, when new packages are released, you need to update them. To do so, you may run conda update <package name>. In case you wish to update all the packages within one environment, you should activate the environment and run conda update --all.

To remove a package, you can run conda remove <package name>. For example, this is how you remove numpy from the root base environment:

(base) C:\Users\IEUser>conda remove numpy
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3  removed specs:    - numpyThe following packages will be REMOVED:    matplotlib: 2.2.3-py37hd159220_0    mkl_fft:    1.0.4-py37h1e22a9b_1    mkl_random: 1.0.1-py37h77b88f5_1    numpy:      1.15.1-py37ha559c80_0    scipy:      1.1.0-py37h4f6bf74_1Proceed ([y]/n)?

It’s worth noting that when you remove a package, all packages that depend on it are also removed.

Using Channels

Sometimes, you won’t find the packages you want to install on the default channels configured by the installer. For example, this is how you install pytorch, another machine learning package:

(base) C:\Users\IEUser>conda search pytorch
Loading channels: donePackagesNotFoundError: The following packages are not available from current channels:  - pytorchCurrent channels:  - https://repo.anaconda.com/pkgs/main/win-64  - https://repo.anaconda.com/pkgs/main/noarch  - https://repo.anaconda.com/pkgs/free/win-64  - https://repo.anaconda.com/pkgs/free/noarch  - https://repo.anaconda.com/pkgs/r/win-64  - https://repo.anaconda.com/pkgs/r/noarch  - https://repo.anaconda.com/pkgs/pro/win-64  - https://repo.anaconda.com/pkgs/pro/noarch  - https://repo.anaconda.com/pkgs/msys2/win-64  - https://repo.anaconda.com/pkgs/msys2/noarchTo search for alternate channels that may provide the conda package you'relooking for, navigate to    https://anaconda.organd use the search bar at the top of the page.

In this case, you may search for the package here. If you search for pytorch, you’ll get the following results:

The channel pytorch has a package named pytorch with version 0.4.1. To install a package from a specific channel you can use the -c <channel> parameter with conda install:

(base) C:\Users\IEUser>conda install -c pytorch pytorch
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3  added / updated specs:    - pytorchThe following packages will be downloaded:    package                    |            build    ---------------------------|-----------------    pytorch-0.4.1              |py37_cuda90_cudnn7he774522_1       590.4 MB  pytorchThe following NEW packages will be INSTALLED:    pytorch: 0.4.1-py37_cuda90_cudnn7he774522_1 pytorchProceed ([y]/n)?

Alternatively, you can add the channel, so that Conda uses it to search for packages to install. To list the current channels used, you can run conda config --get channels:

(base) C:\Users\IEUser>conda config --get channels
--add channels 'defaults'   # lowest priority(base) C:\Users\IEUser>

The Miniconda installer includes only the defaults channels. When more channels are included, it is necessary to set the priority of them to determine from which channel a package will be installed in case it is available from more than one channel.

To add a channel with the lowest priority to the list, you should run conda config --append channels <channel name>. To add a channel with the highest priority to the list, you should run conda config --prepend channels <channel name>. It is recommended to add new channels with low priority, to keep using the default channels prior to the others. So, alternatively, you can install pytorch, adding the pytorch channel and running conda install pytorch:

(base) C:\Users\IEUser>conda config --append channels pytorch

(base) C:\Users\IEUser>conda config --get channels
--add channels 'pytorch'   # lowest priority--add channels 'defaults'   # highest priority(base) C:\Users\IEUser>conda install pytorch
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3  added / updated specs:    - pytorchThe following packages will be downloaded:    package                    |            build    ---------------------------|-----------------    pytorch-0.4.1              |py37_cuda90_cudnn7he774522_1       590.4 MB  pytorchThe following NEW packages will be INSTALLED:    pytorch: 0.4.1-py37_cuda90_cudnn7he774522_1 pytorchProceed ([y]/n)?

Not all packages are available on Conda channels. However, this is not a problem, since you also can use pip to install packages inside Conda environments. Let’s see how to do this.

Using `pip` Inside Conda Environments

Sometimes, you may need pure Python packages and, generally, these packages are not available on Conda’s channels. For example, if you search for unipath, a package to deal with file paths in Python, Conda won’t be able to find it.

You could search for the package here and use another channel to install it. However, since unipath is a pure Python package, you could use pip to install it, as you would do on a regular Python setup. The only difference is that you should use pip installed by the Conda package pip. To illustrate that, let’s create a new environment called newproject. As mentioned before, you can do this running conda create:

conda create --name newproject

Next, to have pip installed, you should activate the environment and install the Conda package pip:

(base) C:\Users\IEUser>conda activate newproject

(newproject) C:\Users\IEUser>conda install pip
Solving environment: done## Package Plan ##  environment location: C:\Users\IEUser\Miniconda3\envs\newproject  added / updated specs:    - pipThe following NEW packages will be INSTALLED:    certifi:        2018.8.24-py37_1    pip:            10.0.1-py37_0    python:         3.7.0-hea74fb7_0    setuptools:     40.2.0-py37_0    vc:             14-h0510ff6_3    vs2015_runtime: 14.0.25123-3    wheel:          0.31.1-py37_0    wincertstore:   0.2-py37_0Proceed ([y]/n)?

Finally, use pip to install the package unipath:

(newproject) C:\Users\IEUser>pip install unipath
Collecting unipathInstalling collected packages: unipathSuccessfully installed unipath-1.1You are using pip version 10.0.1, however version 18.0 is available.You should consider upgrading via the 'python -m pip install --upgrade pip' command.(newproject) C:\Users\IEUser>

After installation, you can list the installed packages with conda list and check that Unipath was installed using pip:

(newproject) C:\Users\IEUser>conda list
# packages in environment at C:\Users\IEUser\Miniconda3\envs\newproject:## Name                    Version                   Build  Channelcertifi                   2018.8.24                py37_1pip                       10.0.1                   py37_0python                    3.7.0                hea74fb7_0setuptools                40.2.0                   py37_0Unipath                   1.1                       <pip>vc                        14                   h0510ff6_3vs2015_runtime            14.0.25123                    3wheel                     0.31.1                   py37_0wincertstore              0.2                      py37_0(newproject) C:\Users\IEUser>

It’s also possible to install packages from a version control system (VCS) using pip. For example, let’s install supervisor, version 4.0.0dev0, available in a Git repository. As Git is not installed in the newproject environment, you should install it first:

(newproject) C:\Users\IEUser> conda install git

Then, install supervisor, using pip to install it from the Git repository:

(newproject) pip install -e git://github.com/Supervisor/supervisor@abef0a2be35f4aae4a4edeceadb7a213b729ef8d#egg=supervisor

After the installation finishes, you can see that supervisor is listed in the installed packages list:

(newproject) C:\Users\IEUser>conda list
## Name                    Version                   Build  Channelcertifi                   2018.8.24                py37_1git                       2.18.0               h6bb4b03_0meld3                     1.0.2                     <pip>pip                       10.0.1                   py37_0python                    3.7.0                hea74fb7_0setuptools                40.2.0                   py37_0supervisor                4.0.0.dev0                <pip>... (more)

Now that you know the basics of using environments and managing packages with Conda, let’s create a simple machine learning example to solve a classic problem using a neural network.

A Simple Machine Learning Example

In this section, you’ll set up the environment using Conda and train a neural network to function like an XOR gate.

An XOR gate implements the digital logic exclusive OR operation, which is widely used in digital systems. It takes two digital inputs, that can be equal to 0, representing a digital false value or 1, representing a digital true value and outputs 1 (true) if the inputs are different or 0 (false), if the inputs are equal. The following table (referred as a truth table in the digital systems terminology) summarizes the XOR gate operation:

Input A	Input B	Output: A XOR B
0	0	0
0	1	1
1	0	1
1	1	0

The XOR operation can be interpreted as a classification problem, given that it takes two inputs and should classify them in one of two classes represented by 0 or 1, depending on whether the inputs are equal to each other or different from one another.

It is commonly used as a first example to train a neural network because it is simple and, at the same time, demands a nonlinear classifier, such as a neural network. The neural network will use only the data from the truth table, without knowledge about where it came from, to “learn” the operation performed by the XOR gate.

To implement the neural network, let’s create a new Conda environment, named nnxor:

(base) C:\Users\IEUser>conda create nnxor

Then, let’s activate it and install the package keras:

(base) C:\Users\IEUser>conda activate nnxor

(nnxor) C:\Users\IEUser>conda install keras

keras is a high-level API that makes easy-to-implement neural networks on top of well-known machine learning libraries, such as TensorFlow.

You’ll train the following neural network to act as an XOR gate:

The network takes two inputs, A and B, and feeds them to two neurons, represented by the big circles. Then, it takes the outputs of these two neurons and feeds them to an output neuron, which should provide the classification according to the XOR truth table.

In brief, the training process consists of adjusting the values of the weights w_1 until w_6, so that the output is consistent with the XOR truth table. To do so, input examples will be fed, one at a time, the output will be calculated according to current values of the weights and, by comparing the output with the desired output, given by the truth table, the values of the weights will be adjusted in a step-by-step process.

To organize the project, you’ll create a folder named nnxor within Windows user’s folder (C:\Users\IEUser) with a file named nnxor.py to store the Python program to implement the neural network:

In the nnxor.py file, you’ll define the network, perform the training, and test it:

importnumpyasnpnp.random.seed(444)fromkeras.modelsimportSequentialfromkeras.layers.coreimportDense,Activationfromkeras.optimizersimportSGDX=np.array([[0,0],[0,1],[1,0],[1,1]])y=np.array([[0],[1],[1],[0]])model=Sequential()model.add(Dense(2,input_dim=2))model.add(Activation('sigmoid'))model.add(Dense(1))model.add(Activation('sigmoid'))sgd=SGD(lr=0.1)model.compile(loss='mean_squared_error',optimizer=sgd)model.fit(X,y,batch_size=1,epochs=5000)if__name__=='__main__':print(model.predict(X))

First, you import numpy, initialize a random seed, so that you can reproduce the same results when running the program again, and import the keras objects you’ll use to build the neural network.

Then, you define an X array, containing the 4 possible A-B sets of inputs for the XOR operation and a y array, containing the outputs for each of the sets of inputs defined in X.

The next five lines define the neural network. The Sequential() model is one of the models provided by keras to define a neural network, in which the layers of the network are defined in a sequential way. Then you define the first layer of neurons, composed of two neurons, fed by two inputs, defining their activation function as a sigmoid function in the sequence. Finally, you define the output layer composed of one neuron with the same activation function.

The following two lines define the details about the training of the network. To adjust the weights of the network, you’ll use the Stochastic Gradient Descent (SGD) with the learning rate equal to 0.1, and you’ll use the mean squared error as a loss function to be minimized.

Finally, you perform the training by running the fit() method, using X and y as training examples and updating the weights after every training example is fed into the network (batch_size=1). The number of epochs represents the number of times the whole training set will be used to train the neural network.

In this case, you’re repeating the training 5000 times using a training set containing 4 input-output examples. By default, each time the training set is used, the training examples are shuffled.

On the last line, after the training process has finished, you print the predicted values for the 4 possible input examples.

By running this script, you’ll see the evolution of the training process and the performance improvement as new training examples are fed into the network:

(nnxor) C:\Users\IEUser>cd nnxor

(nnxor) C:\Users\IEUser\nnxor>python nnxor.py
Using TensorFlow backend.Epoch 1/50002018-09-16 09:49:05.987096: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX22018-09-16 09:49:05.993128: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.4/4 [==============================] - 0s 39ms/step - loss: 0.2565Epoch 2/50004/4 [==============================] - 0s 0us/step - loss: 0.2566Epoch 3/50004/4 [==============================] - 0s 0us/step - loss: 0.2566Epoch 4/50004/4 [==============================] - 0s 0us/step - loss: 0.2566Epoch 5/50004/4 [==============================] - 0s 0us/step - loss: 0.2566Epoch 6/50004/4 [==============================] - 0s 0us/step - loss: 0.2566

After the training finishes, you can check the predictions the network gives for the possible input values:

Epoch 4997/50004/4 [==============================] - 0s 0us/step - loss: 0.0034Epoch 4998/50004/4 [==============================] - 0s 0us/step - loss: 0.0034Epoch 4999/50004/4 [==============================] - 0s 0us/step - loss: 0.0034Epoch 5000/50004/4 [==============================] - 0s 0us/step - loss: 0.0034[[0.0587215 ] [0.9468337 ] [0.9323144 ] [0.05158457]]

As you defined X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]), the expected output values are 0, 1, 1, and 0, which is consistent with the predicted outputs of the network, given you should round them to obtain binary values.

Where To Go From Here

Data science and machine learning applications are emerging in the most diverse areas, attracting more people. However, setting up an environment for numerical computation can be a complicated task, and it’s common to find users having trouble in data science workshops, especially when using Windows.

In this article, you’ve covered the basics of setting up a Python numerical computation environment on a Windows machine using the Anaconda Python distribution.

Free Bonus:Click here to get access to a Conda cheat sheet with handy usage examples for managing your Python environment and packages.

Now that you have a working environment, it’s time to start working with some applications. Python is one of the most used languages for data science and machine learning, and Anaconda is one of the most popular distributions, used in various companies and research laboratories. It provides several packages to install libraries that Python relies on for data acquisition, wrangling, processing, and visualization.

Fortunately there are a lot of tutorials about these libraries available at Real Python, including the following:

Also, if you’d like a deeper understanding of Anaconda and Conda, check out the following links:

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Mike Driscoll: More typo-squatting Malware Found on PyPI

October 31, 2018, 7:49 am

≫ Next: Continuum Analytics Blog: Who You Gonna Call? Halloween Tips & Treats to Protect You from Ghosts, Gremlins…and Software Vulnerabilities

≪ Previous: Real Python: Setting Up Python for Machine Learning on Windows

Malware was recently discovered on the Python Packaging Index that targets Windows users. The package was called colourama and if it had been installed, would end up installing malware on your PC. It is basically hoping that you will misspell the popular colorama pacakge.

You can read more about the malware on Medium where it describes the malware as being a “Cryptocurrency Clipboard Hijacker”.

I actually wrote about this issue last year too when the Slovak National Security Office identified several malicious libraries on the Python Packaging Index.

I noticed this week that the Python Software Foundation is looking at adding security to PyPI in 2019 which they announced on their blog, although right now it does not appear to say what kind of security will be added.

↧

“ALLEZ MU !”

Hacktoberfest almost over

Community Pull Requests

PCC56 Lessons

Read Code for Fun and Profit

Become a Python Ninja

The Challenge

Basic/ required

Don't be shy

Bonus

Ideas and feedback

Become a Python Ninja

Presentations

Andrew Francis

Python packaging for everyone - Eric Araujo

Numpy to PyTorch - Shagun Sodhani

Why are robots becoming Pythonistas? - Maxime St-Pierre

Keep It Simply Annotated, Stupid - Sébastien Portebois

When

Where

Schedule

What is the Request for Information period?

More Information

Participate!

Review will take longer

Updated timeline for the RFP

Introduction

Filter Methods for Feature Selection

Removing Constant features

Importing Required Libraries and Dataset

Splitting Data Into Training and Test Sets

Removing Constant Features using Variance Threshold

Removing Quasi-Constant features

Importing Required Libraries and Dataset

Splitting Data Into Training and Test Sets

Removing Constant Features using Variance Threshold

Removing Quasi-Constant Features Using Variance Threshold

Removing Duplicate Features

Importing Required Libraries and Dataset

Splitting Data Into Training and Test Sets

Removing Duplicate Features using Transpose

Removing Correlated Features

Importing Required Libraries and Dataset

Data Preprocessing

Splitting Data Into Training and Test Sets

Removing Correlated Features using corr() Method

Conclusion

Synopsis

Impact

Workaround

Resolution

Reference

Concern?

Introducing Anaconda and Conda

Installing the Miniconda Python Distribution

Understanding Conda Environments

Understanding Basic Package Management With Conda

Searching and Installing Packages

Updating and Removing Packages

Using Channels

Using pip Inside Conda Environments

A Simple Machine Learning Example

Where To Go From Here

Using `pip` Inside Conda Environments