Kushal Das: Permanent Record: the life of Edward Snowden

September 16, 2019, 9:44 pm

≫ Next: Chris Moffitt: Happy Birthday Practical Business Python!

≪ Previous: Codementor: Top programming languages of 2019

book cover

The personal life and thinking of the ordinary person who did an extraordinary thing.

A fantastic personal narrative of his life and thinking process. The book does not get into technical details, but, it will make sure that people relate to the different events mentioned in the book. It tells the story of a person who is born into the system and grew up to become part of the system, and then learns to question the same system.

I bought the book at midnight on Kindle (I also ordered the physical copies), slept for 3 hours in between and finished it off in the morning. Anyone born in 80s will find so many similarities as an 80s kid. Let it be the Commodore 64 as the first computer we saw or basic as the first-ever programming language to try. The lucky ones also got Internet access and learned to roam around of their own and build their adventure along with the busy telephone lines (which many times made the family members unhappy).

If you are someone from the technology community, I don't think you will find Ed's life was not as much different than yours. It has a different scenario and different key players, but, you will be able to match the progress in life like many other tech workers like ourselves.

Maybe you are reading the book just to learn what happened, or maybe you want to know why. But, I hope this book will help to think about the decisions you make in your life and how that affects the rest of the world. Let it be a group picture posted on Facebook or writing the next new tool for the intelligence community.

Go ahead and read the book, and when you are finished, make sure you pass it across to your friend, or buy them new copies. If you have some free time, you may consider to run a Tor relay or a bridge, a simple step will help many around the world.

On a side note, the book mentions SecureDrop project at the very end, and today is also the release of SecureDrop 1.0.0 (the same day of the book release).

↧

Chris Moffitt: Happy Birthday Practical Business Python!

September 17, 2019, 5:20 am

≫ Next: Real Python: Python Debugging With pdb

≪ Previous: Kushal Das: Permanent Record: the life of Edward Snowden

Introduction

On September 17th, 2014, I published my first article which means that today is the 5th birthday of Practical Business Python. Thank you to all my readers and all those that have supported me through this process! It has been a great journey and I look forward to seeing what the future holds.

This 5 year anniversary gives me the opportunity to reflect on the blog and what will be coming next. I figured I would use this milestone to walk through a few of the stats and costs associated with running this blog for the past 5 years. This post will not be technical but I am hopeful that my readers as well as current and aspiring bloggers going down this path will find it helpful. Finally, please use the comments to let me know what content you would like to see in the future.

Traffic

I’m always curious about other people’s site traffic, so here’s a view of my traffic over time. I’m now averaging around 90K monthly visitors:

I remember watching the views when I first started and never expected to see it grow as much as it has. In other ways, it has definitely been a long process to get here.

I also find it interesting to see which articles are driving my traffic. This post is my 70th article and here are the top 5 articles over the lifetime of this blog:

From a personal perspective, one of the articles I refer to the most in my own usage (and am personally proud of) is this one:

Effectively Using Matplotlib

Combined, these 5 articles drive 35% of the traffic to the site over this time frame. Some of the articles have been around a lot longer so at some point in the future I might try to adjust these numbers based on the length of time they have been published.

As far as where the traffic comes from, about 85% of the daily traffic is driven by organic search. I would try to given you more details but after converting the site to serve over ssl, the search integration with google broke and for the life of me, I can not figure out how to get the search console to link back to google analytics.

Site Costs and Revenue

There are many options for hosting a blog. Overall, I have been very happy with the static blog hosting using pelican. When the blog started, the AWS costs were pretty minimal. As the traffic has grown, the costs have started to add up. In order to give you a sense for how much it costs to run the blog, here are the year to date costs for AWS:

The costs started to rise in May and that’s when I realized that my RSS feed was getting really big and was consuming a lot of my bandwidth. After making the simple change described in the tweet below, costs went down considerably.

Pro tip - if you are using pelican for your blog, set a value for FEED_MAX_ITEMS
My all.atom.xml file was generating over 48GB worth of downloads over the past 2 weeks because the default pelican config included all posts from the past 4.5 years. oops pic.twitter.com/Kv2LNg82Kp
— Chris Moffitt (@chris1610) July 9, 2019

One of my other big costs is disqus. I think comments are important but I really dislike the distracting ads that could be shown on the site. I decided to pay $108/year in order to remove the disqus ads. I think it’s a good investment.

In July 2018, I started my mailing list and it has grown to over 2200 subscribers in that time. The one area I am not happy with is the cost of Mailchimp. It now costs $34.99/month for my list which is a lot considering the low volume of email I send. I will likely be looking for another solution in the upcoming months.

The only direct source of revenue I get from the blog is when someone purchases something from my affiliate links. To be honest, most months I generate about enough to almost pay for my AWS costs. Jeff Bezos giveth and Jeff Bezos taketh away!

Closing Thoughts

Clearly I’m not making enough to retire early. So, why am I doing it? I have two main motivations.

First, I want to continue to learn about python. When I started the blog, I knew python but very little about pandas, scikit-learn and python data visualization. Over the past 5 years, I have learned a lot. Learning about concepts and writing them on this blog has been really helpful in expanding my python and data science knowledge.

The second reason is that I want to give back to the python community. Python has been a very useful tool for me and I think it can help a lot of other people. I hope that in some small way this blog has helped others. The other community benefit is that the blog gives me a reason (or excuse) to participate more consistently in the python community. Without the blog, I would have a lot less reason to actively participate in this wonderful community.

I also have a more selfish motivation. At some point in the future, I would like make a move where I am able to spend more time focusing on python. I do not know exactly what that will look like but I suspect this blog will play a key role in that future state.

As far as changes go, I would like to update the site’s style so it looks more modern and less like the default template. I also want to figure out a better cadence for sending content to my email list. There are several articles that I need to update to reflect the most recent changes in python.

Going forward, I will likely continue creating the same type of content. I am always interested in learning about the types of articles you would like to see so please comment below if you have any ideas. I can not guarantee I will write about it but I will do some research and put it on my list for potential future topics.

Thanks again for all your support over the past 5 years and I look forward to seeing what the next 5 will bring!

Credits

Photo by Elisha Terada on Unsplash

↧

Real Python: Python Debugging With pdb

September 17, 2019, 7:00 am

≫ Next: Anarcat: FSF resignations

≪ Previous: Chris Moffitt: Happy Birthday Practical Business Python!

Nowadays, we often take for granted the excellent debuggers built into our favorite IDEs. But how do you debug your Python code when you don’t have the luxury of using an IDE?

pdb, short for Python DeBugger, is a module for interactive source code debugging. It’s built into the Python Standard Library, so it’s always accessible to you. Because it runs in the command line, it’s especially helpful when you’re developing on remote systems.

In this course, you’ll learn how to perform the most common debugging tasks using pdb, including setting breakpoints, stepping through code, viewing stack traces, creating watch lists, and more.

Free Bonus:Click here to get a printable "pdb Command Reference" (PDF) that you can keep on your desk and refer to while debugging.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Anarcat: FSF resignations

September 17, 2019, 7:58 am

≫ Next: PyCoder’s Weekly: Issue #386 (Sept. 17, 2019)

≪ Previous: Real Python: Python Debugging With pdb

I have been hesitant in renewing my membership to the Free Software Foundation for a while, but now I never want to deal with the FSF until Richard Stallman, president and founder of the free software movement, resigns. So, like many people and organizations, I have written this letter to cancel my membership. (Update: RMS resigned before I even had time to send this letter, but I publish here to share my part of this story.)

My encounters with a former hero

I had the (mis)fortune of meeting rms in person a few times in my life. The first time was at an event we organized for his divine visit to Montreal in 2005. I couldn't attend the event myself, but I had the "privilege" of having dinner with rms later during the week. Richard completely shattered any illusion I had about him as a person. He was arrogant, full of himself, and totally uninterested in the multitude of young hackers he was meeting in his many travels, apart from, of course, arguing with them about proper wording and technicalities. Even though we brought him to the fanciest vegetarian restaurant in town, he got upset because the restaurant was trying to make "fake meat" meals. Somehow my hero, who wrote the GNU manifesto that inspired me to make free software a life goal, has spoiled a delicious meal by being such an ungrateful guest. I would learn later that Stallman has rock star level requirements, with "vegetarian meals served just so" being only one exception out of many. (I don't mind vegetarians of course: I've been a vegetarian for more than 20 years now, but I will never refuse vegetarian food given to me.)

The second time was less frustrating: it was in 2006 during the launch of the GPLv3 discussion draft, an ambitious project to include the community in the rewrite of the GPLv2. Even though I was deeply interested in the legal implications of the changes, everything went a bit over my head and I felt left out of a process that was supposedly designed to include legal geeks like me. At best, I was able to assist Stallman's assistant as she skidded over icy Boston sidewalks with a stereotypical (and maybe a little machismo, I must admit) Canadian winter assurance. At worst, I burned liters of fuel to drive me and some colleagues over the border to see idols speak on a stage.

Finally, I somehow got tangled up with rms in a hallway conversation about open hardware and wireless chipsets at LibrePlanet 2017, the FSF's yearly conference. I forgot the exact details, but we were debating whether or not legislation that forbids certain wireless chipsets to be open was legitimate or not.

(For some reason, rms has ambiguous opinions about "hardware freedom" and sees a distinction between software that runs on a computer (as "in the CPU") and software that is embedded in the hardware, etched into electronic circuits. The fact that this is a continuum that has various in-between incarnations ("firmware", ASIC, FPGA) seems to escape his analysis. But that is besides the point here.)

We "debated" this for a while, but for people who don't know, debating with rms is a little bit like talking with a three year old: they have their deeply rooted opinion, they might recognize you have one as well (if your lucky), but they will generally ignore whatever it is you non-sensical adult are saying because it's incomprehensible anyways. With a three year old, it's kind of hilarious (until they spill an bottle full of vanilla on the floor), but with an adult, it's kind of aggravating and makes you feel like an idiot for even trying.

I mention this anecdote because it's a good example of how Stallman doesn't think rules apply to him. Simple, informal rules like listening to people you're talking to seem like basic courtesy, but rms is above such mundane things. If this was just a hallway conversation, I wouldn't mind that much: after all, I don't need to talk to Richard Stallman. But at LibrePlanet (and in fact anywhere), he believes it is within his prerogative to interrupt any discussion or talk around him . I was troubled by the FSF's silence on Eric Schultz's request for safety at Libre Planet: while I heard the FSF privately reached out to Eric, nothing seemed to have been done to curb Stallman's attitude in public. This is the reason why I haven't returned to Boston for LibrePlanet since then, even though I have dear friends that live there and were deeply involved in the organization.

The final straw before this week's disclosurse was an event in Quebec city where Stallman was speaking at a conference. A friend of mine asked a question involving his daughter as an example user. Stallman responded to the question by asking my friend if he could meet his (underage) daughter, with obvious obscene undertones. Everyone took this as a joke, but, in retrospect, it was just horrible and I had come to conclude that Stallman was now a liability to the free software movement. I just didn't know what to do back then. I wish I had done something.

Why I am resigning from the FSF

Those events at LibrePlanet were the first reason why I haven't renewed my membership yet. But now I want to formally cancel my membership with the FSF because its president went over his usual sexism and weird pedophilia justification from the past. I first treated those as an abhorrent eccentricity or at best an unfortunate intellectual posture, but rms has gone way beyond this position now. Now rms has joined the rank of rape apologists in the Linux kernel development community, an inexcusable position in our community that already struggles too much with issues of inclusion, respect, and just being nice with each other. I am not going to go into details that are better described by this courageous person, but needless to say that this kind of behavior is inexcusable from anyone, and particularly from an historical leader. Stallman did respond to the accusations, but far from issuing an apology, he said his statements were "mischaracterised"; something that looks to me like a sad caricature.

I do not want to have anything to do with the FSF anymore. I don't know if they would be able to function without Stallman, and frankly at this point, I don't care: they have let this gone on for too long. I know how much rms contributed to the free software movement: he wrote most of Emacs, GCC and large parts of the GNU system so many people use on their desktops. I am grateful for that work, but that was a long time ago and this is now. As others have said, we don't need to replace rms. We need a world where such leaders are not necessary, because rock stars too easily become abusers.

Stallman is just the latest: our community is filled with obnoxious leaders like this. It seems our community leaders are (among other things) either assholes, libertarian gun freaks, or pedophilia apologists and sexists. We tolerate their abuse because we somehow believe they are technically exceptional. They aren't: they're just hard-working and privileged. But even if they would be geniuses, but as selamie says:

For a moment, let’s assume that someone like Stallman is truly a genius. Truly, uniquely brilliant. If that type of person keeps tens or even hundreds of highly intelligent but not ‘genius’ people out of science and technology, then they are hindering our progress despite the brilliance.

Or, as Banksy says:

We don't need any more heroes.
We just need someone to take out recycling.

I wish Stallman would just retire already. He's done enough good work for a lifetime, now he's bound to just do more damage.

Update: Richard Stallman resigned from the FSF and from MIT ("due to pressure on MIT and me"), still dodging responsability and characterizing the problem as "a series of misunderstandings and mischaracterizations". Obviously, this man cannot be reformed and we need to move on. Those events happened before I even had time to actually send this letter to the FSF, so I guess I might renew my membership after all. I'll hold off until LibrePlanet, however, we'll see what happens there... In the meantime, I'll see how I can help my friends left the FSF because they must be living through hell now.

↧

PyCoder’s Weekly: Issue #386 (Sept. 17, 2019)

September 17, 2019, 12:30 pm

≫ Next: Yasoob Khalid: Looking for an internship for Summer 2020

≪ Previous: Anarcat: FSF resignations

#386 – SEPTEMBER 17, 2019
View in Browser »

Call for Proposals for PyCon 2020 Is Open

The submission deadlines are: Tutorial proposals are due November 22, 2019. Talk, Charlas, Poster, and Education Summit proposals are due December 20, 2019.
PYCON.BLOGSPOT.COM

Python vs C++: Selecting the Right Tool for the Job

Explore the similarities and differences you’ll find when comparing Python vs C++. You’ll learn about memory management, virtual machines, object-oriented programming differences, and much more.
REAL PYTHON

Find a Python Job Through Vettery

Vettery specializes in developer roles and is completely free for job seekers. Interested? Submit your profile, and if accepted, you can receive interview requests directly from top companies seeking Python devs. Get started →
VETTERYsponsor

PEP 603: Adding a `frozenmap` Type to `collections`

A draft PEP that proposes adding a new fully persistent and immutable mapping type called frozenmap to the collections module in the Python standard library.
PYTHON.ORG

Java Primer for Python Developers

“There are large distinctions between the two programming languages, but I’ll try to give the most notable that I encountered–as I approached Java from a Python-heavy background.”
MAX MAUTNER

The Boring Technology Behind a One-Person Internet Company

The Python-powered tech stack of a one-person company (ListenNotes podcast search engine).
WENBING FANG

Types for Python HTTP APIs

How Instagram uses types to document and enforce a contract for their Python HTTP APIs.
ANIRUDH PADMARAO (INSTAGRAM)

Discussions

What Are Some of the Drawbacks of Python?

How Common Is Python in the Enterprise World?

Python Jobs

Python Backend Developer (Kfar Saba, Israel)

3DSignals

Senior Software Engineer (Remote)

Senior Python Developer/PM/Architect (Austin, TX)

InQuest

Senior Software Developer (Edmonton, Canada)

Levven Electronics Ltd.

More Python Jobs >>>

Articles & Tutorials

PyGame: A Primer on Game Programming in Python

Learn how to use PyGame. This library allows you to create games and rich multimedia programs in Python. You’ll see how to draw items on your screen, implement collision detection, handle user input, and much more!
REAL PYTHON

How “Export to Excel” Almost Killed Our System

“Inspired by an actual incident we had in one of our systems caused by an Export to Excel functionality implemented in Python, we go through the process of identifying the problem, experimenting and benchmarking different solutions.”
HAKI BENITA

SQL, Python, and R. All in One Platform. Free Forever.

Mode Studio combines a SQL editor, Python & R notebooks, and visualization builder in one platform. Connect your data warehouse and analyze with your preferred language. Make custom visualizations (D3.js, HTML/CSS) or use out-of-the-box charts.
MODE ANALYTICSsponsor

JPMorgan’s Athena Has 35 Million Lines of Python 2 Code, and Won’t Be Updated to Python 3 in Time

“With 35 million lines of Python code, the Athena trading platform is at the core of JPMorgan’s business operations. A late start to migrating to Python 3 could create a security risk.”
JAMES SANDERS

Should You Use “Dot Notation” or “Bracket Notation” With Pandas?

There are two ways to select a Series from a DataFrame: “dot notation” and “bracket notation” (square brackets). Find out which one you should use, and why.
KEVIN MARKHAM

LEGB? Meet ICPO, Python’s Search Strategy for Attributes

How Python looks up object attributes like obj.name using a “instance, class, parent, object” search algorithm.
REUVEN LERNER

Never Delete PyPI Release

Why you should (almost) never delete a bad release from PyPI—and what to do as a package maintainer instead.
ALEX BECKER• Shared by Alex Becker

“Level Up Your Python” Humble Bundle

Support Pythonic charities like the PSF and get books, software, and videos collectively valued at $867 for a pay-what-you-want price.
HUMBLEBUNDLE.COMsponsor

Python Does What?! Welcome to the `float` Zone…

A Python “gotcha” involving floating point numbers and tuples.
PYTHONDOESWHAT.COM

Python `heapq` Module and Heap Data Structure Explained With Examples

MEENAKSHI AGARWAL

Fastest Python Function to Slugify a String

PETER BENGTSSON

Projects & Code

python-intervals: Data Structure and Operations for Intervals

GITHUB.COM/ALEXANDREDECAN• Shared by Alexandre Decan

PyCParser: C Parser and Interpreter Written in Python With Automatic `ctypes` Interface Generation

GITHUB.COM/ALBERTZ

MyHDL: Design Hardware With Python

MYHDL.ORG

Neural Modules: Toolkit for Conversational AI

GITHUB.COM/NVIDIA

30-seconds-of-python: Collection of Python Snippets That You Can Understand in 30 Seconds or Less

GITHUB.COM/30-SECONDS

ml-workspace: All-In-One Web-Based IDE Specialized for Machine Learning and Data Science

GITHUB.COM/ML-TOOLING• Shared by Lukas Masuch

awesome-python-typing: Python Type Stubs, Plugins, and Tools

GITHUB.COM/TYPEDDJANGO

Events

PyCon TW 2019

September 20 to September 23, 2019
PYCON.TW

DjangoCon US

September 22 to September 28, 2019
DJANGOCON.US

PyWeek 28

September 22 to September 30, 2019
PYWEEK.ORG

Happy Pythoning!
This was PyCoder’s Weekly Issue #386.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

↧

Yasoob Khalid: Looking for an internship for Summer 2020

September 17, 2019, 2:03 pm

≫ Next: A. Jesse Jiryu Davis: Free Coaching For PyGotham Speakers

≪ Previous: PyCoder’s Weekly: Issue #386 (Sept. 17, 2019)

Hi lovely people! Hope everything is going well on your end. I asked you guys last year for helping me find a kick-ass internship and you all came through. I ended up working at ASAPP over the summer and had an awesome time. I wrote an article about what I learned during my internship.

I am putting out the same request for next summer as well. If you have benefited from any of my articles and work at an amazing company and feel like I would be a good addition to your team, please reach out. I am looking for a 12-14 week internship. I strongly prefer small teams where I can bond with the people I am working with. I am open to most places but bonus points if you work at a hardware based tech company or a fintech startup. However, this is not a hard requirement.

I have done a lot of backend development in Python and GoLang. I am fairly comfortable with dabbling in the front-end code as well. I have also tinkered with open source hardware (Arduino & Raspberry Pi) and wrote a couple of articles about what I did and how I did it. You can take a look at my resume (PDF) to get a better understanding of my expertise. You can also read about how I got into programming through this article.

tldr: I love working with exciting stuff even if it means I have to learn something completely new!

I am ok with take-home assignments and kinda prefer them to algorithm interviews. Bonus points if your company does that but again, not a hard requirement. I know take-homes take a lot of time but IMO they gauge the proficiency better than a normal algorithm interview.

I hope you guys would come through this time as well. Have a fantastic day and keep smiling. If you have any questions/comments/suggestions, please comment below or send me an email at yasoob.khld at gmail.com.

See ya!

↧

A. Jesse Jiryu Davis: Free Coaching For PyGotham Speakers

September 17, 2019, 3:02 pm

≫ Next: Yasoob Khalid: Filtering & Closing Pull Requests on GitHub using the API

≪ Previous: Yasoob Khalid: Looking for an internship for Summer 2020

I help organize PyGotham, NYC’s annual conference about the Python programming language. For the third year in a row, we’re giving our speakers free sessions with a professional speaking coach, opera singer Melissa Collom. In the past we’ve limited coaching to first-time speakers, but we’re now able to coach everyone. However, we only have budget guaranteed for the first 20 signups. If you’re speaking at PyGotham, reserve your spot now:

↧

Yasoob Khalid: Filtering & Closing Pull Requests on GitHub using the API

September 17, 2019, 12:34 pm

≫ Next: Codementor: why python is the best-suited programming language machine learning

≪ Previous: A. Jesse Jiryu Davis: Free Coaching For PyGotham Speakers

Hi everyone! In this post, I am going to show you how you can use the GitHub API to query Pull Requests, check the content of a PR and close it.

The motivation for this project came from my personal website. I introduced static comments on the website using Staticman and only after a day or two, got bombarded with spam. I hadn’t enabled Akismet or any honey pot field so it was kinda expected. However, this resulted in me getting 200+ PRs on GitHub for bogus comments which were mainly advertisements for amoxicillin (this was also the first time I found out how famous this medicine is).

I was in no mood for going through the PRs manually so I decided to write a short script which went through them on my behalf and closed the PRs which mentioned certain keywords.

You can see the different PRs opened by staticman. Most of these are spam:

For this project, I decided to use PyGithub library. It is super easy to install it using pip:

pip install pygithub

Now we can go ahead and log in to GitHub using PyGithub. Write the following code in a github_clean.py file:

from github import Github
import argparse

def parse_arguments():
    """
    Parses arguments
    """
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--username', 
        required=True, help="GitHub username")
    parser.add_argument('-p', '--password', 
        required=True, help="GitHub password")
    parser.add_argument('-r', '--repository', 
        required=True, help="repository name")
    parsed_args = parser.parse_args()
    if "/" not in parsed_args.repository:
        logging.error("repo name should also contain username like: username/repo_name")
        sys.exit()
    return parsed_args
    
def main():
    args = parse_arguments()
    g = Github(args.username, args.password)
    
if __name__ == '__main__':
    main()

So far I am just using argparse to accept and parse the command line arguments and then using the arguments to create a Github object.

You will be passing in three arguments:

Your GitHub username
Your GitHub password
The repo you want to work with

Next step is to figure out how to loop through all the pull requests and check if their body contains any “spam” words:

repo =  g.get_repo(args.repository)
issues = repo.get_issues()

page_num = 0
while True:
    issue_page = issues.get_page(page_num)
    if issue_page == []:
        break
    for issue in issue_page:
        # Do something with the individual issue
        if spam_word in issue.raw_data['body'].lower():
            print("Contains spam word!!")

First, we query GitHub for a specific repo using g.get_repo and then we query for issues for that repo using repo.get_issues. It is important to note that all PRs are registered as issues as well so querying for issues will return pull requests as well. GitHub returns a paginated result so we just continue asking for successive issues in a while loop until we get an empty page.

We can check the body of an issue (PR) using issue.raw_data['body']. Two important pieces are missing from the above code. One is the spam_word variable and another is some sort of a mechanism to close an issue.

For the spam_word, I took a look at some issues and created a list of some pretty frequent spam words. This is the list I came up with:

spam_words = ["buy", "amoxi", "order", "tablets", 
"pills", "cheap", "viagra", "forex", "cafergot", 
"kamagra", "hacker", "python training"]

Add this list at the top of your github_clean.py file and modify the if statement like this:

closed = False
if any(spam_word in issue.raw_data['body'].lower() for spam_word in spam_words):
    issue.edit(state="closed")
    closed = True
print(f"{issue.number}, closed: {closed}")

With this final snippet of code, we have everything we need. My favourite function in this code snippet is any. It checks if any of the elements being passed in as part of the argument is True.

This is what your whole file should look like:

import argparse
import sys
import re
import logging

from github import Github

spam_words = ["buy", "amoxi", "order", "tablets", 
"pills", "cheap", "viagra", "forex", "cafergot", 
"kamagra", "hacker", "python training"]
logging.basicConfig(level=logging.INFO)

def parse_arguments():
    """
    Parses arguments
    """
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--username', 
        required=True, help="GitHub username")
    parser.add_argument('-p', '--password', 
        required=True, help="GitHub password")
    parser.add_argument('-r', '--repository', 
        required=True, help="repository name")
    parsed_args = parser.parse_args()
    if "/" not in parsed_args.repository:
        logging.error("repo name should also contain username like: username/repo_name")
        sys.exit()
    return parsed_args

def process_issue(issue):
    """
    Processes each issue and closes it 
    based on the spam_words list
    """
    closed = False
    if any(bad_word in issue.raw_data['body'].lower() for bad_word in words):
        issue.edit(state="closed")
        closed = True
    return closed

def main():
    """
    Coordinates the flow of the whole program
    """
    args = parse_arguments()
    g = Github(args.username, args.password)
    logging.info("successfully logged in")
    repo =  g.get_repo(args.repository)

    logging.info("getting issues list")
    issues = repo.get_issues()

    page_num = 0
    while True:
        issue_page = issues.get_page(page_num)
        if issue_page == []:
            logging.info("No more issues to process")
            break
        for issue in issue_page:
            closed = process_issue(issue)
            logging.info(f"{issue.number}, closed: {closed}")
        page_num += 1

    
if __name__ == '__main__':
    main()

I just added a couple of different things to this script, like the logging. If you want, you can create a new command-line argument and use that to control the log level. It isn’t really useful here because we don’t have a lot of different log levels.

Now if you run this script you should see something similar to this:

INFO:root:successfully logged in
INFO:root:getting issues list
INFO:root:No more issues to process

It doesn’t process anything in this run because I have already run this script once and there are no more spam issues left.

So there you go! I hope you had fun making this! If you have any questions/comments/suggestions please let me know in the comments below! See you in the next post

↧

Codementor: why python is the best-suited programming language machine learning

September 17, 2019, 10:27 pm

≫ Next: Matt Layman: Python alternative to Docker

≪ Previous: Yasoob Khalid: Filtering & Closing Pull Requests on GitHub using the API

Machine Learning is the hottest trend in modern times. According to Forbes, Machine learning patents grew at a 34% rate between 2013 and 2017 and this is only set to increase in the future. And...

↧

Matt Layman: Python alternative to Docker

September 17, 2019, 5:00 pm

≫ Next: Podcast.__init__: Cultivating The Python Community In Argentina

≪ Previous: Codementor: why python is the best-suited programming language machine learning

Deploying a Python app to a server is surprisingly hard. Without blinking, you’ll be dealing with virtual environments and a host of other complications. The landscape of deployment methods is huge. What if I told you that there is a way to build your app into a single file and it isn’t a Docker container? In this article, we’re going to look at common ways of deploying Python apps. We’ll explore the touted benefits of Docker containers to understand why containers are so popular for web apps.

↧

Podcast.init: Cultivating The Python Community In Argentina

September 18, 2019, 5:53 am

≫ Next: Talk Python to Me: #230 Python in digital humanities research

≪ Previous: Matt Layman: Python alternative to Docker

The Python community in Argentina is large and active, thanks largely to the motivated individuals who manage and organize it. In this episode Facundo Batista explains how he helped to found the Python user group for Argentina and the work that he does to make it accessible and welcoming. He discusses the challenges of encompassing such a large and distributed group, the types of events, resources, and projects that they build, and his own efforts to make information free and available. He is an impressive individual with a substantial list of accomplishments, as well as exhibiting the best of what the global Python community has to offer.

Summary

Announcements

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
Your host as usual is Tobias Macey and today I’m interviewing Facundo Batista about his experiences founding and fostering the Argentinian Python community, working as a core developer, and his career in Python

Interview

Introductions
How did you get introduced to Python?
What was your motivation for organizing a Python user group in Argentina?
How does the geography and culture of Argentina influence the focus of the community?
Argentina is a fairly large country. What is the reasoning for having the user group encompass the whole nation and how is it organized to provide access to everyone?
What are some notable projects that have been built by or for members of PyAr?
- What are some of the challenges that you faced while building CDPedia and what aspects of it are you most proud of?
How did you get started as a core developer?
- What areas of the language and runtime have you been most involved with?
As a core developer, what are some of the most interesting/unexpected/challenging lessons that you have learned?
What other languages do you currently use and what is it about Python that has motivated you to spend so much of your attention on it?
What are some of the shortcomings in Python that you would like to see addressed in the future?
Outside of CPython, what are some of the projects that you are most proud of?
How has your involvement with core development and PyAr influenced your life and career?

Keep In Touch

@facundobatista on Twitter
Blog

Picks

Tobias
- Dictionary of Difficult Words
Facundo
- Fades

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

↧

Talk Python to Me: #230 Python in digital humanities research

September 18, 2019, 1:00 am

≫ Next: Real Python: How to Convert a Python String to int

≪ Previous: Podcast.__init__: Cultivating The Python Community In Argentina

You've often heard me talk about Python as a superpower. It can amplify whatever you're interested in or what you have specialized in for your career. This episode is an amazing example of this. You'll meet Cornelis van Lit. He is a scholar of medieval Islamic philosophy and woks at Utrecht University in the Netherlands. What he is doing with Python is pretty amazing.

↧

Real Python: How to Convert a Python String to int

September 18, 2019, 7:00 am

≫ Next: Python Bytes: #148 The ASGI revolution is upon us!

≪ Previous: Talk Python to Me: #230 Python in digital humanities research

Integers are whole numbers. In other words, they have no fractional component. Two data types you can use to store an integer in Python are int and str. These types offer flexibility for working with integers in different circumstances. In this tutorial, you’ll learn how you can convert a Python string to an int. You’ll also learn how to convert an int to a string.

By the end of this tutorial, you’ll understand:

How to store integers using str and int
How to convert a Python string to an int
How to convert a Python int to a string

Let’s get started!

Python Pit Stop: This tutorial is a quick and practical way to find the info you need, so you’ll be back to your project in no time!

Free Bonus:Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Representing Integers in Python

An integer can be stored using different types. Two possible Python data types for representing an integer are:

For example, you can represent an integer using a string literal:

>>>

>>> s="110"

Here, Python understands you to mean that you want to store the integer 110 as a string. You can do the same with the integer data type:

>>>

>>> i=110

It’s important to consider what you specifically mean by "110" and 110 in the examples above. As a human who has used the decimal number system for your whole life, it may be obvious that you mean the number one hundred and ten. However, there are several other number systems, such as binary and hexadecimal, which use different bases to represent an integer.

For example, you can represent the number one hundred and ten in binary and hexadecimal as 1101110 and 6e respectively.

You can also represent your integers with other number systems in Python using the str and int data types:

>>>

>>> binary=0b1010>>> hexadecimal="0xa"

Notice that binary and hexadecimal use prefixes to identify the number system. All integer prefixes are in the form 0?, in which you replace ? with a character that refers to the number system:

b: binary (base 2)
o: octal (base 8)
d: decimal (base 10)
x: hexadecimal (base 16)

Technical Detail: The prefix is not required in either an integer or string representation when it can be inferred.

int assumes the literal integer to be decimal:

>>>

>>> decimal=303>>> hexadecimal_with_prefix=0x12F>>> hexadecimal_no_prefix=12F
  File "<stdin>", line 1hexadecimal_no_prefix=12F^SyntaxError: invalid syntax

The string representation of an integer is more flexible because a string holds arbitrary text data:

>>>

>>> decimal="303">>> hexadecimal_with_prefix="0x12F">>> hexadecimal_no_prefix="12F"

Each of these strings represent the same integer.

Now that you have some foundational knowledge about how to represent integers using str and int, you’ll learn how to convert a Python string to an int.

Converting a Python String to an `int`

If you have a decimal integer represented as a string and you want to convert the Python string to an int, then you just pass the string to int(), which returns a decimal integer:

>>>

>>> int("10")10>>> type(int("10"))<class 'int'>

By default, int() assumes that the string argument represents a decimal integer. If, however, you pass a hexadecimal string to int(), then you’ll see a ValueError:

>>>

>>> int("0x12F")Traceback (most recent call last):
  File "<stdin>", line 1, in <module>ValueError: invalid literal for int() with base 10: '0x12F'

The error message says that the string is not a valid decimal integer.

Note:

It’s important to recognize the difference between two types of failed results of passing a string to int():

Syntax Error: A ValueError will occur when int() doesn’t know how to parse the string using the provided base (10 by default).
Logical Error:int() does know how to parse the string, but not the way you expected.

Here’s an example of a logical error:

>>>

>>> binary="11010010">>> int(binary)# Using the default base of 10, instead of 211010010

In this example, you meant for the result to be 210, which is the decimal representation of the binary string. Unfortunately, because you didn’t specify that behavior, int() assumed that the string was a decimal integer.

One good safeguard for this behavior is to always define your string representations using explicit bases:

>>>

>>> int("0b11010010")Traceback (most recent call last):
  File "<stdin>", line 1, in <module>ValueError: invalid literal for int() with base 10: '0b11010010'

Here, you get a ValueError because int() doesn’t know how to parse the binary string as a decimal integer.

When you pass a string to int(), you can specify the number system that you’re using to represent the integer. The way to specify the number system is to use base:

>>>

>>> int("0x12F",base=16)303

Now, int() understands you are passing a hexadecimal string and expecting a decimal integer.

Technical Detail: The argument that you pass to base is not limited to 2, 8, 10, and 16:

>>>

>>> int("10",base=3)3

Great! Now that you’re comfortable with the ins and outs of converting a Python string to an int, you’ll learn how to do the inverse operation.

Converting a Python `int` to a String

In Python, you can convert a Python int to a string using str():

>>>

>>> str(10)'10'>>> type(str(10))<class 'str'>

By default, str() behaves like int() in that it results in a decimal representation:

>>>

>>> str(0b11010010)'210'

In this example, str() is smart enough to interpret the binary literal and convert it to a decimal string.

If you want a string to represent an integer in another number system, then you use a formatted string, such as an f-string (in Python 3.6+), and an option that specifies the base:

>>>

>>> octal=0o1073>>> f"{octal}"# Decimal'571'>>> f"{octal:x}"# Hexadecimal'23b'>>> f"{octal:b}"# Binary'1000111011'

str is a flexible way to represent an integer in a variety of different number systems.

Conclusion

Congratulations! You’ve learned so much about integers and how to represent and convert them between Python string and int data types.

In this tutorial, you learned:

How to use str and int to store integers
How to specify an explicit number system for an integer representation
How to convert a Python string to an int
How to convert a Python int to a string

Now that you know so much about str and int, you can learn more about representing numerical types using float(), hex(), oct(), and bin()!

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Python Bytes: #148 The ASGI revolution is upon us!

September 18, 2019, 1:00 am

≫ Next: Mike Driscoll: Python Code Kata: Fizzbuzz

≪ Previous: Real Python: How to Convert a Python String to int

↧

Mike Driscoll: Python Code Kata: Fizzbuzz

September 18, 2019, 10:22 am

≫ Next: Audrey Roy Greenfeld: Voronoi Mandalas

≪ Previous: Python Bytes: #148 The ASGI revolution is upon us!

A code kata is a fun way for computer programmers to practice coding. They are also used a lot for learning how to implement Test Driven Development (TDD) when writing code. One of the popular programming katas is called FizzBuzz. This is also a popular interview question for computer programmers.

The concept behind FizzBuzz is as follows:

Write a program that prints the numbers 1-100, each on a new line
For each number that is a multiple of 3, print “Fizz” instead of the number
For each number that is a multiple of 5, print “Buzz” instead of the number
For each number that is a multiple of both 3 and 5, print “FizzBuzz” instead of the number

Now that you know what you need to write, you can get started!

Creating a Workspace

The first step is to create a workspace or project folder on your machine. For example, you could create a katas folder with a fizzbuzz inside of it.

The next step is to install a source control program. One of the most popular is Git, but you could use something else like Mercurial. For the purposes of this tutorial, you will be using Git. You can get it from the Git website.

Now open up a terminal or run cmd.exe if you are a Windows user. Then navigate in the terminal to your fizzbuzz folder. You can use the cd command to do that. Once you are inside the folder, run the following command:

git init

This will initialize the fizzbuzz folder into a Git repository. Any files or folders that you add inside the fizzbuzz folder can now be added to Git and versioned.

The Fizz Test

To keep things simple, you can create your test file inside of the fizzbuzz folder. A lot of people will save their tests in sub-folder called test or tests and tell their test runner to add the top level folder to sys.path so that the tests can import it.

Note: If you need to brush up on how to use Python’s unittest library, then you might find Python 3 Testing: An Intro to unittest helpful.

Go ahead an create a file called test_fizzbuzz.py inside your fizzbuzz folder.

Now enter the following into your Python file:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
       self.assertEqual(fizzbuzz.process(6), 'Fizz') 
if __name__ == '__main__':
    unittest.main()

Python comes with the unittest library builtin. To use it, all you need to do is import it and subclass unittest.TestCase. Then you can create a series of functions that represent the tests that you want to run.

Note that you also import the fizzbuzz module. You haven’t created that module yet, so you will receive a ModuleNotFoundError when you run this test code. You could create this file without even adding any code other than the imports and have a failing test. But for completeness, you go ahead and assert that fizzbuzz.process(6) returns the correct string.

The fix is to create an empty fizzbuzz.py file. This will only fix the ModuleNotFoundError, but it will allow you to run the test and see its output now.

You can run your test by doing this:

python test_fizzbuzz.py

The output will look something like this:

ERROR: test_multiple_of_three (__main__.TestFizzBuzz) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/michael/Dropbox/code/fizzbuzz/test_fizzbuzz.py", line 7, in test_multiple_of_three self.assertEqual(fizzbuzz.process(6), 'Fizz') AttributeError: module 'fizzbuzz' has no attribute 'process'

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

So this tells you that your fizzbuzz module is missing an attribute called process.

You can fix that by adding a process() function to your fizzbuzz.py file:

def process(number):
    if number %3 == 0:
        return'Fizz'

This function accepts a number and uses the modulus operator to divide the number by 3 and check to see if there is a remainder. If there is no remainder, then you know that the number is divisible by 3 so you can return the string “Fizz”.

Now when you run the test, the output should look like this:

. ---------------------------------------------------------------------- Ran 1 test in 0.000s

The period on the first line above means that you ran one test and it passed.

Let’s take a quick step back here. When a test is failing, it is considered to be in a “red” state. When a test is passing, that is a “green” state. This refers to the Test Driven Development (TDD) mantra of red/green/refactor. Most developers will start a new project by creating a failing test (red). Then they will write the code to make the test pass, usually in the simplest way possible (green).

When your tests are green, that is a good time to commit your test and the code change(s). This allows you to have a working piece of code that you can rollback to. Now you can write a new test or refactor the code to make it better without worrying that you will lose your work because now you have an easy way to roll back to a previous version of the code.

To commit your code, you can do the following:

git add fizzbuzz.py test_fizzbuzz.py git commit -m "First commit"

The first command will add the two new files. You don’t need to commit *.pyc files, just the Python files. There is a handy file called .gitignore that you can add to your Git repository that you may use to exclude certain file types or folder, such as *.pyc. Github has some default gitignore files for various languages that you can get if you’d like to see an example.

The second command is how you can commit the code to your local repository. The “-m” is for message followed by a descriptive message about the changes that you’re committing. If you would like to save your changes to Github as well (which is great for backup purposes), you should check out this article.

Now we are ready to write another test!

The Buzz Test

The second test that you can write can be for multiples of five. To add a new test, you can create another method in the TestFizzBuzz class:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
        self.assertEqual(fizzbuzz.process(6), 'Fizz') 
    def test_multiple_of_five(self):
        self.assertEqual(fizzbuzz.process(20), 'Buzz') 
if __name__ == '__main__':
    unittest.main()

This time around, you want to use a number that is only divisible by 5. When you call fizzbuzz.process(), you should get “Buzz” returned. When you run the test though, you will receive this:

F. ====================================================================== FAIL: test_multiple_of_five (__main__.TestFizzBuzz) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_fizzbuzz.py", line 10, in test_multiple_of_five self.assertEqual(fizzbuzz.process(20), 'Buzz') AssertionError: None != 'Buzz'

----------------------------------------------------------------------
Ran 2 tests in 0.000s

FAILED (failures=1)

Oops! Right now your code uses the modulus operator to check for remainders after dividing by 3. If the number 20 has a remainder, that statement won’t run. The default return value of a function is None, so that is why you end up getting the failure above.

Go ahead and update the process() function to be the following:

def process(number):
    if number %3 == 0:
        return'Fizz'elif number %5 == 0:
        return'Buzz'

Now you can check for remainders with both 3 and 5. When you run the tests this time, the output should look like this:

.. ---------------------------------------------------------------------- Ran 2 tests in 0.000s

Yay! Your tests passed and are now green! That means you can commit these changes to your Git repository.

Now you are ready to add a test for FizzBuzz!

The FizzBuzz Test

The next test that you can write will be for when you want to get “FizzBuzz” back. As you may recall, you will get FizzBuzz whenever the number is divisible by 3 and 5. Go ahead and add a third test that does just that:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
        self.assertEqual(fizzbuzz.process(6), 'Fizz') 
    def test_multiple_of_five(self):
        self.assertEqual(fizzbuzz.process(20), 'Buzz') 
    def test_fizzbuzz(self):
        self.assertEqual(fizzbuzz.process(15), 'FizzBuzz') 
if __name__ == '__main__':
    unittest.main()

For this test, test_fizzbuzz, you ask your program to process the number 15. This shouldn’t work right yet, but go ahead and run the test code to check:

F.. ====================================================================== FAIL: test_fizzbuzz (__main__.TestFizzBuzz) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_fizzbuzz.py", line 13, in test_fizzbuzz self.assertEqual(fizzbuzz.process(15), 'FizzBuzz') AssertionError: 'Fizz' != 'FizzBuzz'

----------------------------------------------------------------------
Ran 3 tests in 0.000s

FAILED (failures=1)

Three tests were run with one failure. You are now back to red. This time the error is ‘Fizz’ != ‘FizzBuzz’ instead of comparing None to FizzBuzz. The reason for that is because your code checks if 15 is divisible by 3 and it is so it returns “Fizz”.

Since that isn’t what you want to happen, you will need to update your code to check if the number is divisible by 3 and 5 before checking for just 3:

def process(number):
    if number %3 == 0and number %5 == 0:
        return'FizzBuzz'elif number %3 == 0:
        return'Fizz'elif number %5 == 0:
        return'Buzz'

Here you do the divisibility check for 3 and 5 first. Then you check for the other two as before.

Now if you run your tests, you should get the following output:

... ---------------------------------------------------------------------- Ran 3 tests in 0.000s

So far so good. However you don’t have the code working for returning numbers that aren’t divisible by 3 or 5. Time for another test!

The Final Test

The last thing that your code needs to do is return the number when it does have a remainder when divided by 3 and 5. Let’s test it a couple of different ways:

import fizzbuzz
importunittest 
class TestFizzBuzz(unittest.TestCase):
 
    def test_multiple_of_three(self):
        self.assertEqual(fizzbuzz.process(6), 'Fizz') 
    def test_multiple_of_five(self):
        self.assertEqual(fizzbuzz.process(20), 'Buzz') 
    def test_fizzbuzz(self):
        self.assertEqual(fizzbuzz.process(15), 'FizzBuzz') 
    def test_regular_numbers(self):
        self.assertEqual(fizzbuzz.process(2), 2)self.assertEqual(fizzbuzz.process(98), 98) 
if __name__ == '__main__':
    unittest.main()

For this test, you test normal numbers 2 and 98 with the test_regular_numbers() test. These numbers will always have a remainder when divided by 3 or 5, so they should just be returned.

When you run the tests now, you should get something like this:

...F ====================================================================== FAIL: test_regular_numbers (__main__.TestFizzBuzz) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_fizzbuzz.py", line 16, in test_regular_numbers self.assertEqual(fizzbuzz.process(2), 2) AssertionError: None != 2

----------------------------------------------------------------------
Ran 4 tests in 0.000s

FAILED (failures=1)

This time you are back to comparing None to the number, which is what you probably suspected would be the output.

Go ahead and update the process() function as follows:

def process(number):
    if number %3 == 0and number %5 == 0:
        return'FizzBuzz'elif number %3 == 0:
        return'Fizz'elif number %5 == 0:
        return'Buzz'else:
        return number

That was easy! All you needed to do at this point was add an else statement that returns the number.

Now when you run the tests, they should all pass:

.... ---------------------------------------------------------------------- Ran 4 tests in 0.000s

Good job! Now your code works. You can verify that it works for all the numbers, 1-100, by adding the following to your fizzbuzz.py module:

if __name__ == '__main__':
    for i inrange(1, 101):
        print(process(i))

Now when you run fizzbuzz yourself using python fizzbuzz.py, you should see the appropriate output that was specified at the beginning of this tutorial.

This is a good time to commit your code and push it to the cloud.

Wrapping Up

Now you know the basics of using Test Driven Development to drive you to solve a coding kata. Python’s unittest module has many more types of asserts and functionality than is covered in this brief tutorial. You could also modify this tutorial to use pytest, another popular 3rd party Python package that you can use in place of Python’s own unittest module.

The nice thing about having these tests is that now you can refactor your code and verify you didn’t break anything by running the tests. This also allows you to add new features more easily without breaking existing features. Just be sure to add more tests as you add more features.

Audrey Roy Greenfeld: Voronoi Mandalas

September 18, 2019, 10:46 am

≫ Next: Matt Layman: Get Out, Git! - Building SaaS #33

≪ Previous: Mike Driscoll: Python Code Kata: Fizzbuzz

SciPy has tools for creating Voronoi tessellations. Besides the obvious data science applications, you can use them to make pretty art like this:

The above was generated by this code:

I started with Carlos Focil's mandalapy code, modifying the parameters until I had a design I liked. I decided to make the Voronoi diagram show both points and vertices, and I gave it an equal aspect ratio. Carlos' mandalapy code is a port of Antonio Sánchez Chinchón's inspiring work drawing mandalas with R, using the deldir library to plot Voronoi tesselations.

↧

Matt Layman: Get Out, Git! - Building SaaS #33

September 17, 2019, 5:00 pm

≫ Next: Wingware Blog: Viewing Arrays and Data Frames in Wing Pro 7

≪ Previous: Audrey Roy Greenfeld: Voronoi Mandalas

In this episode, I removed the Git clone from the server. This is some of the final cleanup to streamline the deployment process. Before we could remove the clone completely, we had to decouple the final remaining connections that still depended on the repository clone. The first thing to clean up was the Let's Encrypt certificate fetching process. The load balancer’s Ansible playbook had this task: - name: Create cert become: yes command: > /usr/bin/letsencrypt certonly --webroot --email "{{ secrets.

↧

Wingware Blog: Viewing Arrays and Data Frames in Wing Pro 7

September 18, 2019, 6:00 pm

≫ Next: Stack Abuse: Solving Sequence Problems with LSTM in Keras: Part 2

≪ Previous: Matt Layman: Get Out, Git! - Building SaaS #33

Wing Pro 7 introduced an array and data frame viewer that can be used to inspect data objects in the debugger. Values are transferred to the IDE according to what portion of the data is visible on the screen, so working with large data sets won't slow down the IDE.

The array viewer works with Pandas, numpy, sqlite3, xarray, Python's builtin lists, tuples, and dicts, and other classes that emulate lists, tuples, or dicts.

To use the array viewer, right-click on a value in the StackData tool in Wing Pro and select ShowValueasArray:

This reveals the array viewer and displays the selected item from the StackData tree, in this case the global variable pandas_df:

Wing fetches data for display as you move the scroll bars. The Filter can be used to display only matching rows:

The drop down next to the Filter field may be used to select plain text, wildcard, or regular expression searching, to control whether searches are case sensitive, and to select whether to search on all columns or only the visible columns.

If more space is needed to view data, the StackData tool's tab can be dragged out of the window, to create a separate window for it.

That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

↧

Stack Abuse: Solving Sequence Problems with LSTM in Keras: Part 2

September 19, 2019, 5:56 am

≫ Next: Will Kahn-Greene: Markus v2.0.0 released! Better metrics API for Python projects.

≪ Previous: Wingware Blog: Viewing Arrays and Data Frames in Wing Pro 7

This is the second and final part of the two-part series of articles on solving sequence problems with LSTMs. In the part 1 of the series, I explained how to solve one-to-one and many-to-one sequence problems using LSTM. In this part, you will see how to solve one-to-many and many-to-many sequence problems via LSTM in Keras.

Image captioning is a classic example of one-to-many sequence problems where you have a single image as input and you have to predict the image description in the form of a word sequence. Similarly, stock market prediction for the next X days, where input is the stock price of the previous Y days, is a classic example of many-to-many sequence problems.

In this article you will see very basic examples of one-to-many and many-to-many problems. However, the concepts learned in this article will lay the foundation for solving advanced sequence problems, such as stock price prediction and automated image captioning that we will see in the upcoming articles.

One-to-Many Sequence Problems

One-to-many sequence problems are the type of sequence problems where input data has one time-step and the output contains a vector of multiple values or multiple time-steps. In this section, we will see how to solve one-to-many sequence problems where the input has a single feature. We will then move on to see how to work with multiple features input to solve one-to-many sequence problems.

One-to-Many Sequence Problems with a Single Feature

Let's first create a dataset and understand the problem that we are going to solve in this section.

Creating the Dataset

The following script imports the required libraries:

from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, LSTM
from keras.layers import GlobalMaxPooling1D
from keras.models import Model
from keras.layers.embeddings import Embedding
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.layers import Input
from keras.layers.merge import Concatenate
from keras.layers import Bidirectional

import pandas as pd
import numpy as np
import re

import matplotlib.pyplot as plt

And the following script creates the dataset:

X = list()
Y = list()
X = [x+3 for x in range(-2, 43, 3)]

for i in X:
    output_vector = list()
    output_vector.append(i+1)
    output_vector.append(i+2)
    Y.append(output_vector)

print(X)
print(Y)

Here is the output:

[1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43]
[[2, 3], [5, 6], [8, 9], [11, 12], [14, 15], [17, 18], [20, 21], [23, 24], [26, 27], [29, 30], [32, 33], [35, 36], [38, 39], [41, 42], [44, 45]]

Our input contains 15 samples with one time-step and one feature value. For each value in the input sample, the corresponding output vector contains the next two integers. For instance, if the input is 4, the output vector will contain values 5 and 6. Hence, the problem is a simple one-to-many sequence problem.

The following script reshapes our data as required by the LSTM:

X = np.array(X).reshape(15, 1, 1)
Y = np.array(Y)

We can now train our models. We will train simple and stacked LSTMs.

Solution via Simple LSTM

model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(1, 1)))
model.add(Dense(2))
model.compile(optimizer='adam', loss='mse')
model.fit(X, Y, epochs=1000, validation_split=0.2, batch_size=3)

Once the model is trained we can make predictions on the test data:

test_input = array([10])
test_input = test_input.reshape((1, 1, 1))
test_output = model.predict(test_input, verbose=0)
print(test_output)

The test data contains a value 10. In the output, we should get a vector containing 11 and 12. The output I received is [10.982891 12.109697] which is actually very close to the expected output.

Solution via Stacked LSTM

The following script trains stacked LSTMs on our data and makes prediction on the test points:

model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, 1)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(2))
model.compile(optimizer='adam', loss='mse')
history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)

test_output = model.predict(test_input, verbose=0)
print(test_output)

The answer is [11.00432 11.99205] which is very close to the actual output.

Solution via Bidirectional LSTM

The following script trains a bidirectional LSTM on our data and then makes a prediction on the test set.

from keras.layers import Bidirectional

model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(1, 1)))
model.add(Dense(2))
model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)
test_output = model.predict(test_input, verbose=0)
print(test_output)

The output I received is [11.035181 12.082813]

One-to-Many Sequence Problems with Multiple Features

In this section we will see one-to-many sequence problems where input samples will have one time-step, but two features. The output will be a vector of two elements.

Creating the Dataset

As always, the first step is to create the dataset:

nums = 25

X1 = list()
X2 = list()
X = list()
Y = list()

X1 = [(x+1)*2 for x in range(25)]
X2 = [(x+1)*3 for x in range(25)]

for x1, x2 in zip(X1, X2):
    output_vector = list()
    output_vector.append(x1+1)
    output_vector.append(x2+1)
    Y.append(output_vector)

X = np.column_stack((X1, X2))
print(X)

Our input dataset looks like this:

[[ 2  3]
 [ 4  6]
 [ 6  9]
 [ 8 12]
 [10 15]
 [12 18]
 [14 21]
 [16 24]
 [18 27]
 [20 30]
 [22 33]
 [24 36]
 [26 39]
 [28 42]
 [30 45]
 [32 48]
 [34 51]
 [36 54]
 [38 57]
 [40 60]
 [42 63]
 [44 66]
 [46 69]
 [48 72]
 [50 75]]

You can see each input time-step consists of two features. The output will be a vector which contains the next two elements that correspond to the two features in the time-step of the input sample. For instance, for the input sample [2, 3], the output will be [3, 4], and so on.

Let's reshape our data:

X = np.array(X).reshape(25, 1, 2)
Y = np.array(Y)

Solution via Simple LSTM

model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(1, 2)))
model.add(Dense(2))
model.compile(optimizer='adam', loss='mse')
model.fit(X, Y, epochs=1000, validation_split=0.2, batch_size=3)

Let's now create our test point and see how well our algorithm performs:

test_input = array([40, 60])
test_input = test_input.reshape((1, 1, 2))
test_output = model.predict(test_input, verbose=0)
print(test_output)

The input is [40, 60], the output should be [41, 61]. The output predicted by our simple LSTM is [40.946873 60.941723] which is very close to the expected output.

Solution via Stacked LSTM

model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, 2)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(2))
model.compile(optimizer='adam', loss='mse')
history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)

test_input = array([40, 60])
test_input = test_input.reshape((1, 1, 2))
test_output = model.predict(test_input, verbose=0)
print(test_output)

The output in this case is: [40.978477 60.994644]

Solution via Bidirectional LSTM

from keras.layers import Bidirectional

model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(1, 2)))
model.add(Dense(2))
model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)
test_output = model.predict(test_input, verbose=0)
print(test_output)

The output obtained is: [41.0975 61.159065]

Many-to-Many Sequence Problems

In one-to-many and many-to-one sequence problems, we saw that the output vector can contain multiple values. Depending upon the problem, an output vector containing multiple values can be considered as having single (since the output contains one time-step data in strict terms) or multiple (since one vector contains multiple values) outputs.

However, in some sequence problems, we want multiple outputs divided over time-steps. In other words, for each time-step in the input, we want a corresponding time-step in the output. Such models can be used to solve many-to-many sequence problems with variable lengths.

Encoder-Decoder Model

To solve such sequence problems, the encoder-decoder model has been designed. The encoder-decoder model is basically a fancy name for neural network architecture with two LSTM layers.

The first layer works as an encoder layer and encodes the input sequence. The decoder is also an LSTM layer, which accepts three inputs: the encoded sequence from the encoder LSTM, the previous hidden state, and the current input. During training the actual output at each time-step is used to train the encoder-decoder model. While making predictions, the encoder output, the current hidden state, and the previous output is used as input to make prediction at each time-step. These concepts will become more understandable when you will see them in action in an upcoming section.

Many-to-Many Sequence Problems with Single Feature

In this section we will solve many-to-many sequence problems via the encoder-decoder model, where each time-step in the input sample will contain one feature.

Let's first create our dataset.

Creating the Dataset

X = list()
Y = list()
X = [x for x in range(5, 301, 5)]
Y = [y for y in range(20, 316, 5)]

X = np.array(X).reshape(20, 3, 1)
Y = np.array(Y).reshape(20, 3, 1)

The input X contains 20 samples where each sample contains 3 time-steps with one feature. One input sample looks like this:

[[[  5]
  [ 10]
  [ 15]]

You can see that the input sample contain 3 values that are basically 3 consecutive multiples of 5. The corresponding output sequence for the above input sample is as follows:

[[[ 20]
  [ 25]
  [ 30]]

The output contains the next three consecutive multiples of 5. You can see the output in this case is different than what we have seen in the previous sections. For the encoder-decoder model, the output should also be converted into a 3D format containing the number of samples, time-steps, and features. The is because the decoder generates an output per time-step.

We have created our dataset; the next step is to train our models. We will train stacked LSTM and bidirectional LSTM models in the following sections.

Solution via Stacked LSTM

The following script creates the encoder-decoder model using stacked LSTMs:

from keras.layers import RepeatVector
from keras.layers import TimeDistributed

model = Sequential()

# encoder layer
model.add(LSTM(100, activation='relu', input_shape=(3, 1)))

# repeat vector
model.add(RepeatVector(3))

# decoder layer
model.add(LSTM(100, activation='relu', return_sequences=True))

model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')

print(model.summary())

In the above script, the first LSTM layer is the encoder layer.

Next, we have added the repeat vector to our model. The repeat vector takes the output from encoder and feeds it repeatedly as input at each time-step to the decoder. For instance, in the output we have three time-steps. To predict each output time-step, the decoder will use the value from the repeat vector, the hidden state from the previous output and the current input.

Next we have a decoder layer. Since the output is in the form of a time-step, which is a 3D format, the return_sequences for the decoder model has been set True. The TimeDistributed layer is used to individually predict the output for each time-step.

The model summary for the encoder-decoder model created in the script above is as follows:

Layer (type)                 Output Shape              Param #
=================================================================
lstm_40 (LSTM)               (None, 100)               40800
_________________________________________________________________
repeat_vector_7 (RepeatVecto (None, 3, 100)            0
_________________________________________________________________
lstm_41 (LSTM)               (None, 3, 100)            80400
_________________________________________________________________
time_distributed_7 (TimeDist (None, 3, 1)              101
=================================================================
Total params: 121,301
Trainable params: 121,301
Non-trainable params: 0

You can see that the repeat vector only repeats the encoder output and has no parameters to train.

The following script trains the above encoder-decoder model.

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)

Let's create a test-point and see if our encoder-decoder model is able to predict the multi-step output. Execute the following script:

test_input = array([300, 305, 310])
test_input = test_input.reshape((1, 3, 1))
test_output = model.predict(test_input, verbose=0)
print(test_output)

Our input sequence contains three time-step values 300, 305 and 310. The output should be next three multiples of 5 i.e. 315, 320 and 325. I received the following output:

[[[316.02878]
  [322.27145]
  [328.5536 ]]]

You can see that the output is in 3D format.

Solution via Bidirectional LSTM

Let's now create encoder-decoder model with bidirectional LSTMs and see if we can get better results:

from keras.layers import RepeatVector
from keras.layers import TimeDistributed

model = Sequential()
model.add(Bidirectional(LSTM(100, activation='relu', input_shape=(3, 1))))
model.add(RepeatVector(3))
model.add(Bidirectional(LSTM(100, activation='relu', return_sequences=True)))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)

The above script trains the encoder-decoder model via bidirectional LSTM. Let's now make predictions on the test point i.e. [300, 305, 310].

test_output = model.predict(test_input, verbose=0)
print(test_output)

Here is the output:

[[[315.7526 ]
  [321.47153]
  [327.94025]]]

The output I got via bidirectional LSTMs is better than what I got via the simple stacked LSTM-based encoder-decoder model.

Many-to-Many Sequence Problems with Multiple Features

As you might have guessed it by now, in many-to-many sequence problems, each time-step in the input sample contains multiple features.

Creating the Dataset

Let's create a simple dataset for our problem:

X = list()
Y = list()
X1 = [x1 for x1 in range(5, 301, 5)]
X2 = [x2 for x2 in range(20, 316, 5)]
Y = [y for y in range(35, 331, 5)]

X = np.column_stack((X1, X2))

In the script above we create two lists X1 and X2. The list X1 contains all the multiples of 5 from 5 to 300 (inclusive) and the list X2 contains all the multiples of 5 from 20 to 315 (inclusive). Finally, the list Y, which happens to be the output contains all the multiples of 5 between 35 and 330 (inclusive). The final input list X is a column-wise merger of X1 and X2.

As always, we need to reshape our input X and output Y before they can be used to train LSTM.

X = np.array(X).reshape(20, 3, 2)
Y = np.array(Y).reshape(20, 3, 1)

You can see the input X has been reshaped into 20 samples of three time-steps with 2 features where the output has been reshaped into similar dimensions but with 1 feature.

The first sample from the input looks like this:

[[ 5  20]
[ 10  25]
[ 15  30]]

The input contains 6 consecutive multiples of integer 5, three each in the two columns. Here is the corresponding output for the above input sample:

[[ 35]
[ 40]
[ 45]]

As you can see, the output contains the next three consecutive multiples of 5.

Let's now train our encoder-decoder model to learn the above sequence. We will first train a simple stacked LSTM-based encoder-decoder.

Solution via Stacked LSTM

The following script trains the stacked LSTM model. You can see that the input shape is now (3, 2) corresponding to three time-steps and two features in the input.

from keras.layers import RepeatVector
from keras.layers import TimeDistributed

model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(3, 2)))
model.add(RepeatVector(3))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)

Let's now create a test point that will be used for making a prediction.

X1 = [300, 305, 310]
X2 = [315, 320, 325]

test_input = np.column_stack((X1, X2))

test_input = test_input.reshape((1, 3, 2))
print(test_input)

The test point looks like this:

[[[300 315]
  [305 320]
  [310 325]]]

The actual output of the above test point is [330, 335, 340]. Let's see what are model predicts:

test_output = model.predict(test_input, verbose=0)
print(test_output)

The predicted output is:

[[[324.5786 ]
  [328.89658]
  [335.67603]]]

The output is far from being correct.

Solution via Bidirectional LSTM

Let's now train encoder-decoder model based on bidirectional LSTMs and see if we can get improved results. The following script trains the model.

from keras.layers import RepeatVector
from keras.layers import TimeDistributed

model = Sequential()
model.add(Bidirectional(LSTM(100, activation='relu', input_shape=(3, 2))))
model.add(RepeatVector(3))
model.add(Bidirectional(LSTM(100, activation='relu', return_sequences=True)))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3)

The following script makes predictions on the test set:

test_output = model.predict(test_input, verbose=0)
print(test_output)

Here is the output:

[[[330.49133]
  [335.35327]
  [339.64398]]]

The output achieved is pretty close to the actual output i.e. [330, 335, 340]. Hence our bidirectional LSTM outperformed the simple LSTM.

Conclusion

This is the second part of my article on "Solving Sequence Problems with LSTM in Keras" (part 1 here). In this article you saw how to solve one-to-many and many-to-many sequence problems in LSTM. You also saw how encoder-decoder model can be used to predict multi-step outputs. The encoder-decoder model is used in a variety of natural language processing applications such as neural machine translation and chatbot development.

In the upcoming article, we will see the application of encoder-decoder model in NLP.

↧

Will Kahn-Greene: Markus v2.0.0 released! Better metrics API for Python projects.

September 19, 2019, 6:00 am

≫ Next: Peter Bengtsson: uwsgi weirdness with --http

≪ Previous: Stack Abuse: Solving Sequence Problems with LSTM in Keras: Part 2

What is it?

Markus is a Python library for generating metrics.

Markus makes it easier to generate metrics in your program by:

providing multiple backends (Datadog statsd, statsd, logging, logging roll-up, and so on) for sending metrics data to different places
sending metrics to multiple backends at the same time
providing a testing framework for easy metrics generation testing
providing a decoupled architecture making it easier to write code to generate metrics without having to worry about making sure creating and configuring a metrics client has been done--similar to the Python logging module in this way

We use it at Mozilla on many projects.

v2.0.0 released!

I released v2.0.0 just now. Changes:

Features

Use time.perf_counter() if available. Thank you, Mike! (#34)
Support Python 3.7 officially.
Add filters for adjusting and dropping metrics getting emitted. See documentation for more details. (#40)

Backwards incompatible changes

tags now defaults to [] instead of None which may affect some expected test output.
Adjust internals to run .emit() on backends. If you wrote your own backend, you may need to adjust it.
Drop support for Python 3.4. (#39)
Drop support for Python 2.7.
If you're still using Python 2.7, you'll need to pin to <2.0.0. (#42)

Bug fixes

Document feature support in backends. (#47)
Fix MetricsMock.has_record() example. Thank you, John!

Where to go for more

Changes for this release: https://markus.readthedocs.io/en/latest/history.html#september-19th-2019

Documentation and quickstart here: https://markus.readthedocs.io/en/latest/index.html

Source code and issue tracker here: https://github.com/willkg/markus

Let me know whether this helps you!

↧

Introduction

Traffic

Site Costs and Revenue

Closing Thoughts

Credits

My encounters with a former hero

Why I am resigning from the FSF

Discussions

Python Jobs

Articles & Tutorials

Projects & Code

Events

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

Representing Integers in Python

Converting a Python String to an int

Converting a Python int to a String

Conclusion

Creating a Workspace

The Fizz Test

The Buzz Test

The FizzBuzz Test

The Final Test

Wrapping Up

Related Reading

One-to-Many Sequence Problems

One-to-Many Sequence Problems with a Single Feature

Creating the Dataset

Solution via Simple LSTM

Solution via Stacked LSTM

Solution via Bidirectional LSTM

One-to-Many Sequence Problems with Multiple Features

Creating the Dataset

Solution via Simple LSTM

Solution via Stacked LSTM

Solution via Bidirectional LSTM

Many-to-Many Sequence Problems

Encoder-Decoder Model

Many-to-Many Sequence Problems with Single Feature

Creating the Dataset

Solution via Stacked LSTM

Solution via Bidirectional LSTM

Many-to-Many Sequence Problems with Multiple Features

Creating the Dataset

Solution via Stacked LSTM

Solution via Bidirectional LSTM

Conclusion

What is it?

v2.0.0 released!

Where to go for more

Converting a Python String to an `int`

Converting a Python `int` to a String