Yasoob Khalid: Making a Reddit + Facebook Messenger Bot

July 11, 2017, 11:15 pm

≫ Next: Yasoob Khalid: Recovering lost Python source code if it’s still resident in-memory

≪ Previous: Roberto Alsina: New mini-project: Gyro

Hi guys! I haven’t been programming a lot lately because of exams. However, on the past weekend I managed to get a hold of my laptop and crank out something useful. It was a Facebook messenger bot which servers you fresh memes, motivational posts, jokes and shower thoughts. It was the first time I had delved into bot creation. In this post I will teach you most of the stuff you need to know in order to get your bot off the ground.

First of all some screenshots of the final product:

Tech Stack

We will be making use of the following:

Flask framework for coding up the backend as it is lightweight and allows us to focus on the logic instead of the folder structure.
Heroku– For hosting our code online for free
Reddit– As a data source because it get’s new posts every minute

1. Getting things ready

Creating a Reddit app

We will be using Facebook, Heroku and Reddit. Firstly, make sure that you have an account on all three of these services. Next you need to create a Reddit application on this link.

In the above image you can already see the “motivation” app which I have created. Click on “create another app…” and follow the on-screen instructions.

The about and redirect url will not be used hence it is ok to leave them blank. For production apps it is better to put in something related to your project so that if you start making a lot of requests and reddit starts to notice it they can check the about page of you app and act in a more informed manner.

So now that your app is created you need to save the ‘client_id’ and ‘client_secret’ in a safe place.

One part of our project is done. Now we need to setup the base for our Heroku app.

Creating an App on Heroku

Go to this dashboard url and create a new application.

On the next page give your application a unique name.

From the next page click on “Heroku CLI” and download the latest Heroku CLI for your operating system. Follow the on-screen install instructions and come back once it has been installed.

Creating a basic Python application

The below code is taken from Konstantinos Tsaprailis’s website.

from flask import Flask, request
import json
import requests

app = Flask(__name__)

# This needs to be filled with the Page Access Token that will be provided
# by the Facebook App that will be created.
PAT = ''

@app.route('/', methods=['GET'])
def handle_verification():
    print "Handling Verification."
    if request.args.get('hub.verify_token', '') == 'my_voice_is_my_password_verify_me':
        print "Verification successful!"
        return request.args.get('hub.challenge', '')
    else:
        print "Verification failed!"
        return 'Error, wrong validation token'

@app.route('/', methods=['POST'])
def handle_messages():
    print "Handling Messages"
    payload = request.get_data()
    print payload
    for sender, message in messaging_events(payload):
        print "Incoming from %s: %s" % (sender, message)
        send_message(PAT, sender, message)
    return "ok"

def messaging_events(payload):
    """Generate tuples of (sender_id, message_text) from the
    provided payload.
    """
    data = json.loads(payload)
    messaging_events = data["entry"][0]["messaging"]
    for event in messaging_events:
        if "message" in event and "text" in event["message"]:
            yield event["sender"]["id"], event["message"]["text"].encode('unicode_escape')
        else:
            yield event["sender"]["id"], "I can't echo this"


def send_message(token, recipient, text):
    """Send the message text to recipient with id recipient.
    """

    r = requests.post("https://graph.facebook.com/v2.6/me/messages",
        params={"access_token": token},
        data=json.dumps({
            "recipient": {"id": recipient},
            "message": {"text": text.decode('unicode_escape')}
        }),
        headers={'Content-type': 'application/json'})
    if r.status_code != requests.codes.ok:
        print r.text

if __name__ == '__main__':
    app.run()

We will be modifying the file according to our needs. So basically a Facebook bot works like this:

Facebook sends a request to our server whenever a user messages our page on Facebook.
We respond to the Facebook’s request and store the id of the user and the message which was sent to our page.
We respond to user’s message through Graph API using the stored user id and message id.

A detailed breakdown of the above code is available of this website. In this post I will mainly be focusing on the Reddit integration and how to use the Postgres Database on Heroku.

Before moving further let’s deploy the above Python code onto Heroku. For that you have to create a local Git repository. Follow the following steps:

$ mkdir messenger-bot
$ cd messenger-bot
$ touch requirements.txt app.py Procfile

Execute the above commands in a terminal and put the above Python code into the app.py file. Put the following into Procfile:

web: gunicorn app:app

Now we need to tell Heroku which Python libraries our app will need to function properly. Those libraries will need to be listed in the requirements.txt file. I am going to fast-forward a bit over here and simply copy the requirements from this post. Put the following lines into requirements.txt file and you should be good to go for now.

click==6.6
Flask==0.11
gunicorn==19.6.0
itsdangerous==0.24
Jinja2==2.8
MarkupSafe==0.23
requests==2.10.0
Werkzeug==0.11.10

Run the following command in the terminal and you should get a similar output:

$ ls
Procfile      app.py     requirements.txt

Now we are ready to create a Git repository which can then be pushed onto Heroku servers. We will carry out the following steps now:

Login into Heroku
Create a new git repository
commit everything into the new repo
push the repo onto Heroku

The commands required to achieve this are listed below:

$ heroku login
$ git init
$ heroku git:remote -a 
$ git commit -am "Initial commit"$ git push heroku master
...
remote: https://.herokuapp.com/ deployed to Heroku
...

$ heroku config:set WEB_CONCURRENCY=3

Save the url which is outputted above after “remote” . It is the url of your Heroku app. We will need it in the next step when we create a Facebook app.

Creating a Facebook App

Firstly we need a Facebook page. It is a requirement by Facebook to supplement every app with a relevant page.

Now we need to register a new app. Go to this app creation page and follow the instructions below.

Now head over to your app.py file and replace the PAT string on line 9 with the Page Access Token we saved above.

Commit everything and push the code to Heroku.

$ git commit -am "Added in the PAT"
$ git push heroku master

Now if you go to the Facebook page and send a message onto that page you will get your own message as a reply from the page. This shows that everything we have done so far is working. If something does not work check your Heroku logs which will give you some clue about what is going wrong. Later, a quick Google search will help you resolve the issue. You can access the logs like this:

$ heroku logs -t -a

Note: Only your msgs will be replied by the Facebook page. If any other random user messages the page his messages will not be replied by the bot because the bot is currently not approved by Facebook. However if you want to enable a couple of users to test your app you can add them as testers. You can do so by going to your Facebook app’s developer page and following the onscreen instructions.

Getting data from Reddit

We will be using data from the following subreddits:

First of all let’s install Reddit’s Python library “praw“. It can easily be done by typing the following instructions in the terminal:

$ pip install praw

Now let’s test some Reddit goodness in a Python shell. I followed the docs which clearly show how to access Reddit and how to access a subreddit. Now is the best time to grab the “client_id” and “client_secret” which we created in the first part of this post.

$ python
Python 2.7.13 (default, Dec 17 2016, 23:03:43) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import praw
>>> reddit = praw.Reddit(client_id='**********',
... client_secret='*****************',
... user_agent='my user agent')

>>> 
>>> submissions = list(reddit.subreddit("GetMotivated").hot(limit=None))
>>> submissions[-4].title
u'[Video] Hi, Stranger.'

Note: Don’t forget to add in your own client_id and client_secret in place of ****

Let’s discuss the important bits here. I am using limit=None because I want to get back as many posts as I can. Initially this feels like an overkill but you will quickly see that when a user starts using the Facebook bot pretty frequently we will run out of new posts if we limit ourselves to 10 or 20 posts. An additional constraint which we will add is that we will only use the image posts from GetMotivated and Memes and only text posts from Jokes and ShowerThoughts. Due to this constraint only one or two posts from top 10 hot posts might be useful to us because a lot of video submissions are also done to GetMotivated.

Now that we know how to access Reddit using the Python library we can go ahead and integrate it into our app.py.

Firstly add some additional libraries into our requirements.txt so that it looks something like this:

$ cat requirements.txt
click==6.6
Flask==0.11
gunicorn==19.6.0
itsdangerous==0.24
Jinja2==2.8
MarkupSafe==0.23
requests==2.10.0
Werkzeug==0.11.10
flask-sqlalchemy
psycopg2
praw

Now if we only wanted to send the user an image or text taken from reddit, it wouldn’t have been very difficult. In the “send_message” function we could have done something like this:

import praw
...

def send_message(token, recipient, text):
    """Send the message text to recipient with id recipient.
    """
    if "meme" in text.lower():
        subreddit_name = "memes"
    elif "shower" in text.lower():
        subreddit_name = "Showerthoughts"
    elif "joke" in text.lower():
        subreddit_name = "Jokes"
    else:
        subreddit_name = "GetMotivated"
    ....

    if subreddit_name == "Showerthoughts":
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            payload = submission.url
            break
    ...
    
    r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"attachment": {
                              "type": "image",
                              "payload": {
                                "url": payload
                              }}
            }),
            headers={'Content-type': 'application/json'})
    ...

But there is one issue with this approach. How will we know whether a user has been sent a particular image/text or not? We need some kind of id for each image/text we send the user so that we don’t send the same post twice. In order to solve this issue we are going to take some help of Postgresql and the reddit posts id (Every post on reddit has a special id).

We are going to use a Many-to-Many relation. There will be two tables:

Users
Posts

Let’s first define them in our code and then I will explain how it will work:

from flask_sqlalchemy import SQLAlchemy

...
app.config['SQLALCHEMY_DATABASE_URI'] = os.environ['DATABASE_URL']
db = SQLAlchemy(app)

...
relationship_table=db.Table('relationship_table',                            
    db.Column('user_id', db.Integer,db.ForeignKey('users.id'), nullable=False),
    db.Column('post_id',db.Integer,db.ForeignKey('posts.id'),nullable=False),
    db.PrimaryKeyConstraint('user_id', 'post_id') )
 
class Users(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(255),nullable=False)
    posts=db.relationship('Posts', secondary=relationship_table, backref='users' )  

    def __init__(self, name):
        self.name = name
 
class Posts(db.Model):
    id=db.Column(db.Integer, primary_key=True)
    name=db.Column(db.String, unique=True, nullable=False)
    url=db.Column(db.String, nullable=False)

    def __init__(self, name, url):
        self.name = name
        self.url = url

So the user table has two fields. The name will be the id sent with the Facebook Messenger Webhook request. The posts will be linked to the other table, “Posts”. The Posts table has name and url field. “name” will be populated by the reddit submission id and the url will be populated by the url of that post. We don’t really need to have the “url” field. I will be using it for some other uses in the future hence I included it in the code.

So now the way our final code will work is this:

We request a list of posts from a particular subreddit. The following code:
```
reddit.subreddit(subreddit_name).hot(limit=None)
```
returns a generator so we don’t need to worry about memory
We will check whether the particular post has already been sent to the user in the past or not
If the post has been sent in the past we will continue requesting more posts from Reddit until we find a fresh post
If the post has not been sent to the user, we send the post and break out of the loop

So the final code of the app.py file is this:

from flask import Flask, request
import json
import requests
from flask_sqlalchemy import SQLAlchemy
import os
import praw

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = os.environ['DATABASE_URL']
db = SQLAlchemy(app)
reddit = praw.Reddit(client_id='*************',
                     client_secret='****************',
                     user_agent='my user agent')

# This needs to be filled with the Page Access Token that will be provided
# by the Facebook App that will be created.
PAT = '*********************************************'

quick_replies_list = [{
    "content_type":"text",
    "title":"Meme",
    "payload":"meme",
},
{
    "content_type":"text",
    "title":"Motivation",
    "payload":"motivation",
},
{
    "content_type":"text",
    "title":"Shower Thought",
    "payload":"Shower_Thought",
},
{
    "content_type":"text",
    "title":"Jokes",
    "payload":"Jokes",
}
]
@app.route('/', methods=['GET'])
def handle_verification():
    print "Handling Verification."
    if request.args.get('hub.verify_token', '') == 'my_voice_is_my_password_verify_me':
        print "Verification successful!"
        return request.args.get('hub.challenge', '')
    else:
        print "Verification failed!"
        return 'Error, wrong validation token'

@app.route('/', methods=['POST'])
def handle_messages():
    print "Handling Messages"
    payload = request.get_data()
    print payload
    for sender, message in messaging_events(payload):
        print "Incoming from %s: %s" % (sender, message)
        send_message(PAT, sender, message)
    return "ok"

def messaging_events(payload):
    """Generate tuples of (sender_id, message_text) from the
    provided payload.
    """
    data = json.loads(payload)
    messaging_events = data["entry"][0]["messaging"]
    for event in messaging_events:
        if "message" in event and "text" in event["message"]:
            yield event["sender"]["id"], event["message"]["text"].encode('unicode_escape')
        else:
            yield event["sender"]["id"], "I can't echo this"


def send_message(token, recipient, text):
    """Send the message text to recipient with id recipient.
    """
    if "meme" in text.lower():
        subreddit_name = "memes"
    elif "shower" in text.lower():
        subreddit_name = "Showerthoughts"
    elif "joke" in text.lower():
        subreddit_name = "Jokes"
    else:
        subreddit_name = "GetMotivated"

    myUser = get_or_create(db.session, Users, name=recipient)

    if subreddit_name == "Showerthoughts":
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            if (submission.is_self == True):
                query_result = Posts.query.filter(Posts.name == submission.id).first()
                if query_result is None:
                    myPost = Posts(submission.id, submission.title)
                    myUser.posts.append(myPost)
                    db.session.commit()
                    payload = submission.title
                    break
                elif myUser not in query_result.users:
                    myUser.posts.append(query_result)
                    db.session.commit()
                    payload = submission.title
                    break
                else:
                    continue  

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"text": payload,
                            "quick_replies":quick_replies_list}
            }),
            headers={'Content-type': 'application/json'})
    
    elif subreddit_name == "Jokes":
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            if ((submission.is_self == True) and ( submission.link_flair_text is None)):
                query_result = Posts.query.filter(Posts.name == submission.id).first()
                if query_result is None:
                    myPost = Posts(submission.id, submission.title)
                    myUser.posts.append(myPost)
                    db.session.commit()
                    payload = submission.title
                    payload_text = submission.selftext
                    break
                elif myUser not in query_result.users:
                    myUser.posts.append(query_result)
                    db.session.commit()
                    payload = submission.title
                    payload_text = submission.selftext
                    break
                else:
                    continue  

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"text": payload}
            }),
            headers={'Content-type': 'application/json'})

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"text": payload_text,
                            "quick_replies":quick_replies_list}
            }),
            headers={'Content-type': 'application/json'})
        
    else:
        payload = "http://imgur.com/WeyNGtQ.jpg"
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            if (submission.link_flair_css_class == 'image') or ((submission.is_self != True) and ((".jpg" in submission.url) or (".png" in submission.url))):
                query_result = Posts.query.filter(Posts.name == submission.id).first()
                if query_result is None:
                    myPost = Posts(submission.id, submission.url)
                    myUser.posts.append(myPost)
                    db.session.commit()
                    payload = submission.url
                    break
                elif myUser not in query_result.users:
                    myUser.posts.append(query_result)
                    db.session.commit()
                    payload = submission.url
                    break
                else:
                    continue

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"attachment": {
                              "type": "image",
                              "payload": {
                                "url": payload
                              }},
                              "quick_replies":quick_replies_list}
            }),
            headers={'Content-type': 'application/json'})

    if r.status_code != requests.codes.ok:
        print r.text

def get_or_create(session, model, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance
    else:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
        return instance

relationship_table=db.Table('relationship_table',                            
    db.Column('user_id', db.Integer,db.ForeignKey('users.id'), nullable=False),
    db.Column('post_id',db.Integer,db.ForeignKey('posts.id'),nullable=False),
    db.PrimaryKeyConstraint('user_id', 'post_id') )
 
class Users(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(255),nullable=False)
    posts=db.relationship('Posts', secondary=relationship_table, backref='users' )  

    def __init__(self, name=None):
        self.name = name
 
class Posts(db.Model):
    id=db.Column(db.Integer, primary_key=True)
    name=db.Column(db.String, unique=True, nullable=False)
    url=db.Column(db.String, nullable=False)

    def __init__(self, name=None, url=None):
        self.name = name
        self.url = url

if __name__ == '__main__':
    app.run()

So put this code into app.py file and send it to Heroku.

$ git commit -am "Updated the code with Reddit feature"
$ git push heroku master

One last thing is still remaining. We need to tell Heroku that we will be using the database. It is simple. Just issue the following command in the terminal:

$ heroku addons:create heroku-postgresql:hobby-dev --app <app_name>

This will create a free hobby database which is enough for our project. Now we only need to initialise the database with the correct tables. In order to do that we first need to run the Python shell on our Heroku server:

$ heroku run python

Now in the Python shell type the following commands:

>>> from app import db
>>> db.create_all()

So now our project is complete. Congrats!

Let me discuss some interesting features of the code. Firstly, I am making use of the “quick-replies” feature of Facebook Messenger Bot API. This allows us to send some pre-formatted inputs which the user can quickly select. They will look something like this:

It is easy to display these quick replies to the user. With every post request to the Facebook graph API we send some additional data:

quick_replies_list = [{
 "content_type":"text",
 "title":"Meme",
 "payload":"meme",
},
{
 "content_type":"text",
 "title":"Motivation",
 "payload":"motivation",
},
{
 "content_type":"text",
 "title":"Shower Thought",
 "payload":"Shower_Thought",
},
{
 "content_type":"text",
 "title":"Jokes",
 "payload":"Jokes",
}]

Another interesting feature of the code is how we determine whether a post is a text, image or a video post. In the GetMotivated subreddit some images don’t have a “.jpg” or “.png” in their url so we rely on

submission.link_flair_css_class == 'image'

This way we are able to select even those posts which do not have a known image extension in the url.

You might have noticed this bit of code in the app.py file:

payload = "http://imgur.com/WeyNGtQ.jpg"

It makes sure that if no new posts are found for a particular user (every subreddit has a maximum number of “hot” posts) we have at least something to return. Otherwise we will get a variable undeclared error.

Create if the User doesn’t exist:

The following function checks whether a user with the particular name exists or not. If it exists it selects that user from the db and returns it. In case it doesn’t exist (user), it creates it and then returns that newly created user:

myUser = get_or_create(db.session, Users, name=recipient)
...

def get_or_create(session, model, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance
    else:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
        return instance

I hope you guys enjoyed the post. Please comment below if you have any questions. I am also starting premium advertising on the blog. This will either be in the form of sponsored posts or blog sponsorship for a particular time. I am still fleshing out the details. If your company works with Python and wants to reach out to potential customers, please email me on yasoob (at) gmail.com.

Source: You can get the code from GitHub as well.

↧

Yasoob Khalid: Recovering lost Python source code if it’s still resident in-memory

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: Importing with ctypes in Python: fighting overflows

≪ Previous: Yasoob Khalid: Making a Reddit + Facebook Messenger Bot

I read this on GitHub Gist the other day. I don’t know whether I will ever use it but I am still putting this on my blog for the sake of bookmarking it. Who knows? Someone from the audience might end up using it!

I screwed up using git (“git checkout –” on the wrong file) and managed to delete the code I had just written… but it was still running in a process in a docker container. Here’s how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb

Install pyrasite – this will let you attach a Python shell to the still-running process

pip install pyrasite

Install uncompyle6, which will let you get Python source code back from in-memory code objects

pip install uncompyle6

Find the PID of the process that is still running

ps aux | grep python

Attach an interactive prompt using pyrasite

pyrasite-shell <PID>

Now you’re in an interactive prompt! Import the code you need to recover

>>> from my_package import my_module

Figure out which functions and classes you need to recover

>>> dir(my_module)
['MyClass', 'my_function']

Decompile the function into source code

>>> import uncompyle6
>>> import sys
>>> uncompyle6.main.uncompyle(
    2.7, my_module.my_function.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
function_body = "appears here"

For the class, you’ll need to decompile each method in turn

>>> uncompyle6.main.uncompyle(
    2.7, my_module.MyClass.my_method.im_func.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
class_method_body = "appears here"

I hope you guys like this post. Stay tuned for the next one in the upcoming days.

↧

Yasoob Khalid: Importing with ctypes in Python: fighting overflows

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: Your first talk

≪ Previous: Yasoob Khalid: Recovering lost Python source code if it’s still resident in-memory

Introduction

On some cold winter night, we’ve decided to refactor a few examples and tests for Python wrapper in Themis, because things have to be not only efficient and useful, but elegant as well. One thing after another, and we ended up revamping Themis error codes a bit.

Internal error and status flags sometimes get less attention than crypto-related code: they are internals for internal use. Problem is, when they fail, they might break something more crucial in a completely invisible way.

Since best mistakes are mistakes which are not just fixed, but properly analyzed, reflected and recorded, we wrote this small report on a completely boring matter: every edge and connection is a challenge. This story is a reflection on a typical issue: different people working on different layers of one large product, and then look around to wipe out the technical debt.

Strange tests behavior

Anytime we touch Themis wrapper code, we touch the tests because pesticide paradox in software development is no small problem.

It all started with Secure Comparator tests:

# test.pyfrom pythemis.scomparator import scomparator, SCOMPARATOR_CODES

secret = b'some secret'

alice = scomparator(secret)
bob = scomparator(secret)
data = alice.begin_compare()

while (alice.result() == SCOMPARATOR_CODES.NOT_READY and
               bob.result() == SCOMPARATOR_CODES.NOT_READY):
    data = alice.proceed_compare(bob.proceed_compare(data))

assert alice.result() != SCOMPARATOR_CODES.NOT_MATCH
assert bob.result() != SCOMPARATOR_CODES.NOT_MATCH

This test attempts to run Secure Comparator with a constant secret, this way making sure that comparison ends in a positive result (flag is called SCOMPARATOR_CODES.MATCH). If the secret is matched, tests should finish with success.

Secure Comparator can result in neither SCOMPARATOR_CODES.NOT_MATCH or SCOMPARATOR_CODES.MATCH.

But why the assert has to be just a negative comparison if we’re testing a feature with boolean state? Checking against non-equality of NOT_MATCH does not automatically mean it matches.

The first reaction is obviously to see if it even works (via example code). It did.

Here, the verification code tests for equality, thankfully:

if comparator.is_equal():
    print("match")
else:
    print("not match")

Fine, so the problem touches only tests. Let’s rewrite assert so that it compares scomparator.result() against SCOMPARATOR_CODES.MATCH correct expected state:

# test.py...

assert alice.result() == SCOMPARATOR_CODES.MATCH
assert bob.result() == SCOMPARATOR_CODES.MATCH

… and bump into unexpected error:

# python test.py

Traceback (most recent call last):
  File "test.py", line23, in 
    assert alice.result() == SCOMPARATOR_CODES.MATCH
AssertionError

A routine fix of testing of absolutely working feature quickly turns into an interesting riddle. We’ve added variable output for debugging to see what’s really going on inside:

# test.py...

print('alice.result(): {}\nNOT_MATCH: {}\nMATCH: {}'.format(
    alice.result(),
    SCOMPARATOR_CODES.NOT_MATCH,
    SCOMPARATOR_CODES.MATCH
))
assert alice.result() == SCOMPARATOR_CODES.MATCH
assert bob.result() == SCOMPARATOR_CODES.MATCH

… and get the completely unexpected:

# python test.py

alice.result(): -252645136NOT_MATCH: -1MATCH:4042322160
Traceback (most recent call last):
  File "test.py", line 23, in 
    assert alice.result() == SCOMPARATOR_CODES.MATCH
AssertionError

How come?

>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)
...>>> import ctypes
>>> print(ctypes.sizeof(ctypes.c_int))
4

Even though OS, Python and Themis are 64bit, PyThemis wrapper is made using ctypes, which has 32-bit int type.

Accordingly, receiving 0xf0f0f0f0 from C Themis, ctypes expects a 32-bit number but 0xf0f0f0f0 is a negative number. Then Python attempts to convert it to an integer without any bit length limit, and literal 0xf0f0f0f0 (from SCOMPARATOR_CODES) turns into 4042322160.

This is strange. Let’s dive a bit into Themis:

src/soter/error.h:

// 

...
/** @brief return type */
typedef int soter_status_t;

/**
 * @addtogroup SOTER
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */#define SOTER_SUCCESS 0#define SOTER_FAIL   -1#define SOTER_INVALID_PARAMETER -2#define SOTER_NO_MEMORY -3#define SOTER_BUFFER_TOO_SMALL -4#define SOTER_DATA_CORRUPT -5#define SOTER_INVALID_SIGNATURE -6#define SOTER_NOT_SUPPORTED -7#define SOTER_ENGINE_FAIL -8

...

typedef int themis_status_t;

/**
 * @addtogroup THEMIS
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */#define THEMIS_SSESSION_SEND_OUTPUT_TO_PEER 1#define THEMIS_SUCCESS SOTER_SUCCESS#define THEMIS_FAIL   SOTER_FAIL#define THEMIS_INVALID_PARAMETER SOTER_INVALID_PARAMETER#define THEMIS_NO_MEMORY SOTER_NO_MEMORY#define THEMIS_BUFFER_TOO_SMALL SOTER_BUFFER_TOO_SMALL#define THEMIS_DATA_CORRUPT SOTER_DATA_CORRUPT#define THEMIS_INVALID_SIGNATURE SOTER_INVALID_SIGNATURE#define THEMIS_NOT_SUPPORTED SOTER_NOT_SUPPORTED#define THEMIS_SSESSION_KA_NOT_FINISHED -8#define THEMIS_SSESSION_TRANSPORT_ERROR -9#define THEMIS_SSESSION_GET_PUB_FOR_ID_CALLBACK_ERROR -10

src/themis/secure_comparator.h:

...#define THEMIS_SCOMPARE_MATCH 0xf0f0f0f0#define THEMIS_SCOMPARE_NO_MATCH THEMIS_FAIL#define THEMIS_SCOMPARE_NOT_READY 0...

themis_status_t secure_comparator_destroy(secure_comparator_t *comp_ctx);

themis_status_t secure_comparator_append_secret(secure_comparator_t *comp_ctx, const void *secret_data, size_t secret_data_length);

themis_status_t secure_comparator_begin_compare(secure_comparator_t *comp_ctx, void *compare_data, size_t *compare_data_length);
themis_status_t secure_comparator_proceed_compare(secure_comparator_t *comp_ctx, const void *peer_compare_data, size_t peer_compare_data_length, void *compare_data, size_t *compare_data_length);

themis_status_t secure_comparator_get_result(const secure_comparator_t *comp_ctx);

Now let’s see PyThemis side at src/wrappers/themis/python/pythemis/exception.py.

All values here correspond to C code, numbers are small and fit any bit length limits:

from enum import IntEnum

class THEMIS_CODES(IntEnum):
    NETWORK_ERROR = -2222
    BUFFER_TOO_SMALL = -4
    FAIL = -1
    SUCCESS = 0
    SEND_AS_IS = 1...

What about Secure Comparator part? Looking at the src/wrappers/themis/python/pythemis/scomparator.py, we see that overall values are fine, but Comparator’s value for SCOMPARATOR_CODES.MATCH is problematic and becomes negative in 32-bit int:

...

class SCOMPARATOR_CODES(IntEnum):
    MATCH = 0xf0f0f0f0
    NOT_MATCH = THEMIS_CODES.FAIL
    NOT_READY = 0...

If we cast it to signed 4 byte number, we receive -252645136 where we expect 4042322160.

So the problem is on the seams between C and Python, where our code 0xf0f0f0f0 gets misinterpreted.

Possible solutions

The whole problem is a minor offense, easy to fix with a clutch, but the whole endeavor was to eliminate technical debt, not create more of it.

Option 1. Add strong type casting when importing variables via ctypes:

Extremely simple clutch. Since we know how ctypes acts in this case, we can explicitly make code perceive it as unsigned, then 0xf0f0f0f0 as int64_t will be equal to the interpretation of uint32_t. To do that, we would simply:

Add either

themis.secure_comparator_get_result.restype = ctypes.c_int64

themis.secure_comparator_get_result.restype = ctypes.c_uint

into src/wrappers/themis/python/pythemis/scomparator.py.

But that looks a bit like an ugly clutch, which additionally requires verifying the correctness of ctypes behavior on 32-bit machine with 32-bit Python.

Option 2. Change from one byte representation to another:

Hack number two. Remove implicit interpretation of hex literal 0xf0f0f0f0 and just give it the right value, in this context -252645136. This will fix the problem in Python wrapper, but we still will need additional verification on a 32bit system and keep an eye on it in future.

Not an option if you can avoid it.

Option 3. Refactor all statuses in C library, never use negative numbers or values near type maximums to avoid overflows.

The easiest would be the second option: since it’s one such error in one wrapper, why even bother? Fix it right away and forget about it. But having problems even once is sometimes enough to see a need for certain standardisation.

We took the third path, and re-thought the principle behind status flags a bit:

Never use negative numbers, because -1 in 32bit is 0xffffffff, in 64bit is 0xffffffffffffffff and one can easily hit into overflow quite soon.
Use small positive numbers for error codes and statuses. Since Themis is supposed to work across many architectures and (theoretically), there might be a weird 9bit kitchen sink processor (they do need more robots to join DoS armies, so have our word, it will happen sooner or later), we decided to limit flag length with (0..127).
In Themis part, which is directly facing the wrappers, we’ve changed ints to explicit int32_t.

Since changing error code system in C library affects all wrappers, and their error codes should be adjusted accordingly, we’ve decided to get error codes from C code directly via variable export where possible (Go, NodeJS, Java, PHP).

After refactoring, error codes in Themis started to look like:

src/soter/error.h:

...

/** @brief return type */
typedef int soter_status_t;

/**
 * @addtogroup SOTER
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */

#define SOTER_SUCCESS 0//success code

//error codes
#define SOTER_FAIL          11#define SOTER_INVALID_PARAMETER     12#define SOTER_NO_MEMORY         13#define SOTER_BUFFER_TOO_SMALL      14#define SOTER_DATA_CORRUPT      15#define SOTER_INVALID_SIGNATURE     16#define SOTER_NOT_SUPPORTED         17#define SOTER_ENGINE_FAIL       18...

/** @brief return type */
typedef int32_t themis_status_t;

/**
 * @addtogroup THEMIS
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */

//
#define THEMIS_SUCCESS              SOTER_SUCCESS#define THEMIS_SSESSION_SEND_OUTPUT_TO_PEER     1

//errors
#define THEMIS_FAIL                     SOTER_FAIL#define THEMIS_INVALID_PARAMETER            SOTER_INVALID_PARAMETER#define THEMIS_NO_MEMORY                SOTER_NO_MEMORY#define THEMIS_BUFFER_TOO_SMALL             SOTER_BUFFER_TOO_SMALL#define THEMIS_DATA_CORRUPT                 SOTER_DATA_CORRUPT#define THEMIS_INVALID_SIGNATURE            SOTER_INVALID_SIGNATURE#define THEMIS_NOT_SUPPORTED                SOTER_NOT_SUPPORTED#define THEMIS_SSESSION_KA_NOT_FINISHED         19#define THEMIS_SSESSION_TRANSPORT_ERROR         20#define THEMIS_SSESSION_GET_PUB_FOR_ID_CALLBACK_ERROR   21#define THEMIS_SCOMPARE_SEND_OUTPUT_TO_PEER         THEMIS_SSESSION_SEND_OUTPUT_TO_PEER...

src/themis/secure_comparator.h:

...#define THEMIS_SCOMPARE_MATCH       21#define THEMIS_SCOMPARE_NO_MATCH    22#define THEMIS_SCOMPARE_NOT_READY   0...

… and, accordingly, in PyThemis:

...

class THEMIS_CODES(IntEnum):
    NETWORK_ERROR = 2222
    BUFFER_TOO_SMALL = 14
    FAIL = 11
    SUCCESS = 0
    SEND_AS_IS = 1...

Note: NETWORK_ERROR is PyThemis specific and is not used in C part, so we kept it the way it was.

src/wrappers/themis/python/pythemis/scomparator.py:

...

class SCOMPARATOR_CODES(IntEnum):
    MATCH = 21
    NOT_MATCH = 22
    NOT_READY = 0...

For example, this is how direct importing of these flags in Go works:

gothemis/compare/compare.go:

package compare

/*
#cgo LDFLAGS: -lthemis -lsoter...

const int GOTHEMIS_SCOMPARE_MATCH = THEMIS_SCOMPARE_MATCH;
const int GOTHEMIS_SCOMPARE_NO_MATCH = THEMIS_SCOMPARE_NO_MATCH;
const int GOTHEMIS_SCOMPARE_NOT_READY = THEMIS_SCOMPARE_NOT_READY;
*/
import "C"
import (
    "github.com/cossacklabs/themis/gothemis/errors""runtime""unsafe"
)

var (
    COMPARE_MATCH = int(C.GOTHEMIS_SCOMPARE_MATCH)
    COMPARE_NO_MATCH = int(C.GOTHEMIS_SCOMPARE_NO_MATCH)
    COMPARE_NOT_READY = int(C.GOTHEMIS_SCOMPARE_NOT_READY)
)

...

Results

After fixing and refactoring, the new scomparator class looks like:

classSComparator(object):# the same
....

    defis_compared(self):returnnot (themis.secure_comparator_get_result(self.comparator_ctx) ==
                    SCOMPARATOR_CODES.NOT_READY)

    defis_equal(self):return (themis.secure_comparator_get_result(self.comparator_ctx) ==
                SCOMPARATOR_CODES.MATCH)

And the new test code, finally refactored to a decent look:

import unittest

from pythemis import scomparator

classSComparatorTest(unittest.TestCase):defsetUp(self):
        self.message = b"This is test message"
        self.message1 = b"This is test message2"deftestComparation(self):
        alice = scomparator.SComparator(self.message)
        bob = scomparator.SComparator(self.message)
        data = alice.begin_compare()
        whilenot (alice.is_compared() and bob.is_compared()):
            data = alice.proceed_compare(bob.proceed_compare(data))
        self.assertTrue(alice.is_equal())
        self.assertTrue(bob.is_equal())

    deftestComparation2(self):
        alice = scomparator.SComparator(self.message)
        bob = scomparator.SComparator(self.message1)
        data = alice.begin_compare()
        whilenot (alice.is_compared() and bob.is_compared()):
            data = alice.proceed_compare(bob.proceed_compare(data))
        self.assertFalse(alice.is_equal())
        self.assertFalse(bob.is_equal())    

# python scomparator_test.py 
..
----------------------------------------------------------------------
Ran 2 tests in0.064s

OK

Conclusions

We love taking the time exploring minor, boring, trivial matters. Apart from willing to give everybody a better Themis experience, we use it every day to build different tools and would like to be extremely confident that behind a nice API, which isolates all implementation details we might accidently break, the implementations are correct.

As with any bug, most of the conclusions sound like coming from the gods of copybook headings, once you know them:

Use types of explicit sizes (int16_t, int32_t, int8_t) to be less dependent of user architectures.
Watch for type overflows in signed types.
Try to explicitly test all possible return status flags in tests.
!false is true only in boolean representation. Once you encode it in numbers, don’t rely on one-sided evaluation. If you’re comparing ints, which represent the two states,- there can be a million reasons why !false is actually kittens, not true. Two mutually exclusive states do not mean your system will not generate N-2 more states because of some error.

Note: This post was written by the people at Cossack Labs. The original post is available here.

↧

Yasoob Khalid: Your first talk

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: Interesting Python Tutorials

≪ Previous: Yasoob Khalid: Importing with ctypes in Python: fighting overflows

Hi there folks. It’s been a long time since I wrote on this blog. I have been very busy with university applications. A lot has happened recently which I will love to share with you. Firstly, I got a news from a friend that my book is being used in McGill University to teach Python programming. That is something I have always wanted, write a book which is used by well-known universities to teach programming. But this post is not about that. I wanted to share how to deliver a good first talk. People dread sharing their first talk with the world because it is mostly filled with “aaahhh”s and fast paced speech but I am sharing mine so that you guys know what “not” to do during your first talk.

1. Keep the slides short

We fail to realize the importance of this. Let me share the theory behind it. When you display a slide with everything written on it people will be able to read it much more quickly than you will be able to explain it. This makes your talk boring because the audience has to listen to everything which they have already read on the slide. A good rule of thumb is to write only one or two short phrases. Don’t write their explanation on the slide. Write those explanations in notes.

2. Use simple transitions

This is also important, especially for tech talks. People are there to look at the quality of content in your presentation not to look at super cool transitions. The major issue with using elaborate transitions is that they take up a lot of time of your talk which could have been used to present a couple of more interesting ideas. Another issue is that they take the focus away from you and the content of your presentation. We need to retain as much attention on you and the content as possible. Hence a short and simple sliding transition is all you need. However, if you really want to use fancy stuff try to keep it brief so that it doesn’t overshadow your main presentation.

3. Speak slowly

In my talk you can clearly see how fast I am speaking. It was due to nervousness. I can safely say that I speak much more slowly now and it is much more easy to understand whatever I say. Make sure that you speak slowly during your talk. Most first-time speakers don’t discuss this because they feel they have already got this under control. However, I have seen a lot of first-time speakers making the same mistake. Make sure that you pace yourself and speak each word clearly. The pace of your speech can make or break your talk.

4. Vary your tone

At PyCons, specifically, the normal talk size is 30-35 minutes. If you keep your tone same throughout it becomes monotonous. You will loose people’s attention very quickly and in extreme cases it might lead to people abandoning your talk.

5. Don’t Code Live in front of an Audience

Never ever code live if it is within your power. It is because a lot of unforeseen issues can crop up on the day of the presentation. You might become nervous. You might not be able to type as quickly as you want to while standing. You might make typing errors. It is much better to prevent this issue altogether by not coding live.

That’s all for today. If you have some other tips for new speakers please do share them in the comments below. I love hearing from my readers.

↧

Yasoob Khalid: Interesting Python Tutorials

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: Intermediate Python conquers the World! (Almost)

≪ Previous: Yasoob Khalid: Your first talk

Hi there folks! I have read some interesting Python tutorials lately. I would love to share them with you. Without any further ado let me list them over here:

1. Composing Music With Recurrent Neural Networks

I loved this tutorial. It is a bit old but still worth a read. The author has explained the theory behind his implementation. You will enjoy this tutorial if you are interested in signal processing, machine learning and/or music.

2. Page dewarping using OpenCV

This was an interesting read. I am not well versed in computer vision but still loved to read the theory behind the dewarping of an image of a curled page. The author does a great job at explaining the whole process and the algorithms used.

3. 10 Interesting Python Modules to learn in 2016

This is a good compilation of some of the famous Python libraries and modules. I have personally used almost all of them. I am linking this here because not only does this article lists the modules but because it also provides sample code for the specific module being discussed.

4. Modern face recognition with Deep Learning

This article shows how modern face recognition works. The author takes you from isolating a face from an image to predicting which person does that face belong to. I learned a lot of new stuff from this tutorial. For instance, I had no idea what the HOG algorithm did before I read this tutorial.

5. How to score 0.8134 in Titanic Kaggle Challenge

This was a highly informative read. I learned the basic workflow of a data scientist. The author does a great job of teaching you the basics of data science. He starts from exploratory data analysis of the data-set and ends with hyper parameter tuning of his predictive models.

I hope you will enjoy reading these articles. If there is anything you would like to ask me just know that I am only an email away. I reply to most of the emails I get. Even if you want to discuss any freelancing opportunity just hit me up. This is my email.

Till next time!

↧

Yasoob Khalid: Intermediate Python conquers the World! (Almost)

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: Support me on Patreon

≪ Previous: Yasoob Khalid: Interesting Python Tutorials

Hi there folks! I hope you are all fine. It’s been almost a year since I published Intermediate Python. It was my life goal to publish a book which really helps people. Today I saw the stats of the book after a long time. I was pretty ecstatic to know that the English version (It is also available in Chinese and Russian) of the book has been read in 181 countries. Just 15 countries short of the whole world.It is also being used in various institutes as training material for the programmers. If you have ever read this book and can spare two minutes of your valuable time then I would love to hear your feedback. The length of the feedback can range from one word to a whole page. You can submit your feedback in the comments below or direct it to my email.

Getting to know that my work really helps people motivates me to work harder and do more awesome stuff.

I hope to hear soon from you guys! Stay happy and stay blessed

↧

Yasoob Khalid: Support me on Patreon

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: 400+ Free Resources for DevOps & Sysadmins

≪ Previous: Yasoob Khalid: Intermediate Python conquers the World! (Almost)

Hi there folks! I have been writing regular blog-posts since 2013. I have been documenting my Python journey since then. Almost every new thing which I learn finds it’s way to the blog.

I haven’t only been writing blog-posts but I also publish a weekly newsletter and have also published a widely read book (Intermediate Python). Over the years I haven’t monetized. I have tried to keep everything free so that the maximum number of people can benefit from my work. Even my book was published under creative commons. It has been translated into Chinese and Russian as well.

Now it is becoming increasingly difficult for me to continue churning out great content. It takes time and effort. I hope that this Patreon campaign would help to displace most of the costs associated with my work and encourage me to continue posting worth-reading articles. The costs include:

Website hosting
Domain renewals
Personal meals

If you enjoy my work and would love to see me continue producing great work in the future then please support me. Every little bit counts!

If you feel that the rewards on the campaign page can be better formalized then please let me know. I would be more than happy to incorporate your suggestion.

If you have any questions then comment below. I will make sure to reply each and every one of you.

Link to Campaign

↧

Yasoob Khalid: 400+ Free Resources for DevOps & Sysadmins

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: This Month I Inspired 40 Teens to Start Programming

≪ Previous: Yasoob Khalid: Support me on Patreon

As an Python advocate and educator, I’m always looking for ways to make my job (and yours) easier. This list put together by Morpheus Data offers a ton of great resources for Python users (more than 25 tools specific to Python) and other DevOps and Sysadmins. Enjoy.

Source Code Repos

bitbucket.org— Free unlimited public and private repos (Git and Mercurial) for up to 5 users
chiselapp.com— Unlimited public and private Fossil repositories
github.com— Free for an unlimited number of public repositories
about.gitlab.com— Unlimited public and private Git repos with unlimited collaborators
hub.jazz.net— Unlimited public repos, private repos free for up to 3 accounts
visualstudio.com— Free unlimited private repos (Git and TFS) for up to 5 users per team
fogcreek.com— Free unlimited public and private repos (hybrid of Git and Mercurial) for 2 users
plasticscm.com— Free for individuals, OSS and nonprofits organizations
cloud.google.com— Free private Git repositories hosted on Google Cloud Platform. Supports syncing with existing GitHub and Bitbucket repos. Free Beta for up to 500 MB of storage

Tools for Teams and Collaboration

scinote.net— scientific data management & team collaboration. One Team with Unlimited number of users, backup and 1GB storage space
appear.in— One click video conversations, for free
flowdock.com— Chat and inbox, free for teams up to 5
slack.com— Free for unlimited users with some feature limitations
hipchat.com— Free for unlimited users with some feature limitations
gitter.im— Chat, for GitHub. Unlimited public & private rooms, free for teams of up to 25
hangouts.google.com— One place for all your Conversations, for free, need a Google account
seafile.com— Private or cloud storage, file sharing, sync, discussions. Private version is full. Cloud version has just 1 GB
sameroom.io— Free for unlimited users with some feature limitations
yammer.com— Private social network standalone or for MS Office 365. Free, just a bit less admin tools and users management features
helpmonks.com— Shared inbox for teams, free for Open Source and nonprofit organizations
typetalk.in— Share and discuss ideas with your team through instant messaging on the web or on your mobile
talky.io— Free group video chat. Anonymous. Peer‑to‑peer. No plugins, signup, or payment required
sourcetalk.net— Code discussion tool, free for open code talks
helplightning.com— Help over video with augmented reality. Free without analytics, encryption, support
evernote.com— Tool for organizing information. Share your notes and work together with others
wunderlist.com— Share your lists and work collaboratively on projects with your colleagues, free on iPhone, iPad, Mac, Android, Windows and the web
doodle.com— The scheduling tool you’ll actually use. Find a date for a meeting two times faster
sendtoinc.com— Share links, notes, files and have discussions. Free for 3 and 100 MB
zoom.us— Secure Video and Web conferencing, add-ons available. Free limited to 40 minutes
ideascale.com— Allow clients to submit ideas and vote, free for 25 members in 1 community
filehero.io— Make it easy to access your company’s file storage from a corporate download page. Free for 5 concurrent downloads
wistia.com— Video hosting with viewer analytics, HD video delivery, and marketing tools to help understand your visitors, 25 videos and Wistia branded player
cnverg.com— Real-time shared visual workspace, whiteboard, GitHub integration. Free 5 GB, 5 spaces and 5 collaborators, no GitHub repos

Code Quality

tachikoma.io— Dependency Update for Ruby, Node.js, Perl projects, free for Open Source
gemnasium.com— Dependency Update for Ruby, Node.js projects, free for Open Source
deppbot.com— Automated Dependency Updates for Ruby projects, free for Open Source
landscape.io— Code Quality for Python projects, free for Open Source
codeclimate.com— Automated code review, free for Open Source
houndci.com— Comments on GitHub commits about code quality, free for Open Source
coveralls.io— Display test coverage reports, free for Open Source
scrutinizer-ci.com— Continuous inspection platform, free for Open Source
codecov.io— Code coverage tool (SaaS), free for Open Source
insight.sensiolabs.com— Code Quality for PHP/Symfony projects, free for Open Source
codacy.com— Automated code reviews for PHP, Python, Ruby, Java, JavaScript, Scala, CSS and CoffeeScript, free for Open Source
pullreview.com— Automated Code Review for Ruby in GitHub, Bitbucket and GitLab, free for Open Source
gocover.io— Code coverage for any Go package
goreportcard.com/— Code Quality for Go projects, free for Open Source
inch-ci.org— Documentation badges for Ruby, JS and Elixir
scan.coverity.com— Static code analysis for Java, C/C++, C# and JavaScript, free for Open Source
webceo.com— SEO tools but with also code verifications and different type of advices
zoompf.com— Fix the performance of your web sites, detailed analysis
websitetest.com— Yotta’s tool to optimize web sites, free limited version online
gtmetrix.com— Reports and thorough recommendations to optimize websites
browserling.com— Live interactive cross-browser testing, free only 3 min. sessions with MS IE 9 under Vista at 1024 x 768 resolution
loadfocus.com— Load and speed tests for websites, mobile apps and APIs, monitoring,… Free 5 tests/month, 120 clients/test, 1 monitor, 1 location,…
versioneye.com— Monitor your source code and notify about outdated dependencies. Free for Open Source and public repos
beanstalkapp.com— A complete workflow to write, review & deploy code), free account for 1 user and 1 repository, with 100 MB of storage
testanywhere.co— Automatic test website or web app continuously and catch bugs in the early stages, free 1,000 tests/month
srcclr.com— SourceClear to scan source code for vulnerabilities, multi-languages and OS

Code Search and Browsing

sourcegraph.com— Java, Go, Python, Node.js, etc., code search/cross-references, free for Open Source
searchcode.com— Comprehensive text-based code search, free for Open Source

CI / CD

codeship.com— 100 private builds/month, 5 private projects, unlimited for Open Source
circleci.com— Free for one concurrent build
travis-ci.org— Free for public GitHub repositories
wercker.com— Free for public and private repositories
drone.io— CI platform that includes browser testing, free for Open Source
semaphoreci.com— 100 private builds/month, unlimited for Open Source
shippable.com— Free for 1 build container, private and public repos, unlimited builds
snap-ci.com— Free for public repositories, 1 build at the time
appveyor.com— CD service for Windows, free for Open Source
github.com— Comparison of Continuous Integration services
ftploy.com— 1 project with unlimited deployments
deployhq.com— 1 project with 10 daily deployments
hub.jazz.net— 60 minutes of free build time/month
styleci.io— Public GitHub repositories only
bitrise.io— iOS CI/CD with 200 free builds/month
saucelabs.com— CI with scalable testing for mobile and web apps, free for Open Source
buddybuild.com— Build, deploy and gather feedback for your iOS and Android apps in one seamless, iterative system.

Automated Browser Testing

gridlastic.com— Selenium Grid testing with free plan up to 4 simultaneous selenium nodes/10 grid starts/4,000 test minutes per month
browserstack.com— Manual and automated browser testing, free for Open Source
EveryStep-Automation.com— Records and replays all steps made in a web browser and creates scripts,… free with fewer options

Security and PKI

threatconnect.com— Threat intelligence: It is designed for individual researchers, analysts, and organizations who are starting to learn about cyber threat intelligence. Free upto 3 Users
crypteron.com— Cloud-first, developer-friendly security platform prevents data breaches in .NET and Java applications
snyk.io— Snyk found and reported several vulnerabilities in the package.Limited to 1 private project (unlimited for open source projects)
vaddy.net— Continuous web security testing with continuous integration (CI) tools. 3 domains, 10 scans history for free
letsencrypt.org— Free SSL Certificate Authority with certs trusted by all major browsers
globalsign.com— Free SSL certificates for Open Source
startssl.com— Free SSL certs
wosign.com— Free SSL certs. Up to 5 domain names for 2 years period. China authority
soclall.com— Free up to 1,000 users login, post, share through top 20+ social networks
stormpath.com— Free user management, authentication, social login, and SSO
auth0.com— Hosted free for development SSO
getclef.com— New take on auth unlimited free tier for anyone not using premium features
ringcaptcha.com— Tools to use phone number as id, available for free
ssllabs.com— Very deep analysis of the configuration of any SSL web server
qualys.com— Find web app vulnerabilities, audit for OWASP Risks
alienvault.com— Uncovers compromised systems in your network
duo.com— Two-factor authentication (2FA) for website or app. Free 10 users, all authentication methods, unlimited, integrations, hardware tokens
tinfoilsecurity.com— Automated vulnerability scanning. Free plan allows weekly XSS scans
acunetix.com— Free vulnerability and network scanning for 3 targets
ponycheckup.com— An automated security checkup tool for Django websites
foxpass.com— Hosted LDAP and RADIUS. Easy per-user logins to servers, VPNs, and wireless networks. Free for 10 users
opswatgears.com— Security Monitoring of computers, devices, applications, configurations,… Free 25 users and 30 days history
bitninja.io— Botnet protection through a blacklist, free plan only reports limited information on each attack
onelogin.com— Identity as a Service (IDaaS), Single Sign-On Identity Provider, Cloud SSO IdP, 3 company apps and 5 personal apps, unlimited users
logintc.com— Two-factor authentication (2FA) by push notifications, free for 10 users, VPN, Websites and SSH
report-uri.io— CSP and HPKP violation reporting

Management System

bitnami.com— Deploy prepared apps on IaaS. Management of 1 AWS micro instance free
visualops.io— 3,600 instance hours/month free

Log Management

papertrailapp.com— 48 hours search, 7 days archive, 100 MB/month
logentries.com— Free up to 5 GB/month with 7 days retention
loggly.com— Free for a single user, see the lite option
sematext.com— Free up to 500 MB/day, 7 days retention
sumologic.com— Free up to 500 MB/day, 7 days retention
logz.io— Free up to 1 GB/day, 3 days retention

Translation Management

lingohub.com— Free up to 3 users, always free for Open Source
getlocalization.com— Free for public projects
webtranslateit.com— Free up to 500 strings
transifex.com— Free for Open Source
oneskyapp.com— Limited free edition for up to 5 users, free for Open Source
crowdin.com— Unlimited projects, unlimited strings and collaborators for Open Source

Monitoring

opbeat.com— Instant performance insights for JS developers. Free with 24 hours data retention
checkmy.ws— Free 15 days full demo and 3 websites, forever free for Open Source
appneta.com— Free with 1 hour data retention
thousandeyes.com— Network and user experience monitoring. 3 locations, plus 20 data feeds of major web services free
datadoghq.com— Free for up to 5 nodes
stackdriver.com— Free monitoring up to 10 servers/hosted services
keymetrics.io— Free for 2 servers with 7 days data retention
newrelic.com— Free with 24 hours data retention
nodequery.com— Free basic server monitor up to 10 servers
watchsumo.com— Free website monitoring, 50 Http(s), Ping or keywords, every 5+ minutes
opsgenie.com— Alert management with mobile push. 600 free alerts/month for 2 users
runscope.com— Monitor and log API usage. Single user 25,000 requests/month free
circonus.com— Free for 20 metrics
uptimerobot.com— Website monitoring, 50 monitors free
statuscake.com— Website monitoring, unlimited tests free with limitations
bmc.com— Free 1 second resolution for up to 10 servers
ghostinspector.com— Free website and web application monitoring. Single user, 100 test runs/month
java-monitor.com— Free monitoring of JVM’s and uptime
sematext.com— Free for 24 hours metrics, unlimited number of servers, 10 custom metrics, 500 K custom metrics data points, unlimited dashboards, users, etc
sealion.com— Free up to 2 servers, 3 days data retention, graphs and raw command output history (top, ps, ifconfig, netstat, iostat, free, custom, etc.)
stathat.com— Get started with 10 stats for free, no expiration
skylight.io— Free for first 100 K requests (Rails only)
appdynamics.com— Free for 24 hours metrics, application performance management agents limited to one Java, one .NET, one PHP, and one Node.js
deadmanssnitch.com— Monitoring for cron jobs. 1 free snitch (monitor), more if you refer others to sign up
librato.com— Free up to 100 metrics at 60 seconds resolution
freeboard.io— Free for public projects. Dashboards for your Internet of Things projects
loader.io— Free load testing tools with limitations
speedchecker.xyz— Performance Monitoring API, checks Ping, DNS, etc
blackfire.io— Blackfire is the SaaS-delivered Application Performance Solution. Free Hacker plan (PHP Only)
apimetrics.io— Automated API Performance Monitoring, Testing and Analytics. Free Plan, manually make API calls and Run from their West Coast servers
opsdash.com— Self-hoster server, clusters and services monitoring, free for 5 servers and 5 services

Crash and Exception Handling

rollbar.com— Exception and error monitoring, free plan with 5,000 errors/month, unlimited users, 30 days retention
bugsnag.com— Free for up to 2,000 errors/month after the initial trial
getsentry.com— Sentry tracks app exceptions in realtime, has a small free plan. Free, unrestricted use if self-hosted

Search

algolia.com— Hosted search-as-you-type (instant). Free hacker plan up to 10,000 documents and 100,000 operations. Bigger free plans available for community/Open Source projects
swiftype.com— Hosted search solution (API and crawler). Free for a single search engine with up to 1,000 documents. Free upgrade to Premium level for Open Source
bonsai.io— Free 1 GB memory and 1 GB storage
searchly.com— Free 2 indices and 5 MB storage
facetflow.com— Hosted Elasticsearch for Microsoft Azure. Free 5,000 docs and 500 MB
indexisto.com— Site search reinvented. Free 10 million document index limit with advertisement block

Email

mailinator.com— Mailinator is Free, Public, Email system where you can use ANY inbox you want! … Disposable Email.
sparkpost.com— First 100,000 emails/month are free
mailgun.com— First 10,000 emails/month are free
tinyletter.com— 5,000 subscribers/month are free
mailchimp.com— 2,000 subscribers and 12,000 emails/month are free
sendloop.com— 2,000 subscribers and 10,000 emails/month are free
sendgrid.com— 400 emails/day for free and 25,000 free transactional emails/month for emails sent from a Google compute instance or Microsoft Azure App Service
phplist.com— Hosted version allow 300 emails/month for free
mailjet.com— 6,000 emails/month for free
sendinblue.com— 9,000 emails/month for free
mailtrap.io— Fake SMTP server for development, free plan with 1 inbox, 50 messages, no team member, 2 emails/second, no forward rules
mailstache.io— 4 mailboxes with 1 GB each for up to 2 custom domains
postmarkapp.com— First 25,000 emails are free
zoho.com— Free email management and collaboration for up to 10 users
domain.yandex.com— Free email and DNS hosting for up to 1,000 users
pawnmail.com— 2 GB free email hosting across unlimited users for custom domain. Roundcube webmail, POP3, IMAP, and SMTP access. No paid plans or upgrades
moosend.com— Mailing list management service. Free account for 6 months for startups
debugmail.io— Easy to use testing mail server for developers
mailinator.com– Free public email for testing accounts
mailboxlayer.com— Email validation and verification JSON API for developers. 1,000 free API requests/month
mailcatcher.me— Catches mail and serves it through a web interface
yopmail.fr— Disposable email addresses
kickbox.io— Verify 100 emails free, real time API available
inumbo.com— SMTP based spam filter, free for 10 users
biz.mail.ru— 5,000 mailboxes with 25 GB each per custom domain with DNS hosting
maildocker.com— First 10,000 emails/month are free
sendpulse.com— 50 emails free/hour, first 12,000 emails/month are free
pepipost.com— Unlimited emails free for first three months, then first 25,000 emails/month are free

CDN and Protection

kloudsec.com— Minimal CDN platform targeted at programmers. CDN is free. Optional and free plugins include Page Optimization (Pagespeed), Service Doctor (Website performance analytics and alerts) and One-click Encryption (Auto provision/renew LetsEncrypt certs for HTTPS)
cloudflare.com— Basic service is free, good for a blog, Cloudflare also offers a free SSL certificate service
bootstrapcdn.com— CDN for bootstrap, bootswatch and font awesome
surge.sh— Single–command, bring your own source control web publishing CDN
cdnjs.com— CDN for JavaScript libraries, CSS libraries, SWF, images, etc
jsdelivr.com— Super-fast CDN of OSS (JS, CSS, fonts) for developers and webmasters, accepts PRs to add more
developers.google.com— The Google Hosted Libraries is a content distribution network for the most popular, Open Source JavaScript libraries
asp.net— The Microsoft Ajax CDN hosts popular third party JavaScript libraries such as jQuery and enables you to easily add them to your Web application
toranproxy.com— Proxy for Packagist and GitHub. Never fail CD. Free for personal use, 1 developer, no support
rawgit.com— Free limited traffic, serves raw files directly from GitHub with proper Content-Type headers
incapsula.com— Free CDN and DDoS protection
fastly.com— Free CDN, all features until USD 50/month is reached, enough for most, then pay or suspended
athenalayer.com— Free DDoS protection with unlimited websites
section.io— A simple way to spin up and manage a complete Varnish Cache solution. Supposedly free forever for one site
netdepot.com— First 100 GB free/month
dropigee.com— Dropigee provides CDN + Cloud Storage, get 2 GB of bandwidth and unlimited storage free per month

PaaS

cloud.google.com— Google App Engine gives 28 instance hours/day free, 1 GB NoSQL database and more
engineyard.com— Engine Yard provides 500 free hours
azure.microsoft.com— MS Azure gives USD 200 worth of free usage for a trial
appharbor.com— A .Net PaaS that provides 1 free worker
shellycloud.com— Platform for hosting Ruby and Ruby on Rails apps, €20 of free credit
heroku.com— Host your apps in the cloud, free for single process apps
firebase.com— Build realtime apps, free plan has 100 max. connections, 10 GB data transfer, 1 GB data storage, 1 GB hosting storage and 10 GB hosting transfer
bluemix.net— IBM PaaS with a monthly free allowance
openshift.com— Red Hat PaaS, free tier provides three small gears each with 512 MB memory and 1 GB storage. {Browse one-click deployments}
outsystems.com— Enterprise web development PaaS for on-premise or cloud, free “personal environment” offering allows for unlimited code and up to 1 GB database
platform.telerik.com— Build and deploy mobile applications using JavaScript. Free plan has 100 MB data storage, 1 GB file storage, 5 GB bandwidth, 1 million push notifications for BaaS offering, 100 active devices for analytics
scn.sap.com— The in-memory Platform-as-a-Service offering from SAP. Free developer accounts come with 1 GB structured, 1 GB unstructured, 1 GB of Git data and allow you to run HTML5, Java and HANA XS apps
mendix.com— Rapid Application Development for Enterprises, unlimited number of free sandbox environments supporting 10 users, 100 MB of files and 100 MB database storage each
pythonanywhere.com— Cloud Python app hosting. Beginner account is free, 1 Python web application at your-username.pythonanywhere.com domain, 512 MB private file storage, one MySQL database
configure.it— Mobile app development platform, free for 2 projects, limited features but no resource limits
elastx.com— Free tier with up to 4 cloudlets, must be renewed every year
pagodabox.io— Small worker, web server, cache, and database for free
cloudandheat.com— 128 MB of RAM for free, includes support for custom domains for free
zeit.co/now– Managed platform for Node.js deployments, featuring dynamic realtime scaling. Includes 20 free deploys/month limited to 1GB storage and 1GB bandwidth for OSS projects (source files are exposed on a public URL)
sandstorm.io– Sandstorm is an open source operating system for personal and private clouds. Free plan offers 200MB storage and 5 grains free.

BaaS

apigee.com— Unlimited trial includes NoSQL data store with 25 GB of storage, user and permission management, geolocation, 10 million push notifications/month, remote configuration, beta and A/B split testing, APM, fully API driven. Accessible and manageable via UI, SDK, and API
appacitive.com— Mobile backend, free for the first 3 months with 100 K API calls, push notifications
bip.io— A web-automation platform for easily connecting web services. Fully open GPLv3 to power the backend of your Open Source project. Commercial OEM License available
blockspring.com— Cloud functions. Free for 5 million runs/month
kinvey.com— Mobile backend, starter plan has unlimited requests/second, with 2 GB of data storage, as well as push notifications for up 5 million unique recipients. Enterprise application support
konacloud.io— Web and Mobile Backend as a Service, with 5 GB free account
layer.com— The full-stack building block for communications
quickblox.com— A communication backend for instant messaging, video and voice calling, and push notifications
pushbots.com— Push notification service. Free for up to 1.5 million pushes/month
dreamfactory.com— DreamFactory is an Open Source backend platform that provides all of the RESTful services you need to build fantastic mobile and web applications
onesignal.com— Unlimited free push notifications
getstream.io— Build scalable news feeds and activity streams in a few hours instead of weeks, free for 3 million feed updates/month
tyk.io— API management with authentication, quotas, monitoring, and analytics. Free cloud offering
iron.io— Async task processing (like AWS Lambda) with free tier and 1 month free trial
stackhut.com— Async task processing (like AWS Lambda). 10 free private services and unlimited free public services
pubnub.com— Free push notifications for up to 1 million messages/month and 100 active daily devices
webtask.io— Run code with an HTTP call. No provisioning. No deployment
zapier.com— Connect the apps you use, to automate tasks. 5 zaps, every 15 min. and 100 tasks/month
stackstorm.com— Event-driven automation for apps, services and workflows, free without flow, access control, LDAP,…
simperium.com— Move data everywhere instantly and automatically, multi-platform, unlimited sending and storage of structured data, max. 2,500 users/month
stamplay.com— Connect services together with a visual interface. 50 K API calls, 100 GB data transfer, and 1 GB storage for free

Web Hosting

closeheat.com— Development Environment in the Cloud for Static Websites with Free Hosting and GitHub integration. 1 free website with custom domain support
code.fosshub.com— is a free service offered by FossHub. Free hosting for Open Source projects.
sourceforge.net— Find, Create, and Publish Open Source software for free
simplybuilt.com— SimplyBuilt offers free website building and hosting for {Open Source projects}. Simple alternative to GitHub Pages
devport.co— Turn GitHub projects, apps, and websites into a personal developer portfolio
netlify.com— Builds, deploy and hosts static site or app, free for 100 MB data and 1 GB bandwidth
pantheon.io— Drupal and WordPress hosting, automated DevOps, and scalable infrastructure. Free for developers and agencies
acquia.com— Hosting for Drupal sites. Free tier for developers. Free development tools (such as Acquia Dev Desktop) also available
bitballoon.com— BitBalloon offers hosting for static sites and apps. Free on a subdomain
readthedocs.org— Free documentation hosting with versioning, PDF generation and more
bubble.is— Visual programming to build web and mobile apps without code, free 100 visitors/month, 2 apps
contentful.com— Content as a Service. Content management and delivery APIs in the cloud. 3 users, 3 spaces (repositories) and 100,000 API requests/month for free
tilda.cc— One site, 50 pages, 50 MB storage, only the main pre-defined blocks among 170+ available, no fonts, no favicon and no custom domain
pubstorm.com— Free static content hosting with global CDN and custom domain support. 10 free sites, each with 2 past revisions

DNS

freedns.afraid.org— Free DNS hosting
dns.he.net— Free DNS hosting service with Dynamic DNS Support
luadns.com— Free DNS hosting, 3 domains, all features with reasonable limits
domain.yandex.com— Free email and DNS hosting for up to 1,000 users
selectel.com– Free DNS hosting, anycast, 10 geo zones
cloudns.net— Free DNS hosting up to 3 domains with unlimited records
ns1.com— Data Driven DNS, automatic traffic management, 1 million free queries

IaaS

aws.amazon.com— AWS Free Tier, free for 12 months
exoscale.ch— Free resources for Open Source
developer.rackspace.com— Rackspace Cloud gives USD 50/month for 12 months
cloud.google.com/compute— Google Compute Engine gives USD 300 over 60 days
virtzone.net— Free VPS. You must meet certain minor qualifications
backblaze.com– Backblaze B2 cloud storage. Free 10GB (Amazon S3-like) object storage for unlimited time

DBaaS

cloudant.com— Hosted database from IBM, free if usage is below USD 50/month
orchestrate.io— 1 application free
redislabs.com— Redis as a Service, 30 MB and 30 concurrent connections free
backand.com— Back-end as a service for AngularJS
zenginehq.com— Build business workflow apps in minutes, free for single users
redsmin.com— Online real-time monitoring and administration service for Redis, 1 Redis instance free
graphstory.com— GraphStory offers Neo4j (a Graph Database) as a service
elephantsql.com— PostgreSQL as a service, 20 MB free
graphenedb.com— Neo4j as a service, up to 1,000 nodes and 10,000 relations free
mongolab.com— MongoDB as a service, 500 MB free
scalingo.com— Primarily a PaaS but offers a 512 MB free tier of MySQL, PostgreSQL, or MongoDB
skyvia.com— Cloud Data Platform, offers free tier and all plans are completely free while in beta
airtable.com— Looks like a spreadsheet, but it’s a relational database, unlimited bases, 1,200 rows/base and 1,000 API requests/month
fieldbook.com— Fieldbook lets anyone create a simple tracking database, as easily as a spreadsheet. Automatic API. Unlimited free sheets, share with unlimited users
iriscouch.com— CouchDB as a service. Free for developing, prototyping, etc

STUN, WebRTC, Web Socket Servers and Other Routers

pusher.com— Hosted Web Sockets broker. Free for up to 20 simultaneous connections and 100 K messages/day
stun:stun.l.google.com:19302 — Google STUN
stun:global.stun.twilio.com:3478?transport=udp — Twilio STUN
segment.com— Hub to translate and route events to other third party services. 100 K events/month free
ngrok.com— Expose locally running servers over a tunnel to a public URL
cloudamqp.com— RabbitMQ as a Service. Little Lemur plan: max 1 million messages/month, max 20 concurrent connections, max 100 queues, max 10,000 queued messages, multiple nodes in different AZ’s

Issue Tracking and Project Management

bitrix24.com— Free intranet and project management tool
pivotaltracker.com— Pivotal Tracker, free for public projects
atlassian.com— Free Jira etc for Open Source
kanbantool.com— Kanban board based project management. Free, paid plans with more options
kanbanflow.com— Board based project management. Free, premium version with more options
kanbanery.com— Board based project management. Free for 2 users, premium tiers with more options
zenhub.io— The only project management solution inside GitHub. Free for public repos, OSS, and nonprofits organizations
trello.com— Board based project management. Free
producteev.com— Task management tool. Free, premium version with more options. Mobile applications available
fogcreek.com— Bug tracking and project management. Free for 2 users
waffle.io— Board based project management solution from your existing GitHub Issues, free for Open Source
huboard.com— Instant project management for your GitHub issues, free for Open Source
taiga.io— Project management platform for startups and agile developers, free for Open Source
jetbrains.com— Free hosted YouTrack (InCloud) for FOSS projects, private projects {free for 10 users}
github.com— In addition to its Git storage facility, GitHub offers basic issue tracking
asana.com— Free for private project with collaborators
acunote.com— Free project management and SCRUM software for up to 5 team members
gliffy.com— Online diagrams: flowchart, UML, wireframe,… Also plugins for Jira & Confluence. 5 diagrams and 2 MB free
cacoo.com— Online diagrams in real time: flowchart, UML, network. Free max. 15 users/diagram, 25 sheets
draw.io— Online diagrams stored locally, in Google Drive, OneDrive or Dropbox. Free for all features and storage levels
hub.jazz.net— IBM Bluemix’s project management services. Free for public projects, free for up to 3 users for private projects
leankit.com— Kanban board, that visualizes your workflow. Free up to 10 users
visualstudio.com— Unlimited free private code repositories; Tracks bugs, work items, feedback and more
testlio.com— Issue tracking, test management and beta testing platform. Free for private use
vivifyscrum.com— Free tool for Agile project management. Scrum Compatible
targetprocess.com— Visual project management, from Kanban and Scrum to almost any operational process. Free for unlimited users, up to 1,000 data entities {more details}
overv.io— Agile project management for teams who love GitHub
taskulu.com— Role based project management. Free up to 5 users. Integration with GitHub/Trello/Dropbox/Google Drive
contriber.com— Customizable project management platform, free starter plan, 5 workspaces
planitpoker.com— Free online planning poker (estimation tool)

Storage and Media Processing

aerofs.com— P2P file syncing, free for up to 30 users
bintray.com— Binary File storage, free for Open Source. Includes SSL, CDN and a limited number of REST calls
cloudinary.com— Image upload, powerful manipulations, storage, and delivery for sites and apps, with libraries for Ruby, Python, Java, PHP, Objective-C and more. Perpetual free tier includes 7,500 images/month, 2 GB storage, 5 GB bandwidth
plot.ly— Graph and share your data. Free tier includes unlimited public files and 10 private files
transloadit.com— Handles file uploads & encoding of video, audio, images, documents. Free for Open Source & other do-gooders. Commercial applications get one GB free for test driving
podio.com— You can use Podio with a team of up to five people and try out the features of the Basic Plan, except users management
shrinkray.io— Free image optimization of GitHub repos
imagefly.io— Responsive images on-demand. CDN fronted image resizing, transcoding, and optimizing. 100 MB/month for free
kraken.io— Image optimization for website performance as a service, free plan up to 1 MB file size
placehold.it— A quick and simple image placeholder service
placekitten.com— A quick and simple service for getting pictures of kittens for use as placeholders
placepenguin.com— A quick and simple service for placeholder images of penguins
embed.ly— Provides APIs for embedding media in a webpage, responsive image scaling, extracting elements from a webpage. Free for up to 5,000 URLs/month at 15 requests/second
backhub.co— Backup and archive your GitHub repositories. Free for public repos
otixo.com— Encrypt, share, copy and move all your cloud storage files from one place. Basic plan provides unlimited files transfer with 250 MB max. file size and allows 5 encrypted files
tinypng.com— API to compress and resize PNG and JPEG images, offers 500 compressions for free each month
filestack.com— File picker, transform and deliver, free for 250 files, 500 transformations and 3 GB bandwidth
packagecloud.io– Hosted Package Repositories for YUM, APT, RubyGem, and PyPI. Limited free plans, open source plans available via request.

Design and UI

pixlr.com— Free online browser editor on the level of commercial ones
imagebin.ca— Pastebin for images
cloudconvert.com— Convert anything to anything. 208 supported formats including videos to gif
resizeappicon.com— A simple service to resize and manage your app icons
vectr.com— Free Design App For Web + Desktop
walkme.com— Enterprise Class Guidance and Engagement Platform, free plan 3 walk-thrus up to 5 steps/walk
marvelapp.com— Design, prototyping and collaboration, free limited for 3 projects

Data Visualization on Maps

geocoder.opencagedata.com/— Geocoding API that aggregates OpenStreetMap and other open geo sources. 2,500 free queries/day
datamaps.co a free platform for creating visualizations with data maps.
geocod.io— Geocoding via API or CSV Upload. 2,500 free queries/day
gogeo.io— Maps and geospatial services with an easy to use API and support for big data
cartodb.com— Create maps and geospatial APIs from your data and public data
giscloud.com— Visualize, analyze and share geo data online
latlon.io— Geocoding API + school districts, census geography divisons, and other address based data. 2,500 free requests/month
mapbox.com— Maps, geospatial services, and SDKs for displaying map data

Package Build System

build.opensuse.org— Package build service for multiple distros (SUSE, EL, Fedora, Debian etc.)
copr.fedoraproject.org— Mock-based RPM build service for Fedora and EL
help.launchpad.net— Ubuntu and Debian build service

IDE and Code Editing

c9.io— IDE in a browser. Incorporates an Ubuntu virtual machine and in-browser terminal access. Integrates with GitHub and BitBucket, but also adds SFTP and generic Git access
koding.com– online cloud-based development environment. You have a Ubuntu OS machine
codeanywhere.com— Full IDE in the browser and mobile apps. Access FTP, SFTP, Dropbox, Google Drive, GitHub, and BitBucket. Hosted virtual machines with terminal access. Collaboration features like share links, live editing, permissions, and version tracking
codenvy.com— IDE and automated developer workspaces in a browser, collaborative, Git/SVN integration, build and run your app in customizable Docker-based runners (free tier includes: 4 GB RAM, always-on machines, ability to run multiple machines simultaneously), pre-integrated deploy to Google Apps
nitrous.io— Private Linux instance(s) with interactive collaboration, free for 2 hours/day. {More Details}
visualstudio.com— Fully-featured IDE with thousands of extensions, cross-platform app development (Microsoft extensions available for download for iOS and Android), desktop, web and cloud development, multi-language support (C#, C++, JavaScript, Python, PHP and more)
code.visualstudio.com— Build and debug modern web and cloud applications. Code is free, Open Source and available on your favorite platform, Linux, Mac OSX and Windows
cloud.sagemath.com— Collaborative mathematics-oriented IDE in a browser, with support for Python, LaTeX, IPython Notebooks, etc
wakatime.com— Quantified self metrics about your coding activity, using text editor plugins, limited plan for free
apiary.io— Collaborative design API with instant API mock and generated documentation (Free for unlimited API blueprints and unlimited user with one admin account and hosted documentation)
mockable.io— Mockable is a simple configurable service to mock out RESTful API or SOAP web-services. This online service allows you to quickly define REST API or SOAP endpoints and have them return JSON or XML data
jetbrains.com— Productivity tools, IDEs and deploy tools. Free license for students, teachers, Open Source, and user groups
stackhive.com— Cloud based IDE in browser that supports HTML5/CSS3/jQuery/Bootstrap
tadpoledb.com— IDE in browser Database tool. Support Amazon RDS, Apache Hive, Apache Tajo, CUBRID, MariaDB, MySQL, Oracle, SQLite, MSSQL, PostgreSQL and MongoDB databases
sourcelair.com— In-browser IDE for Django, JavaScript, HTML5, Python, and more. Integrates with Git, Mercurial, GitHub, Heroku and more. Free forever for 1 private project
codepen.io— CodePen is a playground for the front end side of the web

Analytics, Events and Statistics

analytics.google.com— Google Analytics
heapanalytics.com— Automatically captures every user action in iOS or web apps. Free for up to 5,000 visits/month
sematext.com— Free for up to 50 K actions/month, 1 day data retention, unlimited dashboards, users, etc
usabilityhub.com— Test designs and mockups on real people, track visitors. Free for one user, unlimited tests
gosquared.com— Track up to 1,000 data points for free
mixpanel.com— Free 25,000 points or 200,000 with their badge on your site
amplitude.com— 1 million monthly events, up to 2 apps
keen.io— Custom Analytics for data collection, analysis and visualization. 50,000 events/month free
inspectlet.com— 100 sessions/month free for 1 website
mousestats.com— 100 sessions/month free for 1 website
metrica.yandex.com— Unlimited free analytics
hotjar.com— Per site: 2,000 pages views/day, 3 heatmaps, data stored for 3 months,…
imprace.com— Landing page analysis with suggestions to improve bounce rates. Free for 5 landing pages/domain
baremetrics.com— Analytics & Insights for stripe
optimizely.com— A/B Testing solution, free starter plan, 1 website, 1 iOS and 1 Android app
expensify.com— Expense reporting, free personal reporting approval workflow
ironSource atom— Atom Data Flow Management is a data pipeline solution, 10M monthly events free
botan.io— Free analytics for your Telegram bot.

International Mobile Number Verification API and SDK

cognalys.com— Freemium mobile number verification through an innovative and reliable method than using SMS gateway. Free accounts will have 10 tries and 15 verifications/day.
numverify.com— Global phone number validation & lookup JSON API. 250 API requests/month
sumome.com— Heat map and conversion enhancement tools, free without few advanced features

Payment / Billing Integration

braintreepayments.com— Credit Card, Paypal, Venmo, Bitcoin, Apple Pay,… integration. Single and Recurrent Payments. First USD 50,000 are free of charge
taxratesapi.avalara.com— Get the right sales tax rates to charge for the close to 10,000 sales tax jurisdictions in the USA. Free REST API. Registration required
currencylayer.com— Reliable Exchange Rates & Currency Conversion for your Business, 1,000 API requests/month free
vatlayer.com— Instant VAT number validation & EU VAT rates API, free 100 API requests/month

Docker Related

docker.com— One free private repository,free managed node and Unlimited public repositories
quay.io— Unlimited free public repositories
tutum.co— The Docker Platform for Dev and Ops, build, deploy, and manage your apps across any cloud, free while in beta and free developer plan when tutum will be production ready

Vagrant Related

atlas.hashicorp.com— HashiCorp’s index of boxes
vagrantbox.es— An alternative public box index

Miscellaneous

apichangelog.com— Subscribe to be notified each time API Documentation is updated (Facebook, Twitter, Google,…)
docsapp.io— Easiest way to publish documentation, free for Open Source
instadiff.com— Compare website versions with highlighted changes before you deploy, free for 100 pages/month
fullcontact.com— Help your users know more about their contacts by adding social profile into your app. 500 free Person API matches/month
apicastor.com— Convert spreadsheets into URL and monitor access
formlets.com— Online forms, unlimited single page forms/month, 100 submissions/month, email notifications
superfeedr.com— Real-time PubSubHubbub compliant feeds, export, analytics. Free with less customization
screenshotlayer.com— Capture highly customizable snapshots of any website. Free 100 snapshots/month
screenshotmachine.com— Capture 100 snapshots/month, png, gif and jpg, including full-length captures, not only home page
readme.io— Beautiful documentations made easy, free for Open Source

APIs, Data, and ML

monkeylearn.com— Text analysis with Machine Learning, free 100,000 queries/month
wit.ai— NLP for developers
wolfram.com— Built-in knowledge based algorithms in the cloud
parsehub.com— Extract data from dynamic sites, turn dynamic websites into APIs, 5 projects free
import.io— Easily turn websites into APIs, completely free for life
wrapapi.com— Turn any website into a parameterized API
algorithmia.com— Host algorithms for free. Includes free monthly allowance for running algorithms. Now with CLI support
bigml.com— Hosted machine learning algorithms. Unlimited free tasks for development, limit of 16 MB data/task
mashape.com— API Marketplace And Powerful Tools For Private And Public APIs. With the free tier, some features are limited such as monitoring, alerting and support
dominodatalab.com— Data science with support for Python, R, Spark, Hadoop, Matlab, and others
havenondemand.com— APIs for machine learning
restlet.com— APISpark enables any API, application or data owner to become an API provider in minutes via an intuitive browser interface
scrapinghub.com— Data scraping with visual interface and plugins. Free plan includes unlimited scraping on a shared server
context.io– Create simple email webhooks and code against a free, RESTful, imap API to leverage email data.

Other Free Resources

github.com – FOSS for Dev— A hub of free and Open Source software for developers
github.com – Free for nonprofit— List of free services for nonprofit organizations
getawesomeness— Retrieve all amazing awesomeness from GitHub… a must see
education.github.com— Collection of free services for students. Registration required

↧

Yasoob Khalid: This Month I Inspired 40 Teens to Start Programming

July 11, 2017, 11:16 pm

≫ Next: Yasoob Khalid: Python Sorted Collections

≪ Previous: Yasoob Khalid: 400+ Free Resources for DevOps & Sysadmins

Hi there folks! I have been wanting to write a post on this blog for quite some time now but life always gets in the way. This time it was my exams. Hopefully I will get free after the 4th of June and would get more time to write posts and do stuff which I love and care about.

So enough with the rant now. I wanted to write about my latest endeavor. This month, with the help of two people (whom i had never met before), I got a chance to inspire 40 teens into taking their first step into programming. This whole event was planned and organized by one badass lady, Elena Sinel, who managed almost everything by herself.

I was asked by Elena to give a motivational + tutorial session to the students of Tech City College, London. I was pretty stoked by this opportunity because I love inspiring people and helping them as much as I can. This has been the theme of my life for quite a bit of time now.

Apart from a couple hiccups in the live video stream, the session went fairly well. It was coordinated by Charlie Ringer on the other end. He did an awesome job and helped to make sure that things went smoothly. He provided hands-on support to the kids whenever they needed it.

The best moment for me was when Elena told me that the kids were pretty inspired by my story of how I got started with programming and what I have achieved through it. Elena has written a short summary of the event over here. Do read it if you want to get some more detailed information about the whole session.

If you are an educator and want me to deliver a session for you at your institute and inspire your students just let me know. I am sure you won’t regret it.

If you have any questions then please let me know in the comments below. I would love to answer as many of them as possible.

Cheers!

↧

Yasoob Khalid: Python Sorted Collections

July 11, 2017, 11:16 pm

≫ Next: PyCharm: Remote Development on Raspberry Pi: Analyzing Ping Times (Part 2)

≪ Previous: Yasoob Khalid: This Month I Inspired 40 Teens to Start Programming

Hey folks! This is a guest post by Grant Jenks. Let’s give him a warm welcome and get right on into what he has to say.

Hello all! I’m Grant Jenks and I’m guest-posting about one of my favorite topics: Python Sorted Collections.

Python is a little unusual regarding sorted collection types as compared with other programming languages. Three of the top five programming languages in the TIOBE Index include sorted list, sorted dict or sorted set data types. But neither Python nor C include these. For a language heralded as “batteries included” that’s a little strange.

The reasoning is a bit circular but boils down to: the standard library covers most use cases, for everything else there’s PyPI, the Python Package Index. But PyPI works only so well. In fact, some peculiarities of the Python community make PyPI’s job quite difficult. For example, Python likes Monty Python references which many find unusual or obscure. And as Phil Karlton would point out, naming things is hard.

collections.OrderedDict

As an aside, it’s worth noting collections.OrderedDict in the Python standard library. OrderedDict maintains the order that items were added to the dictionary. Sometimes that order is sorted:

>>> from collections import OrderedDict
>>> letters = [('a', 0), ('b', 1), ('c', 2), ('d', 3)]
>>> values = OrderedDict(letters)
>>> print(values)
OrderedDict([('a', 0), ('b', 1), ('c', 2), ('d', 3)])
>>> print(list(values.keys()))
['a', 'b', 'c', 'd']

We can continue editing this OrderedDict. Depending on the key we add, the order may remain sorted.

>>> values['e'] = 4
>>> print(list(values.keys()))
['a', 'b', 'c', 'd', 'e']

But sort order won’t always be maintained. If we remove an existing key and add it back, then we’ll see it appended to the end of the keys.

>>> del values['a']
>>> values['a'] = 0
>>> print(list(values.keys()))
['b', 'c', 'd', 'e', 'a']

Ooops! Notice now that ‘a’ is at the end of the list of keys. That’s the difference between ordered and sorted. While OrderedDict maintains order based on insertion order, a SortedDict would maintain order based on the sorted order of the keys.

SortedContainers

A few years ago I set out to select a sorted collections library from PyPI. I was initially overwhelmed by the options. There are many data types in computer science theory that can be used and each has various tradeoffs. For example, Red-Black Trees are used in the Linux Kernel but Tries are often more space efficient and used in embedded systems. Also B-Trees work very well with a huge number of items and are commonly used in databases.

What I really wanted was a pure-Python solution that was fast-enough. Finding a solution at the intersection of those requirements was really tough. Most fast implementations were written in C and many lacked benchmarks or documentation.

I couldn’t find the right answer so I built it: Sorted Containers. The right answer is pure-Python. It’s Python 2 and Python 3 compatible. It’s fast. It’s fully-featured. And it’s extensively tested with 100% coverage and hours of stress. SortedContainers includes SortedList, SortedDict, and SortedSet implementations with a familiar API.

>>> from sortedcontainers import SortedList, SortedDict, SortedSet
>>> values = SortedList('zaxycb')
>>> values[0]
'a'
>>> values[-1]
'z'
>>> list(values)  # Sorted order is automatic.
['a', 'b', 'c', 'x', 'y', 'z']
>>> values.add('d')
>>> values[3]
'd'
>>> del values[0]
>>> list(values)  # Sorted order is maintained.
['b', 'c', 'd', 'x', 'y', 'z']

Each of the SortedList, SortedDict, and SortedSet data types looks, swims, and quacks like its built-in counterpart.

>>> items = SortedDict(zip('dabce', range(5)))
>>> list(items.keys())  # Keys iterated in sorted order.
['a', 'b', 'c', 'd', 'e']
>>> items['b']
2
>>> del items['c']
>>> list(items.keys())  # Sorted order is automatic.
['a', 'b', 'd', 'e']
>>> items['c'] = 10
>>> list(items.keys())  # Sorted order is maintained.
['a', 'b', 'c', 'd', 'e']

Each sorted data type also plays nicely with other data types.

>>> keys = SortedSet('dcabef')
>>> list(keys)
['a', 'b', 'c', 'd', 'e', 'f']
>>> 'c' in keys
True
>>> list(keys | 'efgh')
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> list(keys & 'cde')
['c', 'd', 'e']
>>> list(keys & 'yzab')
['a', 'b']

Bonus Features

In addition to the familiar API of the built-ins, maintaining sorted order affords efficient opportunities for searching and indexing.

You can very quickly and efficiently lookup the presence or index of a value. What would previously require a linear scan is now done in logarithmic time.

>>> import string
>>> values = SortedList(string.lowercase)
>>> 'q' in values
True
>>> values.index('r')
17

You can slice containers by index or by value. Even mappings and sets support numeric indexing and iteration.

>>> items = SortedDict(zip(string.lowercase, range(26)))
>>> list(items.irange('g', 'j'))
['g', 'h', 'i', 'j']
>>> items.index('g')
6
>>> items.index('j')
9
>>> list(items.islice(6, 10))
['g', 'h', 'i', 'j']

Mappings also support numeric indexing using a Pandas-like iloc interface.

>>> items.iloc[0]
'a'
>>> items.iloc[5]
'f'
>>> items.iloc[:5]
['a', 'b', 'c', 'd', 'e']
>>> items.iloc[-3:]
['x', 'y', 'z']

Using these features, you can easily duplicate the advanced features found in Pandas DataFrame indexes, SQLite column indexes, and Redis sorted sets.

Performance

On top of it all, performance is very good across the API and faster-than-C implementations for many methods. There are extensive benchmarks comparing alternative implementations, load-factors, runtimes, and simulated workloads. SortedContainers has managed to unseat the decade-old incumbent “blist” module and convinced authors of alternatives to recommend SortedContainers over their own package.

Implementation

How does it work? I’m glad you asked! In addition to the implementation details, I’ll be giving a talk at PyCon 2016 in Portland, Oregon on Python Sorted Collections that will get into the gory details. We’ll see why benchmarks matter most in claims about performance and why the strengths and weakness of modern processors affect how you choose your data structures. It’s possible to write fast code in pure-Python!

Your feedback on the project is welcome!

↧

PyCharm: Remote Development on Raspberry Pi: Analyzing Ping Times (Part 2)

July 12, 2017, 2:42 am

≫ Next: Reuven Lerner: Announcing: Three live Python courses

≪ Previous: Yasoob Khalid: Python Sorted Collections

Last week we created a script that records ping times on a regular basis. We developed the script remotely on a Raspberry Pi, and then added it to Cron to make sure that times are recorded every 5 minutes into a PostgreSQL database.

This week we’ll work on visualizing the data we’ve recorded. For this we’ll create a basic Flask app where we use Matplotlib to create a graph. Furthermore, we’ll take a look at some cool PostgreSQL features.

Let’s see some results

It’s no good to just record pings if we can’t see some statistics about them, so let’s write a small Flask app, and use matplotlib to draw a graph of recent ping times. In our Flask app we’ll create two routes:

On ‘/’ we’ll list the destinations that we’ve pinged in the last hour with basic stats (min, average, max time in the last hour)
On ‘/graphs/<destination>’ we’ll draw a graph of the pings in the last 3 hours

The first route is simple, we just execute a query to get the data we’re interested in, and pass that to the template. See the full code on GitHub. Let’s make sure that everything works right by putting a breakpoint on the call to render_template:

Debug Flask Thumb

The graph route is a lot more complex, first we have to get the ping averages for the past three hours in reasonably sized bins (let’s say 10 minutes), and then we have to draw the graph.

To obtain those binned ping times, we could either get all times from the past three hours, and then use a scientific python library to handle the binning. Or we could write a monster SQL query which does everything for us. As I’ve recently read a book about PostgreSQL and got excited about it, I chose the second option.

Querying the Data

So the data we’re looking for is:

For each 10 minute period in the last 3 hours
Get the minimum, average, and maximum ping time to a specified destination

The first part makes this a fairly complex query. Even though PostgreSQL has support for intervals, date ranges, and a way to generate a series of dates, there is no way to generate a series of ranges (that I know of). One solution to this problem is a common table expression (CTE), this is a way to execute a subquery which you can later refer to as if it were a real table.

To get a series of timestamps in over the last three hours in 10 minute intervals is easy:

select begin_time from generate_series(now() - interval '3 hours', now(), interval '10 minutes') begin_time;

The generate_series function takes three arguments: begin, end, and step. The function works with numbers and with timestamps, so that makes it easy. If we wanted pings at exactly these times, we’d be done now. However, we need times between the two timestamps. So we can use another bit of SQL magic: window functions. Window functions allow us to do things with rows before or after the row that we’re currently on. So let’s add end_time to our query:

select
 begin_time,
 LEAD(begin_time) OVER (ORDER BY begin_time ASC) as end_time
from generate_series(now() - interval '3 hours', now(), interval '10 minutes') begin_time;

LEAD takes the value of the next row in the results, as ordered in the way specified in the over clause. You can use LAG to get the previous row in a similar way. So now we can wrap this query with WITH intervals as ( … query goes here … ) to make it a CTE. Then we can join our pings table and get the results we’re looking for:

WITH intervals AS (
   SELECT
     begin_time,
     LEAD(begin_time)
     OVER (
       ORDER BY begin_time ) AS end_time
   FROM
         generate_series(
             now() - INTERVAL '3 hours',
             now(),
             INTERVAL '10 minutes'
         ) begin_time
)
SELECT
 i.begin_time AT TIME ZONE 'Europe/Berlin' AS begin_time,
 i.end_time AT TIME ZONE 'Europe/Berlin' AS end_time,
 p.destination,
 count(p.pingtime),
 round(avg(p.pingtime),2) AS avg,
 max(p.pingtime),
 min(p.pingtime)
FROM intervals i LEFT JOIN pings p
ON p.recorded_at >= i.begin_time AND
 p.recorded_at < i.end_time
WHERE
 i.end_time IS NOT NULL
 AND destination = %s
GROUP BY i.begin_time, i.end_time, p.destination
ORDER BY i.begin_time ASC;

Now you might think “That’s nice, but won’t it be incredibly slow?”, so let’s try it out! If you don’t see the ‘execute’ option when you right click an SQL query, you may need to click ‘Attach console’ first to let PyCharm know on which database you’d like to execute your query:

Execute Query Thumb

At the time of writing, my pings table has about 12,500 rows. And this query takes about 200-300ms. Although we could say that this is acceptable for our use case, let’s have a look at how we could speed this up. To see if there’s a way to improve the query, let’s have a look at the query plan:

EXPLAIN ANALYZE shows us both how PostgreSQL decided to retrieve our results, and how long it took. We can see that the query took 471 ms. This is a bit painfully slow, and in the query plan we can see why: there’s a nested loop, and then a sequential scan. This means that for each of the 18 time buckets (6 buckets per hour, 3 hours), we do a full table scan. Right now the table fits in memory, so we first load the table in memory, and then we scan it 18 times in memory (you can see loops=18 on the materialize node). Imagine how slow this will be after the Pi collected a year’s worth of pings.

We can improve though, we’re querying our pings by the recorded_at column, using a ‘>=’ and a ‘<’ operator. A standard B-tree index supports these operations on a timestamptz column. So let’s add an index:

CREATE INDEX pings_recorded_at ON pings(recorded_at);

Now let’s look at the output of EXPLAIN ANALYZE again:

5.7ms: much better.

Graphing the Data

After getting the data, matplotlib is used to generate a line graph with lines for the minimum, average, and maximum ping time per bin. Matplotlib makes it easy to plot time-based data using the plot_date function.

When the plot is ready, it’s ‘saved’ as a PNG to a StringIO object, which is then used to create an HTTP response. By setting the content_type header to image/png, everything is arranged.

So let’s take a look at the final result:

If you want to see the full code, check out analyze.py on GitHub.

Querying the Data with Pandas

If the query above is a little much for you, you can achieve the same results with a couple of lines of code using the Pandas library. If we’d like to use Pandas, we can use a simple query to obtain the last three hours of pings, and then use the resample method to place the times in 10-minute buckets.

Important: To load PostgreSQL data into a Pandas dataframe, we need to have SQLAlchemy installed as well. Pandas needs SQLAlchemy for all database engines except SQLite.

Then we can do the same as with the SQL query, and use Matplotlib to plot it:

@app.route('/pandas/<destination>')
def pandas(destination):
    engine = create_engine('postgres:///pi')

    with engine.connect() as conn, conn.begin():

        data = pd.read_sql_query("select recorded_at, pingtime from pings where recorded_at > now() - interval ""'3 hours' and ""destination='jetbrains.com'; ", conn)

    engine.dispose()

    df = data.set_index(pd.DatetimeIndex(data['recorded_at']))

    # We have this information in the index now, so let's drop it
    del df['recorded_at']

    result = df.resample('10T').agg(['min', 'mean', 'max'])

    fig = Figure()
    ax = fig.add_subplot(111)

    ax.plot(
        result.index,
        result['pingtime', 'max'],
        label='max',
        linestyle='solid'
    )

    ax.plot_date(
        result.index,
        result['pingtime', 'mean'],
        label='avg',
        linestyle='solid'
    )

    ax.plot_date(
        result.index,
        result['pingtime', 'min'],
        label='min',
        linestyle='solid'
    )

    ax.xaxis.set_major_formatter(DateFormatter('%H:%M'))

    ax.set_xlabel('Time')
    ax.set_ylabel('Round Trip (ms)')
    ax.set_ylim(bottom=0)

    ax.legend()

    # Output plot as PNG
    # canvas = FigureCanvasAgg(fig)
    png_output = StringIO.StringIO()

    # canvas.print_png(png_output, transparent=True)
    fig.set_canvas(FigureCanvasAgg(fig))
    fig.savefig(png_output, transparent=True)

    response = make_response(png_output.getvalue())
    response.headers['content-type'] = 'image/png'
    return response

Now let’s have a look at what’s faster, the large SQL query, or a simple SQL query and Pandas. Pandas uses Numpy for math, which is largely written in native code for high performance. We add the code to get the appropriate data in both ways to benchmark.py, creating two functions: get_with_sql() and get_with_pandas(). We can use the Python standard library’s timeit function to run the methods 1000 times, and then get the total time it took to execute the function.

Let’s open the Python console, and take a look:

In other words: using Pandas it takes about 86ms to obtain the data, and with the large SQL statement it takes under 24ms. In other words: Pandas takes 262% longer. We’ve found that with a larger dataset this gap widens further.

In this specific case, it takes about 800ms to generate the graph, with the vast majority of that time taken by Matplotlib. So if we were really looking to improve performance, we’d hand off charting to a JavaScript library and just provide the data as a JSON object from the server.

Final Words

While working on this blog post, there have been a couple of times that I forgot that I was working on a remote computer. After setting up the remote interpreter, PyCharm handles everything in the background.

As you can see, PyCharm makes developing code for a remote server very easy. Let us know in the comments what projects you’re interested in running on remote servers! We’d also appreciate your feedback about SQL, let us know if you’d like to see more SQL content (or less of course) in further blog posts!

↧

Reuven Lerner: Announcing: Three live Python courses

July 12, 2017, 5:00 am

≫ Next: Data School: How to launch your data science career (with Python)

≪ Previous: PyCharm: Remote Development on Raspberry Pi: Analyzing Ping Times (Part 2)

If you’re like many of the Python developers I know, the basics are easy for you: Strings, lists, tuples, dictionaries, functions, and even objects roll off of your fingers and onto your keyboard. Your day-to-day tasks have become significantly easier as a result of Python, and you’re comfortable using it for tasks at work and home.

But some parts of Python remain difficult, mysterious, and outside of your comfort zone:

When you want to use a list comprehension, you have to go to Stack Overflow to remember how they work — to say nothing of set and dict comprehensions.
You know that there is a difference between functions and methods, but you can’t quite put your foot on what that difference is, or how Python rewrites “self” to be the first argument to every method.
You keep hearing about “decorators,” and how they allow you to do all sorts of magical things to functions and classes — but every time you start reading about them, you get confused or distracted.

Sound familiar? If so, then I want to help.

As you probably know, I spend just about every day at one of the world’s best companies — Apple, Cisco, IBM, PayPal, VMWare, and Western Digital, among others — teaching their engineers how to use Python.

The engineers who learn these techniques benefit by having more “tools in their toolbox,” as I like to put it; when a problem presents itself, they have more options at their disposal. I help them to solve new types of problems, or to solve existing problems more quickly. These engineers become more valuable to their employers, and more valuable on the larger job market.

I’m announcing three courses that you can take, from the comfort of your home or office, using the content I’ve presented to these companies:

Tuesday, July 25: Functional programming in Python
- comprehensions
- custom sorting
- passing functions as arguments
- lambda expressions
- map, filter, and reduce
Wednesday, August 2: Advanced Python objects
- attributes
- methods vs. functions
- class attributes
- inheritance
- methods vs. functions
- descriptors
- dunder methods
Thursday, August 3: Python decorators
- properties and other built-in decorators
- writing decorators
- decorating functions, objects, and methods

Each of these classes will run live, for five hours (with two 15-minute breaks):

New York: 7 a.m. – 2 p.m.
London: 12 noon – 5 p.m.
Israel: 2 p.m. – 7 p.m.
Mumbai: 5:30 p.m. – 10:30 p.m.

Each will be packed with lectures, accompanied by tons of live-coding examples, many exercises that you’ll be expected to solve (and which we’ll review together when you’re done), and plenty of time for interactions and questions. Indeed, please come with lots of questions, to make the class more interesting and relevant.

Each course costs $350, and will give you:

Access to the live audio/video/chat feed,
PDFs of my slides,
the Jupyter notebook I use during my live-coding demos,
and solutions to all of the exercises

I’m offering discounts to people who buy more than one course:

Buy two courses, and save $100, for a total of $600. Just use the “2sessions” coupon code when purchasing each one.
Buy all three courses, and save $250, for a total of $800. Just use the “3sessions” coupon code when purchasing each one.

As always, I’m also offering a discount to students; e-mail me, and I’ll send you the appropriate discount code.

Convinced? I hope so! View the full course descriptions here, and then register for them:

Tuesday, July 25: Functional programming in Python
Wednesday, August 2: Advanced Python objects
Thursday, August 3: Python decorators

But wait! If you register before Monday, July 18th, then you can save 15% more, by purchasing an early-bird ticket.

I’m very excited to be offering these courses. They won’t be my last ones — but I’ll next be teaching other topics, so if these subjects interest you, you should definitely attend.

I hope that you can join me for these live, online courses.

The post Announcing: Three live Python courses appeared first on Lerner Consulting Blog.

↧

Data School: How to launch your data science career (with Python)

July 12, 2017, 6:09 am

≫ Next: Enthought: Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions

≪ Previous: Reuven Lerner: Announcing: Three live Python courses

Welcome, Data School students! If you're interested in the exciting world of data science, but don't know where to start, Data School is here to help.

Step 0:Figure out what you need to learn
Step 1:Get comfortable with Python
Step 2:Learn data analysis, manipulation, and visualization with pandas
Step 3:Learn machine learning with scikit-learn
Step 4:Understand machine learning in more depth
Step 5:Keep learning and practicing
Bonus:Join Data School (for free!)

Step 0: Figure out what you need to learn

Data science can be an overwhelming field. Many people will tell you that you can't become a data scientist until you master the following: statistics, linear algebra, calculus, programming, databases, distributed computing, machine learning, visualization, experimental design, clustering, deep learning, natural language processing, and more. That's simply not true.

So, what exactly is data science? It's the process of asking interesting questions, and then answering those questions using data. Generally speaking, the data science workflow looks like this:

Ask a question
Gather data that might help you to answer that question
Clean the data
Explore, analyze, and visualize the data
Build and evaluate a machine learning model
Communicate results

This workflow doesn't necessarily require advanced mathematics, a mastery of deep learning, or many of the other skills listed above. But it does require knowledege of a programming language and the ability to work with data in that language. And although you need mathematical fluency to become really good at data science, you only need a basic understanding of mathematics to get started.

It's true that the other specialized skills listed above may one day help you to solve data science problems. However, you don't need to master all of those skills to begin your career in data science. You can begin today, and I'm here to help you!

Step 1: Get comfortable with Python

Python and R are both great choices as programming languages for data science. R tends to be more popular in academia, and Python tends to be more popular in industry, but both languages have a wealth of packages that support the data science workflow. I've taught data science in both languages, and generally prefer Python. (Here's why.)

You don't need to learn both Python and R to get started. Instead, you should focus on learning one language and its ecosystem of data science packages. If you've chosen Python (my recommendation), you may want to considering installing the Anaconda distribution because it simplifies the process of package installation and management on Windows, OSX, and Linux.

You also don't need to become a Python expert to move on to step 2. Instead, you should focus on mastering the following: data types, data structures, imports, functions, conditional statements, comparisons, loops, and comprehensions. Everything else can wait until later!

If you're not sure whether you know "enough" Python, scan through my Python Quick Reference. If most of that material is familiar to you, you can move on to step 2!

If you're looking for a course to help you learn Python, here are a few recommendations:

Python Jumpstart by Building 10 Apps is an excellent video course taught by Michael Kennedy (host of the "Talk Python To Me" podcast).
DataCamp and Dataquest both offer short, interactive courses in beginning Python.
Introduction to Python is a more substantial course in beginning Python that feels like an interactive textbook.
Google's Python Class is best for people with some programming experience, and includes lecture videos and downloadable exercises.

Step 2: Learn data analysis, manipulation, and visualization with pandas

For working with data in Python, you should learn how to use the pandas library.

pandas provides a high-performance data structure (called a "DataFrame") that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table. It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. In short, learning pandas will significantly increase your efficiency when working with data.

However, pandas includes an overwhelming amount of functionality, and (arguably) provides too many ways to accomplish the same task. Those characteristics can make it challenging to learn pandas and to discover best practices.

That's why I created a pandas video series (30 videos, 6 hours) that teaches the pandas library from the ground up. Each video answers a question using a real dataset, and the datasets are posted online so you can follow along at home. (I also created a well-commented Jupyter notebook that includes the code from every video.)

"Your videos are extremely helpful. I like that you use actual data sets and try a lot of different applications of the concept being discussed rather than just overly simplistic examples. Your content has helped me immensely!" - Sean Montague

If you would prefer a non-video resource for learning pandas, here are my recommended resources.

Step 3: Learn machine learning with scikit-learn

For machine learning in Python, you should learn how to use the scikit-learn library.

Building "machine learning models" to predict the future or automatically extract insights from data is the sexy part of data science. scikit-learn is the most popular library for machine learning in Python, and for good reason:

It provides a clean and consistent interface to tons of different models.
It offers many tuning parameters for each model, but also chooses sensible defaults.
Its documentation is exceptional, and it helps you to understand the models as well as how to use them properly.

However, machine learning is still a highly complex and rapidly evolving field, and scikit-learn has a steep learning curve. That's why I created a scikit-learn video series (9 videos, 4 hours), which will help you to gain a thorough grasp of both machine learning fundamentals and the scikit-learn workflow. The series doesn't presume any familiarity with machine learning or advanced mathematics. (You can find all of the code from the series on GitHub.)

"Your videos are absolutely incredible. I have just completed the course on Machine Learning with Python and I can say I understood every single thing thanks to your excellent teaching style and skills." - Guillaume B

If you would prefer a non-video resource for learning scikit-learn, I recommend either Python Machine Learning (Amazon / GitHub) or Introduction to Machine Learning with Python (Amazon / GitHub).

Step 4: Understand machine learning in more depth

Machine learning is a complex field. Although scikit-learn provides the tools you need to do effective machine learning, it doesn't directly answer many important questions:

How do I know which machine learning model will work "best" with my dataset?
How do I interpret the results of my model?
How do I evaluate whether my model will generalize to future data?
How do I select which features should be included in my model?
And so on...

If you want to become great at machine learning, you need to be able to answer those questions, which requires both experience and further study. Here are some resources to help you along that path:

My top recommendation is to read An Introduction to Statistical Learning (PDF / Amazon). It will help you to gain both a theoretical and practical understanding of many important methods for regression and classification, without requiring a background in advanced mathematics. The authors also released 15 hours of high-quality videos to supplement the book.
If you need a refresher on probability or statistics, I recommend reading OpenIntro Statistics (PDF / Amazon).
I created lessons to help you learn linear regression and logistic regression, two of the most popular machine learning models.
Although nothing can replace an in-depth understanding of a variety of models, I created a comparison chart of supervised learning models that may serve as a useful reference guide.
I created a few guides to help you to evaluate the quality of your model: Simple guide to confusion matrix terminology, ROC curves and AUC explained, and Comparison of evaluation procedures and metrics.

Step 5: Keep learning and practicing

Here is my best advice for improving your data science skills: Find "the thing" that motivates you to practice what you learned and to learn more, and then do that thing. That could be personal data science projects, Kaggle competitions, online courses, reading books, reading blogs, attending meetups or conferences, or something else!

Kaggle competitions are a great way to practice data science without coming up with the problem yourself. Don't worry about how high you place, just focus on learning something new with every competition. (Keep in mind that you won't be practicing important parts of the data science workflow: asking questions, gathering data, and communicating results.)
If you create your own data science projects, you should share them on GitHub and include writeups. That will help to show others that you know how to do reproducible data science. (If you don't know how to use Git and GitHub, I have a short video series that will help you to master the basics.)
There are an overwhelming number of data science blogs, but DataTau will help you to find the latest and greatest content.
If you like email newsletters, my favorites are Data Elixir, Data Science Weekly, and Python Weekly.
If you want to truly experience the Python community, I highly recommend attending PyCon US. (There are also smaller PyCon conferences elsewhere.) As a data scientist, you should also consider attending SciPy and the nearest PyData conference.

Your data science journey has only begun! There is so much to learn in the field of data science that it would take more than a lifetime to master. Just remember: You don't have to master it all to launch your data science career, you just have to get started!

Join Data School (for free!)

My name is Kevin Markham, and I'm the founder of Data School. I'd be honored if you would join the Data School community by subscribing to the email newsletter:

Fill out your name and email address in the left sidebar, and click "Join the Newsletter."
Find the confirmation email from Data School in your inbox, and click the link to confirm your email address.

As a subscriber, you'll receive priority access to my online courses and live webcasts, and you'll get notified about new Data School tutorials and videos.

Have a question? Please let me know in the comments section below!

Want to follow Data School?

Thank you so much for reading!

↧

Enthought: Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions

July 12, 2017, 8:27 am

≫ Next: PyCharm: PyCharm 2017.1.5 Out Now

≪ Previous: Data School: How to launch your data science career (with Python)

When: Thursday, July 20, 2017, 11-11:45 AM CT (Live webcast)

What: A comprehensive overview and live demonstration of Enthought’s latest tools for Python for the enterprise with Enthought’s Chief Technical & Engineering Officer, Didrik Pinte

Who Should Attend: Python users (or those supporting Python users) who are looking for a universal solution set that is reliable and “just works”; scientists, engineers, and data science teams trying to answer the question “how can I more easily build and deploy my applications”; organizations looking for an alternative to MATLAB that is cost-effective, robust, and powerful

REGISTER (if you can’t attend we’ll send all registrants a recording)

For over 15 years, Enthought has been empowering scientists, engineers, analysts, and data scientists to create amazing new technologies, to make new discoveries, and to do so faster and more effectively than they dreamed possible. Along the way, hand in hand with our customers in aerospace, biotechnology, finance, oil and gas, manufacturing, national laboratories, and more, we’ve continued to “build the science tools we wished we had,” and share them with the world.

For 2017, we’re pleased to announce the release of several major new products and tools, specifically designed to make Python more powerful and accessible for users like you who are building the future of science, engineering, artificial intelligence, and data analysis.

WHAT YOU’LL SEE IN THE WEBINAR

In this webinar, Enthought’s Chief Technical & Engineering Officer will share a comprehensive overview and live demonstration of Enthought’s latest products and how they provide the foundation for scientific computing and artificial intelligence applications with Python, including:

Enthought Python Distribution (EPD): 450+ commercially supported Python 2 and Python 3 packages, now free for all users
Enthought Canopy 2.0: a graphical package manager and integrated analysis environment that provides a universal analysis and development tool, now with Python 3! (NOTE: we recommend all Canopy users upgrade to this major release version ASAP)
Enthought Deployment Server: provides reproducible, secure, onsite management of Python package and application development and deployment across groups and organizations

We’ll also walk through specific use cases so you can quickly see how Enthought’s Enterprise Python tools can impact your workflows and productivity.

REGISTER (if you can’t attend we’ll send all registrants a recording)

Presenter: Didrik Pinte, Chief Technical & Engineering Officer, Enthought

Blog: Enthought Presents the Canopy Platform at the 2017 American Institute of Chemical Engineers (AIChE) Spring Meeting (April 2017)

Blog: New Year, New Enthought Products (Jan 2017)

Product pages:

Enthought Canopy product page
Enthought Canopy download page
Enthought Deployment Server: onsite, centralized management of Python deployment for your group or organization
EDM (Enthought Deployment Manager): Enthought’s next generation, command line environment and package management

The post Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions appeared first on Enthought Blog.

↧

PyCharm: PyCharm 2017.1.5 Out Now

July 12, 2017, 8:36 am

≫ Next: Kushal Das: Article on Hacker Ethic and Free Software movement

≪ Previous: Enthought: Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions

We’ve released a minor update for PyCharm 2017.1. In this update several bugs have been resolved:

Python debugging would generate ‘unknown opcode’ errors.
Empty lines in test runner output
Bugs which caused PyCharm to freeze
For details, read the release notes

Update PyCharm now by downloading the new version from our website, or by using the automatic updates either in PyCharm, or JetBrains Toolbox.

PyCharm Team
-The Drive to Develop

↧

Kushal Das: Article on Hacker Ethic and Free Software movement

July 12, 2017, 9:28 am

≫ Next: Damián Avila: We are above 1000 stars!

≪ Previous: PyCharm: PyCharm 2017.1.5 Out Now

As I have mentioned in the dgplug summer training page, focusing on the Free Software movement is a big part of this year’s training program. A few weeks back there was a tweet from @gnome about travel ban, and many could not figure out why Gnome was writing about this topic. Amongst the many proper replies, Miguel de Icaza’s reply was to the point. This incident made Anwesha and me stop and think; and then made us rethink, about how we wanted to conduct the sessions on the Free Software movement and Software Licensing.

I was born in the beginning of the 80s and Anwesha even later. Our introduction to the movement was from the stories we heard (from many people); from Levy’s famous book, Hackers: Heroes of the Computer Revolution and the seminal Free as in Freedom.

My introduction to the FSF came through ilug-calcutta, and from Sayamindu. Later, at foss.in 2005, I made another friend (for life), Praveen A (he is from the same batch). And even later through out various conferences, I was introduced to other members of FSF India. In 2007, I was part of the 4th GPLv3 meet organizing team in Bangalore. That was my introduction to RMS, and his personality (I will write a blog post later about various incidents from that conference). That had a big impact on me.

Coming back to the story of the tweet, we also saw similar ignorance from newcomers, as they never got a chance to learn about the past, nor did get to meet the various people involved (distance and time). So, Anwesha and I, tried to write a brief history, including the hacker ethic, and beginning of the Free Software movement. A lot of stories mentioned in the article are from the books mentioned above. At the very end, I have written about how the different software we use everyday came about initially. I took the help of various FSF bulletins for the same.

This Monday I took a session on the same topic in the #dgplug IRC channel. When I made mention of the GNU C Library and the time Ronald McGrath started it, Siddhesh called attention to an announcement he (Ronald) made a few days ago (about stepping down from maintainership of that same GNU Library). I also pointed that Siddhesh is now one of the maintainer of Glibc. That gave the students a sense of impact and immediacy; a feeling of involvement and ownership.

Today evening from 13:30 UTC, Anwesha took a session on Software Licenses 101 in the #dgplug channel on Freenode. There will be more follow up sessions in the coming days.

Link to the article once again

↧

Damián Avila: We are above 1000 stars!

July 12, 2017, 9:45 am

≫ Next: NumFOCUS: Meet our GSoC Students Part 2: The Julia Cohort

≪ Previous: Kushal Das: Article on Hacker Ethic and Free Software movement

Github has a way to measure projects popularity through stars.

And those stars are given by the users themselves.

And we are just above a remarkable line...

NumFOCUS: Meet our GSoC Students Part 2: The Julia Cohort

July 12, 2017, 12:41 pm

≫ Next: Talk Python to Me: #120 Python in Finance

≪ Previous: Damián Avila: We are above 1000 stars!

↧

Talk Python to Me: #120 Python in Finance

July 12, 2017, 1:00 am

≫ Next: Continuum Analytics News: Continuum Analytics Named a 2017 Gartner Cool Vendor in Data Science and Machine Learning

≪ Previous: NumFOCUS: Meet our GSoC Students Part 2: The Julia Cohort

This week we'll enter the world of stock markets, trades, hedge funds and more. You'll meet Yves Hilpisch who runs The Python Quants where Python, open-source, education, and finance intersect. Links from the show: <div style="font-size: .85em;">Yves on Twitter: <a href="https://twitter.com/dyjh" target="_blank">@dyjh</a> Personal site: <a href="http://hilpisch.com" target="_blank">hilpisch.com</a> The Python Quants Group: <a href="http://tpq.io/" target="_blank">tpq.io</a> Yves on YouTube: <a href="https://www.youtube.com/results?search_query=yves+hilpisch" target="_blank">youtube.com/results</a> Quant platform: <a href="http://pqp.io" target="_blank">pqp.io</a> DX Analytics: <a href="http://dx-analytics.com/" target="_blank">dx-analytics.com</a> For Python Quants Bootcamp: <a href="http://fpq.io" target="_blank">fpq.io</a> Python for Quant Finance Meetup: <a href="http://pqf.tpq.io" target="_blank">pqf.tpq.io</a> Books: <a href="http://books.tpq.io" target="_blank">books.tpq.io</a> </div>

↧

Continuum Analytics News: Continuum Analytics Named a 2017 Gartner Cool Vendor in Data Science and Machine Learning

July 12, 2017, 7:18 am

≫ Next: Amjith Ramanujam: FuzzyFinder - in 10 lines of Python

≪ Previous: Talk Python to Me: #120 Python in Finance

News

Thursday, July 13, 2017

Data Science and AI platform, Anaconda, empowers leading businesses worldwide with solutions to transform data into intelligence

AUSTIN, Texas—July 13, 2017—Continuum Analytics, the creator and driving force behind Anaconda, the leading data science and AI platform powered by Python, today announced it has been included in the “Cool Vendors in Data Science and Machine Learning, 2017” report by Gartner, Inc.

“We believe the addition of machine learning to Gartner’s Hype Cycle for Emerging Technologies in 2016 highlights the growing importance of data science across the enterprise,” said Scott Collison, chief executive officer of Continuum Analytics. “Data science has shifted from ‘emerging’ to ‘established’ and we’re seeing this evolution first-hand as Anaconda’s active user base of four million continues to grow. We are enabling future innovations; solving some of the world’s biggest challenges and uncovering answers to questions that haven’t even been asked yet.”

Continuum Analytics recently released its newest version, Anaconda 4.4, featuring a comprehensive platform for Python-centric data science with a single-click installer for Windows, Mac, Linux and Power8. Anaconda 4.4 is also designed to make it easy to work with both Python 2 and Python 3 code.

Gartner is the world's leading information technology research and advisory company. You can find the full report on Gartner’s site: https://www.gartner.com/document/3706738.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

About Anaconda Powered by Continuum Analytics

Anaconda is the leading Open Data Science platform powered by Python, the fastest growing data science language with more than 13 million downloads and 4 million unique users to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with solutions to identify patterns in data, uncover key insights and transform data into a goldmine of intelligence to solve the world’s most challenging problems. Learn more at continuum.io.

↧

1. Getting things ready

Attach a shell to the docker container

Introduction

Strange tests behavior

How come?

Possible solutions

Option 1. Add strong type casting when importing variables via ctypes:

Option 2. Change from one byte representation to another:

Option 3. Refactor all statuses in C library, never use negative numbers or values near type maximums to avoid overflows.

Results

Conclusions

Table of Contents

Source Code Repos

Tools for Teams and Collaboration

Code Quality

Code Search and Browsing

CI / CD

Automated Browser Testing

Security and PKI

Management System

Log Management

Translation Management

Monitoring

Crash and Exception Handling

Search

Email

CDN and Protection

PaaS

BaaS

Web Hosting

DNS

IaaS

DBaaS

STUN, WebRTC, Web Socket Servers and Other Routers

Issue Tracking and Project Management

Storage and Media Processing

Design and UI

Data Visualization on Maps

Package Build System

IDE and Code Editing

Analytics, Events and Statistics

International Mobile Number Verification API and SDK

Payment / Billing Integration

Docker Related

Vagrant Related

Miscellaneous

APIs, Data, and ML

Other Free Resources

collections.OrderedDict

SortedContainers

Bonus Features

Performance

Implementation

Let’s see some results

Querying the Data

Graphing the Data

Querying the Data with Pandas

Final Words

Step 0: Figure out what you need to learn

Step 1: Get comfortable with Python

Step 2: Learn data analysis, manipulation, and visualization with pandas

Step 3: Learn machine learning with scikit-learn

Step 4: Understand machine learning in more depth

Step 5: Keep learning and practicing

Join Data School (for free!)

Related Blogs:

Product pages: