Quantcast
Channel: Planet Python
Viewing all 24360 articles
Browse latest View live

Yasoob Khalid: Making a Reddit + Facebook Messenger Bot

$
0
0

Hi guys! I haven’t been programming a lot lately because of exams. However, on the past weekend I managed to get a hold of my laptop and crank out something useful. It was a Facebook messenger bot which servers you fresh memes, motivational posts, jokes and shower thoughts. It was the first time I had delved into bot creation. In this post I will teach you most of the stuff you need to know in order to get your bot off the ground.

First of all some screenshots of the final product:

Tech Stack

We will be making use of the following:

  • Flask framework for coding up the backend as it is lightweight and allows us to focus on the logic instead of the folder structure.
  • Heroku– For hosting our code online for free
  • Reddit– As a data source because it get’s new posts every minute

1. Getting things ready

Creating a Reddit app

We will be using Facebook, Heroku and Reddit. Firstly, make sure that you have an account on all three of these services. Next you need to create a Reddit application on this link.

In the above image you can already see the “motivation” app which I have created. Click on “create another app…” and follow the on-screen instructions.

The about and redirect url will not be used hence it is ok to leave them blank. For production apps it is better to put in something related to your project so that if you start making a lot of requests and reddit starts to notice it they can check the about page of you app and act in a more informed manner.

So now that your app is created you need to save the ‘client_id’ and ‘client_secret’ in a safe place.

One part of our project is done. Now we need to setup the base for our Heroku app.

Creating an App on Heroku

Go to this dashboard url and create a new application.

On the next page give your application a unique name.

From the next page click on “Heroku CLI” and download the latest Heroku CLI for your operating system. Follow the on-screen install instructions and come back once it has been installed.

Creating a basic Python application

The below code is taken from Konstantinos Tsaprailis’s website.

from flask import Flask, request
import json
import requests

app = Flask(__name__)

# This needs to be filled with the Page Access Token that will be provided
# by the Facebook App that will be created.
PAT = ''

@app.route('/', methods=['GET'])
def handle_verification():
    print "Handling Verification."
    if request.args.get('hub.verify_token', '') == 'my_voice_is_my_password_verify_me':
        print "Verification successful!"
        return request.args.get('hub.challenge', '')
    else:
        print "Verification failed!"
        return 'Error, wrong validation token'

@app.route('/', methods=['POST'])
def handle_messages():
    print "Handling Messages"
    payload = request.get_data()
    print payload
    for sender, message in messaging_events(payload):
        print "Incoming from %s: %s" % (sender, message)
        send_message(PAT, sender, message)
    return "ok"

def messaging_events(payload):
    """Generate tuples of (sender_id, message_text) from the
    provided payload.
    """
    data = json.loads(payload)
    messaging_events = data["entry"][0]["messaging"]
    for event in messaging_events:
        if "message" in event and "text" in event["message"]:
            yield event["sender"]["id"], event["message"]["text"].encode('unicode_escape')
        else:
            yield event["sender"]["id"], "I can't echo this"


def send_message(token, recipient, text):
    """Send the message text to recipient with id recipient.
    """

    r = requests.post("https://graph.facebook.com/v2.6/me/messages",
        params={"access_token": token},
        data=json.dumps({
            "recipient": {"id": recipient},
            "message": {"text": text.decode('unicode_escape')}
        }),
        headers={'Content-type': 'application/json'})
    if r.status_code != requests.codes.ok:
        print r.text

if __name__ == '__main__':
    app.run()

We will be modifying the file according to our needs. So basically a Facebook bot works like this:

  1. Facebook sends a request to our server whenever a user messages our page on Facebook.
  2. We respond to the Facebook’s request and store the id of the user and the message which was sent to our page.
  3. We respond to user’s message through Graph API using the stored user id and message id.

A detailed breakdown of the above code is available of this website. In this post I will mainly be focusing on the Reddit integration and how to use the Postgres Database on Heroku.

Before moving further let’s deploy the above Python code onto Heroku. For that you have to create a local Git repository. Follow the following steps:

$ mkdir messenger-bot
$ cd messenger-bot
$ touch requirements.txt app.py Procfile

Execute the above commands in a terminal and put the above Python code into the app.py file. Put the following into Procfile:

web: gunicorn app:app 

Now we need to tell Heroku which Python libraries our app will need to function properly. Those libraries will need to be listed in the requirements.txt file. I am going to fast-forward a bit over here and simply copy the requirements from this post. Put the following lines into requirements.txt file and you should be good to go for now.

click==6.6
Flask==0.11
gunicorn==19.6.0
itsdangerous==0.24
Jinja2==2.8
MarkupSafe==0.23
requests==2.10.0
Werkzeug==0.11.10

Run the following command in the terminal and you should get a similar output:

$ ls
Procfile      app.py     requirements.txt

Now we are ready to create a Git repository which can then be pushed onto Heroku servers. We will carry out the following steps now:

  • Login into Heroku
  • Create a new git repository
  • commit everything into the new repo
  • push the repo onto Heroku

The commands required to achieve this are listed below:

$ heroku login
$ git init
$ heroku git:remote -a 
$ git commit -am "Initial commit"$ git push heroku master
...
remote: https://.herokuapp.com/ deployed to Heroku
...

$ heroku config:set WEB_CONCURRENCY=3

Save the url which is outputted above after “remote” . It is the url of your Heroku app. We will  need it in the next step when we create a Facebook app.

Creating a Facebook App

Firstly we need a Facebook page. It is a requirement by Facebook to supplement every app with a relevant page.

Now we need to register a new app. Go to this app creation page and follow the instructions below.

Now head over to your app.py file and replace the PAT string on line 9 with the Page Access Token we saved above.

Commit everything and push the code to Heroku.

$ git commit -am "Added in the PAT"
$ git push heroku master

Now if you go to the Facebook page and send a message onto that page you will get your own message as a reply from the page. This shows that everything we have done so far is working. If something does not work check your Heroku logs which will give you some clue about what is going wrong. Later, a quick Google search will help you resolve the issue. You can access the logs like this:

$ heroku logs -t -a

Note: Only your msgs will be replied by the Facebook page. If any other random user messages the page his messages will not be replied by the bot because the bot is currently not approved by Facebook. However if you want to enable a couple of users to test your app you can add them as testers. You can do so by going to your Facebook app’s developer page and following the onscreen instructions.

Getting data from Reddit

We will be using data from the following subreddits:

First of all let’s install Reddit’s Python library “praw“. It can easily be done by typing the following instructions in the terminal:

$ pip install praw

Now let’s test some Reddit goodness in a Python shell. I followed the docs which clearly show how to access Reddit and how to access a subreddit. Now is the best time to grab the “client_id” and “client_secret” which we created in the first part of this post.

$ python
Python 2.7.13 (default, Dec 17 2016, 23:03:43) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import praw
>>> reddit = praw.Reddit(client_id='**********',
... client_secret='*****************',
... user_agent='my user agent')

>>> 
>>> submissions = list(reddit.subreddit("GetMotivated").hot(limit=None))
>>> submissions[-4].title
u'[Video] Hi, Stranger.'

Note: Don’t forget to add in your own client_id and client_secret in place of ****

Let’s discuss the important bits here. I am using limit=None because I want to get back as many posts as I can. Initially this feels like an overkill but you will quickly see that when a user starts using the Facebook bot pretty frequently we will run out of new posts if we limit ourselves to 10 or 20 posts. An additional constraint which we will add is that we will only use the image posts from GetMotivated and Memes and only text posts from Jokes and ShowerThoughts. Due to this constraint only one or two posts from top 10 hot posts might be useful to us because a lot of video submissions are also done to GetMotivated.

Now that we know how to access Reddit using the Python library we can go ahead and integrate it into our app.py.

Firstly add some additional libraries into our requirements.txt so that it looks something like this:

$ cat requirements.txt
click==6.6
Flask==0.11
gunicorn==19.6.0
itsdangerous==0.24
Jinja2==2.8
MarkupSafe==0.23
requests==2.10.0
Werkzeug==0.11.10
flask-sqlalchemy
psycopg2
praw

Now if we only wanted to send the user an image or text taken from reddit, it wouldn’t have been very difficult. In the “send_message” function we could have done something like this:

import praw
...

def send_message(token, recipient, text):
    """Send the message text to recipient with id recipient.
    """
    if "meme" in text.lower():
        subreddit_name = "memes"
    elif "shower" in text.lower():
        subreddit_name = "Showerthoughts"
    elif "joke" in text.lower():
        subreddit_name = "Jokes"
    else:
        subreddit_name = "GetMotivated"
    ....

    if subreddit_name == "Showerthoughts":
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            payload = submission.url
            break
    ...
    
    r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"attachment": {
                              "type": "image",
                              "payload": {
                                "url": payload
                              }}
            }),
            headers={'Content-type': 'application/json'})
    ...

But there is one issue with this approach. How will we know whether a user has been sent a particular image/text or not? We need some kind of id for each image/text we send the user so that we don’t send the same post twice. In order to solve this issue we are going to take some help of Postgresql and the reddit posts id (Every post on reddit has a special id).

We are going to use a Many-to-Many relation. There will be two tables:

  • Users
  • Posts

Let’s first define them in our code and then I will explain how it will work:

from flask_sqlalchemy import SQLAlchemy

...
app.config['SQLALCHEMY_DATABASE_URI'] = os.environ['DATABASE_URL']
db = SQLAlchemy(app)

...
relationship_table=db.Table('relationship_table',                            
    db.Column('user_id', db.Integer,db.ForeignKey('users.id'), nullable=False),
    db.Column('post_id',db.Integer,db.ForeignKey('posts.id'),nullable=False),
    db.PrimaryKeyConstraint('user_id', 'post_id') )
 
class Users(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(255),nullable=False)
    posts=db.relationship('Posts', secondary=relationship_table, backref='users' )  

    def __init__(self, name):
        self.name = name
 
class Posts(db.Model):
    id=db.Column(db.Integer, primary_key=True)
    name=db.Column(db.String, unique=True, nullable=False)
    url=db.Column(db.String, nullable=False)

    def __init__(self, name, url):
        self.name = name
        self.url = url

So the user table has two fields. The name will be the id sent with the Facebook Messenger Webhook request. The posts will be linked to the other table, “Posts”. The Posts table has name and url field. “name” will be populated by the reddit submission id and the url will be populated by the url of that post. We don’t really need to have the “url” field. I will be using it for some other uses in the future hence I included it in the code.

So now the way our final code will work is this:

  • We request a list of posts from a particular subreddit. The following code:
    reddit.subreddit(subreddit_name).hot(limit=None)

    returns a generator so we don’t need to worry about memory

  • We will check whether the particular post has already been sent to the user in the past or not
  • If the post has been sent in the past we will continue requesting more posts from Reddit until we find a fresh post
  • If the post has not been sent to the user, we send the post and break out of the loop

So the final code of the app.py file is this:

from flask import Flask, request
import json
import requests
from flask_sqlalchemy import SQLAlchemy
import os
import praw

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = os.environ['DATABASE_URL']
db = SQLAlchemy(app)
reddit = praw.Reddit(client_id='*************',
                     client_secret='****************',
                     user_agent='my user agent')

# This needs to be filled with the Page Access Token that will be provided
# by the Facebook App that will be created.
PAT = '*********************************************'

quick_replies_list = [{
    "content_type":"text",
    "title":"Meme",
    "payload":"meme",
},
{
    "content_type":"text",
    "title":"Motivation",
    "payload":"motivation",
},
{
    "content_type":"text",
    "title":"Shower Thought",
    "payload":"Shower_Thought",
},
{
    "content_type":"text",
    "title":"Jokes",
    "payload":"Jokes",
}
]
@app.route('/', methods=['GET'])
def handle_verification():
    print "Handling Verification."
    if request.args.get('hub.verify_token', '') == 'my_voice_is_my_password_verify_me':
        print "Verification successful!"
        return request.args.get('hub.challenge', '')
    else:
        print "Verification failed!"
        return 'Error, wrong validation token'

@app.route('/', methods=['POST'])
def handle_messages():
    print "Handling Messages"
    payload = request.get_data()
    print payload
    for sender, message in messaging_events(payload):
        print "Incoming from %s: %s" % (sender, message)
        send_message(PAT, sender, message)
    return "ok"

def messaging_events(payload):
    """Generate tuples of (sender_id, message_text) from the
    provided payload.
    """
    data = json.loads(payload)
    messaging_events = data["entry"][0]["messaging"]
    for event in messaging_events:
        if "message" in event and "text" in event["message"]:
            yield event["sender"]["id"], event["message"]["text"].encode('unicode_escape')
        else:
            yield event["sender"]["id"], "I can't echo this"


def send_message(token, recipient, text):
    """Send the message text to recipient with id recipient.
    """
    if "meme" in text.lower():
        subreddit_name = "memes"
    elif "shower" in text.lower():
        subreddit_name = "Showerthoughts"
    elif "joke" in text.lower():
        subreddit_name = "Jokes"
    else:
        subreddit_name = "GetMotivated"

    myUser = get_or_create(db.session, Users, name=recipient)

    if subreddit_name == "Showerthoughts":
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            if (submission.is_self == True):
                query_result = Posts.query.filter(Posts.name == submission.id).first()
                if query_result is None:
                    myPost = Posts(submission.id, submission.title)
                    myUser.posts.append(myPost)
                    db.session.commit()
                    payload = submission.title
                    break
                elif myUser not in query_result.users:
                    myUser.posts.append(query_result)
                    db.session.commit()
                    payload = submission.title
                    break
                else:
                    continue  

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"text": payload,
                            "quick_replies":quick_replies_list}
            }),
            headers={'Content-type': 'application/json'})
    
    elif subreddit_name == "Jokes":
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            if ((submission.is_self == True) and ( submission.link_flair_text is None)):
                query_result = Posts.query.filter(Posts.name == submission.id).first()
                if query_result is None:
                    myPost = Posts(submission.id, submission.title)
                    myUser.posts.append(myPost)
                    db.session.commit()
                    payload = submission.title
                    payload_text = submission.selftext
                    break
                elif myUser not in query_result.users:
                    myUser.posts.append(query_result)
                    db.session.commit()
                    payload = submission.title
                    payload_text = submission.selftext
                    break
                else:
                    continue  

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"text": payload}
            }),
            headers={'Content-type': 'application/json'})

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"text": payload_text,
                            "quick_replies":quick_replies_list}
            }),
            headers={'Content-type': 'application/json'})
        
    else:
        payload = "http://imgur.com/WeyNGtQ.jpg"
        for submission in reddit.subreddit(subreddit_name).hot(limit=None):
            if (submission.link_flair_css_class == 'image') or ((submission.is_self != True) and ((".jpg" in submission.url) or (".png" in submission.url))):
                query_result = Posts.query.filter(Posts.name == submission.id).first()
                if query_result is None:
                    myPost = Posts(submission.id, submission.url)
                    myUser.posts.append(myPost)
                    db.session.commit()
                    payload = submission.url
                    break
                elif myUser not in query_result.users:
                    myUser.posts.append(query_result)
                    db.session.commit()
                    payload = submission.url
                    break
                else:
                    continue

        r = requests.post("https://graph.facebook.com/v2.6/me/messages",
            params={"access_token": token},
            data=json.dumps({
                "recipient": {"id": recipient},
                "message": {"attachment": {
                              "type": "image",
                              "payload": {
                                "url": payload
                              }},
                              "quick_replies":quick_replies_list}
            }),
            headers={'Content-type': 'application/json'})

    if r.status_code != requests.codes.ok:
        print r.text

def get_or_create(session, model, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance
    else:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
        return instance

relationship_table=db.Table('relationship_table',                            
    db.Column('user_id', db.Integer,db.ForeignKey('users.id'), nullable=False),
    db.Column('post_id',db.Integer,db.ForeignKey('posts.id'),nullable=False),
    db.PrimaryKeyConstraint('user_id', 'post_id') )
 
class Users(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(255),nullable=False)
    posts=db.relationship('Posts', secondary=relationship_table, backref='users' )  

    def __init__(self, name=None):
        self.name = name
 
class Posts(db.Model):
    id=db.Column(db.Integer, primary_key=True)
    name=db.Column(db.String, unique=True, nullable=False)
    url=db.Column(db.String, nullable=False)

    def __init__(self, name=None, url=None):
        self.name = name
        self.url = url

if __name__ == '__main__':
    app.run()

So put this code into app.py file and send it to Heroku.

$ git commit -am "Updated the code with Reddit feature"
$ git push heroku master

One last thing is still remaining. We need to tell Heroku that we will be using the database. It is simple. Just issue the following command in the terminal:

$ heroku addons:create heroku-postgresql:hobby-dev --app <app_name>

This will create a free hobby database which is enough for our project. Now we only need to initialise the database with the correct tables. In order to do that we first need to run the Python shell on our Heroku server:

$ heroku run python

Now in the Python shell type the following commands:

>>> from app import db
>>> db.create_all()

So now our project is complete. Congrats!

Let me discuss some interesting features of the code. Firstly, I am making use of the “quick-replies” feature of Facebook Messenger Bot API. This allows us to send some pre-formatted inputs which the user can quickly select. They will look something like this:

It is easy to display these quick replies to the user. With every post request to the Facebook graph API we send some additional data:

quick_replies_list = [{
 "content_type":"text",
 "title":"Meme",
 "payload":"meme",
},
{
 "content_type":"text",
 "title":"Motivation",
 "payload":"motivation",
},
{
 "content_type":"text",
 "title":"Shower Thought",
 "payload":"Shower_Thought",
},
{
 "content_type":"text",
 "title":"Jokes",
 "payload":"Jokes",
}]

Another interesting feature of the code is how we determine whether a post is a text, image or a video post. In the GetMotivated subreddit some images don’t have a “.jpg” or “.png” in their url so we rely on

submission.link_flair_css_class == 'image'

This way we are able to select even those posts which do not have a known image extension in the url.

You might have noticed this bit of code in the app.py file:

payload = "http://imgur.com/WeyNGtQ.jpg"

It makes sure that if no new posts are found for a particular user (every subreddit has a maximum number of “hot” posts) we have at least something to return. Otherwise we will get a variable undeclared error.

Create if the User doesn’t exist:

The following function checks whether a user with the particular name exists or not. If it exists it selects that user from the db and returns it. In case it doesn’t exist (user), it creates it and then returns that newly created user:

myUser = get_or_create(db.session, Users, name=recipient)
...

def get_or_create(session, model, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance
    else:
        instance = model(**kwargs)
        session.add(instance)
        session.commit()
        return instance

I hope you guys enjoyed the post. Please comment below if you have any questions. I am also starting premium advertising on the blog. This will either be in the form of sponsored posts or blog sponsorship for a particular time. I am still fleshing out the details. If your company works with Python and wants to reach out to potential customers, please email me on yasoob (at) gmail.com.

Source: You can get the code from GitHub as well



Yasoob Khalid: Recovering lost Python source code if it’s still resident in-memory

$
0
0

I read this on GitHub Gist the other day. I don’t know whether I will ever use it but I am still putting this on my blog for the sake of bookmarking it. Who knows? Someone from the audience might end up using it!

I screwed up using git (“git checkout –” on the wrong file) and managed to delete the code I had just written… but it was still running in a process in a docker container. Here’s how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container


Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb

Install pyrasite – this will let you attach a Python shell to the still-running process

pip install pyrasite

Install uncompyle6, which will let you get Python source code back from in-memory code objects

pip install uncompyle6

Find the PID of the process that is still running

ps aux | grep python

Attach an interactive prompt using pyrasite

pyrasite-shell <PID>

Now you’re in an interactive prompt! Import the code you need to recover

>>> from my_package import my_module

Figure out which functions and classes you need to recover

>>> dir(my_module)
['MyClass', 'my_function']

Decompile the function into source code

>>> import uncompyle6
>>> import sys
>>> uncompyle6.main.uncompyle(
    2.7, my_module.my_function.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
function_body = "appears here"

For the class, you’ll need to decompile each method in turn

>>> uncompyle6.main.uncompyle(
    2.7, my_module.MyClass.my_method.im_func.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
class_method_body = "appears here"

I hope you guys like this post. Stay tuned for the next one in the upcoming days.


Yasoob Khalid: Importing with ctypes in Python: fighting overflows

$
0
0

Introduction

On some cold winter night, we’ve decided to refactor a few examples and tests for Python wrapper in Themis, because things have to be not only efficient and useful, but elegant as well. One thing after another, and we ended up revamping Themis error codes a bit.

Internal error and status flags sometimes get less attention than crypto-related code: they are internals for internal use. Problem is, when they fail, they might break something more crucial in a completely invisible way.

Since best mistakes are mistakes which are not just fixed, but properly analyzed, reflected and recorded, we wrote this small report on a completely boring matter: every edge and connection is a challenge. This story is a reflection on a typical issue: different people working on different layers of one large product, and then look around to wipe out the technical debt.

Strange tests behavior

Anytime we touch Themis wrapper code, we touch the tests because pesticide paradox in software development is no small problem.

It all started with Secure Comparator tests:

# test.pyfrom pythemis.scomparator import scomparator, SCOMPARATOR_CODES

secret = b'some secret'

alice = scomparator(secret)
bob = scomparator(secret)
data = alice.begin_compare()

while (alice.result() == SCOMPARATOR_CODES.NOT_READY and
               bob.result() == SCOMPARATOR_CODES.NOT_READY):
    data = alice.proceed_compare(bob.proceed_compare(data))

assert alice.result() != SCOMPARATOR_CODES.NOT_MATCH
assert bob.result() != SCOMPARATOR_CODES.NOT_MATCH

This test attempts to run Secure Comparator with a constant secret, this way making sure that comparison ends in a positive result (flag is called SCOMPARATOR_CODES.MATCH). If the secret is matched, tests should finish with success.

Secure Comparator can result in neither SCOMPARATOR_CODES.NOT_MATCH or SCOMPARATOR_CODES.MATCH.

But why the assert has to be just a negative comparison if we’re testing a feature with boolean state? Checking against non-equality of NOT_MATCH does not automatically mean it matches.

The first reaction is obviously to see if it even works (via example code). It did.

Here, the verification code tests for equality, thankfully:

if comparator.is_equal():
    print("match")
else:
    print("not match")

Fine, so the problem touches only tests. Let’s rewrite assert so that it compares scomparator.result() against SCOMPARATOR_CODES.MATCH correct expected state:

# test.py...

assert alice.result() == SCOMPARATOR_CODES.MATCH
assert bob.result() == SCOMPARATOR_CODES.MATCH

… and bump into unexpected error:

# python test.py

Traceback (most recent call last):
  File "test.py", line23, in 
    assert alice.result() == SCOMPARATOR_CODES.MATCH
AssertionError

A routine fix of testing of absolutely working feature quickly turns into an interesting riddle. We’ve added variable output for debugging to see what’s really going on inside:

# test.py...

print('alice.result(): {}\nNOT_MATCH: {}\nMATCH: {}'.format(
    alice.result(),
    SCOMPARATOR_CODES.NOT_MATCH,
    SCOMPARATOR_CODES.MATCH
))
assert alice.result() == SCOMPARATOR_CODES.MATCH
assert bob.result() == SCOMPARATOR_CODES.MATCH

… and get the completely unexpected:

# python test.py

alice.result(): -252645136NOT_MATCH: -1MATCH:4042322160
Traceback (most recent call last):
  File "test.py", line 23, in 
    assert alice.result() == SCOMPARATOR_CODES.MATCH
AssertionError

How come?

>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)
...>>> import ctypes
>>> print(ctypes.sizeof(ctypes.c_int))
4

Even though OS, Python and Themis are 64bit, PyThemis wrapper is made using ctypes, which has 32-bit int type.

Accordingly, receiving 0xf0f0f0f0 from C Themis, ctypes expects a 32-bit number but 0xf0f0f0f0 is a negative number. Then Python attempts to convert it to an integer without any bit length limit, and literal 0xf0f0f0f0 (from SCOMPARATOR_CODES) turns into 4042322160.

This is strange. Let’s dive a bit into Themis:

src/soter/error.h:

// 

...
/** @brief return type */
typedef int soter_status_t;

/**
 * @addtogroup SOTER
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */#define SOTER_SUCCESS 0#define SOTER_FAIL   -1#define SOTER_INVALID_PARAMETER -2#define SOTER_NO_MEMORY -3#define SOTER_BUFFER_TOO_SMALL -4#define SOTER_DATA_CORRUPT -5#define SOTER_INVALID_SIGNATURE -6#define SOTER_NOT_SUPPORTED -7#define SOTER_ENGINE_FAIL -8

...

typedef int themis_status_t;

/**
 * @addtogroup THEMIS
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */#define THEMIS_SSESSION_SEND_OUTPUT_TO_PEER 1#define THEMIS_SUCCESS SOTER_SUCCESS#define THEMIS_FAIL   SOTER_FAIL#define THEMIS_INVALID_PARAMETER SOTER_INVALID_PARAMETER#define THEMIS_NO_MEMORY SOTER_NO_MEMORY#define THEMIS_BUFFER_TOO_SMALL SOTER_BUFFER_TOO_SMALL#define THEMIS_DATA_CORRUPT SOTER_DATA_CORRUPT#define THEMIS_INVALID_SIGNATURE SOTER_INVALID_SIGNATURE#define THEMIS_NOT_SUPPORTED SOTER_NOT_SUPPORTED#define THEMIS_SSESSION_KA_NOT_FINISHED -8#define THEMIS_SSESSION_TRANSPORT_ERROR -9#define THEMIS_SSESSION_GET_PUB_FOR_ID_CALLBACK_ERROR -10

src/themis/secure_comparator.h:

...#define THEMIS_SCOMPARE_MATCH 0xf0f0f0f0#define THEMIS_SCOMPARE_NO_MATCH THEMIS_FAIL#define THEMIS_SCOMPARE_NOT_READY 0...

themis_status_t secure_comparator_destroy(secure_comparator_t *comp_ctx);

themis_status_t secure_comparator_append_secret(secure_comparator_t *comp_ctx, const void *secret_data, size_t secret_data_length);

themis_status_t secure_comparator_begin_compare(secure_comparator_t *comp_ctx, void *compare_data, size_t *compare_data_length);
themis_status_t secure_comparator_proceed_compare(secure_comparator_t *comp_ctx, const void *peer_compare_data, size_t peer_compare_data_length, void *compare_data, size_t *compare_data_length);

themis_status_t secure_comparator_get_result(const secure_comparator_t *comp_ctx);

Now let’s see PyThemis side at src/wrappers/themis/python/pythemis/exception.py.

All values here correspond to C code, numbers are small and fit any bit length limits:

from enum import IntEnum

class THEMIS_CODES(IntEnum):
    NETWORK_ERROR = -2222
    BUFFER_TOO_SMALL = -4
    FAIL = -1
    SUCCESS = 0
    SEND_AS_IS = 1...

What about Secure Comparator part? Looking at the src/wrappers/themis/python/pythemis/scomparator.py, we see that overall values are fine, but Comparator’s value for SCOMPARATOR_CODES.MATCH is problematic and becomes negative in 32-bit int:

...

class SCOMPARATOR_CODES(IntEnum):
    MATCH = 0xf0f0f0f0
    NOT_MATCH = THEMIS_CODES.FAIL
    NOT_READY = 0...

If we cast it to signed 4 byte number, we receive -252645136 where we expect 4042322160.

So the problem is on the seams between C and Python, where our code 0xf0f0f0f0 gets misinterpreted.

Possible solutions

The whole problem is a minor offense, easy to fix with a clutch, but the whole endeavor was to eliminate technical debt, not create more of it.

Option 1. Add strong type casting when importing variables via ctypes:

Extremely simple clutch. Since we know how ctypes acts in this case, we can explicitly make code perceive it as unsigned, then 0xf0f0f0f0 as int64_t will be equal to the interpretation of uint32_t. To do that, we would simply:

Add either

themis.secure_comparator_get_result.restype = ctypes.c_int64

or

themis.secure_comparator_get_result.restype = ctypes.c_uint

into src/wrappers/themis/python/pythemis/scomparator.py.

But that looks a bit like an ugly clutch, which additionally requires verifying the correctness of ctypes behavior on 32-bit machine with 32-bit Python.

Option 2. Change from one byte representation to another:

Hack number two. Remove implicit interpretation of hex literal 0xf0f0f0f0 and just give it the right value, in this context -252645136. This will fix the problem in Python wrapper, but we still will need additional verification on a 32bit system and keep an eye on it in future.

Not an option if you can avoid it.

Option 3. Refactor all statuses in C library, never use negative numbers or values near type maximums to avoid overflows.

The easiest would be the second option: since it’s one such error in one wrapper, why even bother? Fix it right away and forget about it. But having problems even once is sometimes enough to see a need for certain standardisation.

We took the third path, and re-thought the principle behind status flags a bit:

  • Never use negative numbers, because -1 in 32bit is 0xffffffff, in 64bit is 0xffffffffffffffff and one can easily hit into overflow quite soon.
  • Use small positive numbers for error codes and statuses. Since Themis is supposed to work across many architectures and (theoretically), there might be a weird 9bit kitchen sink processor (they do need more robots to join DoS armies, so have our word, it will happen sooner or later), we decided to limit flag length with (0..127).
  • In Themis part, which is directly facing the wrappers, we’ve changed ints to explicit int32_t.

Since changing error code system in C library affects all wrappers, and their error codes should be adjusted accordingly, we’ve decided to get error codes from C code directly via variable export where possible (Go, NodeJS, Java, PHP).

After refactoring, error codes in Themis started to look like:

src/soter/error.h:

...

/** @brief return type */
typedef int soter_status_t;

/**
 * @addtogroup SOTER
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */

#define SOTER_SUCCESS 0//success code

//error codes
#define SOTER_FAIL          11#define SOTER_INVALID_PARAMETER     12#define SOTER_NO_MEMORY         13#define SOTER_BUFFER_TOO_SMALL      14#define SOTER_DATA_CORRUPT      15#define SOTER_INVALID_SIGNATURE     16#define SOTER_NOT_SUPPORTED         17#define SOTER_ENGINE_FAIL       18...

/** @brief return type */
typedef int32_t themis_status_t;

/**
 * @addtogroup THEMIS
 * @{
 * @defgroup SOTER_ERROR_CODES status codes
 * @{
 */

//
#define THEMIS_SUCCESS              SOTER_SUCCESS#define THEMIS_SSESSION_SEND_OUTPUT_TO_PEER     1

//errors
#define THEMIS_FAIL                     SOTER_FAIL#define THEMIS_INVALID_PARAMETER            SOTER_INVALID_PARAMETER#define THEMIS_NO_MEMORY                SOTER_NO_MEMORY#define THEMIS_BUFFER_TOO_SMALL             SOTER_BUFFER_TOO_SMALL#define THEMIS_DATA_CORRUPT                 SOTER_DATA_CORRUPT#define THEMIS_INVALID_SIGNATURE            SOTER_INVALID_SIGNATURE#define THEMIS_NOT_SUPPORTED                SOTER_NOT_SUPPORTED#define THEMIS_SSESSION_KA_NOT_FINISHED         19#define THEMIS_SSESSION_TRANSPORT_ERROR         20#define THEMIS_SSESSION_GET_PUB_FOR_ID_CALLBACK_ERROR   21#define THEMIS_SCOMPARE_SEND_OUTPUT_TO_PEER         THEMIS_SSESSION_SEND_OUTPUT_TO_PEER...

src/themis/secure_comparator.h:

...#define THEMIS_SCOMPARE_MATCH       21#define THEMIS_SCOMPARE_NO_MATCH    22#define THEMIS_SCOMPARE_NOT_READY   0...

… and, accordingly, in PyThemis:

...

class THEMIS_CODES(IntEnum):
    NETWORK_ERROR = 2222
    BUFFER_TOO_SMALL = 14
    FAIL = 11
    SUCCESS = 0
    SEND_AS_IS = 1...

Note: NETWORK_ERROR is PyThemis specific and is not used in C part, so we kept it the way it was.

src/wrappers/themis/python/pythemis/scomparator.py:

...

class SCOMPARATOR_CODES(IntEnum):
    MATCH = 21
    NOT_MATCH = 22
    NOT_READY = 0...

For example, this is how direct importing of these flags in Go works:

gothemis/compare/compare.go:

package compare

/*
#cgo LDFLAGS: -lthemis -lsoter...

const int GOTHEMIS_SCOMPARE_MATCH = THEMIS_SCOMPARE_MATCH;
const int GOTHEMIS_SCOMPARE_NO_MATCH = THEMIS_SCOMPARE_NO_MATCH;
const int GOTHEMIS_SCOMPARE_NOT_READY = THEMIS_SCOMPARE_NOT_READY;
*/
import "C"
import (
    "github.com/cossacklabs/themis/gothemis/errors""runtime""unsafe"
)

var (
    COMPARE_MATCH = int(C.GOTHEMIS_SCOMPARE_MATCH)
    COMPARE_NO_MATCH = int(C.GOTHEMIS_SCOMPARE_NO_MATCH)
    COMPARE_NOT_READY = int(C.GOTHEMIS_SCOMPARE_NOT_READY)
)

...

Results

After fixing and refactoring, the new scomparator class looks like:

classSComparator(object):# the same
....

    defis_compared(self):returnnot (themis.secure_comparator_get_result(self.comparator_ctx) ==
                    SCOMPARATOR_CODES.NOT_READY)

    defis_equal(self):return (themis.secure_comparator_get_result(self.comparator_ctx) ==
                SCOMPARATOR_CODES.MATCH)

And the new test code, finally refactored to a decent look:

import unittest

from pythemis import scomparator

classSComparatorTest(unittest.TestCase):defsetUp(self):
        self.message = b"This is test message"
        self.message1 = b"This is test message2"deftestComparation(self):
        alice = scomparator.SComparator(self.message)
        bob = scomparator.SComparator(self.message)
        data = alice.begin_compare()
        whilenot (alice.is_compared() and bob.is_compared()):
            data = alice.proceed_compare(bob.proceed_compare(data))
        self.assertTrue(alice.is_equal())
        self.assertTrue(bob.is_equal())

    deftestComparation2(self):
        alice = scomparator.SComparator(self.message)
        bob = scomparator.SComparator(self.message1)
        data = alice.begin_compare()
        whilenot (alice.is_compared() and bob.is_compared()):
            data = alice.proceed_compare(bob.proceed_compare(data))
        self.assertFalse(alice.is_equal())
        self.assertFalse(bob.is_equal())    

# python scomparator_test.py 
..
----------------------------------------------------------------------
Ran 2 tests in0.064s

OK

Conclusions

We love taking the time exploring minor, boring, trivial matters. Apart from willing to give everybody a better Themis experience, we use it every day to build differenttools and would like to be extremely confident that behind a nice API, which isolates all implementation details we might accidently break, the implementations are correct.

As with any bug, most of the conclusions sound like coming from the gods of copybook headings, once you know them:

  • Use types of explicit sizes (int16_t, int32_t, int8_t) to be less dependent of user architectures.
  • Watch for type overflows in signed types.
  • Try to explicitly test all possible return status flags in tests.
  • !false is true only in boolean representation. Once you encode it in numbers, don’t rely on one-sided evaluation. If you’re comparing ints, which represent the two states,- there can be a million reasons why !false is actually kittens, not true. Two mutually exclusive states do not mean your system will not generate N-2 more states because of some error.

Note: This post was written by the people at Cossack Labs. The original post is available here.


Yasoob Khalid: Your first talk

$
0
0

Hi there folks. It’s been a long time since I wrote on this blog. I have been very busy with university applications. A lot has happened recently which I will love to share with you. Firstly, I got a news from a friend that my book is being used in McGill University to teach Python programming. That is something I have always wanted, write a book which is used by well-known universities to teach programming. But this post is not about that. I wanted to share how to deliver a good first talk. People dread sharing their first talk with the world because it is mostly filled with “aaahhh”s and fast paced speech but I am sharing mine so that you guys know what “not” to do during your first talk.

1. Keep the slides short

We fail to realize the importance of this. Let me share the theory behind it. When you display a slide with everything written on it people will be able to read it much more quickly than you will be able to explain it. This makes your talk boring because the audience has to listen to everything which they have already read on the slide. A good rule of thumb is to write only one or two short phrases. Don’t write their explanation on the slide. Write those explanations in notes.

2. Use simple transitions

This is also important, especially for tech talks. People are there to look at the quality of content in your presentation not to look at super cool transitions. The major issue with using elaborate transitions is that they take up a lot of time of your talk which could have been used to present a couple of more interesting ideas. Another issue is that they take the focus away from you and the content of your presentation. We need to retain as much attention on you and the content as possible. Hence a short and simple sliding transition is all you need. However, if you really want to use fancy stuff try to keep it brief so that it doesn’t overshadow your main presentation.

3. Speak slowly

In my talk you can clearly see how fast I am speaking. It was due to nervousness. I can safely say that I speak much more slowly now and it is much more easy to understand whatever I say. Make sure that you speak slowly during your talk. Most first-time speakers don’t discuss this because they feel they have already got this under control. However, I have seen a lot of first-time speakers making the same mistake. Make sure that you pace yourself and speak each word clearly. The pace of your speech can make or break your talk.

4. Vary your tone

At PyCons, specifically, the normal talk size is 30-35 minutes. If you keep your tone same throughout it becomes monotonous. You will loose people’s attention very quickly and in extreme cases it might lead to people abandoning your talk.

5. Don’t Code Live in front of an Audience

Never ever code live if it is within your power. It is because a lot of unforeseen issues can crop up on the day of the presentation. You might become nervous. You might not be able to type as quickly as you want to while standing. You might make typing errors. It is much better to prevent this issue altogether by not coding live.

That’s all for today. If you have some other tips for new speakers please do share them in the comments below. I love hearing from my readers.


Yasoob Khalid: Interesting Python Tutorials

$
0
0

Hi there folks! I have read some interesting Python tutorials lately. I would love to share them with you. Without any further ado let me list them over here:

1. Composing Music With Recurrent Neural Networks

I loved this tutorial. It is a bit old but still worth a read. The author has explained the theory behind his implementation. You will enjoy this tutorial if you are interested in signal processing, machine learning and/or music.

2. Page dewarping using OpenCV

This was an interesting read. I am not well versed in computer vision but still loved to read the theory behind the dewarping of an image of a curled page. The author does a great job at explaining the whole process and the algorithms used.

3. 10 Interesting Python Modules to learn in 2016

This is a good compilation of some of the famous Python libraries and modules. I have personally used almost all of them. I am linking this here because not only does this article lists the modules but because it also provides sample code for the specific module being discussed.

4. Modern face recognition with Deep Learning

This article shows how modern face recognition works. The author takes you from isolating a face from an image to predicting which person does that face belong to. I learned a lot of new stuff from this tutorial. For instance, I had no idea what the HOG algorithm did before I read this tutorial.

5. How to score 0.8134 in Titanic Kaggle Challenge

This was a highly informative read. I learned the basic workflow of a data scientist. The author does a great job of teaching you the basics of data science. He starts from exploratory data analysis of the data-set and ends with hyper parameter tuning of his predictive models.

I hope you will enjoy reading these articles. If there is anything you would like to ask me just know that I am only an email away. I reply to most of the emails I get. Even if you want to discuss any freelancing opportunity just hit me up. This is my email.

Till next time!


Yasoob Khalid: Intermediate Python conquers the World! (Almost)

$
0
0

Hi there folks! I hope you are all fine. It’s been almost a year since I published Intermediate Python. It was my life goal to publish a book which really helps people. Today I saw the stats of the book after a long time. I was pretty ecstatic to know that the English version (It is also available in Chinese and Russian) of the book has been read in 181 countries. Just 15 countries short of the whole world.It is also being used in various institutes as training material for the programmers. If you have ever read this book and can spare two minutes of your valuable time then I would love to hear your feedback. The length of the feedback can range from one word to a whole page. You can submit your feedback in the comments below or direct it to my email.

Getting to know that my work really helps people motivates me to work harder and do more awesome stuff.

I hope to hear soon from you guys! Stay happy and stay blessed 🙂


Yasoob Khalid: Support me on Patreon

$
0
0

Hi there folks! I have been writing regular blog-posts since 2013. I have been documenting my Python journey since then. Almost every new thing which I learn finds it’s way to the blog.

I haven’t only been writing blog-posts but I also publish a weekly newsletter and have also published a widely read book (Intermediate Python). Over the years I haven’t monetized. I have tried to keep everything free so that the maximum number of people can benefit from my work. Even my book was published under creative commons. It has been translated into Chinese and Russian as well.

Now it is becoming increasingly difficult for me to continue churning out great content. It takes time and effort. I hope that this Patreon campaign would help to displace most of the costs associated with my work and encourage me to continue posting worth-reading articles. The costs include:

  • Website hosting
  • Domain renewals
  • Personal meals

If you enjoy my work and would love to see me continue producing great work in the future then please support me. Every little bit counts!

If you feel that the rewards on the campaign page can be better formalized then please let me know. I would be more than happy to incorporate your suggestion.

If you have any questions then comment below. I will make sure to reply each and every one of you.

Link to Campaign


Yasoob Khalid: 400+ Free Resources for DevOps & Sysadmins

$
0
0

As an Python advocate and educator, I’m always looking for ways to make my job (and yours) easier. This list put together by Morpheus Data offers a ton of great resources for Python users (more than 25 tools specific to Python) and other DevOps and Sysadmins. Enjoy.

Table of Contents

Source Code Repos

  • bitbucket.org— Free unlimited public and private repos (Git and Mercurial) for up to 5 users
  • chiselapp.com— Unlimited public and private Fossil repositories
  • github.com— Free for an unlimited number of public repositories
  • about.gitlab.com— Unlimited public and private Git repos with unlimited collaborators
  • hub.jazz.net— Unlimited public repos, private repos free for up to 3 accounts
  • visualstudio.com— Free unlimited private repos (Git and TFS) for up to 5 users per team
  • fogcreek.com— Free unlimited public and private repos (hybrid of Git and Mercurial) for 2 users
  • plasticscm.com— Free for individuals, OSS and nonprofits organizations
  • cloud.google.com— Free private Git repositories hosted on Google Cloud Platform. Supports syncing with existing GitHub and Bitbucket repos. Free Beta for up to 500 MB of storage

Tools for Teams and Collaboration

  • scinote.net— scientific data management & team collaboration. One Team with Unlimited number of users, backup and 1GB storage space
  • appear.in— One click video conversations, for free
  • flowdock.com— Chat and inbox, free for teams up to 5
  • slack.com— Free for unlimited users with some feature limitations
  • hipchat.com— Free for unlimited users with some feature limitations
  • gitter.im— Chat, for GitHub. Unlimited public & private rooms, free for teams of up to 25
  • hangouts.google.com— One place for all your Conversations, for free, need a Google account
  • seafile.com— Private or cloud storage, file sharing, sync, discussions. Private version is full. Cloud version has just 1 GB
  • sameroom.io— Free for unlimited users with some feature limitations
  • yammer.com— Private social network standalone or for MS Office 365. Free, just a bit less admin tools and users management features
  • helpmonks.com— Shared inbox for teams, free for Open Source and nonprofit organizations
  • typetalk.in— Share and discuss ideas with your team through instant messaging on the web or on your mobile
  • talky.io— Free group video chat. Anonymous. Peer‑to‑peer. No plugins, signup, or payment required
  • sourcetalk.net— Code discussion tool, free for open code talks
  • helplightning.com— Help over video with augmented reality. Free without analytics, encryption, support
  • evernote.com— Tool for organizing information. Share your notes and work together with others
  • wunderlist.com— Share your lists and work collaboratively on projects with your colleagues, free on iPhone, iPad, Mac, Android, Windows and the web
  • doodle.com— The scheduling tool you’ll actually use. Find a date for a meeting two times faster
  • sendtoinc.com— Share links, notes, files and have discussions. Free for 3 and 100 MB
  • zoom.us— Secure Video and Web conferencing, add-ons available. Free limited to 40 minutes
  • ideascale.com— Allow clients to submit ideas and vote, free for 25 members in 1 community
  • filehero.io— Make it easy to access your company’s file storage from a corporate download page. Free for 5 concurrent downloads
  • wistia.com— Video hosting with viewer analytics, HD video delivery, and marketing tools to help understand your visitors, 25 videos and Wistia branded player
  • cnverg.com— Real-time shared visual workspace, whiteboard, GitHub integration. Free 5 GB, 5 spaces and 5 collaborators, no GitHub repos

Code Quality

  • tachikoma.io— Dependency Update for Ruby, Node.js, Perl projects, free for Open Source
  • gemnasium.com— Dependency Update for Ruby, Node.js projects, free for Open Source
  • deppbot.com— Automated Dependency Updates for Ruby projects, free for Open Source
  • landscape.io— Code Quality for Python projects, free for Open Source
  • codeclimate.com— Automated code review, free for Open Source
  • houndci.com— Comments on GitHub commits about code quality, free for Open Source
  • coveralls.io— Display test coverage reports, free for Open Source
  • scrutinizer-ci.com— Continuous inspection platform, free for Open Source
  • codecov.io— Code coverage tool (SaaS), free for Open Source
  • insight.sensiolabs.com— Code Quality for PHP/Symfony projects, free for Open Source
  • codacy.com— Automated code reviews for PHP, Python, Ruby, Java, JavaScript, Scala, CSS and CoffeeScript, free for Open Source
  • pullreview.com— Automated Code Review for Ruby in GitHub, Bitbucket and GitLab, free for Open Source
  • gocover.io— Code coverage for any Go package
  • goreportcard.com/— Code Quality for Go projects, free for Open Source
  • inch-ci.org— Documentation badges for Ruby, JS and Elixir
  • scan.coverity.com— Static code analysis for Java, C/C++, C# and JavaScript, free for Open Source
  • webceo.com— SEO tools but with also code verifications and different type of advices
  • zoompf.com— Fix the performance of your web sites, detailed analysis
  • websitetest.com— Yotta’s tool to optimize web sites, free limited version online
  • gtmetrix.com— Reports and thorough recommendations to optimize websites
  • browserling.com— Live interactive cross-browser testing, free only 3 min. sessions with MS IE 9 under Vista at 1024 x 768 resolution
  • loadfocus.com— Load and speed tests for websites, mobile apps and APIs, monitoring,… Free 5 tests/month, 120 clients/test, 1 monitor, 1 location,…
  • versioneye.com— Monitor your source code and notify about outdated dependencies. Free for Open Source and public repos
  • beanstalkapp.com— A complete workflow to write, review & deploy code), free account for 1 user and 1 repository, with 100 MB of storage
  • testanywhere.co— Automatic test website or web app continuously and catch bugs in the early stages, free 1,000 tests/month
  • srcclr.com— SourceClear to scan source code for vulnerabilities, multi-languages and OS

Code Search and Browsing

  • sourcegraph.com— Java, Go, Python, Node.js, etc., code search/cross-references, free for Open Source
  • searchcode.com— Comprehensive text-based code search, free for Open Source

CI / CD

  • codeship.com— 100 private builds/month, 5 private projects, unlimited for Open Source
  • circleci.com— Free for one concurrent build
  • travis-ci.org— Free for public GitHub repositories
  • wercker.com— Free for public and private repositories
  • drone.io— CI platform that includes browser testing, free for Open Source
  • semaphoreci.com— 100 private builds/month, unlimited for Open Source
  • shippable.com— Free for 1 build container, private and public repos, unlimited builds
  • snap-ci.com— Free for public repositories, 1 build at the time
  • appveyor.com— CD service for Windows, free for Open Source
  • github.com— Comparison of Continuous Integration services
  • ftploy.com— 1 project with unlimited deployments
  • deployhq.com— 1 project with 10 daily deployments
  • hub.jazz.net— 60 minutes of free build time/month
  • styleci.io— Public GitHub repositories only
  • bitrise.io— iOS CI/CD with 200 free builds/month
  • saucelabs.com— CI with scalable testing for mobile and web apps, free for Open Source
  • buddybuild.com— Build, deploy and gather feedback for your iOS and Android apps in one seamless, iterative system.

Automated Browser Testing

  • gridlastic.com— Selenium Grid testing with free plan up to 4 simultaneous selenium nodes/10 grid starts/4,000 test minutes per month
  • browserstack.com— Manual and automated browser testing, free for Open Source
  • EveryStep-Automation.com— Records and replays all steps made in a web browser and creates scripts,… free with fewer options

Security and PKI

  • threatconnect.com— Threat intelligence: It is designed for individual researchers, analysts, and organizations who are starting to learn about cyber threat intelligence. Free upto 3 Users
  • crypteron.com— Cloud-first, developer-friendly security platform prevents data breaches in .NET and Java applications
  • snyk.io— Snyk found and reported several vulnerabilities in the package.Limited to 1 private project (unlimited for open source projects)
  • vaddy.net— Continuous web security testing with continuous integration (CI) tools. 3 domains, 10 scans history for free
  • letsencrypt.org— Free SSL Certificate Authority with certs trusted by all major browsers
  • globalsign.com— Free SSL certificates for Open Source
  • startssl.com— Free SSL certs
  • wosign.com— Free SSL certs. Up to 5 domain names for 2 years period. China authority
  • soclall.com— Free up to 1,000 users login, post, share through top 20+ social networks
  • stormpath.com— Free user management, authentication, social login, and SSO
  • auth0.com— Hosted free for development SSO
  • getclef.com— New take on auth unlimited free tier for anyone not using premium features
  • ringcaptcha.com— Tools to use phone number as id, available for free
  • ssllabs.com— Very deep analysis of the configuration of any SSL web server
  • qualys.com— Find web app vulnerabilities, audit for OWASP Risks
  • alienvault.com— Uncovers compromised systems in your network
  • duo.com— Two-factor authentication (2FA) for website or app. Free 10 users, all authentication methods, unlimited, integrations, hardware tokens
  • tinfoilsecurity.com— Automated vulnerability scanning. Free plan allows weekly XSS scans
  • acunetix.com— Free vulnerability and network scanning for 3 targets
  • ponycheckup.com— An automated security checkup tool for Django websites
  • foxpass.com— Hosted LDAP and RADIUS. Easy per-user logins to servers, VPNs, and wireless networks. Free for 10 users
  • opswatgears.com— Security Monitoring of computers, devices, applications, configurations,… Free 25 users and 30 days history
  • bitninja.io— Botnet protection through a blacklist, free plan only reports limited information on each attack
  • onelogin.com— Identity as a Service (IDaaS), Single Sign-On Identity Provider, Cloud SSO IdP, 3 company apps and 5 personal apps, unlimited users
  • logintc.com— Two-factor authentication (2FA) by push notifications, free for 10 users, VPN, Websites and SSH
  • report-uri.io— CSP and HPKP violation reporting

Management System

  • bitnami.com— Deploy prepared apps on IaaS. Management of 1 AWS micro instance free
  • visualops.io— 3,600 instance hours/month free

Log Management

Translation Management

Monitoring

  • opbeat.com— Instant performance insights for JS developers. Free with 24 hours data retention
  • checkmy.ws— Free 15 days full demo and 3 websites, forever free for Open Source
  • appneta.com— Free with 1 hour data retention
  • thousandeyes.com— Network and user experience monitoring. 3 locations, plus 20 data feeds of major web services free
  • datadoghq.com— Free for up to 5 nodes
  • stackdriver.com— Free monitoring up to 10 servers/hosted services
  • keymetrics.io— Free for 2 servers with 7 days data retention
  • newrelic.com— Free with 24 hours data retention
  • nodequery.com— Free basic server monitor up to 10 servers
  • watchsumo.com— Free website monitoring, 50 Http(s), Ping or keywords, every 5+ minutes
  • opsgenie.com— Alert management with mobile push. 600 free alerts/month for 2 users
  • runscope.com— Monitor and log API usage. Single user 25,000 requests/month free
  • circonus.com— Free for 20 metrics
  • uptimerobot.com— Website monitoring, 50 monitors free
  • statuscake.com— Website monitoring, unlimited tests free with limitations
  • bmc.com— Free 1 second resolution for up to 10 servers
  • ghostinspector.com— Free website and web application monitoring. Single user, 100 test runs/month
  • java-monitor.com— Free monitoring of JVM’s and uptime
  • sematext.com— Free for 24 hours metrics, unlimited number of servers, 10 custom metrics, 500 K custom metrics data points, unlimited dashboards, users, etc
  • sealion.com— Free up to 2 servers, 3 days data retention, graphs and raw command output history (top, ps, ifconfig, netstat, iostat, free, custom, etc.)
  • stathat.com— Get started with 10 stats for free, no expiration
  • skylight.io— Free for first 100 K requests (Rails only)
  • appdynamics.com— Free for 24 hours metrics, application performance management agents limited to one Java, one .NET, one PHP, and one Node.js
  • deadmanssnitch.com— Monitoring for cron jobs. 1 free snitch (monitor), more if you refer others to sign up
  • librato.com— Free up to 100 metrics at 60 seconds resolution
  • freeboard.io— Free for public projects. Dashboards for your Internet of Things projects
  • loader.io— Free load testing tools with limitations
  • speedchecker.xyz— Performance Monitoring API, checks Ping, DNS, etc
  • blackfire.io— Blackfire is the SaaS-delivered Application Performance Solution. Free Hacker plan (PHP Only)
  • apimetrics.io— Automated API Performance Monitoring, Testing and Analytics. Free Plan, manually make API calls and Run from their West Coast servers
  • opsdash.com— Self-hoster server, clusters and services monitoring, free for 5 servers and 5 services

Crash and Exception Handling

  • rollbar.com— Exception and error monitoring, free plan with 5,000 errors/month, unlimited users, 30 days retention
  • bugsnag.com— Free for up to 2,000 errors/month after the initial trial
  • getsentry.com— Sentry tracks app exceptions in realtime, has a small free plan. Free, unrestricted use if self-hosted

Search

  • algolia.com— Hosted search-as-you-type (instant). Free hacker plan up to 10,000 documents and 100,000 operations. Bigger free plans available for community/Open Source projects
  • swiftype.com— Hosted search solution (API and crawler). Free for a single search engine with up to 1,000 documents. Free upgrade to Premium level for Open Source
  • bonsai.io— Free 1 GB memory and 1 GB storage
  • searchly.com— Free 2 indices and 5 MB storage
  • facetflow.com— Hosted Elasticsearch for Microsoft Azure. Free 5,000 docs and 500 MB
  • indexisto.com— Site search reinvented. Free 10 million document index limit with advertisement block

Email

  • mailinator.com— Mailinator is Free, Public, Email system where you can use ANY inbox you want! … Disposable Email.
  • sparkpost.com— First 100,000 emails/month are free
  • mailgun.com— First 10,000 emails/month are free
  • tinyletter.com— 5,000 subscribers/month are free
  • mailchimp.com— 2,000 subscribers and 12,000 emails/month are free
  • sendloop.com— 2,000 subscribers and 10,000 emails/month are free
  • sendgrid.com— 400 emails/day for free and 25,000 free transactional emails/month for emails sent from a Google compute instance or Microsoft Azure App Service
  • phplist.com— Hosted version allow 300 emails/month for free
  • mailjet.com— 6,000 emails/month for free
  • sendinblue.com— 9,000 emails/month for free
  • mailtrap.io— Fake SMTP server for development, free plan with 1 inbox, 50 messages, no team member, 2 emails/second, no forward rules
  • mailstache.io— 4 mailboxes with 1 GB each for up to 2 custom domains
  • postmarkapp.com— First 25,000 emails are free
  • zoho.com— Free email management and collaboration for up to 10 users
  • domain.yandex.com— Free email and DNS hosting for up to 1,000 users
  • pawnmail.com— 2 GB free email hosting across unlimited users for custom domain. Roundcube webmail, POP3, IMAP, and SMTP access. No paid plans or upgrades
  • moosend.com— Mailing list management service. Free account for 6 months for startups
  • debugmail.io— Easy to use testing mail server for developers
  • mailinator.com– Free public email for testing accounts
  • mailboxlayer.com— Email validation and verification JSON API for developers. 1,000 free API requests/month
  • mailcatcher.me— Catches mail and serves it through a web interface
  • yopmail.fr— Disposable email addresses
  • kickbox.io— Verify 100 emails free, real time API available
  • inumbo.com— SMTP based spam filter, free for 10 users
  • biz.mail.ru— 5,000 mailboxes with 25 GB each per custom domain with DNS hosting
  • maildocker.com— First 10,000 emails/month are free
  • sendpulse.com— 50 emails free/hour, first 12,000 emails/month are free
  • pepipost.com— Unlimited emails free for first three months, then first 25,000 emails/month are free

CDN and Protection

  • kloudsec.com— Minimal CDN platform targeted at programmers. CDN is free. Optional and free plugins include Page Optimization (Pagespeed), Service Doctor (Website performance analytics and alerts) and One-click Encryption (Auto provision/renew LetsEncrypt certs for HTTPS)
  • cloudflare.com— Basic service is free, good for a blog, Cloudflare also offers a free SSL certificate service
  • bootstrapcdn.com— CDN for bootstrap, bootswatch and font awesome
  • surge.sh— Single–command, bring your own source control web publishing CDN
  • cdnjs.com— CDN for JavaScript libraries, CSS libraries, SWF, images, etc
  • jsdelivr.com— Super-fast CDN of OSS (JS, CSS, fonts) for developers and webmasters, accepts PRs to add more
  • developers.google.com— The Google Hosted Libraries is a content distribution network for the most popular, Open Source JavaScript libraries
  • asp.net— The Microsoft Ajax CDN hosts popular third party JavaScript libraries such as jQuery and enables you to easily add them to your Web application
  • toranproxy.com— Proxy for Packagist and GitHub. Never fail CD. Free for personal use, 1 developer, no support
  • rawgit.com— Free limited traffic, serves raw files directly from GitHub with proper Content-Type headers
  • incapsula.com— Free CDN and DDoS protection
  • fastly.com— Free CDN, all features until USD 50/month is reached, enough for most, then pay or suspended
  • athenalayer.com— Free DDoS protection with unlimited websites
  • section.io— A simple way to spin up and manage a complete Varnish Cache solution. Supposedly free forever for one site
  • netdepot.com— First 100 GB free/month
  • dropigee.com— Dropigee provides CDN + Cloud Storage, get 2 GB of bandwidth and unlimited storage free per month

PaaS

  • cloud.google.com— Google App Engine gives 28 instance hours/day free, 1 GB NoSQL database and more
  • engineyard.com— Engine Yard provides 500 free hours
  • azure.microsoft.com— MS Azure gives USD 200 worth of free usage for a trial
  • appharbor.com— A .Net PaaS that provides 1 free worker
  • shellycloud.com— Platform for hosting Ruby and Ruby on Rails apps, €20 of free credit
  • heroku.com— Host your apps in the cloud, free for single process apps
  • firebase.com— Build realtime apps, free plan has 100 max. connections, 10 GB data transfer, 1 GB data storage, 1 GB hosting storage and 10 GB hosting transfer
  • bluemix.net— IBM PaaS with a monthly free allowance
  • openshift.com— Red Hat PaaS, free tier provides three small gears each with 512 MB memory and 1 GB storage. {Browse one-click deployments}
  • outsystems.com— Enterprise web development PaaS for on-premise or cloud, free “personal environment” offering allows for unlimited code and up to 1 GB database
  • platform.telerik.com— Build and deploy mobile applications using JavaScript. Free plan has 100 MB data storage, 1 GB file storage, 5 GB bandwidth, 1 million push notifications for BaaS offering, 100 active devices for analytics
  • scn.sap.com— The in-memory Platform-as-a-Service offering from SAP. Free developer accounts come with 1 GB structured, 1 GB unstructured, 1 GB of Git data and allow you to run HTML5, Java and HANA XS apps
  • mendix.com— Rapid Application Development for Enterprises, unlimited number of free sandbox environments supporting 10 users, 100 MB of files and 100 MB database storage each
  • pythonanywhere.com— Cloud Python app hosting. Beginner account is free, 1 Python web application at your-username.pythonanywhere.com domain, 512 MB private file storage, one MySQL database
  • configure.it— Mobile app development platform, free for 2 projects, limited features but no resource limits
  • elastx.com— Free tier with up to 4 cloudlets, must be renewed every year
  • pagodabox.io— Small worker, web server, cache, and database for free
  • cloudandheat.com— 128 MB of RAM for free, includes support for custom domains for free
  • zeit.co/now– Managed platform for Node.js deployments, featuring dynamic realtime scaling. Includes 20 free deploys/month limited to 1GB storage and 1GB bandwidth for OSS projects (source files are exposed on a public URL)
  • sandstorm.io– Sandstorm is an open source operating system for personal and private clouds. Free plan offers 200MB storage and 5 grains free.

BaaS

  • apigee.com— Unlimited trial includes NoSQL data store with 25 GB of storage, user and permission management, geolocation, 10 million push notifications/month, remote configuration, beta and A/B split testing, APM, fully API driven. Accessible and manageable via UI, SDK, and API
  • appacitive.com— Mobile backend, free for the first 3 months with 100 K API calls, push notifications
  • bip.io— A web-automation platform for easily connecting web services. Fully open GPLv3 to power the backend of your Open Source project. Commercial OEM License available
  • blockspring.com— Cloud functions. Free for 5 million runs/month
  • kinvey.com— Mobile backend, starter plan has unlimited requests/second, with 2 GB of data storage, as well as push notifications for up 5 million unique recipients. Enterprise application support
  • konacloud.io— Web and Mobile Backend as a Service, with 5 GB free account
  • layer.com— The full-stack building block for communications
  • quickblox.com— A communication backend for instant messaging, video and voice calling, and push notifications
  • pushbots.com— Push notification service. Free for up to 1.5 million pushes/month
  • dreamfactory.com— DreamFactory is an Open Source backend platform that provides all of the RESTful services you need to build fantastic mobile and web applications
  • onesignal.com— Unlimited free push notifications
  • getstream.io— Build scalable news feeds and activity streams in a few hours instead of weeks, free for 3 million feed updates/month
  • tyk.io— API management with authentication, quotas, monitoring, and analytics. Free cloud offering
  • iron.io— Async task processing (like AWS Lambda) with free tier and 1 month free trial
  • stackhut.com— Async task processing (like AWS Lambda). 10 free private services and unlimited free public services
  • pubnub.com— Free push notifications for up to 1 million messages/month and 100 active daily devices
  • webtask.io— Run code with an HTTP call. No provisioning. No deployment
  • zapier.com— Connect the apps you use, to automate tasks. 5 zaps, every 15 min. and 100 tasks/month
  • stackstorm.com— Event-driven automation for apps, services and workflows, free without flow, access control, LDAP,…
  • simperium.com— Move data everywhere instantly and automatically, multi-platform, unlimited sending and storage of structured data, max. 2,500 users/month
  • stamplay.com— Connect services together with a visual interface. 50 K API calls, 100 GB data transfer, and 1 GB storage for free

Web Hosting

  • closeheat.com— Development Environment in the Cloud for Static Websites with Free Hosting and GitHub integration. 1 free website with custom domain support
  • code.fosshub.com— is a free service offered by FossHub. Free hosting for Open Source projects.
  • sourceforge.net— Find, Create, and Publish Open Source software for free
  • simplybuilt.com— SimplyBuilt offers free website building and hosting for {Open Source projects}. Simple alternative to GitHub Pages
  • devport.co— Turn GitHub projects, apps, and websites into a personal developer portfolio
  • netlify.com— Builds, deploy and hosts static site or app, free for 100 MB data and 1 GB bandwidth
  • pantheon.io— Drupal and WordPress hosting, automated DevOps, and scalable infrastructure. Free for developers and agencies
  • acquia.com— Hosting for Drupal sites. Free tier for developers. Free development tools (such as Acquia Dev Desktop) also available
  • bitballoon.com— BitBalloon offers hosting for static sites and apps. Free on a subdomain
  • readthedocs.org— Free documentation hosting with versioning, PDF generation and more
  • bubble.is— Visual programming to build web and mobile apps without code, free 100 visitors/month, 2 apps
  • contentful.com— Content as a Service. Content management and delivery APIs in the cloud. 3 users, 3 spaces (repositories) and 100,000 API requests/month for free
  • tilda.cc— One site, 50 pages, 50 MB storage, only the main pre-defined blocks among 170+ available, no fonts, no favicon and no custom domain
  • pubstorm.com— Free static content hosting with global CDN and custom domain support. 10 free sites, each with 2 past revisions

DNS

  • freedns.afraid.org— Free DNS hosting
  • dns.he.net— Free DNS hosting service with Dynamic DNS Support
  • luadns.com— Free DNS hosting, 3 domains, all features with reasonable limits
  • domain.yandex.com— Free email and DNS hosting for up to 1,000 users
  • selectel.com– Free DNS hosting, anycast, 10 geo zones
  • cloudns.net— Free DNS hosting up to 3 domains with unlimited records
  • ns1.com— Data Driven DNS, automatic traffic management, 1 million free queries

IaaS

DBaaS

  • cloudant.com— Hosted database from IBM, free if usage is below USD 50/month
  • orchestrate.io— 1 application free
  • redislabs.com— Redis as a Service, 30 MB and 30 concurrent connections free
  • backand.com— Back-end as a service for AngularJS
  • zenginehq.com— Build business workflow apps in minutes, free for single users
  • redsmin.com— Online real-time monitoring and administration service for Redis, 1 Redis instance free
  • graphstory.com— GraphStory offers Neo4j (a Graph Database) as a service
  • elephantsql.com— PostgreSQL as a service, 20 MB free
  • graphenedb.com— Neo4j as a service, up to 1,000 nodes and 10,000 relations free
  • mongolab.com— MongoDB as a service, 500 MB free
  • scalingo.com— Primarily a PaaS but offers a 512 MB free tier of MySQL, PostgreSQL, or MongoDB
  • skyvia.com— Cloud Data Platform, offers free tier and all plans are completely free while in beta
  • airtable.com— Looks like a spreadsheet, but it’s a relational database, unlimited bases, 1,200 rows/base and 1,000 API requests/month
  • fieldbook.com— Fieldbook lets anyone create a simple tracking database, as easily as a spreadsheet. Automatic API. Unlimited free sheets, share with unlimited users
  • iriscouch.com— CouchDB as a service. Free for developing, prototyping, etc

STUN, WebRTC, Web Socket Servers and Other Routers

  • pusher.com— Hosted Web Sockets broker. Free for up to 20 simultaneous connections and 100 K messages/day
  • stun:stun.l.google.com:19302 — Google STUN
  • stun:global.stun.twilio.com:3478?transport=udp — Twilio STUN
  • segment.com— Hub to translate and route events to other third party services. 100 K events/month free
  • ngrok.com— Expose locally running servers over a tunnel to a public URL
  • cloudamqp.com— RabbitMQ as a Service. Little Lemur plan: max 1 million messages/month, max 20 concurrent connections, max 100 queues, max 10,000 queued messages, multiple nodes in different AZ’s

Issue Tracking and Project Management

  • bitrix24.com— Free intranet and project management tool
  • pivotaltracker.com— Pivotal Tracker, free for public projects
  • atlassian.com— Free Jira etc for Open Source
  • kanbantool.com— Kanban board based project management. Free, paid plans with more options
  • kanbanflow.com— Board based project management. Free, premium version with more options
  • kanbanery.com— Board based project management. Free for 2 users, premium tiers with more options
  • zenhub.io— The only project management solution inside GitHub. Free for public repos, OSS, and nonprofits organizations
  • trello.com— Board based project management. Free
  • producteev.com— Task management tool. Free, premium version with more options. Mobile applications available
  • fogcreek.com— Bug tracking and project management. Free for 2 users
  • waffle.io— Board based project management solution from your existing GitHub Issues, free for Open Source
  • huboard.com— Instant project management for your GitHub issues, free for Open Source
  • taiga.io— Project management platform for startups and agile developers, free for Open Source
  • jetbrains.com— Free hosted YouTrack (InCloud) for FOSS projects, private projects {free for 10 users}
  • github.com— In addition to its Git storage facility, GitHub offers basic issue tracking
  • asana.com— Free for private project with collaborators
  • acunote.com— Free project management and SCRUM software for up to 5 team members
  • gliffy.com— Online diagrams: flowchart, UML, wireframe,… Also plugins for Jira & Confluence. 5 diagrams and 2 MB free
  • cacoo.com— Online diagrams in real time: flowchart, UML, network. Free max. 15 users/diagram, 25 sheets
  • draw.io— Online diagrams stored locally, in Google Drive, OneDrive or Dropbox. Free for all features and storage levels
  • hub.jazz.net— IBM Bluemix’s project management services. Free for public projects, free for up to 3 users for private projects
  • leankit.com— Kanban board, that visualizes your workflow. Free up to 10 users
  • visualstudio.com— Unlimited free private code repositories; Tracks bugs, work items, feedback and more
  • testlio.com— Issue tracking, test management and beta testing platform. Free for private use
  • vivifyscrum.com— Free tool for Agile project management. Scrum Compatible
  • targetprocess.com— Visual project management, from Kanban and Scrum to almost any operational process. Free for unlimited users, up to 1,000 data entities {more details}
  • overv.io— Agile project management for teams who love GitHub
  • taskulu.com— Role based project management. Free up to 5 users. Integration with GitHub/Trello/Dropbox/Google Drive
  • contriber.com— Customizable project management platform, free starter plan, 5 workspaces
  • planitpoker.com— Free online planning poker (estimation tool)

Storage and Media Processing

  • aerofs.com— P2P file syncing, free for up to 30 users
  • bintray.com— Binary File storage, free for Open Source. Includes SSL, CDN and a limited number of REST calls
  • cloudinary.com— Image upload, powerful manipulations, storage, and delivery for sites and apps, with libraries for Ruby, Python, Java, PHP, Objective-C and more. Perpetual free tier includes 7,500 images/month, 2 GB storage, 5 GB bandwidth
  • plot.ly— Graph and share your data. Free tier includes unlimited public files and 10 private files
  • transloadit.com— Handles file uploads & encoding of video, audio, images, documents. Free for Open Source & other do-gooders. Commercial applications get one GB free for test driving
  • podio.com— You can use Podio with a team of up to five people and try out the features of the Basic Plan, except users management
  • shrinkray.io— Free image optimization of GitHub repos
  • imagefly.io— Responsive images on-demand. CDN fronted image resizing, transcoding, and optimizing. 100 MB/month for free
  • kraken.io— Image optimization for website performance as a service, free plan up to 1 MB file size
  • placehold.it— A quick and simple image placeholder service
  • placekitten.com— A quick and simple service for getting pictures of kittens for use as placeholders
  • placepenguin.com— A quick and simple service for placeholder images of penguins
  • embed.ly— Provides APIs for embedding media in a webpage, responsive image scaling, extracting elements from a webpage. Free for up to 5,000 URLs/month at 15 requests/second
  • backhub.co— Backup and archive your GitHub repositories. Free for public repos
  • otixo.com— Encrypt, share, copy and move all your cloud storage files from one place. Basic plan provides unlimited files transfer with 250 MB max. file size and allows 5 encrypted files
  • tinypng.com— API to compress and resize PNG and JPEG images, offers 500 compressions for free each month
  • filestack.com— File picker, transform and deliver, free for 250 files, 500 transformations and 3 GB bandwidth
  • packagecloud.io– Hosted Package Repositories for YUM, APT, RubyGem, and PyPI. Limited free plans, open source plans available via request.

Design and UI

  • pixlr.com— Free online browser editor on the level of commercial ones
  • imagebin.ca— Pastebin for images
  • cloudconvert.com— Convert anything to anything. 208 supported formats including videos to gif
  • resizeappicon.com— A simple service to resize and manage your app icons
  • vectr.com— Free Design App For Web + Desktop
  • walkme.com— Enterprise Class Guidance and Engagement Platform, free plan 3 walk-thrus up to 5 steps/walk
  • marvelapp.com— Design, prototyping and collaboration, free limited for 3 projects

Data Visualization on Maps

  • geocoder.opencagedata.com/— Geocoding API that aggregates OpenStreetMap and other open geo sources. 2,500 free queries/day
  • datamaps.co a free platform for creating visualizations with data maps.
  • geocod.io— Geocoding via API or CSV Upload. 2,500 free queries/day
  • gogeo.io— Maps and geospatial services with an easy to use API and support for big data
  • cartodb.com— Create maps and geospatial APIs from your data and public data
  • giscloud.com— Visualize, analyze and share geo data online
  • latlon.io— Geocoding API + school districts, census geography divisons, and other address based data. 2,500 free requests/month
  • mapbox.com— Maps, geospatial services, and SDKs for displaying map data

Package Build System

IDE and Code Editing

  • c9.io— IDE in a browser. Incorporates an Ubuntu virtual machine and in-browser terminal access. Integrates with GitHub and BitBucket, but also adds SFTP and generic Git access
  • koding.com– online cloud-based development environment. You have a Ubuntu OS machine
  • codeanywhere.com— Full IDE in the browser and mobile apps. Access FTP, SFTP, Dropbox, Google Drive, GitHub, and BitBucket. Hosted virtual machines with terminal access. Collaboration features like share links, live editing, permissions, and version tracking
  • codenvy.com— IDE and automated developer workspaces in a browser, collaborative, Git/SVN integration, build and run your app in customizable Docker-based runners (free tier includes: 4 GB RAM, always-on machines, ability to run multiple machines simultaneously), pre-integrated deploy to Google Apps
  • nitrous.io— Private Linux instance(s) with interactive collaboration, free for 2 hours/day. {More Details}
  • visualstudio.com— Fully-featured IDE with thousands of extensions, cross-platform app development (Microsoft extensions available for download for iOS and Android), desktop, web and cloud development, multi-language support (C#, C++, JavaScript, Python, PHP and more)
  • code.visualstudio.com— Build and debug modern web and cloud applications. Code is free, Open Source and available on your favorite platform, Linux, Mac OSX and Windows
  • cloud.sagemath.com— Collaborative mathematics-oriented IDE in a browser, with support for Python, LaTeX, IPython Notebooks, etc
  • wakatime.com— Quantified self metrics about your coding activity, using text editor plugins, limited plan for free
  • apiary.io— Collaborative design API with instant API mock and generated documentation (Free for unlimited API blueprints and unlimited user with one admin account and hosted documentation)
  • mockable.io— Mockable is a simple configurable service to mock out RESTful API or SOAP web-services. This online service allows you to quickly define REST API or SOAP endpoints and have them return JSON or XML data
  • jetbrains.com— Productivity tools, IDEs and deploy tools. Free license for students, teachers, Open Source, and user groups
  • stackhive.com— Cloud based IDE in browser that supports HTML5/CSS3/jQuery/Bootstrap
  • tadpoledb.com— IDE in browser Database tool. Support Amazon RDS, Apache Hive, Apache Tajo, CUBRID, MariaDB, MySQL, Oracle, SQLite, MSSQL, PostgreSQL and MongoDB databases
  • sourcelair.com— In-browser IDE for Django, JavaScript, HTML5, Python, and more. Integrates with Git, Mercurial, GitHub, Heroku and more. Free forever for 1 private project
  • codepen.io— CodePen is a playground for the front end side of the web

Analytics, Events and Statistics

  • analytics.google.com— Google Analytics
  • heapanalytics.com— Automatically captures every user action in iOS or web apps. Free for up to 5,000 visits/month
  • sematext.com— Free for up to 50 K actions/month, 1 day data retention, unlimited dashboards, users, etc
  • usabilityhub.com— Test designs and mockups on real people, track visitors. Free for one user, unlimited tests
  • gosquared.com— Track up to 1,000 data points for free
  • mixpanel.com— Free 25,000 points or 200,000 with their badge on your site
  • amplitude.com— 1 million monthly events, up to 2 apps
  • keen.io— Custom Analytics for data collection, analysis and visualization. 50,000 events/month free
  • inspectlet.com— 100 sessions/month free for 1 website
  • mousestats.com— 100 sessions/month free for 1 website
  • metrica.yandex.com— Unlimited free analytics
  • hotjar.com— Per site: 2,000 pages views/day, 3 heatmaps, data stored for 3 months,…
  • imprace.com— Landing page analysis with suggestions to improve bounce rates. Free for 5 landing pages/domain
  • baremetrics.com— Analytics & Insights for stripe
  • optimizely.com— A/B Testing solution, free starter plan, 1 website, 1 iOS and 1 Android app
  • expensify.com— Expense reporting, free personal reporting approval workflow
  • ironSource atom— Atom Data Flow Management is a data pipeline solution, 10M monthly events free
  • botan.io— Free analytics for your Telegram bot.

International Mobile Number Verification API and SDK

  • cognalys.com— Freemium mobile number verification through an innovative and reliable method than using SMS gateway. Free accounts will have 10 tries and 15 verifications/day.
  • numverify.com— Global phone number validation & lookup JSON API. 250 API requests/month
  • sumome.com— Heat map and conversion enhancement tools, free without few advanced features

Payment / Billing Integration

  • braintreepayments.com— Credit Card, Paypal, Venmo, Bitcoin, Apple Pay,… integration. Single and Recurrent Payments. First USD 50,000 are free of charge
  • taxratesapi.avalara.com— Get the right sales tax rates to charge for the close to 10,000 sales tax jurisdictions in the USA. Free REST API. Registration required
  • currencylayer.com— Reliable Exchange Rates & Currency Conversion for your Business, 1,000 API requests/month free
  • vatlayer.com— Instant VAT number validation & EU VAT rates API, free 100 API requests/month

Docker Related

  • docker.com— One free private repository,free managed node and Unlimited public repositories
  • quay.io— Unlimited free public repositories
  • tutum.co— The Docker Platform for Dev and Ops, build, deploy, and manage your apps across any cloud, free while in beta and free developer plan when tutum will be production ready

Vagrant Related

Miscellaneous

  • apichangelog.com— Subscribe to be notified each time API Documentation is updated (Facebook, Twitter, Google,…)
  • docsapp.io— Easiest way to publish documentation, free for Open Source
  • instadiff.com— Compare website versions with highlighted changes before you deploy, free for 100 pages/month
  • fullcontact.com— Help your users know more about their contacts by adding social profile into your app. 500 free Person API matches/month
  • apicastor.com— Convert spreadsheets into URL and monitor access
  • formlets.com— Online forms, unlimited single page forms/month, 100 submissions/month, email notifications
  • superfeedr.com— Real-time PubSubHubbub compliant feeds, export, analytics. Free with less customization
  • screenshotlayer.com— Capture highly customizable snapshots of any website. Free 100 snapshots/month
  • screenshotmachine.com— Capture 100 snapshots/month, png, gif and jpg, including full-length captures, not only home page
  • readme.io— Beautiful documentations made easy, free for Open Source

APIs, Data, and ML

  • monkeylearn.com— Text analysis with Machine Learning, free 100,000 queries/month
  • wit.ai— NLP for developers
  • wolfram.com— Built-in knowledge based algorithms in the cloud
  • parsehub.com— Extract data from dynamic sites, turn dynamic websites into APIs, 5 projects free
  • import.io— Easily turn websites into APIs, completely free for life
  • wrapapi.com— Turn any website into a parameterized API
  • algorithmia.com— Host algorithms for free. Includes free monthly allowance for running algorithms. Now with CLI support
  • bigml.com— Hosted machine learning algorithms. Unlimited free tasks for development, limit of 16 MB data/task
  • mashape.com— API Marketplace And Powerful Tools For Private And Public APIs. With the free tier, some features are limited such as monitoring, alerting and support
  • dominodatalab.com— Data science with support for Python, R, Spark, Hadoop, Matlab, and others
  • havenondemand.com— APIs for machine learning
  • restlet.com— APISpark enables any API, application or data owner to become an API provider in minutes via an intuitive browser interface
  • scrapinghub.com— Data scraping with visual interface and plugins. Free plan includes unlimited scraping on a shared server
  • context.io– Create simple email webhooks and code against a free, RESTful, imap API to leverage email data.

Other Free Resources



Yasoob Khalid: This Month I Inspired 40 Teens to Start Programming

$
0
0

Hi there folks! I have been wanting to write a post on this blog for quite some time now but life always gets in the way. This time it was my exams. Hopefully I will get free after the 4th of June and would get more time to write posts and do stuff which I love and care about.

So enough with the rant now. I wanted to write about my latest endeavor. This month, with the help of two people (whom i had never met before), I got a chance to inspire 40 teens into taking their first step into programming. This whole event was planned and organized by one badass lady, Elena Sinel, who managed almost everything by herself.

I was asked by Elena to give a motivational + tutorial session to the students of Tech City College, London. I was pretty stoked by this opportunity because I love inspiring people and helping them as much as I can. This has been the theme of my life for quite a bit of time now.

Apart from a couple hiccups in the live video stream, the session went fairly well. It was coordinated by Charlie Ringer on the other end. He did an awesome job and helped to make sure that things went smoothly. He provided hands-on support to the kids whenever they needed it.

The best moment for me was when Elena told me that the kids were pretty inspired by my story of how I got started with programming and what I have achieved through it. Elena has written a short summary of the event over here. Do read it if you want to get some more detailed information about the whole session.

If you are an educator and want me to deliver a session for you at your institute and inspire your students just let me know. I am sure you won’t regret it.

If you have any questions then please let me know in the comments below. I would love to answer as many of them as possible.

Cheers!


Yasoob Khalid: Python Sorted Collections

$
0
0

Hey folks! This is a guest post by Grant Jenks. Let’s give him a warm welcome and get right on into what he has to say. 🙂


Hello all! I’m Grant Jenks and I’m guest-posting about one of my favorite topics: Python Sorted Collections.

Python is a little unusual regarding sorted collection types as compared with other programming languages. Three of the top five programming languages in the TIOBE Index include sorted list, sorted dict or sorted set data types. But neither Python nor C include these. For a language heralded as “batteries included” that’s a little strange.

The reasoning is a bit circular but boils down to: the standard library covers most use cases, for everything else there’s PyPI, the Python Package Index. But PyPI works only so well. In fact, some peculiarities of the Python community make PyPI’s job quite difficult. For example, Python likes Monty Python references which many find unusual or obscure. And as Phil Karlton would point out, naming things is hard.

collections.OrderedDict

As an aside, it’s worth noting collections.OrderedDict in the Python standard library. OrderedDict maintains the order that items were added to the dictionary. Sometimes that order is sorted:

>>> from collections import OrderedDict
>>> letters = [('a', 0), ('b', 1), ('c', 2), ('d', 3)]
>>> values = OrderedDict(letters)
>>> print(values)
OrderedDict([('a', 0), ('b', 1), ('c', 2), ('d', 3)])
>>> print(list(values.keys()))
['a', 'b', 'c', 'd']

We can continue editing this OrderedDict. Depending on the key we add, the order may remain sorted.

>>> values['e'] = 4
>>> print(list(values.keys()))
['a', 'b', 'c', 'd', 'e']

But sort order won’t always be maintained. If we remove an existing key and add it back, then we’ll see it appended to the end of the keys.

>>> del values['a']
>>> values['a'] = 0
>>> print(list(values.keys()))
['b', 'c', 'd', 'e', 'a']

Ooops! Notice now that ‘a’ is at the end of the list of keys. That’s the difference between ordered and sorted. While OrderedDict maintains order based on insertion order, a SortedDict would maintain order based on the sorted order of the keys.

SortedContainers

A few years ago I set out to select a sorted collections library from PyPI. I was initially overwhelmed by the options. There are many data types in computer science theory that can be used and each has various tradeoffs. For example, Red-Black Trees are used in the Linux Kernel but Tries are often more space efficient and used in embedded systems. Also B-Trees work very well with a huge number of items and are commonly used in databases.

What I really wanted was a pure-Python solution that was fast-enough. Finding a solution at the intersection of those requirements was really tough. Most fast implementations were written in C and many lacked benchmarks or documentation.

I couldn’t find the right answer so I built it: Sorted Containers. The right answer is pure-Python. It’s Python 2 and Python 3 compatible. It’s fast. It’s fully-featured. And it’s extensively tested with 100% coverage and hours of stress. SortedContainers includes SortedList, SortedDict, and SortedSet implementations with a familiar API.

>>> from sortedcontainers import SortedList, SortedDict, SortedSet
>>> values = SortedList('zaxycb')
>>> values[0]
'a'
>>> values[-1]
'z'
>>> list(values)  # Sorted order is automatic.
['a', 'b', 'c', 'x', 'y', 'z']
>>> values.add('d')
>>> values[3]
'd'
>>> del values[0]
>>> list(values)  # Sorted order is maintained.
['b', 'c', 'd', 'x', 'y', 'z']

Each of the SortedList, SortedDict, and SortedSet data types looks, swims, and quacks like its built-in counterpart.

>>> items = SortedDict(zip('dabce', range(5)))
>>> list(items.keys())  # Keys iterated in sorted order.
['a', 'b', 'c', 'd', 'e']
>>> items['b']
2
>>> del items['c']
>>> list(items.keys())  # Sorted order is automatic.
['a', 'b', 'd', 'e']
>>> items['c'] = 10
>>> list(items.keys())  # Sorted order is maintained.
['a', 'b', 'c', 'd', 'e']

Each sorted data type also plays nicely with other data types.

>>> keys = SortedSet('dcabef')
>>> list(keys)
['a', 'b', 'c', 'd', 'e', 'f']
>>> 'c' in keys
True
>>> list(keys | 'efgh')
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> list(keys & 'cde')
['c', 'd', 'e']
>>> list(keys & 'yzab')
['a', 'b']

Bonus Features

In addition to the familiar API of the built-ins, maintaining sorted order affords efficient opportunities for searching and indexing.

  • You can very quickly and efficiently lookup the presence or index of a value. What would previously require a linear scan is now done in logarithmic time.
>>> import string
>>> values = SortedList(string.lowercase)
>>> 'q' in values
True
>>> values.index('r')
17
  • You can slice containers by index or by value. Even mappings and sets support numeric indexing and iteration.
>>> items = SortedDict(zip(string.lowercase, range(26)))
>>> list(items.irange('g', 'j'))
['g', 'h', 'i', 'j']
>>> items.index('g')
6
>>> items.index('j')
9
>>> list(items.islice(6, 10))
['g', 'h', 'i', 'j']
>>> items.iloc[0]
'a'
>>> items.iloc[5]
'f'
>>> items.iloc[:5]
['a', 'b', 'c', 'd', 'e']
>>> items.iloc[-3:]
['x', 'y', 'z']

Using these features, you can easily duplicate the advanced features found in Pandas DataFrame indexes, SQLite column indexes, and Redis sorted sets.

Performance

On top of it all, performance is very good across the API and faster-than-C implementations for many methods. There are extensive benchmarks comparing alternative implementations, load-factors, runtimes, and simulated workloads. SortedContainers has managed to unseat the decade-old incumbent “blist” module and convinced authors of alternatives to recommend SortedContainers over their own package.

Implementation

How does it work? I’m glad you asked! In addition to the implementation details, I’ll be giving a talk at PyCon 2016 in Portland, Oregon on Python Sorted Collections that will get into the gory details. We’ll see why benchmarks matter most in claims about performance and why the strengths and weakness of modern processors affect how you choose your data structures. It’s possible to write fast code in pure-Python!

Your feedback on the project is welcome!


PyCharm: Remote Development on Raspberry Pi: Analyzing Ping Times (Part 2)

$
0
0

Last week we created a script that records ping times on a regular basis. We developed the script remotely on a Raspberry Pi, and then added it to Cron to make sure that times are recorded every 5 minutes into a PostgreSQL database.

This week we’ll work on visualizing the data we’ve recorded. For this we’ll create a basic Flask app where we use Matplotlib to create a graph. Furthermore, we’ll take a look at some cool PostgreSQL features.

Let’s see some results

It’s no good to just record pings if we can’t see some statistics about them, so let’s write a small Flask app, and use matplotlib to draw a graph of recent ping times. In our Flask app we’ll create two routes:

  • On ‘/’ we’ll list the destinations that we’ve pinged in the last hour with basic stats (min, average, max time in the last hour)
  • On ‘/graphs/<destination>’ we’ll draw a graph of the pings in the last 3 hours

The first route is simple, we just execute a query to get the data we’re interested in, and pass that to the template. See the full code on GitHub. Let’s make sure that everything works right by putting a breakpoint on the call to render_template:

Debug Flask Thumb

The graph route is a lot more complex, first we have to get the ping averages for the past three hours in reasonably sized bins (let’s say 10 minutes), and then we have to draw the graph.

To obtain those binned ping times, we could either get all times from the past three hours, and then use a scientific python library to handle the binning. Or we could write a monster SQL query which does everything for us. As I’ve recently read a book about PostgreSQL and got excited about it, I chose the second option.

Querying the Data

So the data we’re looking for is:

  • For each 10 minute period in the last 3 hours
  • Get the minimum, average, and maximum ping time to a specified destination

The first part makes this a fairly complex query. Even though PostgreSQL has support for intervals, date ranges, and a way to generate a series of dates, there is no way to generate a series of ranges (that I know of). One solution to this problem is a common table expression (CTE), this is a way to execute a subquery which you can later refer to as if it were a real table.

To get a series of timestamps in over the last three hours in 10 minute intervals is easy:

select begin_time from generate_series(now() - interval '3 hours', now(), interval '10 minutes') begin_time;

The generate_series function takes three arguments: begin, end, and step. The function works with numbers and with timestamps, so that makes it easy. If we wanted pings at exactly these times, we’d be done now. However, we need times between the two timestamps. So we can use another bit of SQL magic: window functions. Window functions allow us to do things with rows before or after the row that we’re currently on. So let’s add end_time to our query:

select
 begin_time,
 LEAD(begin_time) OVER (ORDER BY begin_time ASC) as end_time
from generate_series(now() - interval '3 hours', now(), interval '10 minutes') begin_time;

LEAD takes the value of the next row in the results, as ordered in the way specified in the over clause. You can use LAG to get the previous row in a similar way. So now we can wrap this query with WITH intervals as ( … query goes here … ) to make it a CTE. Then we can join our pings table and get the results we’re looking for:

WITH intervals AS (
   SELECT
     begin_time,
     LEAD(begin_time)
     OVER (
       ORDER BY begin_time ) AS end_time
   FROM
         generate_series(
             now() - INTERVAL '3 hours',
             now(),
             INTERVAL '10 minutes'
         ) begin_time
)
SELECT
 i.begin_time AT TIME ZONE 'Europe/Berlin' AS begin_time,
 i.end_time AT TIME ZONE 'Europe/Berlin' AS end_time,
 p.destination,
 count(p.pingtime),
 round(avg(p.pingtime),2) AS avg,
 max(p.pingtime),
 min(p.pingtime)
FROM intervals i LEFT JOIN pings p
ON p.recorded_at >= i.begin_time AND
 p.recorded_at < i.end_time
WHERE
 i.end_time IS NOT NULL
 AND destination = %s
GROUP BY i.begin_time, i.end_time, p.destination
ORDER BY i.begin_time ASC;

Now you might think “That’s nice, but won’t it be incredibly slow?”, so let’s try it out! If you don’t see the ‘execute’ option when you right click an SQL query, you may need to click ‘Attach console’ first to let PyCharm know on which database you’d like to execute your query:

Execute Query Thumb

At the time of writing, my pings table has about 12,500 rows. And this query takes about 200-300ms. Although we could say that this is acceptable for our use case, let’s have a look at how we could speed this up. To see if there’s a way to improve the query, let’s have a look at the query plan:

Explain Analyze No Index

EXPLAIN ANALYZE shows us both how PostgreSQL decided to retrieve our results, and how long it took. We can see that the query took 471 ms. This is a bit painfully slow, and in the query plan we can see why: there’s a nested loop, and then a sequential scan. This means that for each of the 18 time buckets (6 buckets per hour, 3 hours), we do a full table scan. Right now the table fits in memory, so we first load the table in memory, and then we scan it 18 times in memory (you can see loops=18 on the materialize node). Imagine how slow this will be after the Pi collected a year’s worth of pings.

We can improve though, we’re querying our pings by the recorded_at column, using a ‘>=’ and a ‘<’ operator. A standard B-tree index supports these operations on a timestamptz column. So let’s add an index:

CREATE INDEX pings_recorded_at ON pings(recorded_at);

Now let’s look at the output of EXPLAIN ANALYZE again:

Explain Analyze With Index

5.7ms: much better.

Graphing the Data

After getting the data, matplotlib is used to generate a line graph with lines for the minimum, average, and maximum ping time per bin. Matplotlib makes it easy to plot time-based data using the plot_date function.

When the plot is ready, it’s ‘saved’ as a PNG to a StringIO object, which is then used to create an HTTP response. By setting the content_type header to image/png, everything is arranged.

So let’s take a look at the final result:

Ping page

If you want to see the full code, check out analyze.py on GitHub.

Querying the Data with Pandas

If the query above is a little much for you, you can achieve the same results with a couple of lines of code using the Pandas library. If we’d like to use Pandas, we can use a simple query to obtain the last three hours of pings, and then use the resample method to place the times in 10-minute buckets.

Important: To load PostgreSQL data into a Pandas dataframe, we need to have SQLAlchemy installed as well. Pandas needs SQLAlchemy for all database engines except SQLite.

Then we can do the same as with the SQL query, and use Matplotlib to plot it:

@app.route('/pandas/<destination>')
def pandas(destination):
    engine = create_engine('postgres:///pi')

    with engine.connect() as conn, conn.begin():

        data = pd.read_sql_query("select recorded_at, pingtime from pings where recorded_at > now() - interval ""'3 hours' and ""destination='jetbrains.com'; ", conn)

    engine.dispose()

    df = data.set_index(pd.DatetimeIndex(data['recorded_at']))

    # We have this information in the index now, so let's drop it
    del df['recorded_at']

    result = df.resample('10T').agg(['min', 'mean', 'max'])

    fig = Figure()
    ax = fig.add_subplot(111)

    ax.plot(
        result.index,
        result['pingtime', 'max'],
        label='max',
        linestyle='solid'
    )

    ax.plot_date(
        result.index,
        result['pingtime', 'mean'],
        label='avg',
        linestyle='solid'
    )

    ax.plot_date(
        result.index,
        result['pingtime', 'min'],
        label='min',
        linestyle='solid'
    )

    ax.xaxis.set_major_formatter(DateFormatter('%H:%M'))

    ax.set_xlabel('Time')
    ax.set_ylabel('Round Trip (ms)')
    ax.set_ylim(bottom=0)

    ax.legend()

    # Output plot as PNG
    # canvas = FigureCanvasAgg(fig)
    png_output = StringIO.StringIO()

    # canvas.print_png(png_output, transparent=True)
    fig.set_canvas(FigureCanvasAgg(fig))
    fig.savefig(png_output, transparent=True)

    response = make_response(png_output.getvalue())
    response.headers['content-type'] = 'image/png'
    return response

 

Now let’s have a look at what’s faster, the large SQL query, or a simple SQL query and Pandas. Pandas uses Numpy for math, which is largely written in native code for high performance. We add the code to get the appropriate data in both ways to benchmark.py, creating two functions: get_with_sql() and get_with_pandas(). We can use the Python standard library’s timeit function to run the methods 1000 times, and then get the total time it took to execute the function.

Let’s open the Python console, and take a look:Timeit Results

In other words: using Pandas it takes about 86ms to obtain the data, and with the large SQL statement it takes under 24ms. In other words: Pandas takes 262% longer. We’ve found that with a larger dataset this gap widens further.

In this specific case, it takes about 800ms to generate the graph, with the vast majority of that time taken by Matplotlib. So if we were really looking to improve performance, we’d hand off charting to a JavaScript library and just provide the data as a JSON object from the server.

Final Words

While working on this blog post, there have been a couple of times that I forgot that I was working on a remote computer. After setting up the remote interpreter, PyCharm handles everything in the background.

As you can see, PyCharm makes developing code for a remote server very easy. Let us know in the comments what projects you’re interested in running on remote servers! We’d also appreciate your feedback about SQL, let us know if you’d like to see more SQL content (or less of course) in further blog posts!

Reuven Lerner: Announcing: Three live Python courses

$
0
0

If you’re like many of the Python developers I know, the basics are easy for you: Strings, lists, tuples, dictionaries, functions, and even objects roll off of your fingers and onto your keyboard. Your day-to-day tasks have become significantly easier as a result of Python, and you’re comfortable using it for tasks at work and home.

But some parts of Python remain difficult, mysterious, and outside of your comfort zone:

  • When you want to use a list comprehension, you have to go to Stack Overflow to remember how they work — to say nothing of set and dict comprehensions.
  • You know that there is a difference between functions and methods, but you can’t quite put your foot on what that difference is, or how Python rewrites “self” to be the first argument to every method.
  • You keep hearing about “decorators,” and how they allow you to do all sorts of magical things to functions and classes — but every time you start reading about them, you get confused or distracted.

Sound familiar? If so, then I want to help.

As you probably know, I spend just about every day at one of the world’s best companies — Apple, Cisco, IBM, PayPal, VMWare, and Western Digital, among others — teaching their engineers how to use Python.

The engineers who learn these techniques benefit by having more “tools in their toolbox,” as I like to put it; when a problem presents itself, they have more options at their disposal. I help them to solve new types of problems, or to solve existing problems more quickly. These engineers become more valuable to their employers, and more valuable on the larger job market.

I’m announcing three courses that you can take, from the comfort of your home or office, using the content I’ve presented to these companies:

  • Tuesday, July 25: Functional programming in Python
    • comprehensions
    • custom sorting
    • passing functions as arguments
    • lambda expressions
    • map, filter, and reduce
  • Wednesday, August 2: Advanced Python objects
    • attributes
    • methods vs. functions
    • class attributes
    • inheritance
    • methods vs. functions
    • descriptors
    • dunder methods
  • Thursday, August 3: Python decorators
    • properties and other built-in decorators
    • writing decorators
    • decorating functions, objects, and methods

Each of these classes will run live, for five hours (with two 15-minute breaks):

  • New York: 7 a.m. – 2 p.m.
  • London: 12 noon – 5 p.m.
  • Israel: 2 p.m. – 7 p.m.
  • Mumbai: 5:30 p.m. – 10:30 p.m.

Each will be packed with lectures, accompanied by tons of live-coding examples, many exercises that you’ll be expected to solve (and which we’ll review together when you’re done), and plenty of time for interactions and questions.  Indeed, please come with lots of questions, to make the class more interesting and relevant.

Each course costs $350, and will give you:

  • Access to the live audio/video/chat feed,
  • PDFs of my slides,
  • the Jupyter notebook I use during my live-coding demos,
  • and solutions to all of the exercises

I’m offering discounts to people who buy more than one course:

  • Buy two courses, and save $100, for a total of $600.  Just use the “2sessions” coupon code when purchasing each one.
  • Buy all three courses, and save $250, for a total of $800.  Just use the “3sessions” coupon code when purchasing each one.

As always, I’m also offering a discount to students; e-mail me, and I’ll send you the appropriate discount code.

Convinced?  I hope so!  View the full course descriptions here, and then register for them:

But wait!  If you register before Monday, July 18th, then you can save 15% more, by purchasing an early-bird ticket.

I’m very excited to be offering these courses.  They won’t be my last ones — but I’ll next be teaching other topics, so if these subjects interest you, you should definitely attend.

I hope that you can join me for these live, online courses.

The post Announcing: Three live Python courses appeared first on Lerner Consulting Blog.

Data School: How to launch your data science career (with Python)

$
0
0

Welcome, Data School students! If you're interested in the exciting world of data science, but don't know where to start, Data School is here to help.


Step 0: Figure out what you need to learn

Data science can be an overwhelming field. Many people will tell you that you can't become a data scientist until you master the following: statistics, linear algebra, calculus, programming, databases, distributed computing, machine learning, visualization, experimental design, clustering, deep learning, natural language processing, and more. That's simply not true.

So, what exactly is data science? It's the process of asking interesting questions, and then answering those questions using data. Generally speaking, the data science workflow looks like this:

  • Ask a question
  • Gather data that might help you to answer that question
  • Clean the data
  • Explore, analyze, and visualize the data
  • Build and evaluate a machine learning model
  • Communicate results

This workflow doesn't necessarily require advanced mathematics, a mastery of deep learning, or many of the other skills listed above. But it does require knowledege of a programming language and the ability to work with data in that language. And although you need mathematical fluency to become really good at data science, you only need a basic understanding of mathematics to get started.

It's true that the other specialized skills listed above may one day help you to solve data science problems. However, you don't need to master all of those skills to begin your career in data science. You can begin today, and I'm here to help you!


Step 1: Get comfortable with Python

Python and R are both great choices as programming languages for data science. R tends to be more popular in academia, and Python tends to be more popular in industry, but both languages have a wealth of packages that support the data science workflow. I've taught data science in both languages, and generally prefer Python. (Here's why.)

You don't need to learn both Python and R to get started. Instead, you should focus on learning one language and its ecosystem of data science packages. If you've chosen Python (my recommendation), you may want to considering installing the Anaconda distribution because it simplifies the process of package installation and management on Windows, OSX, and Linux.

You also don't need to become a Python expert to move on to step 2. Instead, you should focus on mastering the following: data types, data structures, imports, functions, conditional statements, comparisons, loops, and comprehensions. Everything else can wait until later!

If you're not sure whether you know "enough" Python, scan through my Python Quick Reference. If most of that material is familiar to you, you can move on to step 2!

If you're looking for a course to help you learn Python, here are a few recommendations:


Step 2: Learn data analysis, manipulation, and visualization with pandas

For working with data in Python, you should learn how to use the pandas library.

pandas provides a high-performance data structure (called a "DataFrame") that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table. It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. In short, learning pandas will significantly increase your efficiency when working with data.

However, pandas includes an overwhelming amount of functionality, and (arguably) provides too many ways to accomplish the same task. Those characteristics can make it challenging to learn pandas and to discover best practices.

That's why I created a pandas video series (30 videos, 6 hours) that teaches the pandas library from the ground up. Each video answers a question using a real dataset, and the datasets are posted online so you can follow along at home. (I also created a well-commented Jupyter notebook that includes the code from every video.)

"Your videos are extremely helpful. I like that you use actual data sets and try a lot of different applications of the concept being discussed rather than just overly simplistic examples. Your content has helped me immensely!" - Sean Montague

If you would prefer a non-video resource for learning pandas, here are my recommended resources.


Step 3: Learn machine learning with scikit-learn

For machine learning in Python, you should learn how to use the scikit-learn library.

Building "machine learning models" to predict the future or automatically extract insights from data is the sexy part of data science. scikit-learn is the most popular library for machine learning in Python, and for good reason:

  • It provides a clean and consistent interface to tons of different models.
  • It offers many tuning parameters for each model, but also chooses sensible defaults.
  • Its documentation is exceptional, and it helps you to understand the models as well as how to use them properly.

However, machine learning is still a highly complex and rapidly evolving field, and scikit-learn has a steep learning curve. That's why I created a scikit-learn video series (9 videos, 4 hours), which will help you to gain a thorough grasp of both machine learning fundamentals and the scikit-learn workflow. The series doesn't presume any familiarity with machine learning or advanced mathematics. (You can find all of the code from the series on GitHub.)

"Your videos are absolutely incredible. I have just completed the course on Machine Learning with Python and I can say I understood every single thing thanks to your excellent teaching style and skills." - Guillaume B

If you would prefer a non-video resource for learning scikit-learn, I recommend either Python Machine Learning (Amazon / GitHub) or Introduction to Machine Learning with Python (Amazon / GitHub).


Step 4: Understand machine learning in more depth

Machine learning is a complex field. Although scikit-learn provides the tools you need to do effective machine learning, it doesn't directly answer many important questions:

  • How do I know which machine learning model will work "best" with my dataset?
  • How do I interpret the results of my model?
  • How do I evaluate whether my model will generalize to future data?
  • How do I select which features should be included in my model?
  • And so on...

If you want to become great at machine learning, you need to be able to answer those questions, which requires both experience and further study. Here are some resources to help you along that path:


Step 5: Keep learning and practicing

Here is my best advice for improving your data science skills: Find "the thing" that motivates you to practice what you learned and to learn more, and then do that thing. That could be personal data science projects, Kaggle competitions, online courses, reading books, reading blogs, attending meetups or conferences, or something else!

  • Kaggle competitions are a great way to practice data science without coming up with the problem yourself. Don't worry about how high you place, just focus on learning something new with every competition. (Keep in mind that you won't be practicing important parts of the data science workflow: asking questions, gathering data, and communicating results.)
  • If you create your own data science projects, you should share them on GitHub and include writeups. That will help to show others that you know how to do reproducible data science. (If you don't know how to use Git and GitHub, I have a short video series that will help you to master the basics.)
  • There are an overwhelming number of data science blogs, but DataTau will help you to find the latest and greatest content.
  • If you like email newsletters, my favorites are Data Elixir, Data Science Weekly, and Python Weekly.
  • If you want to truly experience the Python community, I highly recommend attending PyCon US. (There are also smaller PyCon conferences elsewhere.) As a data scientist, you should also consider attending SciPy and the nearest PyData conference.

Your data science journey has only begun! There is so much to learn in the field of data science that it would take more than a lifetime to master. Just remember: You don't have to master it all to launch your data science career, you just have to get started!


Join Data School (for free!)

My name is Kevin Markham, and I'm the founder of Data School. I'd be honored if you would join the Data School community by subscribing to the email newsletter:

  1. Fill out your name and email address in the left sidebar, and click "Join the Newsletter."
  2. Find the confirmation email from Data School in your inbox, and click the link to confirm your email address.

As a subscriber, you'll receive priority access to my online courses and live webcasts, and you'll get notified about new Data School tutorials and videos.

Have a question? Please let me know in the comments section below!

Want to follow Data School?

Thank you so much for reading!

Enthought: Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions

$
0
0

When: Thursday, July 20, 2017, 11-11:45 AM CT (Live webcast)

What: A comprehensive overview and live demonstration of Enthought’s latest tools for Python for the enterprise with Enthought’s Chief Technical & Engineering Officer, Didrik Pinte

Who Should Attend: Python users (or those supporting Python users) who are looking for a universal solution set that is reliable and “just works”; scientists, engineers, and data science teams trying to answer the question “how can I more easily build and deploy my applications”; organizations looking for an alternative to MATLAB that is cost-effective, robust, and powerful

REGISTER  (if you can’t attend we’ll send all registrants a recording)


For over 15 years, Enthought has been empowering scientists, engineers, analysts, and data scientists to create amazing new technologies, to make new discoveries, and to do so faster and more effectively than they dreamed possible. Along the way, hand in hand with our customers in aerospace, biotechnology, finance, oil and gas, manufacturing, national laboratories, and more, we’ve continued to “build the science tools we wished we had,” and share them with the world.

For 2017, we’re pleased to announce the release of several major new products and tools, specifically designed to make Python more powerful and accessible for users like you who are building the future of science, engineering, artificial intelligence, and data analysis.

WHAT YOU’LL SEE IN THE WEBINAR

In this webinar, Enthought’s Chief Technical & Engineering Officer will share a comprehensive overview and live demonstration of Enthought’s latest products and how they provide the foundation for scientific computing and artificial intelligence applications with Python, including:

We’ll also walk through  specific use cases so you can quickly see how Enthought’s Enterprise Python tools can impact your workflows and productivity.

REGISTER  (if you can’t attend we’ll send all registrants a recording)


Presenter: Didrik Pinte, Chief Technical & Engineering Officer, Enthought

 

 

 

Related Blogs:

Blog: Enthought Announces Canopy 2.1: A Major Milestone Release for the Python Analysis Environment and Package Distribution (June 2017)

Blog: Enthought Presents the Canopy Platform at the 2017 American Institute of Chemical Engineers (AIChE) Spring Meeting (April 2017)

Blog: New Year, New Enthought Products (Jan 2017)

Product pages:

The post Webinar: A Tour of Enthought’s Latest Enterprise Python Solutions appeared first on Enthought Blog.

PyCharm: PyCharm 2017.1.5 Out Now


Kushal Das: Article on Hacker Ethic and Free Software movement

$
0
0

As I have mentioned in the dgplug summer training page, focusing on the Free Software movement is a big part of this year’s training program. A few weeks back there was a tweet from @gnome about travel ban, and many could not figure out why Gnome was writing about this topic. Amongst the many proper replies, Miguel de Icaza’s reply was to the point. This incident made Anwesha and me stop and think; and then made us rethink, about how we wanted to conduct the sessions on the Free Software movement and Software Licensing.

I was born in the beginning of the 80s and Anwesha even later. Our introduction to the movement was from the stories we heard (from many people); from Levy’s famous book, Hackers: Heroes of the Computer Revolution and the seminal Free as in Freedom.

My introduction to the FSF came through ilug-calcutta, and from Sayamindu. Later, at foss.in 2005, I made another friend (for life), Praveen A (he is from the same batch). And even later through out various conferences, I was introduced to other members of FSF India. In 2007, I was part of the 4th GPLv3 meet organizing team in Bangalore. That was my introduction to RMS, and his personality (I will write a blog post later about various incidents from that conference). That had a big impact on me.

Coming back to the story of the tweet, we also saw similar ignorance from newcomers, as they never got a chance to learn about the past, nor did get to meet the various people involved (distance and time). So, Anwesha and I, tried to write a brief history, including the hacker ethic, and beginning of the Free Software movement. A lot of stories mentioned in the article are from the books mentioned above. At the very end, I have written about how the different software we use everyday came about initially. I took the help of various FSF bulletins for the same.

This Monday I took a session on the same topic in the #dgplug IRC channel. When I made mention of the GNU C Library and the time Ronald McGrath started it, Siddhesh called attention to an announcement he (Ronald) made a few days ago (about stepping down from maintainership of that same GNU Library). I also pointed that Siddhesh is now one of the maintainer of Glibc. That gave the students a sense of impact and immediacy; a feeling of involvement and ownership.

Today evening from 13:30 UTC, Anwesha took a session on Software Licenses 101 in the #dgplug channel on Freenode. There will be more follow up sessions in the coming days.

Link to the article once again

Damián Avila: We are above 1000 stars!

$
0
0

Github has a way to measure projects popularity through stars.

And those stars are given by the users themselves.

And we are just above a remarkable line...

Read more… (1 min remaining to read)

NumFOCUS: Meet our GSoC Students Part 2: The Julia Cohort

Talk Python to Me: #120 Python in Finance

$
0
0
This week we'll enter the world of stock markets, trades, hedge funds and more. You'll meet Yves Hilpisch who runs The Python Quants where Python, open-source, education, and finance intersect.<br/> <br/> Links from the show:<br/> <br/> <div style="font-size: .85em;"><b>Yves on Twitter</b>: <a href="https://twitter.com/dyjh" target="_blank">@dyjh</a><br/> <b>Personal site</b>: <a href="http://hilpisch.com" target="_blank">hilpisch.com</a><br/> <br/> <b>The Python Quants Group</b>: <a href="http://tpq.io/" target="_blank">tpq.io</a><br/> <b>Yves on YouTube</b>: <a href="https://www.youtube.com/results?search_query=yves+hilpisch" target="_blank">youtube.com/results</a><br/> <b>Quant platform</b>: <a href="http://pqp.io" target="_blank">pqp.io</a><br/> <b>DX Analytics</b>: <a href="http://dx-analytics.com/" target="_blank">dx-analytics.com</a><br/> <b>For Python Quants Bootcamp</b>: <a href="http://fpq.io" target="_blank">fpq.io</a><br/> <b>Python for Quant Finance Meetup</b>: <a href="http://pqf.tpq.io" target="_blank">pqf.tpq.io</a><br/> <b>Books</b>: <a href="http://books.tpq.io" target="_blank">books.tpq.io</a><br/></div>

Continuum Analytics News: Continuum Analytics Named a 2017 Gartner Cool Vendor in Data Science and Machine Learning

$
0
0
Thursday, July 13, 2017

Data Science and AI platform, Anaconda, empowers leading businesses worldwide with solutions to transform data into intelligence 

AUSTIN, Texas—July 13, 2017Continuum Analytics, the creator and driving force behind Anaconda, the leading data science and AI platform powered by Python, today announced it has been included in the “Cool Vendors in Data Science and Machine Learning, 2017” report by Gartner, Inc. 

“We believe the addition of machine learning to Gartner’s Hype Cycle for Emerging Technologies in 2016 highlights the growing importance of data science across the enterprise,” said Scott Collison, chief executive officer of Continuum Analytics. “Data science has shifted from ‘emerging’ to ‘established’ and we’re seeing this evolution first-hand as Anaconda’s active user base of four million continues to grow. We are enabling future innovations; solving some of the world’s biggest challenges and uncovering answers to questions that haven’t even been asked yet.” 

Continuum Analytics recently released its newest version, Anaconda 4.4, featuring a comprehensive platform for Python-centric data science with a single-click installer for Windows, Mac, Linux and Power8. Anaconda 4.4 is also designed to make it easy to work with both Python 2 and Python 3 code. 

Gartner is the world's leading information technology research and advisory company. You can find the full report on Gartner’s site: https://www.gartner.com/document/3706738

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

About Anaconda Powered by Continuum Analytics

Anaconda is the leading Open Data Science platform powered by Python, the fastest growing data science language with more than 13 million downloads and 4 million unique users to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with solutions to identify patterns in data, uncover key insights and transform data into a goldmine of intelligence to solve the world’s most challenging problems. Learn more at continuum.io.

Viewing all 24360 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>