Real Python: Introduction to Python SQL Libraries

February 24, 2020, 6:00 am

≫ Next: Anarcat: The CLA Denial-Of-Service attack

≪ Previous: Roberto Alsina: Episodio 23: Androides Linuxeros

All software applications interact with data, most commonly through a database management system (DBMS). Some programming languages come with modules that you can use to interact with a DBMS, while others require the use of third-party packages. In this tutorial, you’ll explore the different Python SQL libraries that you can use. You’ll develop a straightforward application to interact with SQLite, MySQL, and PostgreSQL databases.

In this tutorial, you’ll learn how to:

Connect to different database management systems with Python SQL libraries
Interact with SQLite, MySQL, and PostgreSQL databases
Perform common database queries using a Python application
Develop applications across different databases using a Python script

To get the most out of this tutorial, you should have knowledge of basic Python, SQL, and working with database management systems. You should also be able to download and import packages in Python and know how to install and run different database servers locally or remotely.

Free PDF Download:Python 3 Cheat Sheet

Understanding the Database Schema

In this tutorial, you’ll develop a very small database for a social media application. The database will consist of four tables:

users
posts
comments
likes

A high-level diagram of the database schema is shown below:

Image may be NSFW.
Clik here to view.

Both users and posts will have a one-to-many relationship since one user can like many posts. Similarly, one user can post many comments, and one post can also have multiple comments. So, both users and posts will also have one-to-many relationships with the comments table. This also applies to the likes table, so both users and posts will have a one-to-many relationship with the likes table.

Using Python SQL Libraries to Connect to a Database

Before you interact with any database through a Python SQL Library, you have to connect to that database. In this section, you’ll see how to connect to SQLite, MySQL, and PostgreSQL databases from within a Python application.

Note: You’ll need MySQL and PostgreSQL servers up and running before you execute the scripts in the MySQL and PostgreSQL database sections. For a quick intro on how to start a MySQL server, check out the MySQL section of Starting a Django Project. To learn how to create a database in PostgreSQL, check out the Setting Up a Database section of Preventing SQL Injection Attacks With Python.

It’s recommended that you create three different Python files, so you have one for each of the three databases. You’ll execute the script for each database in its corresponding file.

SQLite

SQLite is probably the most straightforward database to connect to with a Python application since you don’t need to install any external Python SQL modules to do so. By default, your Python installation contains a Python SQL library named sqlite3 that you can use to interact with an SQLite database.

What’s more, SQLite databases are serverless and self-contained, since they read and write data to a file. This means that, unlike with MySQL and PostgreSQL, you don’t even need to install and run an SQLite server to perform database operations!

Here’s how you use sqlite3 to connect to an SQLite database in Python:

 1 importsqlite3 2 fromsqlite3importError 3  4 defcreate_connection(path): 5 connection=None 6 try: 7 connection=sqlite3.connect(path) 8 print("Connection to SQLite DB successful") 9 exceptErrorase:10 print(f"The error {e} occurred")11 12 returnconnection

Here’s how this code works:

Lines 1 and 2 import sqlite3 and the module’s Error class.
Line 4 defines a function .create_connection() that accepts the path to the SQLite database.
Line 7 uses .connect() from the sqlite3 module and takes the SQLite database path as a parameter. If the database exists at the specified location, then a connection to the database is established. Otherwise, a new database is created at the specified location, and a connection is established.
Line 8 prints the status of the successful database connection.
Line 9 catches any exception that might be thrown if .connect() fails to establish a connection.
Line 10 displays the error message in the console.

sqlite3.connect(path) returns a connection object, which is in turn returned by create_connection(). This connection object can be used to execute queries on an SQLite database. The following script creates a connection to the SQLite database:

connection=create_connection("E:\\sm_app.sqlite")

Once you execute the above script, you’ll see that a database file sm_app.sqlite is created in the root directory. Note that you can change the location to match your setup.

MySQL

Unlike SQLite, there’s no default Python SQL module that you can use to connect to a MySQL database. Instead, you’ll need to install a Python SQL driver for MySQL in order to interact with a MySQL database from within a Python application. One such driver is mysql-connector-python. You can download this Python SQL module with pip:

$ pip install mysql-connector-python

Note that MySQL is a server-based database management system. One MySQL server can have multiple databases. Unlike SQLite, where creating a connection is tantamount to creating a database, a MySQL database has a two-step process for database creation:

Make a connection to a MySQL server.
Execute a separate query to create the database.

Define a function that connects to the MySQL database server and returns the connection object:

 1 importmysql.connector 2 frommysql.connectorimportError 3  4 defcreate_connection(host_name,user_name,user_password): 5 connection=None 6 try: 7 connection=mysql.connector.connect( 8 host=host_name, 9 user=user_name,10 passwd=user_password11 )12 print("Connection to MySQL DB successful")13 exceptErrorase:14 print(f"The error {e} occurred")15 16 returnconnection17 18 connection=create_connection("localhost","root","")

In the above script, you define a function create_connection() that accepts three parameters:

host_name
user_name
user_password

The mysql.connector Python SQL module contains a method .connect() that you use in line 7 to connect to a MySQL database server. Once the connection is established, the connection object is returned to the calling function. Finally, in line 18 you call create_connection() with the host name, username, and password.

So far, you’ve only established the connection. The database is not yet created. To do this, you’ll define another function create_database() that accepts two parameters:

connection is the connection object to the database server that you want to interact with.
query is the query that creates the database.

Here’s what this function looks like:

defcreate_database(connection,query):cursor=connection.cursor()try:cursor.execute(query)print("Database created successfully")exceptErrorase:print(f"The error {e} occurred")

To execute queries, you use the cursor object. The query to be executed is passed to cursor.execute() in string format.

Create a database named sm_app for your social media app in the MySQL database server:

create_database_query="CREATE DATABASE sm_app"create_database(connection,create_database_query)

Now you’ve created a database sm_app on the database server. However, the connection object returned by the create_connection() is connected to the MySQL database server. You need to connect to the sm_app database. To do so, you can modify create_connection() as follows:

 1 defcreate_connection(host_name,user_name,user_password,db_name): 2 connection=None 3 try: 4 connection=mysql.connector.connect( 5 host=host_name, 6 user=user_name, 7 passwd=user_password, 8 database=db_name 9 )10 print("Connection to MySQL DB successful")11 exceptErrorase:12 print(f"The error {e} occurred")13 14 returnconnection

You can see in line 8 that create_connection() now accepts an additional parameter called db_name. This parameter specifies the name of the database that you want to connect to. You can pass in the name of the database you want to connect to when you call this function:

connection=create_connection("localhost","root","","sm_app")

The above script successfully calls create_connection() and connects to the sm_app database.

PostgreSQL

Like MySQL, there’s no default Python SQL library that you can use to interact with a PostgreSQL database. Instead, you need to install a third-party Python SQL driver to interact with PostgreSQL. One such Python SQL driver for PostgreSQL is psycopg2. Execute the following command on your terminal to install the psycopg2 Python SQL module:

$ pip install psycopg2

Like with the SQLite and MySQL databases, you’ll define create_connection() to make a connection with your PostgreSQL database:

importpsycopg2frompsycopg2importOperationalErrordefcreate_connection(db_name,db_user,db_password,db_host,db_port):connection=Nonetry:connection=psycopg2.connect(database=db_name,user=db_user,password=db_password,host=db_host,port=db_port)print("Connection to PostgreSQL DB successful")exceptOperationalErrorase:print(f"The error {e} occurred")returnconnection

You use psycopg2.connect() to connect to a PostgreSQL server from within your Python application.

You can then use create_connection() to create a connection to a PostgreSQL database. First, you’ll make a connection with the default database postgres by using the following string:

connection=create_connection("postgres","postgres","abc123","127.0.0.1","5432")

Next, you have to create the database sm_app inside the default postgres database. You can define a function to execute any SQL query in PostgreSQL. Below, you define create_database() to create a new database in the PostgreSQL database server:

defcreate_database(connection,query):connection.autocommit=Truecursor=connection.cursor()try:cursor.execute(query)print("Query executed successfully")exceptOperationalErrorase:print(f"The error {e} occurred")create_database_query="CREATE DATABASE sm_app"create_database(connection,create_database_query)

Once you run the script above, you’ll see the sm_app database in your PostgreSQL database server.

Before you execute queries on the sm_app database, you need to connect to it:

connection=create_connection("sm_app","postgres","abc123","127.0.0.1","5432")

Once you execute the above script, a connection will be established with the sm_app database located in the postgres database server. Here, 127.0.0.1 refers to the database server host IP address, and 5432 refers to the port number of the database server.

Creating Tables

In the previous section, you saw how to connect to SQLite, MySQL, and PostgreSQL database servers using different Python SQL libraries. You created the sm_app database on all three database servers. In this section, you’ll see how to create tables inside these three databases.

As discussed earlier, you’ll create four tables:

users
posts
comments
likes

You’ll start with SQLite.

SQLite

To execute queries in SQLite, use cursor.execute(). In this section, you’ll define a function execute_query() that uses this method. Your function will accept the connection object and a query string, which you’ll pass to cursor.execute().

.execute() can execute any query passed to it in the form of string. You’ll use this method to create tables in this section. In the upcoming sections, you’ll use this same method to execute update and delete queries as well.

Note: This script should be executed in the same file where you created the connection for your SQLite database.

Here’s your function definition:

defexecute_query(connection,query):cursor=connection.cursor()try:cursor.execute(query)connection.commit()print("Query executed successfully")exceptErrorase:print("The error '"+str(e)+"' occurred")

This code tries to execute the given query and prints an error message if necessary.

Next, write your query:

create_users_table="""CREATE TABLE IF NOT EXISTS users (                                        id INTEGER PRIMARY KEY AUTOINCREMENT,                                        name TEXT NOT NULL,                                        age INTEGER,                                        gender TEXT,                                        nationality TEXT                                    ); """

This says to create a table users with the following five columns:

id
name
age
gender
nationality

Finally, you’ll call execute_query() to create the table. You’ll pass in the connection object that you created in the previous section, along with the create_users_table string that contains the create table query:

execute_query(connection,create_users_table)

The following query is used to create the posts table:

create_posts_table="""CREATE TABLE IF NOT EXISTS posts(                                        id INTEGER PRIMARY KEY AUTOINCREMENT,                                        title TEXT NOT NULL,                                        description TEXT NOT NULL,                                        user_id INTEGER NOT NULL,                                        FOREIGN KEY (user_id)                                        REFERENCES users (id)                                    ); """

Since there’s a one-to-many relationship between users and posts, you can see a foreign key user_id in the posts table that references the id column in the users table. Execute the following script to create the posts table:

execute_query(connection,create_posts_table)

Finally, you can create the comments and likes tables with the following script:

create_comments_table="""CREATE TABLE IF NOT EXISTS comments (                                        id INTEGER PRIMARY KEY AUTOINCREMENT,                                        text TEXT NOT NULL,                                        user_id INTEGER NOT NULL,                                        post_id INTEGER NOT NULL,                                        FOREIGN KEY (user_id)                                        REFERENCES users (id)                                        FOREIGN KEY (post_id)                                        REFERENCES posts (id)                                    ); """create_likes_table="""CREATE TABLE IF NOT EXISTS likes (                                        id INTEGER PRIMARY KEY AUTOINCREMENT,                                        user_id INTEGER NOT NULL,                                        post_id integer NOT NULL,                                        FOREIGN KEY (user_id)                                        REFERENCES users (id)                                        FOREIGN KEY (post_id)                                        REFERENCES posts (id)                                    ); """execute_query(connection,create_comments_table)execute_query(connection,create_likes_table)

You can see that creating tables in SQLite is very similar to using raw SQL. All you have to do is store the query in a string variable and then pass that variable to cursor.execute().

MySQL

You’ll use the mysql-connector-python Python SQL module to create tables in MySQL. Just like with SQLite, you need to pass your query to cursor.execute(), which is returned by calling .cursor() on the connection object. You can create another function execute_query() that accepts the connection and query string:

 1 defexecute_query(connection,query): 2 cursor=connection.cursor() 3 try: 4 cursor.execute(query) 5 connection.commit() 6 print("Query executed successfully") 7 exceptErrorase: 8 print("The error '"+str(e)+"' occurred")

In line 4, you pass the query to cursor.execute().

Now you can create your users table using this function:

create_users_table="""CREATE TABLE IF NOT EXISTS users                        (                        id INT AUTO_INCREMENT,                        name TEXT NOT NULL,                        age INT,                        gender TEXT,                        nationality TEXT,                        PRIMARY KEY (id)) ENGINE=InnoDB """execute_query(connection,create_users_table)

The query for implementing the foreign key relation is slightly different in MySQL as compared to SQLite. What’s more, MySQL uses the AUTO_INCREMENT keyword (compared to the SQLite AUTOINCREMENT keyword) to create columns where the values are automatically incremented when new records are inserted.

The following script creates the posts table, which contains a foreign key user_id that references the id column of the users table:

create_posts_table="""CREATE TABLE IF NOT EXISTS posts                        (                        id INT AUTO_INCREMENT,                        title TEXT NOT NULL,                        description TEXT NOT NULL,                        user_id INTEGER NOT NULL,                        FOREIGN KEY fk_user_id (user_id) REFERENCES users(id),                        PRIMARY KEY (id)) ENGINE=InnoDB """execute_query(connection,create_posts_table)

Similarly, to create the comments and likes tables, you can pass the corresponding CREATE queries to execute_query().

PostgreSQL

Like with SQLite and MySQL databases, the connection object that’s returned by psycopg2.connect() contains a cursor object. You can use cursor.execute() to execute Python SQL queries on your PostgreSQL database.

Define a function execute_query():

defexecute_query(connection,query):connection.autocommit=Truecursor=connection.cursor()try:cursor.execute(query)print("Query executed successfully")exceptOperationalErrorase:print("The error '"+str(e)+"' occurred")

You can use this function to create tables, insert records, modify records, and delete records in your PostgreSQL database.

Now create the users table inside the sm_app database:

create_users_table="""CREATE TABLE IF NOT EXISTS users                        (                        id SERIAL PRIMARY KEY,                        name TEXT NOT NULL,                        age INTEGER,                        gender TEXT,                        nationality TEXT                        ) """execute_query(connection,create_users_table)

You can see that the query to create the users table in PostgreSQL is slightly different than SQLite and MySQL. Here, the keyword SERIAL is used to create columns that increment automatically. Recall that MySQL uses the keyword AUTO_INCREMENT.

In addition, foreign key referencing is also specified differently, as shown in the following script that creates the posts table:

create_posts_table="""CREATE TABLE IF NOT EXISTS posts                        (                        id SERIAL PRIMARY KEY,                        title TEXT NOT NULL,                        description TEXT NOT NULL,                        user_id INTEGER REFERENCES users(id)                        ) """execute_query(connection,create_posts_table)

To create the comments table, you’ll have to write a CREATE query for the comments table and pass it to execute_query(). The process for creating the likes table is the same. You only have to modify the CREATE query to create the likes table instead of the comments table.

Inserting Records

In the previous section, you saw how to create tables in your SQLite, MySQL, and PostgreSQL databases by using different Python SQL modules. In this section, you’ll see how to insert records into your tables.

SQLite

To insert records into your SQLite database, you can use the same execute_query() function that you used to create tables. First, you have to store your INSERT INTO query in a string. Then, you can pass the connection object and query string to execute_query(). Let’s insert five records into the users table:

create_users=""" INSERT INTO users (name, age, gender, nationality)                   VALUES ('James', 25, 'male', 'USA'),                   ('Leila', 32, 'female', 'France'),                   ('Brigitte', 35, 'female', 'England'),                   ('Mike', 40, 'male', 'Denmark'),                   ('Elizabeth', 21, 'female', 'Canada');"""execute_query(connection,create_users)

Since you set the id column to auto-increment, you don’t need to specify the value of the id column for these users. The users table will auto-populate these five records with id values from 1 to 5.

Now insert six records into the posts table:

create_posts=""" INSERT INTO posts (title, description, user_id)                   VALUES ('Happy', 'I am feeling very happy today', 1),                   ('Hot Weather', 'The weather is very hot today', 2),                   ('Help', 'I need some help with my work', 2),                   ('Great News', 'I am getting married', 1),                   ('Interesting Game', 'It was a fantastic game of tennis', 5),                   ('Party', 'Anyone up for a late night party today?', 3);"""execute_query(connection,create_posts)

It’s important to mention that the user_id column of the posts table is a foreign key that references the id column of the users table. This means that the user_id column must contain a value that already exists in the id column of the users table. If it doesn’t exist, then you’ll see an error.

Similarly, the following script inserts records into the comments and likes tables:

create_comments=""" INSERT INTO comments (text, user_id, post_id)                   VALUES ('Count me in', 1, 6),                   ('What sort of help?', 5, 3),                   ('Congrats buddy', 2, 4),                   ('I was rooting for Nadal though', 4, 5),                   ('Help with your thesis?', 2, 3),                   ('Many congratulations', 5, 4);"""create_likes=""" INSERT INTO likes (user_id, post_id)                   VALUES (1, 6),                   (2, 3),                   (1, 5),                   (5, 4),                   (2, 4),                   (4, 2),                   (3, 6);"""execute_query(connection,create_comments)execute_query(connection,create_likes)

In both cases, you store your INSERT INTO query as a string and execute it with execute_query().

MySQL

There are two ways to insert records into MySQL databases from a Python application. The first approach is similar to SQLite. You can store the INSERT INTO query in a string and then use cursor.execute() to insert records.

Earlier, you defined a wrapper function execute_query() that you used to insert records. You can use this same function now to insert records into your MySQL table. The following script inserts records into the users table using execute_query():

create_users=""" INSERT INTO `users` (`name`, `age`, `gender`, `nationality`)                   VALUES ('James', 25, 'male', 'USA'),                   ('Leila', 32, 'female', 'France'),                   ('Brigitte', 35, 'female', 'England'),                   ('Mike', 40, 'male', 'Denmark'),                   ('Elizabeth', 21, 'female', 'Canada');"""execute_query(connection,create_users)

The second approach uses cursor.executemany(), which accepts two parameters:

The query string containing placeholders for the records to be inserted
The list of records that you want to insert

Look at the following example, which inserts two records into the likes table:

sql="INSERT INTO likes ( user_id, post_id ) VALUES ( %s, %s )"val=[(4,5),(3,4)]cursor=connection.cursor()cursor.executemany(sql,val)connection.commit()

It’s up to you which approach you choose to insert records into your MySQL table. If you’re an expert in SQL, then you can use .execute(). If you’re not much familiar with SQL, then it may be more straightforward for you to use .executemany(). With either of the two approaches, you can successfully insert records into the posts, comments, and likes tables.

PostgreSQL

In the previous section, you saw two approaches for inserting records into SQLite database tables. The first uses an SQL string query, and the second uses .executemany(). psycopg2 follows this second approach, though .execute() is used to execute a placeholder-based query.

You pass the SQL query with the placeholders and the list of records to .execute(). Each record in the list will be a tuple, where tuple values correspond to the column values in the database table. Here’s how you can insert user records into the users table in a PostgreSQL database:

users=[('James',25,'male','USA'),('Leila',32,'female','France'),('Brigitte',35,'female','England'),('Mike',40,'male','Denmark'),('Elizabeth',21,'female','Canada')]user_records=','.join(['%s']*len(users))insert_query="INSERT INTO users (name, age, gender, nationality)VALUES{}".format(user_records)connection.autocommit=Truecursor=connection.cursor()cursor.execute(insert_query,users)

The script above creates a list users that contains five user records in the form of tuples. Next, you create a placeholder string with five placeholder elements (%s) that correspond to the five user records. The placeholder string is concatenated with the query that inserts records into the users table. Finally, the query string and the user records are passed to .execute(). The above script successfully inserts five records into the users table.

Take a look at another example of inserting records into a PostgreSQL table. The following script inserts records into the posts table:

posts=[('Happy','I am feeling very happy today',1),('Hot Weather','The weather is very hot today',2),('Help','I need some help with my work',2),('Great News','I am getting married',1),('Interesting Game','It was a fantastic game of tennis',5),('Party','Anyone up for a late-night party today?',3)]post_records=','.join(['%s']*len(posts))insert_query="INSERT INTO posts (title, description, user_id)VALUES{}".format(post_records)connection.autocommit=Truecursor=connection.cursor()cursor.execute(insert_query,posts)

You can insert records into the comments and likes tables with the same approach.

Selecting Records

In this section, you’ll see how to select records from database tables using the different Python SQL modules. In particular, you’ll see how to perform SELECT queries on your SQLite, MySQL, and PostgreSQL databases.

SQLite

To select records using SQLite, you can again use cursor.execute(). However, after you’ve done this, you’ll need to call .fetchall(). This method returns a list of tuples where each tuple is mapped to the corresponding row in the retrieved records.

To simplify the process, you can create a function execute_read_query():

defexecute_read_query(connection,query):cursor=connection.cursor()result=Nonetry:cursor.execute(query)result=cursor.fetchall()returnresultexceptErrorase:print("The error '"+str(e)+"' occurred")

This function accepts the connection object and the SELECT query and returns the selected record.

`SELECT`

Let’s now select all the records from the users table:

select_users="SELECT * from users"users=execute_read_query(connection,select_users)foruserinusers:print(user)

In the above script, the SELECT query selects all the users from the users table. This is passed to the execute_read_query(), which returns all the records from the users table. The records are then traversed and printed to the console.

Note: It’s not recommended to use SELECT * on large tables since it can result in a large number of I/O operations that increase the network traffic.

The output of the above query looks like this:

(1, 'James', 25, 'male', 'USA')
(2, 'Leila', 32, 'female', 'France')
(3, 'Brigitte', 35, 'female', 'England')
(4, 'Mike', 40, 'male', 'Denmark')
(5, 'Elizabeth', 21, 'female', 'Canada')

In the same way, you can retrieve all the records from the posts table with the below script:

select_users_posts="""SELECT users.id, users.name, posts.description                 FROM posts                 INNER JOIN users ON users.id = posts.user_id"""users_posts=execute_read_query(connection,select_users_posts)forusers_postinusers_posts:print(users_post)

The output looks like this:

(1, 'James', 'I am feeling very happy today')
(2, 'Leila', 'The weather is very hot today')
(2, 'Leila', 'I need some help with my work')
(1, 'James', 'I am getting married')
(5, 'Elizabeth', 'It was a fantastic game of tennis')
(3, 'Brigitte', 'Anyone up for a late-night party today?')

The result shows all the records in the posts table.

`JOIN`

You can also execute complex queries involving JOIN operations to retrieve data from two related tables. For instance, the following script returns the user ids and names, along with the description of the posts that these users posted:

select_users_posts="""SELECT users.id, users.name, posts.description                 FROM posts                 INNER JOIN users ON users.id = posts.user_id"""users_posts=execute_read_query(connection,select_users_posts)forusers_postinusers_posts:print(users_post)

Here’s the output:

(1, 'James', 'I am feeling very happy today')
(2, 'Leila', 'The weather is very hot today')
(2, 'Leila', 'I need some help with my work')
(1, 'James', 'I am getting married')
(5, 'Elizabeth', 'It was a fantastic game of tennis')
(3, 'Brigitte', 'Anyone up for a late night party today?')

You can also select data from three related tables by implementing multiple JOIN operators. The following script returns all posts, along with the comments on the posts and the names of the users who posted the comments:

select_posts_comments_users="""SELECT posts.description as post,                                 text as comment, name                                 FROM posts                                 INNER JOIN comments                                 ON                                 posts.id = comments.post_id                                 INNER JOIN users                                 ON                                 users.id = comments.user_id"""posts_comments_users=execute_read_query(connection,select_posts_comments_users)forposts_comments_userinposts_comments_users:print(posts_comments_user)

The output looks like this:

('Anyone up for a late night party today?', 'Count me in', 'James')
('I need some help with my work', 'What sort of help?', 'Elizabeth')
('I am getting married', 'Congrats buddy', 'Leila')
('It was a fantastic game of tennis', 'I was rooting for Nadal though', 'Mike')
('I need some help with my work', 'Help with your thesis?', 'Leila')
('I am getting married', 'Many congratulations', 'Elizabeth')

You can see from the output that the column names are not being returned by .fetchall(). To return column names, you can use the .description attribute of the cursor object. For instance, the following list returns all the column names for the above query:

cursor=connection.cursor()cursor.execute(select_posts_comments_users)cursor.fetchall()column_names=[description[0]fordescriptionincursor.description]print(column_names)

The output looks like this:

['post', 'comment', 'name']

You can see the names of the columns for the given query.

`WHERE`

Now you’ll execute a SELECT query that returns the post, along with the total number of likes that the post received:

select_post_likes="""SELECT description as Post, COUNT(likes.id) as Likes                        FROM likes, posts                        WHERE posts.id = likes.post_id                        GROUP BY likes.post_id"""post_likes=execute_read_query(connection,select_post_likes)forpost_likeinpost_likes:print(post_like)

The output is as follows:

('The weather is very hot today', 1)
('I need some help with my work', 1)
('I am getting married', 2)
('It was a fantastic game of tennis', 1)
('Anyone up for a late night party today?', 2)

By using a WHERE clause, you’re able to return more specific results.

MySQL

The process of selecting records in MySQL is absolutely identical to selecting records in SQLite. You can use cursor.execute() followed by .fetchall(). The following script creates a wrapper function execute_read_query() that you can use to select records:

defexecute_read_query(connection,query):cursor=connection.cursor()result=Nonetry:cursor.execute(query)result=cursor.fetchall()returnresultexceptErrorase:print("The error '"+str(e)+"' occurred")

Now select all the records from the users table:

select_users="SELECT * from users"users=execute_read_query(connection,select_users)foruserinusers:print(user)

The output will be similar to what you saw with SQLite.

PostgreSQL

The process of selecting records from a PostgreSQL table with the psycopg2 Python SQL module is similar to what you did with SQLite and MySQL. Again, you’ll use cursor.execute() followed by .fetchall() to select records from your PostgreSQL table. The following script selects all the records from the users table and prints them to the console:

defexecute_read_query(connection,query):cursor=connection.cursor()result=Nonetry:cursor.execute(query)result=cursor.fetchall()returnresultexceptOperationalErrorase:print("The error '"+str(e)+"' occurred")select_users="SELECT * from users"users=execute_read_query(connection,select_users)foruserinusers:print(user)

Again, the output will be similar to what you’ve seen before.

Updating Table Records

In the last section, you saw how to select records from SQLite, MySQL, and PostgreSQL databases. In this section, you’ll cover the process for updating records using the Python SQL libraries for SQLite, PostgresSQL, and MySQL.

SQLite

Updating records in SQLite is pretty straightforward. You can again make use of execute_query(). As an example, you can update the description of the post with an id of 2. First, SELECT the description of this post:

select_post_description="""SELECT description from posts                WHERE id = 2"""post_description=execute_read_query(connection,select_post_description)fordescriptioninpost_description:print(description)

You should see the following output:

('The weather is very hot today',)

The following script updates the description:

update_post_description="""UPDATE posts                             set description = "The weather has become pleasant now"                             WHERE id = 2"""execute_query(connection,update_post_description)

Now, if you execute the SELECT query again, you should see the following result:

('The weather has become pleasant now',)

The output has been updated.

MySQL

The process of updating records in MySQL with mysql-connector-python is also a carbon copy of the sqlite3 Python SQL module. You need to pass the string query to cursor.execute(). For example, the following script updates the description of the post with an id of 2:

update_post_description="""UPDATE posts                             set description =  "The weather has become pleasant now"                             WHERE id = 2"""execute_query(connection,update_post_description)

Again, you’ve used your wrapper function execute_query() to update the post description.

PostgreSQL

The update query for PostgreSQL is similar to what you’ve seen with SQLite and MySQL. You can use the above scripts to update records in your PostgreSQL table.

Deleting Table Records

In this section, you’ll see how to delete table records using the Python SQL modules for SQLite, MySQL, and PostgreSQL databases. The process of deleting records is uniform for all three databases since the DELETE query for the three databases is the same.

SQLite

You can again use execute_query() to delete records from YOUR SQLite database. All you have to do is pass the connection object and the string query for the record you want to delete to execute_query(). Then, execute_query() will create a cursor object using the connection and pass the string query to cursor.execute(), which will delete the records.

As an example, try to delete the comment with an id of 5:

delete_comment="""DELETE FROM comments                             WHERE id = 5"""execute_query(connection,delete_comment)

Now, if you select all the records from the comments table, you’ll see that the fifth comment has been deleted.

MySQL

The process for deletion in MySQL is also similar to SQLite, as shown in the following example:

delete_comment="""DELETE FROM comments                             WHERE id = 2"""execute_query(connection,delete_comment)

Here, you delete the second comment from the sm_app database’s comments table in your MySQL database server.

PostgreSQL

The delete query for PostgreSQL is also similar to SQLite and MySQL. You can write a delete query string by using the DELETE keyword and then passING the query and the connection object to execute_query(). This will delete the specified records from your PostgreSQL database.

Conclusion

In this tutorial, you’ve learned how to use three common Python SQL libraries. sqlite3, mysql-connector-python, and psycopg2 allow you to connect a Python application to SQLite, MySQL, and PostgreSQL databases, respectively.

Now you can:

Interact with SQLite, MySQL, or PostgreSQL databases
Use three different Python SQL modules
Execute SQL queries on various databases from within a Python application

However, this is just the tip of the iceberg! There are also Python SQL libraries for object-relational mapping, such as SQLAlchemy and Django ORM, that automate the task of database interaction in Python. You’ll learn more about these libraries in an upcoming tutorial.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Anarcat: The CLA Denial-Of-Service attack

February 24, 2020, 7:32 am

≫ Next: Stack Abuse: Introduction to Image Processing in Python with OpenCV

≪ Previous: Real Python: Introduction to Python SQL Libraries

I just stumbled upon this weird mind bender this morning. I have found what I believe is a simple typo in the Ganeti documentation which has a trivial fix. But then, before I submitted a PR to fix it, I remembered that I had trouble getting stuff merged in Ganeti before. That's because they require a CLA (which is already annoying enough) that requires a Google account to sign (which is simply unacceptable). So that patch has been sitting there for months, unused and I haven't provided a patch for the other issue because of this very problem.

But that got me thinking. If I would want to mess things up real bad in a CLA-using project I don't like and:

find a critical bug
figure out a patch for the bug
publish the patch in their issue tracker
forever refuse to sign the CLA

Then my patch, and any derivative, would be unmergeable. If the bug is trivial enough, it might even be impossible to fix it without violating the letter of the law, or at least the process that project as adhered to.

Obviously, there's a flaw in that logic. A CLA is an agreement between a project and a (new) contributor. A project does not absolutely requires the contributor to sign the agreement to accept its contributions, in theory. It's the reverse: for the contributor to have their patch accepted, they need to accept the CLA. But the project could accept contributions without CLA without violating the law.

But it seems that projects sometimes end up doing a DOS on themselves by refusing perfectly fine contributions from drive-by contributors who don't have time to waste filling forms on all projects they stumble upon.

In the case of this typo, I could have submitted a patch, but because I didn't sign a CLA, again, the project couldn't have merged it without breaking their own rules, even if someone else submits the same patch, after agreeing to the CLA. So, in effect, I would have DOS'd the project by providing the patch, so I just opened an issue which strangely — and hopefully — isn't covered by the CLA.

Feels kind of stupid, really...

Instances of known self-imposed CLA DOS attacks:

Ganeti (ganeti-instance-debootstrap)
Kubernetes (kubespray, note that k8s "discussed" switching to DCO and the conclusion was that "DCO tooling needs improvement" so it can't be used)

↧

Stack Abuse: Introduction to Image Processing in Python with OpenCV

February 24, 2020, 9:43 am

≫ Next: Andre Roberge: From a rejected Pycon talk to a new project.

≪ Previous: Anarcat: The CLA Denial-Of-Service attack

Introduction

In this tutorial, we are going to learn how we can perform image processing using the Python language. We are not going to restrict ourselves to a single library or framework; however, there is one that we will be using the most frequently, the Open CV library. We will start off by talking a little about image processing and then we will move on to see different applications/scenarios where image processing can come in handy. So, let's begin!

What is Image Processing?

It is important to know what exactly image processing is and what is its role in the bigger picture before diving into its how's. Image Processing is most commonly termed as 'Digital Image Processing' and the domain in which it is frequently used is 'Computer Vision'. Don't be confused - we are going to talk about both of these terms and how they connect. Both Image Processing algorithms and Computer Vision (CV) algorithms take an image as input; however, in image processing, the output is also an image, whereas in computer vision the output can be some features/information about the image.

Why do we need it?

The data that we collect or generate is mostly raw data, i.e. it is not fit to be used in applications directly due to a number of possible reasons. Therefore, we need to analyze it first, perform the necessary pre-processing, and then use it.

For instance, let's assume that we were trying to build a cat classifier. Our program would take an image as input and then tell us whether the image contains a cat or not. The first step for building this classifier would be to collect hundreds of cat pictures. One common issue is that all the pictures we have scraped would not be of the same size/dimensions, so before feeding them to the model for training, we would need to resize/pre-process them all to a standard size.

This is just one of many reasons why image processing is essential to any computer vision application.

Prerequisites

Before going any further, let's discuss what you need to know in order to follow this tutorial with ease. Firstly, you should have some basic programming knowledge in any language. Secondly, you should know what machine learning is and the basics of how it works, as we will be using some machine learning algorithms for image processing in this article. As a bonus, it would help if you have had any exposure to, or basic knowledge of, Open CV before going on with this tutorial. But this is not required.

One thing you should definitely know in order to follow this tutorial is how exactly an image is represented in memory. Each image is represented by a set of pixels i.e. a matrix of pixel values. For a grayscale image, the pixel values range from 0 to 255 and they represent the intensity of that pixel. For instance, if you have an image of 20 x 20 dimensions, it would be represented by a matrix of 20x20 (a total of 400-pixel values).

If you are dealing with a colored image, you should know that it would have three channels - Red, Green, and Blue (RGB). Therefore, there would be three such matrices for a single image.

Installation

Note: Since we are going to use OpenCV via Python, it is an implicit requirement that you already have Python (version 3) already installed on your workstation.

Windows

$ pip install opencv-python

MacOS

$ brew install opencv3 --with-contrib --with-python3

Linux

$ sudo apt-get install libopencv-dev python-opencv

To check if your installation was successful or not, run the following command in either a Python shell or your command prompt:

import cv2

Some Basics You Should Know

Before we move on to using Image Processing in an application, it is important to get an idea of what kind of operations fall into this category, and how to do those operations. These operations, along with others, would be used later on in our applications. So, let's get to it.

For this article we'll be using the following image:

Image may be NSFW.
Clik here to view. original image used for basic image processing

Note: The image has been scaled for the sake of displaying it in this article, but the original size we are using is about 1180x786.

You probably noticed that the image is currently colored, which means it is represented by three color channels i.e. Red, Green, and Blue. We will be converting the image to grayscale, as well as splitting the image into its individual channels using the code below.

Finding Image Details

After loading the image with the imread() function, we can then retrieve some simple properties about it, like the number of pixels and dimensions:

import cv2

img = cv2.imread('rose.jpg')

print("Image Properties")
print("- Number of Pixels: " + str(img.size))
print("- Shape/Dimensions: " + str(img.shape))

Output:

Image Properties
- Number of Pixels: 2782440
- Shape/Dimensions: (1180, 786, 3)

Splitting an Image into Individual Channels

Now we'll split the image in to its red, green, and blue components using OpenCV and display them:

from google.colab.patches import cv2_imshow

blue, green, red = cv2.split(img) # Split the image into its channels
img_gs = cv2.imread('rose.jpg', cv2.IMREAD_GRAYSCALE) # Convert image to grayscale

cv2_imshow(red) # Display the red channel in the image
cv2_imshow(blue) # Display the red channel in the image
cv2_imshow(green) # Display the red channel in the image
cv2_imshow(img_gs) # Display the grayscale version of image

For brevity, we'll just show the grayscale image.

Grayscale Image:

Image may be NSFW.
Clik here to view. flower image in greyscale

Image Thresholding

The concept of thresholding is quite simple. As discussed above in the image representation, pixel values can be any value between 0 to 255. Let's say we wish to convert an image into a binary image i.e. assign a pixel either a value of 0 or 1. To do this, we can perform thresholding. For instance, if the Threshold (T) value is 125, then all pixels with values greater than 125 would be assigned a value of 1, and all pixels with values lesser than or equal to that would be assigned a value of 0. Let's do that through code to get a better understanding.

Image used for Thresholding:

Image may be NSFW.
Clik here to view. image used for thresholding

import cv2

# Read image
img = cv2.imread('image.png', 0)

# Perform binary thresholding on the image with T = 125
r, threshold = cv2.threshold(img, 125, 255, cv2.THRESH_BINARY)
cv2_imshow(threshold)

Output:

Image may be NSFW.
Clik here to view. image thresholding output

As you can see, in the resultant image, two regions have been established, i.e. the black region (pixel value 0) and white region (pixel value 1). Turns out, the threshold we set was right in the middle of the image, which is why the black and white values are divided there.

Applications

#1: Removing Noise from an Image

Now that you have got a basic idea of what image processing is and what it is used for, let's go ahead and learn about some of its specific applications.

In most cases, the raw data that we gather has noise in it i.e. unwanted features that makes the image hard to perceive. Although these images can be used directly for feature extraction, the accuracy of the algorithm would suffer greatly. This is why image processing is applied to the image before passing it to the algorithm to get better accuracy.

There are many different types of noise, like Gaussian noise, salt and pepper noise, etc. We can remove that noise from an image by applying a filter which removes that noise, or at the very least, minimizes its effect. There are a lot of options when it comes to filters as well, each of them has different strengths, and hence is the best for a specific kind of noise.

To understand this properly, we are going to add 'salt and pepper' noise to the grayscale version of the rose image that we considered above, and then try to remove that noise from our noisy image using different filters and see which one is best-fit for that type.

import numpy as np

# Adding salt & pepper noise to an image
def salt_pepper(prob):
      # Extract image dimensions
      row, col = img_gs.shape

      # Declare salt & pepper noise ratio
      s_vs_p = 0.5
      output = np.copy(img_gs)

      # Apply salt noise on each pixel individually
      num_salt = np.ceil(prob * img_gs.size * s_vs_p)
      coords = [np.random.randint(0, i - 1, int(num_salt))
            for i in img_gs.shape]
      output[coords] = 1

      # Apply pepper noise on each pixel individually
      num_pepper = np.ceil(prob * img_gs.size * (1. - s_vs_p))
      coords = [np.random.randint(0, i - 1, int(num_pepper))
            for i in img_gs.shape]
      output[coords] = 0
      cv2_imshow(output)

      return output

# Call salt & pepper function with probability = 0.5
# on the grayscale image of rose
sp_05 = salt_pepper(0.5)

# Store the resultant image as 'sp_05.jpg'
cv2.imwrite('sp_05.jpg', sp_05)

Alright, we have added noise to our rose image, and this is what it looks like now:

Noisy Image:

Image may be NSFW.
Clik here to view. image with noise

Lets now apply different filters on it and note down our observations i.e. how well each filter reduces the noise.

Arithmetic Filter with Sharpening Kernel

# Create our sharpening kernel, the sum of all values must equal to one for uniformity
kernel_sharpening = np.array([[-1,-1,-1],
                              [-1, 9,-1],
                              [-1,-1,-1]])

# Applying the sharpening kernel to the grayscale image & displaying it.
print("\n\n--- Effects on S&P Noise Image with Probability 0.5 ---\n\n")

# Applying filter on image with salt & pepper noise
sharpened_img = cv2.filter2D(sp_05, -1, kernel_sharpening)
cv2_imshow(sharpened_img)

The resulting image, from applying arithmetic filter on the image with salt and pepper noise, is shown below. Upon comparison with the original grayscale image, we can see that it brightens the image too much and is unable to highlight the bright spots on the rose as well. Hence, it can be concluded that arithmetic filter fails to remove salt and pepper noise.

Arithmetic Filter Output:

Image may be NSFW.
Clik here to view. image without noise via arithmetic filter

Midpoint Filter

from scipy.ndimage import maximum_filter, minimum_filter

def midpoint(img):
    maxf = maximum_filter(img, (3, 3))
    minf = minimum_filter(img, (3, 3))
    midpoint = (maxf + minf) / 2
    cv2_imshow(midpoint)

print("\n\n---Effects on S&P Noise Image with Probability 0.5---\n\n")
midpoint(sp_05)

The resulting image, from applying th Midpoint Filter on the image with salt and pepper noise, is shown below. Upon comparison with the original grayscale image, we can see that, like the kernel method above, brightens the image too much; however, it is able to highlight the bright spots on the rose. Therefore, we can say that it is a better choice than the arithmetic filter, but still it does not recover the original image completely.

Midpoint Filter Output:

Image may be NSFW.
Clik here to view. image without noise via midpoint filter

Contraharmonic Mean Filter

Note: The implementations of these filters can be found online easily and how exactly they work is out of scope for this tutorial. We will be looking at the applications from an abstract/higher level.

def contraharmonic_mean(img, size, Q):
    num = np.power(img, Q + 1)
    denom = np.power(img, Q)
    kernel = np.full(size, 1.0)
    result = cv2.filter2D(num, -1, kernel) / cv2.filter2D(denom, -1, kernel)
    return result

print("\n\n--- Effects on S&P Noise Image with Probability 0.5 ---\n\n")
cv2_imshow(contraharmonic_mean(sp_05, (3,3), 0.5))

The resulting image, from applying Contraharmonic Mean Filter on the image with salt and pepper noise, is shown below. Upon comparison with the original grayscale image, we can see that it has reproduced pretty much the exact same image as the original one. Its intensity/brightness level is the same and it highlights the bright spots on the rose as well. Hence, we can conclude that contraharmonic mean filter is very effective in dealing with salt and pepper noise.

Contraharmonic Mean Filter Output:

Image may be NSFW.
Clik here to view. image without noise via contraharmonic filter

Now that we have found the best filter to recover the original image from a noisy one, we can move on to our next application.

#2: Edge Detection using Canny Edge Detector

The rose image that we have been using so far has a constant background i.e. black, therefore, we will be using a different image for this application to better show the algorithm's capabilities. The reason is that if the background is constant, it makes the edge detection task rather simple, and we don't want that.

We talked about a cat classifier earlier in this tutorial, let's take that example forward and see how image processing plays an integral role in that.

In a classification algorithm, the image is first scanned for 'objects' i.e. when you input an image, the algorithm would find all the objects in that image and then compare them against the features of the object that you are trying to find. In case of a cat classifier, it would compare all objects found in an image against the features of a cat image, and if a match is found, it tells us that the input image contains a cat.

Since we are using the cat classifier as an example, it is only fair that we use a cat image going forward. Below is the image we will be using:

Image used for Edge Detection:

Image may be NSFW.
Clik here to view. image used for edge detection

import cv2
import numpy as np
from matplotlib import pyplot as plt

# Declaring the output graph's size
plt.figure(figsize=(16, 16))

# Convert image to grayscale
img_gs = cv2.imread('cat.jpg', cv2.IMREAD_GRAYSCALE)
cv2.imwrite('gs.jpg', img_gs)

# Apply canny edge detector algorithm on the image to find edges
edges = cv2.Canny(img_gs, 100,200)

# Plot the original image against the edges
plt.subplot(121), plt.imshow(img_gs)
plt.title('Original Gray Scale Image')
plt.subplot(122), plt.imshow(edges)
plt.title('Edge Image')

# Display the two images
plt.show()

Edge Detection Output:

Image may be NSFW.
Clik here to view. edge detection output

As you can see, the part of the image which contains an object, which in this case is a cat, has been dotted/separated through edge detection. Now you must be wondering, what is the Canny Edge Detector and how did it make this happen; so let's discuss that now.

To understand the above, there are three key steps that need to be discussed. First, it performs noise reduction on the image in a similar manner that we discussed previously. Second, it uses the first derivative at each pixel to find edges. The logic behind this is that the point where an edge exists, there is an abrupt intensity change, which causes a spike in the first derivative's value, hence making that pixel an 'edge pixel'.

At the end, it performs hysteresis thresholding; we said above that there's a spike in the value of first derivative at an edge, but we did not state 'how high' the spike needs to be for it to be classified as an edge - this is called a threshold! Earlier in this tutorial we discussed what simple thresholding is. Hysteresis thresholding is an improvement on that, it makes use of two threshold values instead of one. The reason behind that is, if the threshold value is too high, we might miss some actual edges (true negatives) and if the value is too low, we would get a lot of points classified as edges that actually are not edges (false positives). One threshold value is set high, and one is set low. All points which are above the 'high threshold value' are identified as edges, then all points which are above the low threshold value but below the high threshold value are evaluated; the points which are close to, or are neighbors of, points which have been identified as edges, are also identified as edges and the rest are discarded.

These are the underlying concepts/methods that Canny Edge Detector algorithm uses to identify edges in an image.

Conclusion

In this article, we learned how to install OpenCV, the most popular library for image processing in Python, on different platforms like Windows, MacOS, and Linux, as well as how to verify that the installation was successful.

We went on to discuss what Image Processing is and its uses in the computer vision domain of Machine Learning. We talked about some common types of noise and how we can remove it from our images using different filters, before using the images in our applications.

Furthermore, we learned how image processing plays an integral part in high-end applications like Object Detection or classification. Do note that this article was just the tip of the iceberg, and Digital Image Processing has a lot more in the store that cannot possibly be covered in a single tutorial. Reading this should enable you to dive deeper and learn about other advanced concepts related to image processing. Good Luck!

↧

Andre Roberge: From a rejected Pycon talk to a new project.

February 24, 2020, 8:38 am

≫ Next: PyBites: Talking to API's and goodlooking tools

≪ Previous: Stack Abuse: Introduction to Image Processing in Python with OpenCV

Like many others, my talk proposal (early draft here) for Pycon US was rejected. So, I decided to spend some time putting everything in a new project instead. (Documentation here.) It is still a rough draft, but usable ... and since I've mentioned it in a few other places, I thought I should mention it here as well.

Image may be NSFW.
Clik here to view.

↧

PyBites: Talking to API's and goodlooking tools

February 24, 2020, 12:39 pm

≫ Next: PyBites: Productivity Mondays - Are You Producing Enough Value? 5 Tips to Boost Your Deep Work

≪ Previous: Andre Roberge: From a rejected Pycon talk to a new project.

One of my go-to locations for security news had a thread recently about a tool called VTScan. I really liked the idea of not having to go through the browser overhead to check files against multiple scan engines.

Although the tool (which is itself a basic vt-cli spinoff) already existed, I was looking for a new challenge, I decided to roll my own and add a few cool features! I'll have a thorough look at how python talks to API's with requests and I look at turning all this API data into a nice GUI application with click. I hope to give you some idea's for CLI styling in the future so I can see more awesome tools by you all!

You can find the full code on my github.

Index

Requirements

PyWin32
Click
Requests
Win10Toast
Colorama (optional, but prettier)

REST

What is REST?

According to Wikipedia:

Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called RESTful Web services, provide interoperability between computer systems on the Internet. RESTful Web services allow the requesting systems to access and manipulate textual representations of Web resources by using a uniform and predefined set of stateless operations. Other kinds of Web services, such as SOAP Web services, expose their own arbitrary sets of operations.

So in human language, a REST API is just a web-based endpoint that we can send HTTP requests to. This endpoint in turn will query an application on the backend and will return some data based on what the application does.

In our example, we will post a file to a webserver, and the webserver will send the file to a number of anti-virus scanners. The results of all these scans will be put in a report to indicate if a file has been flagged as a virus or not.

Note: API Key Protection

Before we get into the action, I want to leave a small note about protecting your API keys. Your API key is a unique identifier and authorization mechanism to allow you to access certain services (like REST API's) with just a single key.

If anyone manages to get a hold of this unique string, people WILL be able to query the service as if they are you.

Something very common is that developers accidentally push their code including API keys to github, and thus everyone can access the service as that developer.

Bob mentioned a common way to tackle this problem in his mentoring session digest where he uses os.getenv.

I personally tend to create a sensitive.py file, which I then add to my .gitignore.

This allows me to import my sensitive data like: from sensitive import APIKEY.

(Of course, the API key is still stored in a file, so Bob's way of using os.getenv is waaay more foolproof!)

Either way, hide those keys!

The Setup

Get a free API key at VirusTotal by creating an account and getting the API-key from your profile:

Image may be NSFW.
Clik here to view. API Menu

Install the required packages:

pip3 install pywin32 click requests win10toast

(Optional): Install the colorama package if you would like colors!

pip3 install colorama

This package is used by click to draw terminal colors, but click can run perfectly without it.

The VirusTotal API

The first thing you should do when implementing an API that you don't know, is to Read the Manual!

A lot of times, sample code is provided to get you started, and usually, API documentation lists all endpoints you can query to get information. Besides, your tool acts like a view for the API, so you need to know what you have to send, what you can expect, and what is required to make requests.

Image may be NSFW.
Clik here to view. RTFM!

So go ahead and have a quick look the 2 links below and just read through them.
I'll be focusing on /file/scan and on /file/report for the purpose of this article!

The VT API: /file/scan

The endpoint is described as follows:

This endpoint allows you to send a file for scanning with VirusTotal. Before performing your submissions we encourage you to retrieve the latest report on the file, if it is recent enough you might want to save time and bandwidth by making use of it. File size limit is 32MB, in order to submit files up to 200MB in size you must request a special upload URL using the /file/scan/upload_url endpoint.

The python example looks like this:

importrequestsurl='https://www.virustotal.com/vtapi/v2/file/scan'params={'apikey':'<apikey>'}files={'file':('myfile.exe',open('myfile.exe','rb'))}response=requests.post(url,files=files,params=params)print(response.json())

What is going on? We make a very simple HTTP Post request to the VirusTotal scan endpoint. As parameter, we send our API-key for authorization We add the file to our body We print out the response in json.

Let's run this and have a look at the data that's being returned:

{"scan_id":"8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6-1582211127""sha1":"552d86c190fb6ad0f4734f44e59dce91fc364230""resource":"8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6""response_code":1"sha256":"8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6""permalink":"https://www.virustotal.com/file/8fbc375f08b4cb9b55c64f14b32891f9703ab3e69ca13f504deec7655fcd13b6/ana ...""md5":"f9615c7e8528ed16b213a796af2ef31b""verbose_msg":"Scan request successfully queued, come back later for the report"}

Looking good, although we don't have the results we're looking for yet, in terms of positives per scanner.

The VT API: /file/report

This endpoint is described as follows:

The resource argument can be the MD5, SHA-1 or SHA-256 of a file for which you want to retrieve the most recent antivirus report. You may also specify a scan_id returned by the /file/scan endpoint.

Again, the python code is straightforward:

importrequestsurl='https://www.virustotal.com/vtapi/v2/file/report'params={'apikey':'<apikey>','resource':'<resource>'}response=requests.get(url,params=params)print(response.json())

What is going on? We make a HTTP GET Request to the /file/report endpoint this time. As parameter, we send our API-key for authorization, and a resource. * According to the documentation, this resource can be either and MD5, SHA1, SHA256 or Resource value * We have obtained all these values above through the /file/scan endpoint

VT API: Chaining the endpoints together

What we ultimately want to reach, is that we can run the script, pass a file to it, and it uploads and scans the file and prints the result without our intervention. By now we know a few things: In order to get a scan report, we need to request the report from /file/report based on the resource-id We can get a resource-id for a file by posting the file to the /file/scan endpoint.

Here's the function I wrote to contact the /file/scan endpoint:

defscan_single_file(file):url='https://www.virustotal.com/vtapi/v2/file/scan'withopen(file,"rb")as_f:withrequests.Session()as_sess:response=_sess.post(url,files={'file':_f},params=params)json_resp=response.json()resource=json_resp['resource']# Extract the "resource" value from the JSON data_print_prefixed_message('*','yellow',f'Getting Scan Result for {file}')generate_scan_report(resource)# Call function to generate scan report based on the resource

The first thing you might notice is that I do not have a declaration for params in this function.

That is because params has to be sent to the endpoint every time. So if we want to reuse it in every function and it never changes, we can easily set a global variable at the top of our code that will act as a constant.

A lot of people dislike working with global variables because it might not always be clear where the values are coming from.

fromsensitiveimportAPIKEYparams={'apikey':APIKEY}def...

Because we are writing a single, non-object-oriented script, this is perfectly fine, and it should still be clear where this variable is coming from.

Apart from that, we're pretty much doing the same as the example script, but we're making sure our files and sessions get closed properly by using the with .. as .. :-format. Don't worry too much about _print_prefixed_message(), we'll get to that later when we discuss the magic that Click is!

If you don't understand json_resp['resource'], remember that Json is just another dict in python, and dicts have keys!

In[1]:example_json={'id':1,'name':'Jarvis'}In[2]:type(example_json)Out[2]:dictIn[3]:example_json.keys()Out[3]:dict_keys(['id','name'])In[4]:example_json['name']Out[4]:'Jarvis'

So now we have a function to upload the file and get the resource-id. Next we'll need a function to get the scan report based on this resource-id!

defgenerate_scan_report(resource_id):url='https://www.virustotal.com/vtapi/v2/file/report'local_params={'resource':resource_id}full_params=dict()full_params.update(params)full_params.update(local_params)withrequests.Session()as_sess:response=_sess.get(url,params=full_params)json_resp=response.json()forkeyinjson_resp.keys():ifkey=="scans":vendor_table=json_resp[key]elifkey=="verbose_msg":result_message=json_resp[key]elifkey=="total":total_scans=json_resp[key]elifkey=="positives":total_positives=json_resp[key]elifkey=="permalink":permalink=json_resp[key]elifkey=="scan_date":scan_date=json_resp[key]print_scan_report(vendor_table,permalink,scan_date,result_message,total_scans,total_positives)

What is going on? Why are you updating dicts? * We have to send a GET request, with our resource-id as a parameter * We also (still) have to pass our API-key parameter! * Parameters to requests POST or GET, are dicts. * By calling one_dict.update(other_dict) we can put 2 dicts together into a single one! After the parameter-preparation is done, we make our request and capture the response We get some interesting values from the scan report We pass it on to get printed (Next up!)

If we look at the data that's coming back from the report endpoint we see it looks like this:

{'scans':{'Bkav':{'detected':False,'version':'1.3.0.9899',...

Awesome that's exactly what we need but in it's current state, the tool is not really usable.

However, in terms of what we need to know from the API, we're all done. Very often, creating a tool that chains API endpoints together can be done with just a few lines of code and the requests library.

You can go out there right now, find an API of your liking, and start getting that data!

When you've done all that, you can come back here and we'll get to making things pretty and usable!

Making pretty CLI Tools with Click

If you've read my previous post, you know I like my data pretty!

So the end goal in this chapter will be to go from this:

Image may be NSFW.
Clik here to view. Ewwww Ugly

To this:

Image may be NSFW.
Clik here to view. Woaahhh Pretty

Helpers, they really do help

Earlier, I promised an explanation for that _print_prefixed_message() function. The click library provides 2 basic print functions click.echo and click.secho

There's the option to add style() to click.echo, but secho already does this for us.

From the docs:

The combination of echo() and style() is also available in a single function called secho()

Here's the prototype:

click.secho(message=None, file=None, nl=True, err=False, color=None, **styles)

Do you see those little [:)]'s and [!!]'s in their respective colors?

In the beginning they were all individually printed, so I had multiple calls actually doing the same. When you're duplicating code, there's probably a way to throw it in a function:

def_print_prefix(character,color):click.secho('[',nl=False)click.secho(character,fg=color,nl=False)click.secho('] ',nl=False)def_print_prefixed_message(character,color,message):_print_prefix(character,color)click.secho(message)

What's going on?

We're echo'ing a bracket, without starting a new line
We print the characters that our _print_prefix received as an argument, together with the color of our choice, again no new line!
We close off our prefix with a closing bracket, still no new line!
In _print_prefixed_message, we simple call a function that does the above, and we add a message!

Instead of having to write 4 click.secho's every time I want to print a message, I can simply call:

_print_prefixed_message('*', 'yellow', f'Yellow prefixed message for you!')

If you have the feeling you're duplicating code and just making minor changes, take the time to look at what you're doing and sometimes, you can make a helper to help you!

Here's an example use of _print_prefix() in case you're rightfully wondering why I broke those up.

defprint_vendor_table(vendordict):forvdinvendordict.keys():ifvendordict[vd]['detected']:_print_prefix('!!','red')click.secho(vd,fg='red')else:_print_prefix(':)','green')click.secho(vd,fg='green')

And even here we have some duplicate pieces that we could optimize. Everything in these helpers could probably also have been done with decorators. But the code is functional, readable and works, so that's enough for now (feel free to submit a PR if you like!)

Options and flags

Again, what we want to achieve, is a tool where we can simply go:

python tool.py -f /file/to/scan.exe

To make this type of behavior easier to implement, click offers a couple of decorators, here's the head of my main() function to give you an idea:

@click.command()@click.option("-w","--watcher",default=False,is_flag=True)@click.option("-D","--directory",type=str,default=None)@click.option("-f","--file",type=str,default=None)defmain(**kwargs):...

What's going on? - First we're saying that the following function is our command, click automatically adds a --help to commands.

We add a number of -X or --Y options and specify their types and defaults.
- These options will be stored in Y (the second argument, stripped off --)
We define the w option as a flag, which means it can be set or unset, but no value has to be specified.
- For options we have to do --option VALUE
- For flags we can simply say --flag and it's toggled True
The wrappers pass the options as named arguments to main

So if I now run:

python vtscan.py --file ./myfile.exe

the ./myfile.exe string is stored and passed to main
directory gets a default value of None
the watcher flag is set to it's default False
main receives this as main(file="./myfile.exe", directory=None, watcher=False)

This is where I'll close up around click, if you want to know more about this awesome library be sure to read the docs!

More functionality: Adding the Watcher

This part covers how I added a directory watcher and some of the challenges I faced.

Aside from the normal vt-cli behavior, I wanted to also be able to drop files in a directory and have them scanned automatically.

We'll have a look at a script I found to do the actual directory matching, and we'll look at the module used: pywin32.

I won't go too much in depth on the Win32 API because that's a whole different writeup.

Additional work on the arguments

When the program is running interactively with -f and it receives the file, it can't start it's watcher loop. If the program is running as a watcher, the --directory has to be specified and we shouldn't prompt the user for more details to prevent interruption.

I also wanted to add my own messages so I could use the same format on my errors as I did for the rest of the scan reports, like this:

Image may be NSFW.
Clik here to view. Pretty Errors

defparse_cli_options(**kwargs):globalis_watcherwatcher_opt=kwargs['watcher']dir_opt=kwargs['directory']file_opt=kwargs['file']ifwatcher_opt:_print_prefixed_message("*","cyan","Running as watcher!")is_watcher=Trueifdir_optisNone:_print_prefixed_message("E","red","You need to specify a directory to watch when running as Watcher!")_print_prefixed_message("i","cyan","Run VTScan.py --help for more info")exit()eliffile_optisNone:_print_prefixed_message("E","red","You must specify a file when running interactively")_print_prefixed_message("i","cyan","Run VTScan.py --help for more info")exit()returnwatcher_opt,dir_opt,file_opt@click.command()@click.option("-w","--watcher",default=False,is_flag=True)@click.option("-D","--directory",type=str,default=None)@click.option("-f","--file",type=str,default=None)defmain(**kwargs):watcher_opt,dir_opt,file_opt=parse_cli_options(**kwargs)ifwatcher_opt:run_as_watcher(dir_opt)else:scan_single_file(file_opt)

What's going on?

First, I expose the global variable is_watcher
- global variables can always be read
- if you want to write to a global variable, your function needs a line that exposes it: global is_watcher
- depending on the -w flag, we want to toggle our watcher behavior.
Next, I get the function arguments out of kwargs, these all have a default value, so they all exist.
- If the watcher option is set:
  - We change our global is_watcher (which is False by default)
  - we verify that the directory was provided
- Otherwise we check that the file name was provided.
Now we're done with these extra prints and checks, we return the options to main.
- in main we check our watcher option and choose the correct path with it's respective argument.

Here's an example use of that global variable to toggle some functionality:

ifnotis_watcher:click.secho('Show Detail? [y/n]: ',nl=False)c=click.getchar()click.echo()ifc.upper()=='Y':print_vendor_table(vendordict)ifc.upper()=='N':click.secho("Exiting!")exit()

If we're running as a watcher, we don't need to ask the user to show details. Very convenient!

Watching a directory: Win32

Credit where credit is due, for this part, I only slightly modified the code I found here.

It elegantly uses pywin32 to access the Win32 API and loops ReadDirectoryChangesW to check a directory for changes. Perfect for our directory watcher!

defrun_as_watcher(directory):path_to_watch=directoryhDir=win32file.CreateFile(path_to_watch,FILE_LIST_DIRECTORY,win32con.FILE_SHARE_READ|win32con.FILE_SHARE_WRITE|win32con.FILE_SHARE_DELETE,None,win32con.OPEN_EXISTING,win32con.FILE_FLAG_BACKUP_SEMANTICS,None)try:while1:results=win32file.ReadDirectoryChangesW(hDir,1024,True,win32con.FILE_NOTIFY_CHANGE_FILE_NAME|win32con.FILE_NOTIFY_CHANGE_DIR_NAME|win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES|win32con.FILE_NOTIFY_CHANGE_SIZE|win32con.FILE_NOTIFY_CHANGE_LAST_WRITE|win32con.FILE_NOTIFY_CHANGE_SECURITY,None,None)foraction,fileinresults:full_filename=os.path.join(path_to_watch,file)ifACTION.get(action,"Unknown")=="Created":time.sleep(1)scan_single_file(full_filename)exceptKeyboardInterrupt:print("Exiting!")exit()

What's going on?

First we create a handle hDir to a directory using CreateFile (Win32 API Reference)
- Windows uses a lot of handles in it's API.
- ReadDirectoryChangesW needs this handle to check that directory for changes.
- We pass our directory function argument to the file handle along with the required permissions and options.
Next, an infinite loop is started, wrapped with a try catch to check for CTRL+C
This runs ReadDirectoryChangesW over and over again (Microsoft docs)
- When the contents of a directory changed, we check what change occured (Creation, deletion, modification)
  - If a file is created, it means a new file got pasted.
  - We're only interested in new files
- When a file gets pasted, the operating system first Creates an empty file, and then copies the original data to the copy.
  - This means that when the CREATE action occurs, our file will be in use
  - So we sleep for a bit so Windows has time to finish the paste (or we wont have read access)
  - finally, we just run the single_scan part again, and our is_watcher global flag will take care of the rest!

One way to replace sleep would be to wait for the file to no longer be in use, which is something for the future. Right now, the program will fail with an "Access Denied" if you paste a large file that takes longer than 1 second.

And that's all there's to it!

Image may be NSFW.
Clik here to view. Watcher CLI

Adding Toast messages: Win10Toast

A final library I added so I wouldn't have to go back to the CLI log everytime, was win10toast.

Again, it's awesome how little code is needed to create a pretty toast message!

# Show results in Toast!toaster=ToastNotifier()toaster.show_toast("Scan Complete!",f"Positives: {positives} / {total}",duration=5,icon_path=".\\favicon.ico")

And we have a fully operational tool for our day to day job!

Image may be NSFW.
Clik here to view. Toast in Action GIF

Thanks for reading, I hope you enjoyed it as much as I enjoyed writing it. If you have any remarks or questions, you can likely find me on the Pybites Slack Channel as 'Jarvis'.

Keep calm and code in Python!

-- Cedric

↧

PyBites: Productivity Mondays - Are You Producing Enough Value? 5 Tips to Boost Your Deep Work

February 24, 2020, 12:40 pm

≫ Next: pgcli: Welcome IRedis

≪ Previous: PyBites: Talking to API's and goodlooking tools

Here is another edition of Productivity Mondays geared towards getting you closer towards your goals. This weekend I picked up Deep work again. Every time I read it is a revelation. The better you manage your time, the more successful you will become. It all comes down to the amount of value you can produce. And for that deep work is essential.

"If you don’t produce, you won’t thrive—no matter how skilled or talented you are." (Cal Newport - Deep Work)

Here are 5 tips to get more deep work done. Don't just read this, please comment below which tips you use or are going to use on a daily basis from now on. Or share ones of your own. Here we go:

Plan out your week during the weekend, and your day the night before.
Schedule large blocks of uninterrupted time on the tasks (80/20) that move you closer towards your goals. If you don't do this, it's so easy to let social media, unimportant meetings, and other interruptions take over your schedule.
You are an ASSET, protect your time!
Persistence. When you work on an important task sit with it till it's complete.
This will not only increase your output / value, this will seriously boost your level of confidence.
Environment is half of the battle.
We find it's best to schedule your deep work early in the morning. Part of the world (kids!) is still asleep and as the day progresses, the amount of interrupts increases.
It's also important to realize that willpower is a finite resource, you have way more of it earlier in the day!
Ideally you do your one or two most important tasks (MITs) first. It's enormously empowering to cross those off of your list. It's also THE way to beat imposter syndrome because these tasks are often related to getting onto the court, playing the game!
The other thing to pay close attention to is your physiology. For example:
- Are you drinking enough water? No? Have a water bottle on your desk at all times.
- Eating healthy food? No? Whatever is in your kitchen gets consumed, don't buy groceries when hungry.
- Are you getting enough exercise? No? Get some steps in when you wake up and set a daily alarm to go to the gym (the correlation between exercise / fitness and overall performance is just too high to ignore this!)
All these things make it easier to get the work done that will ultimately matter.
Motivation. Ask yourself why:
- Why am I doing this?
- Who am I serving and why?
- Why is this important to me?
These are great questions to get back on track. Specially when you struggle with procrastination or you find yourself aimlessly clicking on Facebook.
When that happens stop and grab a notebook. Go back to your WHY, maybe something is off in the stories you tell yourself (identify).
Honestly I was already lining up two resources, Eat that frog! and Maker's Schedule, Manager's Schedule, which are great reads, but sometimes the best tool is a notebook or your journal. Remember, it all starts in your MIND. Thoughts -> actions -> results.
"Who you are, what you think, feel, and do, what you love—is the sum of what you focus on." (Cal Newport - Deep Work)
Review your weeks / months. Here is the template we use every weekend. Why is this important? A plethora of reasons but mainly because:
- It keeps you focused on your goals and next actions you need to take
- It lets you reflect on the week (be honest with yourself) and course correct, which prevents you from making the same mistakes again.

Now go crush it this week and share in the comments below how this week is going so far ...

-- Bob

With so many avenues to pursue in Python it can be tough to know what to do. If you're looking for some direction or want to take your Python code and career to the next level, book a strategy session with us. We can help you!

↧

pgcli: Welcome IRedis

February 24, 2020, 12:00 am

≫ Next: Podcast.__init__: Reducing The Friction Of Embedded Software Development With PlatformIO

≪ Previous: PyBites: Productivity Mondays - Are You Producing Enough Value? 5 Tips to Boost Your Deep Work

We are happy to welcome IRedis to the dbcli org.

IRedis is A Terminal Client for Redis with AutoCompletion and Syntax Highlighting.

IRedis is written in python using the wonderful prompt-toolkit library. It is cross-platform compatible and it is tested on Linux, MacOS and Windows.

IRedis ships with a lot of user-friendly features. One new innovative feature is the ability to pipe the output of a redis command to a unix command. Here's an example of piping JSON to jq:

The project is lead by 赖信涛.

IRedis is the latest addition to the DBCLI suite of tools.

↧

Podcast.init: Reducing The Friction Of Embedded Software Development With PlatformIO

February 24, 2020, 7:25 pm

≫ Next: tryexceptpass: Episode 4 - 7 Practices for High Quality Maintainable Code

≪ Previous: pgcli: Welcome IRedis

Embedded software development is a challenging endeavor due to a fragmented ecosystem of tools. Ivan Kravets experienced the pain of programming for different hardware platforms when embroiled in a home automation project. As a result he built the PlatformIO ecosystem to reduce the friction encountered by engineers working with multiple microcontroller architectures. In this episode he describes the complexities associated with targeting multiple platforms, the tools that PlatformIO offers to simplify the workflow, and how it fits into the development process. If you are feeling the pain of working with different editing environments and build toolchains for various microcontroller vendors then give this interview a listen and then try it out for yourself.

Summary

Announcements

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
Your host as usual is Tobias Macey and today I’m interviewing Ivan Kravets about PlatformIO, an open source ecosystem for IoT development including a cross-platform IDE, unified debugger, remote unit testing, and firmware updates.

Interview

Introductions
How did you get introduced to Python?
Can you start by describing what PlatformIO is?
- What was your motivation for creating it?
- What are the aspects of embedded development that keep you interested and engaged in this space?
What are some of the types of projects that someone might use PlatformIO to build?
What are some of the common challenges that a developer might encounter when working on embedded systems?
- What are the additional complexities that get introduced as more hardware targets get added to a project?
What is the workflow for someone using PlatformIO for embedded systems development?
What are the different elements of PlatformIO and how do they simplify the work of building embedded systems projects?
How is PlatformIO implemented and how has the system design evolved since you first began working on it?
- What was your reason for selecting Python as the implementation language?
- If you were to start over today what would you do differently?
How has the embedded hardware and software landscape changed since you first started work on PlatformIO?
- How has that impacted your product direction?
How do developers handle testing and validation of their applications?
How does PlatformIO help with updating deployed devices with new firmware?
What have been some of the most interesting/unexpected/innovative projects that you have seen built with PlatformIO?
What have been some of the most interesting/unexpected/challenging aspects of building and maintaining PlatformIO?
How are you approaching sustainability of the project and business?
What do you have planned for the future of PlatformIO?

Keep In Touch

LinkedIn
Website
ivankravets on GitHub
@ikravets on Twitter

Picks

Tobias
- UMass Amherst Making Electricity From Thin Air
Ivan
- Don’t focus on the money side of your project, just focus on building a great product.

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Image may be NSFW.
Clik here to view.

↧

tryexceptpass: Episode 4 - 7 Practices for High Quality Maintainable Code

February 23, 2020, 8:00 pm

≫ Next: Programiz: Python Programming

≪ Previous: Podcast.__init__: Reducing The Friction Of Embedded Software Development With PlatformIO

Code is complicated, hard to test, difficult to understand and can frustrate others.
Writing cleaner code can save you from reimplementing software simply that you cannot understand.
It’s an iterative process and there’s several principles to help you do that.
Keep it Simple Stupid (KISS) tells us to avoid unnecessary complexity and reduce moving parts. The idea is to write for maintainability.
Don’t Repeat Yourself (DRY) is about avoiding redundant implementations of the same function. You should think about refactoring.
You Aren’t Gonna Need It (YAGNI), an Extreme Programming principle, says we should stick with the requirements and avoid adding unneeded features or functions.
Composition over Inheritance asks us to take care when applying classes an inheritance in your design because it can lead to inflexible code.
Favoring Readability reminds us that writing software is like writing prose. Organize your code as if you’re writing a novel.
Practice Consistency tells us to stick with our decisions throughout the project. Keep the same format, implementation flow and design principles.
Consider How to Test a solution before writing it, or at least while writing. It helps you avoid traps that can unnecessarily complicate the code base.

↧

Programiz: Python Programming

February 25, 2020, 12:54 am

≫ Next: Catalin George Festila: Python 3.7.6 : The new concepts of execution in python 3 - part 001.

≪ Previous: tryexceptpass: Episode 4 - 7 Practices for High Quality Maintainable Code

↧

Catalin George Festila: Python 3.7.6 : The new concepts of execution in python 3 - part 001.

February 24, 2020, 6:55 pm

≫ Next: Real Python: How to Work With a PDF in Python

≪ Previous: Programiz: Python Programming

The main goal of these tutorials series is learning to deal with python source code using the new concepts of execution in python 3. When two or more events are concurrent it means that they are happening at the same time. Concurrent programming is not equivalent to parallel execution. In computing, concurrency is the execution of pieces of work or tasks by a computer at the same time.

↧

Real Python: How to Work With a PDF in Python

February 25, 2020, 6:00 am

≫ Next: Data School: How to merge DataFrames in pandas (video)

≪ Previous: Catalin George Festila: Python 3.7.6 : The new concepts of execution in python 3 - part 001.

The Portable Document Format or PDF is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2 package.

PyPDF2 is a pure-Python package that you can use for many different types of PDF operations.

By the end of this course, you’ll know how to:

Extract document information from a PDF in Python
Rotate pages
Merge PDFs
Split PDFs
Add watermarks
Encrypt a PDF

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Data School: How to merge DataFrames in pandas (video)

February 25, 2020, 8:56 am

≫ Next: PyCoder’s Weekly: Issue #409 (Feb. 25, 2020)

≪ Previous: Real Python: How to Work With a PDF in Python

Image may be NSFW.
Clik here to view. How to merge DataFrames in pandas (video)

In my new pandas video, you're going to learn how to use the "merge" function so that you can combine multiple datasets into a single DataFrame.

Merging (also known as "joining") can be tricky to do correctly, which is why I'll walk you through the process in great detail. By the end of the video, you'll be fully prepared to merge your own DataFrames!

"This, by far, is the best explanation of these concepts." - M. Schuer

Click on a timestamp below to jump to a particular section:

1:21 Selecting a function (merge/join/concat/append)
3:36 Details of the merge process
12:07 Handling common merge issues
17:01 Comparing the four types of joins (inner/outer/left/right)

If you want to follow along with the code, you can download the Jupyter notebook and the datasets from GitHub.

Related Resources

pandas documentation for merge and concat
My video series: Easier data analysis in Python with pandas
My videos on the pandas index: Part 1 and Part 2 (includes concat)
My pandas tricks for merging: Using the indicator and validate parameters
My pandas course on DataCamp: Analyzing Police Activity with pandas

If you have any questions, please let me know in the comments below!

↧

PyCoder’s Weekly: Issue #409 (Feb. 25, 2020)

February 25, 2020, 11:30 am

≫ Next: Codementor: Build Systems with Speed and Confidence by Closing the Loop First!

≪ Previous: Data School: How to merge DataFrames in pandas (video)

#409 – FEBRUARY 25, 2020
View in Browser »

Image may be NSFW.
Clik here to view.

Analysing NBA Assists: How to Visualize Hidden Relationships in Data With Python

Using basketball as the background setting, the author discusses several different strategies for uncovering relationships and producing beautiful visualizations with Python.
JP HWANG

PyCon US 2020 Packaging Summit: Registration and Topic Proposal

Registration is open for the PyCon US 2020 Packaging Summit. Topic proposals are also being accepted. Both registration and topic proposals close on March 7, 2020.
PYTHON.ORG

Python Developers Are in Demand on Vettery

Image may be NSFW.
Clik here to view.

Vettery is an online hiring marketplace that’s changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today →
VETTERYsponsor

Django Security Vulnerability: CVE-2020-7471

Django 1.11 before 1.11.28, 2.2 before 2.2.10, and 3.0 before 3.0.3 allows SQL Injection if untrusted data is used as a StringAgg delimiter (e.g., in Django applications that offer downloads of data as a series of rows with a user-specified column delimiter).
MITRE.ORG

Working With PDFs in Python

In this step-by-step course, you’ll learn how to work with a PDF in Python. You’ll see how to extract metadata from preexisting PDF files. You’ll also learn how to merge, split, watermark, and rotate pages in PDFs using Python and the PyPDF2 library.
REAL PYTHONvideo

Python in Production

Hynek Schlawack feels that discussions of Python web applications in production are missing from Python conferences. He is offering to mentor people who are interested in proposing conference talks on the subject
HYNEK SCHLAWACK

Null in Python: Understanding Python’s `NoneType` Object

Learn about the NoneType object None, which acts as the “null” in Python. This object represents emptiness, and you can use it to mark default parameters and even show when you have no result.
REAL PYTHON

PEP 584 PR Merged (Dictionary Union)

This will add the following dictionary operations: dict1 | dict2 (copy + update) and dict1 |= dict2 (update). See PEP 584 for example use cases.
GITHUB.COM/PYTHON

Discussions

Scene From Werner Herzog’s “Programming in Python” ;-)

“I see the lie in front of me – import time, and I am appalled – how can a machine offer such a promise, such a lie, the ability to import time as if it were a simple commodity. Once again, the vile snake has bitten me.”
TWITTER.COM/MVATTUONE

Python Jobs

Articles & Tutorials

Pycel: Compiling Excel Spreadsheets to Python and Making Pretty Pictures [2011]

Author describes how he compiled Excel spreadsheets with formulas into Python code in order to optimize the calculations and visualize results. Very interesting read!
DIRK GORISSEN

Better Python Tracebacks With Rich

“I’ve never found Python tracebacks to be a great debugging aid beyond telling me what the exception was, and where it occurred. In a recent update to Rich, I’ve tried to refresh the humble traceback to give enough context to diagnose errors before switching back to the editor.”
WILL MCGUGAN

Monitor Python Application Metrics and Distribute Traces in Real Time With Datadog APM

Image may be NSFW.
Clik here to view.

Datadog’s APM generates detailed flame graphs that will help your teams identify bottlenecks and latency. If an error is spotted, you can easily pivot to related logs and metrics in seconds to troubleshoot without switching tools or contexts. Visualize Python metrics end-to-end with a free trial →
DATADOGsponsor

Introduction to Python SQL Libraries

Learn how to connect to different database management systems by using various Python SQL libraries. You’ll interact with SQLite, MySQL, and PostgreSQL databases and perform common database queries using a Python application.
REAL PYTHON

A Brief Network Analysis of Symbolism in Blake’s Poetry

The author explains how she used the spaCy and NetworkX libraries to analyze William Blake’s 18th century poetry collection The Songs of Innocence and of Excellence.
MARTA PALANDRI

Python Packaging Metadata

“Since this topic keeps coming up, I’d like to briefly share my thoughts on Python package metadata because it’s – as always – more complex than it seems.”
HYNEK SCHLAWACK

How Python Became the Popular Choice

“With the popularity of Python with programmers still growing, we tried to understand how it became one of the most impactful languages in the world.”
JUN WU

How to Add a `robots.txt` to Your Django Site

robots.txt is a standard file to communicate to “robot” crawlers, such as Google’s Googlebot, which pages they should not crawl.
ADAM JOHNSON

Automate Your Dating Life With 100 Lines of Python

Author used a Python-based man-in-the-middle proxy to deconstruct network calls made by the Hinge app and then built a service to automatically “swipe right” on dating profiles.
ELI MERNIT

Learn Python for Data Science in 4 Weeks

Learn the foundational Python programming and statistics skills needed for a job in data science in as little as 4 weeks. Work 1:1 with a data science mentor to master the skills needed to get started in your journey to a data science role. Enroll in Springboard’s data science career track prep course today.
SPRINGBOARDsponsor

Managing Kindle Highlights With Python and GitHub

Author writes a Python script to build a GitHub repo for storing Kindle book highlights in an organized way.
DUARTE O.CARMO

How to Cheat at Unit Tests With Pytest and Black

Some tips for quickly writing rough initial implementations for test cases and then iterating on them.
SIMON WILLISON

Introduction to Image Processing in Python With OpenCV

MUHAMMAD JUNAID KHALID

Projects & Code

PayloadsAllTheThings: List of Useful Pentesting/CTF Payloads

GITHUB.COM/SWISSKYREPO

Carnets: Standalone Jupyter Notebooks Implementation for iOS

HOLZSCHU.GITHUB.IO

Pycel: Compile Excel Spreadsheets to Python Code & Visualize Them

GITHUB.COM/DGORISSEN

darknet.py: TOR Proxy Written in Python

GITHUB.COM/MULTIVERSECODER

dg: A Python With a Haskell Syntax

PYOS.GITHUB.IO

HiPlot: High-Dimensional Interactive Plots Made Easy

FACEBOOK.COM

Django Security: PyCharm Python Security Plugin

PYCHARM-SECURITY.READTHEDOCS.IO

DeepSpeed: Deep Learning Optimization Library

GITHUB.COM/MICROSOFT

Events

JupyterCon 2020

August 10–14 in Berlin, Germany.
NUMFOCUS.ORG

Python Vienna Meetup

February 29, 2020
MEETUP.COM

Python Mauritius User Group Meetup

February 29, 2020
MEETUP.COM

PyDelhi User Group Meetup

February 29, 2020
MEETUP.COM

Melbourne Python Users Group, Australia

March 2, 2020
J.MP

Happy Pythoning!
This was PyCoder’s Weekly Issue #409.
View in Browser »

Image may be NSFW.
Clik here to view. alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

↧

Codementor: Build Systems with Speed and Confidence by Closing the Loop First!

February 25, 2020, 12:48 pm

≫ Next: Talk Python to Me: #253 Moon base geekout

≪ Previous: PyCoder’s Weekly: Issue #409 (Feb. 25, 2020)

A completely finished “loop” is when you can provide the required input to your system, and it produces the desired output (or side effects, if that’s how you like it). The “Close the loop first” technique is about closing this loop as fast as possible by creating a barebones version of it first, providing all or some required inputs, and generating a partial form of the desired output. Once we have closed this barebones loop, we can then begin implementing behaviours from the inside out, so that with each new change our loop starts looking more like the actual system we want.

↧

Talk Python to Me: #253 Moon base geekout

February 24, 2020, 4:00 pm

≫ Next: Python Bytes: #170 Visualize this: Visualizing Python's visualization ecosystem

≪ Previous: Codementor: Build Systems with Speed and Confidence by Closing the Loop First!

↧

Python Bytes: #170 Visualize this: Visualizing Python's visualization ecosystem

February 25, 2020, 12:00 am

≫ Next: Python Insider: Python 3.8.2 and 3.9.0a4 are now available

≪ Previous: Talk Python to Me: #253 Moon base geekout

↧

Python Insider: Python 3.8.2 and 3.9.0a4 are now available

February 25, 2020, 12:02 pm

≫ Next: Roberto Alsina: Episodio 24: I like Windows!

≪ Previous: Python Bytes: #170 Visualize this: Visualizing Python's visualization ecosystem

On behalf of the entire Python development community, and the currently serving Python release team in particular, I’m pleased to announce the release of two of the latest Python editions.

Python 3.8.2

Python 3.8.2 is the second maintenance release of Python 3.8 and contains two months worth of bug fixes. Detailed information about all changes made in 3.8.2 can be found inits change log. Note that compared to 3.8.1, version 3.8.2 also contains the changes introduced in 3.8.2rc1 and 3.8.2rc2.

The Python 3.8 series is the newest feature release of the Python language, and it contains many new features and optimizations. You can find Python 3.8.2 here:
https://www.python.org/downloads/release/python-382/

See the “What’s New in Python 3.8” document for more information about features included in the 3.8 series.

Maintenance releases for the 3.8 series will continue at regular bi-monthly intervals, with 3.8.3 planned for April 2020 (at thePyCon US sprints).

Python 3.9.0a4

An early developer preview of Python 3.9 is also ready:
https://www.python.org/downloads/release/python-390a4/

Python 3.9 is still in development. This releasee, 3.9.0a4 is the fourth of six planned alpha releases. Alpha releases are intended to make it easier to test the current state of new features and bug fixes and to test the release process. During the alpha phase, features may be added up until the start of the beta phase (2020-05-18) and, if necessary, may be modified or deleted up until the release candidate phase (2020-08-10). Please keep in mind that this is a preview release and its use is not recommended for production environments.

We hope you enjoy both!

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.
https://www.python.org/psf/

Your friendly release team,

Ned Deily
Steve Dower
Łukasz Langa

Image may be NSFW.
Clik here to view.

↧

Roberto Alsina: Episodio 24: I like Windows!

February 25, 2020, 3:02 pm

≫ Next: The No Title® Tech Blog: Book review - Machine Learning with Python for Everyone, By Mark E. Fenner

≪ Previous: Python Insider: Python 3.8.2 and 3.9.0a4 are now available

Me puse a hacer un window manager de juguete en Python. Lo basé en HackWM. Hack. Hacker. Carlín! Se llama carlin_wm y realmente no sirve para nada, excepto como experiencia educativa. O sea, es re útil... para mí!

HackWm: https://github.com/kathamer/HackWM
Carlin: https://gitlab.com/ralsina/carlin

↧

The No Title® Tech Blog: Book review - Machine Learning with Python for Everyone, By Mark E. Fenner

February 25, 2020, 6:10 pm

≫ Next: PyBites: How to Write a Guest Article for PyBites

≪ Previous: Roberto Alsina: Episodio 24: I like Windows!

Machine learning, one of the hottest tech topics of today, is being used more and more. Sometimes as the best tool for the job, other times perhaps as a buzzword that is mainly used as a way to make a product look cooler. However, without knowing what ML is and how it works behind the scenes, it’s very easy to get lost. But this book does a great job in guiding you all the way up from very simple math concepts to some sophisticated machine learning techniques.

↧