NOTE: You can also watch a video walkthrough of the common code covered in this blogpost here.
Authorized data, including user information such as personal files on Google Drive and YouTube playlists, require additional security steps before access is granted. Sharing of and hardcoding credentials such as usernames and passwords is not only insecure, it's also a thing of the past. A more modern approach leverages token exchange, authenticated API calls, and standards such as OAuth2.
In this post, we'll demonstrate how to use Python to access authorized Google APIs using OAuth2, specifically listing the files (and folders) in your Google Drive. In order to better understand the example, we strongly recommend you check out the OAuth2 guides (general OAuth2 info, OAuth2 as it relates to Python and its client library) in the documentation to get started.
The docs describe the OAuth2 flow: making a request for authorized access, having the user grant access to your app, and obtaining a(n access) token with which to sign and make authorized API calls with. The steps you need to take to get started begin nearly the same way as for simple API access. The process diverges when you arrive on the Credentials page when following the steps below.
We will again use the apiclient.discovery.build() function, which is what we need to create a service endpoint for interacting with an API, authorized or otherwise. However, for authorized data access, we need additional resources, namely the httplib2 and oauth2client packages. Here are the first five lines of the new boilerplate code for authorized access:
Each scope is a single character string, specifically a URL. Here are some examples:
That is space-delimited and made tiny by me so it doesn't wrap in a regular-sized browser window; or it could be an easier-to-read, non-tiny, and non-wrapped tuple:
SCOPES = (
'https://www.googleapis.com/auth/plus.me',
'https://www.googleapis.com/auth/youtube',
)
Our example command-line script will just list the files on your Google Drive, so we only need the read-only Drive metadata scope, meaning our SCOPES variable will be just this:
The next section of boilerplate represents the security code:
If the credentials are missing or invalid, such as being expired, the authorization flow (using the client secret you downloaded along with a set of requested scopes) must be created (by client.flow_from_clientsecrets()) and executed (by tools.run()) to ensure possession of valid credentials. If you don't have credentials at all, the user much explicitly grant permission — I'm sure you've all seen the OAuth2 dialog describing the type of access an app is requesting (remember those scopes?). Once the user clicks "Accept" to grant permission, a valid access token is returned and saved into the storage file (because you passed a handle to it when you called tools.run()).
Once the user grants access and valid credentials are saved, you can create one or more endpoints to the secure service(s) desired with apiclient.discovery.build(), just like with simple API access. Its call will look slightly different, mainly that you need to sign your HTTP requests with your credentials rather than passing an API key:
DRIVE = build(API, VERSION, http=creds.authorize(Http()))
In our example, we're going to list your files and folders in your Google Drive, so for API, use the string 'drive'. The API version is currently on version 2 so use 'v2' for VERSION:
DRIVE = build('drive', 'v2', http=creds.authorize(Http()))
If you want to get comfortable with OAuth2, what it's flow is and how it works, we recommend that you experiment at the OAuth Playground. There you can choose from any number of APIs to access and experience first-hand how your app must be authorized to access personal data.
Going back to our working example, once you have an established service endpoint, you can use the list() method of the files service to request the file data:
files = DRIVE.files().list().execute().get('items', [])
If all goes well, the (JSON) response payload will (not be empty or missing and) contain a sequence of files that we can loop over, displaying file names and types:
for f in files:
print f['title'], f['mimeType']
Just like in the previous blogpost, we're using the print statement here in Python 2, but a pro tip to start getting ready for Python 3 is to add this import to the top of your script (which has no effect in 3.x) so you can use the print() function instead:
from __future__ import print_function
Below is the entire script for your convenience:
EXTRA CREDIT: To test your skills, add functionality to this code that also displays the last modified timestamp, the file (byte)size, and perhaps shave the MIMEtype a bit as it's slightly harder to read in its entirety... perhaps take just the final path element? One last challenge: in the output above, we have both Microsoft Office documents as well as their auto-converted versions for Google Apps... perhaps only show the filename once and have a double-entry for the filetypes!
Introduction
In this final installment of a (currently) two-part series introducing Python developers to building on Google APIs, we'll extend from the simple API example from the first post (part 1) just over a month ago. Those first snippets showed some skeleton code and a short real working sample that demonstrate accessing a public (Google) API with an API key (that queried public Google+ posts). An API key however, does not grant applications access to authorized data.Authorized data, including user information such as personal files on Google Drive and YouTube playlists, require additional security steps before access is granted. Sharing of and hardcoding credentials such as usernames and passwords is not only insecure, it's also a thing of the past. A more modern approach leverages token exchange, authenticated API calls, and standards such as OAuth2.
In this post, we'll demonstrate how to use Python to access authorized Google APIs using OAuth2, specifically listing the files (and folders) in your Google Drive. In order to better understand the example, we strongly recommend you check out the OAuth2 guides (general OAuth2 info, OAuth2 as it relates to Python and its client library) in the documentation to get started.
The docs describe the OAuth2 flow: making a request for authorized access, having the user grant access to your app, and obtaining a(n access) token with which to sign and make authorized API calls with. The steps you need to take to get started begin nearly the same way as for simple API access. The process diverges when you arrive on the Credentials page when following the steps below.
Google API access
In order to Google API authorized access, follow these instructions (the first three of which are roughly the same for simple API access):- Go to the Google Developers Console and login.
- Use your Gmail or Google credentials; create an account if needed
- Click "Create Project" button
- Enter a Project Name (mutable, human-friendly string only used in the console)
- Enter a Project ID (immutable, must be unique and not already taken)
- Once project has been created, click "Enable an API" button
- You can toggle on any API(s) that support(s) simple or authorized API access.
- For the code example below, we use the Google Drive API.
- Other ideas: YouTube Data API, Google+ API, Google Prediction API, etc.
- Find more APIs (and version#s which you need) at the OAuth Playground.
- Select "Credentials" in left-nav under "APIs & auth"
- In the top half labeled "OAuth2", click "Create new Client ID"
- In the new dialog, select your application type — we're building a command-line script which is an "Installed application"
- In the bottom part of that same dialog, specify the type of installed application; choose "Other" (cmd-line scripts are not web nor mobile)
- Click "Create Client ID" to generate your credentials
- Finally, click "Download JSON" to save the new credentials to your computer... perhaps choose a shorter name like "client_secret.json"
Accessing Google APIs from Python
In order to access authorized Google APIs from Python, you still need the Google APIs Client Library for Python, so in this case, do follow those installation instructions from part 1.We will again use the apiclient.discovery.build() function, which is what we need to create a service endpoint for interacting with an API, authorized or otherwise. However, for authorized data access, we need additional resources, namely the httplib2 and oauth2client packages. Here are the first five lines of the new boilerplate code for authorized access:
from apiclient.discovery import buildAfter the imports are some global variables, starting with CLIENT_SECRET. This is the credentials file you saved when you clicked "Download JSON" in the instructions above. SCOPES is a critical variable: it represents the set of scopes of authorization an app wants to obtain (then access) on behalf of user(s). What's does a scope look like?
from httplib2 import Http
from oauth2client import file, client, tools
CLIENT_SECRET = 'client_secret.json' # downloaded JSON file
SCOPES = # one or more scopes (strings)
Each scope is a single character string, specifically a URL. Here are some examples:
- 'https://www.googleapis.com/auth/plus.me'— access your personal Google+ settings
- 'https://www.googleapis.com/auth/drive.metadata.readonly'— read-only access your Google Drive file or folder metadata
- 'https://www.googleapis.com/auth/youtube'— access your YouTube playlists and other personal information
SCOPES = 'https://www.googleapis.com/auth/plus.me https://www.googleapis.com/auth/youtube'
That is space-delimited and made tiny by me so it doesn't wrap in a regular-sized browser window; or it could be an easier-to-read, non-tiny, and non-wrapped tuple:
SCOPES = (
'https://www.googleapis.com/auth/plus.me',
'https://www.googleapis.com/auth/youtube',
)
Our example command-line script will just list the files on your Google Drive, so we only need the read-only Drive metadata scope, meaning our SCOPES variable will be just this:
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
The next section of boilerplate represents the security code:
store = file.Storage('storage.json')Once the user has authorized access to their personal data by your app, a special "access token" is given to your app. This precious resource must be stored somewhere local for the app to use. In our case, we'll store it in a file called "storage.json". The lines setting the store and creds variables are attempting to get a valid access token with which to make an authorized API call.
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET, SCOPES)
creds = tools.run(flow, store)
If the credentials are missing or invalid, such as being expired, the authorization flow (using the client secret you downloaded along with a set of requested scopes) must be created (by client.flow_from_clientsecrets()) and executed (by tools.run()) to ensure possession of valid credentials. If you don't have credentials at all, the user much explicitly grant permission — I'm sure you've all seen the OAuth2 dialog describing the type of access an app is requesting (remember those scopes?). Once the user clicks "Accept" to grant permission, a valid access token is returned and saved into the storage file (because you passed a handle to it when you called tools.run()).
Note: tools.run() deprecated by tools.run_flow() At the time of this writing, the tools.run() function has been deprecated by tools.run_flow() . We'll explain this in more detail in a future blogpost, but for now, you can use either. The caveats for both: use of tools.run() is "easier" but is outdated and requires another package to download while tools.run_flow() requires more code and a recent version of Python.Why is using tools.run() "easier?" Well, it does mean less code, but it also requires the 'gflags' library, so if you need that, install it with "pip install -U python-gflags ". The good news with tools.run_flow() is that it does not need this library; the bad news is that you do need to create an argparse.ArgumentParser object (which proxies for the missing 'gflags'), meaning you need Python 2.7. If you wish to do be modern and use tools.run_flow() , read more here in the docs. |
Once the user grants access and valid credentials are saved, you can create one or more endpoints to the secure service(s) desired with apiclient.discovery.build(), just like with simple API access. Its call will look slightly different, mainly that you need to sign your HTTP requests with your credentials rather than passing an API key:
In our example, we're going to list your files and folders in your Google Drive, so for API, use the string 'drive'. The API version is currently on version 2 so use 'v2' for VERSION:
DRIVE = build('drive', 'v2', http=creds.authorize(Http()))
Going back to our working example, once you have an established service endpoint, you can use the list() method of the files service to request the file data:
files = DRIVE.files().list().execute().get('items', [])
If all goes well, the (JSON) response payload will (not be empty or missing and) contain a sequence of files that we can loop over, displaying file names and types:
for f in files:
print f['title'], f['mimeType']
from __future__ import print_function
Conclusion
To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for files().list(). For more information on what other operations you can execute with the Google Drive API, take a look at the reference docs and check out the companion video for this code sample. That's it!Below is the entire script for your convenience:
#!/usr/bin/env pythonWhen you run it, you should see pretty much what you'd expect, a list of file or folder names followed by their MIMEtypes — I named my script drive_list.py:
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
CLIENT_SECRET = 'client_secret.json'
SCOPES = 'https://www.googleapis.com/auth/drive.readonly.metadata'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET, SCOPES)
creds = tools.run(flow, store)
DRIVE = build('drive', 'v2', http=creds.authorize(Http()))
files = DRIVE.files().list().execute().get('items', [])
for f in files:
print f['title'], f['mimeType']
$ python drive_list.pyObviously your output will be different, depending on what files are in your Google Drive. But that's it... hope this is useful. You can now customize this code for your own needs and/or to access other Google APIs. Thanks for reading!
Google Maps demo application/vnd.google-apps.spreadsheet
Overview of Google APIs - Sep 2014 application/vnd.google-apps.presentation
tiresResearch.xls application/vnd.google-apps.spreadsheet
6451_Core_Python_Schedule.doc application/vnd.google-apps.document
out1.txt application/vnd.google-apps.document
tiresResearch.xls application/vnd.ms-excel
6451_Core_Python_Schedule.doc application/msword
out1.txt text/plain
Maps and Sheets demo application/vnd.google-apps.spreadsheet
ProtoRPC Getting Started Guide application/vnd.google-apps.document
gtaskqueue-1.0.2_public.tar.gz application/x-gzip
Pull Queues application/vnd.google-apps.folder
gtaskqueue-1.0.1_public.tar.gz application/x-gzip
appengine-java-sdk.zip application/zip
taskqueue.py text/x-python-script
Google Apps Security Whitepaper 06/10/2010.pdf application/pdf
EXTRA CREDIT: To test your skills, add functionality to this code that also displays the last modified timestamp, the file (byte)size, and perhaps shave the MIMEtype a bit as it's slightly harder to read in its entirety... perhaps take just the final path element? One last challenge: in the output above, we have both Microsoft Office documents as well as their auto-converted versions for Google Apps... perhaps only show the filename once and have a double-entry for the filetypes!