I'm in the midst of rewriting a big app that currently uses AWS S3 and will soon be switched over to Google Cloud Storage. This blog post is a rough attempt to log various activities in both Python libraries:
Disclaimer: I'm manually copying these snippets from a real project and I have to manually scrub the code clean of unimportant quirks, hacks, and other unrelated things that would just add noise.
Install
boto3
$ pip install boto3
$ emacs ~/.aws/credentials
google-cloud-storage
$ pip install google-cloud-storage
$ cat ./google_service_account.json
Note: You need to create a service account and then that gives you a .json
file which you download and make sure you pass its path when you create a client.
I suspect there are more/other ways to do this with environment variables alone but I haven't got there yet.
Making a "client"
boto3
Note, there are easier shortcuts for this but with this pattern you can have full control over things like like read_timeout
, connect_timeout
, etc. with that confi_params
keyword.
importboto3frombotocore.configimportConfigdefget_s3_client(region_name=None,**config_params):options={"config":Config(**config_params)}ifregion_name:options["region_name"]=region_namesession=boto3.session.Session()returnsession.client("s3",**options)
google-cloud-storage
fromgoogle.cloudimportstoragedefget_gcs_client():returnstorage.Client.from_service_account_json(settings.GOOGLE_APPLICATION_CREDENTIALS_PATH)
Checking if a bucket exists and if you have access to it
boto3
(for s3_client
here, see above)
frombotocore.exceptionsimportClientError,EndpointConnectionErrortry:s3_client.head_bucket(Bucket=bucket_name)exceptClientErrorasexception:ifexception.response["Error"]["Code"]in("403","404"):raiseBucketHardError(f"Unable to connect to bucket={bucket_name!r} "f"ClientError ({exception.response!r})")else:raiseexceptEndpointConnectionError:raiseBucketSoftError(f"Unable to connect to bucket={bucket.name!r} "f"EndpointConnectionError")else:print("It exists and we have access to it.")
google-cloud-storage
fromgoogle.api_core.exceptionsimportBadRequesttry:gcs_client.get_bucket(bucket_name)exceptBadRequestasexception:raiseBucketHardError(f"Unable to connect to bucket={bucket_name!r}, "f"because bucket not found due to {exception}")else:print("It exists and we have access to it.")
Checking if an object exists
boto3
frombotocore.exceptionsimportClientErrordefkey_existing(client,bucket_name,key):"""return a tuple of ( key's size if it exists or 0, S3 key metadata ) If the object doesn't exist, return None for the metadata."""try:response=client.head_object(Bucket=bucket_name,Key=key)returnresponse["ContentLength"],response.get("Metadata")exceptClientErrorasexception:ifexception.response["Error"]["Code"]=="404":return0,Noneraise
Note, if you do this a lot and often find that the object doesn't exist the using list_objects_v2
is probably faster.
google-cloud-storage
defkey_existing(client,bucket_name,key):"""return a tuple of ( key's size if it exists or 0, S3 key metadata ) If the object doesn't exist, return None for the metadata."""bucket=client.get_bucket(bucket_name)blob=bucket.get_blob(key)ifblob:returnblob.size,blob.metadatareturn0,None
Uploading a file with a special Content-Encoding
Note: You have to use your imagination with regards to the source. In this example, I'm assuming that the source is a file on disk and that it might have already been compressed with gzip
.
boto3
defupload(file_path,bucket_name,key_name,metadata=None,compressed=False):content_type=get_key_content_type(key_name)metadata=metadataor{}# boto3 will raise a botocore.exceptions.ParamValidationError# error if you try to do something like:## s3.put_object(Bucket=..., Key=..., Body=..., ContentEncoding=None)## ...because apparently 'NoneType' is not a valid type.# We /could/ set it to something like '' but that feels like an# actual value/opinion. Better just avoid if it's not something# really real.extras={}ifcontent_type:extras["ContentType"]=content_typeifcompressed:extras["ContentEncoding"]="gzip"ifmetadata:extras["Metadata"]=metadatawithopen(file_path,"rb")asf:s3_client.put_object(Bucket=bucket_name,Key=key_name,Body=f,**extras)
google-cloud-storage
defupload(file_path,bucket_name,key_name,metadata=None,compressed=False):content_type=get_key_content_type(key_name)metadata=metadataor{}bucket=gcs_client.get_bucket(bucket_name)blob=bucket.blob(key_name)ifcontent_type:blob.content_type=content_typeifcompressed:blob.content_encoding="gzip"blob.metadata=metadatablob.upload_from_file(f)
Downloading and uncompressing a gzipped object
boto3
fromioimportBytesIOfromgzipimportGzipFilefrombotocore.exceptionsimportClientErrorfrom.utilsimportiter_linesdefget_stream(bucket_name,key_name):try:response=source.s3_client.get_object(Bucket=bucket_name,Key=key)exceptClientErrorasexception:ifexception.response["Error"]["Code"]=="NoSuchKey":raiseKeyHardError("key not in bucket")raisestream=response["Body"]# But if the content encoding is gzip we have re-wrap the stream.ifresponse.get("ContentEncoding")=="gzip":body=response["Body"].read()bytestream=BytesIO(body)stream=GzipFile(None,"rb",fileobj=bytestream)forlineiniter_lines(stream):yieldline.decode("utf-8")
google-cloud-storage
fromioimportBytesIOfromgzipimportGzipFilefrombotocore.exceptionsimportClientErrorfrom.utilsimportiter_linesdefget_stream(bucket_name,key_name):bucket=gcs_client.get_bucket(bucket_name)blob=bucket.get_blob(key)ifblobisNone:raiseKeyHardError("key not in bucket")bytestream=BytesIO()blob.download_to_file(bytestream)bytestream.seek(0)forlineiniter_lines(bytestream):yieldline.decode("utf-8")
Note That here blob.download_to_file
works a bit like requests.get()
in that it automatically notices the Content-Encoding
metadata and does the gunzip on the fly.
Conclusion
It's not fair to compare them on style because I think boto3 came out of boto which probably started back in the day when Google was just web search and web emails.
I wanted to include a section about how to unit test against these. Especially how to mock them. But what I had for a draft was getting ugly. Yes, it works for the testing needs I have in my app but it's very personal taste (aka. appropriate for the context) and admittedly quite messy.