Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 23070

Peter Bengtsson: Switching from AWS S3 (boto3) to Google Cloud Storage (google-cloud-storage) in Python

$
0
0

I'm in the midst of rewriting a big app that currently uses AWS S3 and will soon be switched over to Google Cloud Storage. This blog post is a rough attempt to log various activities in both Python libraries:

Disclaimer: I'm manually copying these snippets from a real project and I have to manually scrub the code clean of unimportant quirks, hacks, and other unrelated things that would just add noise.

Install

boto3

$ pip install boto3
$ emacs ~/.aws/credentials

google-cloud-storage

$ pip install google-cloud-storage
$ cat ./google_service_account.json

Note: You need to create a service account and then that gives you a .json file which you download and make sure you pass its path when you create a client.

I suspect there are more/other ways to do this with environment variables alone but I haven't got there yet.

Making a "client"

boto3

Note, there are easier shortcuts for this but with this pattern you can have full control over things like like read_timeout, connect_timeout, etc. with that confi_params keyword.

importboto3frombotocore.configimportConfigdefget_s3_client(region_name=None,**config_params):options={"config":Config(**config_params)}ifregion_name:options["region_name"]=region_namesession=boto3.session.Session()returnsession.client("s3",**options)

google-cloud-storage

fromgoogle.cloudimportstoragedefget_gcs_client():returnstorage.Client.from_service_account_json(settings.GOOGLE_APPLICATION_CREDENTIALS_PATH)

Checking if a bucket exists and if you have access to it

boto3 (for s3_client here, see above)

frombotocore.exceptionsimportClientError,EndpointConnectionErrortry:s3_client.head_bucket(Bucket=bucket_name)exceptClientErrorasexception:ifexception.response["Error"]["Code"]in("403","404"):raiseBucketHardError(f"Unable to connect to bucket={bucket_name!r} "f"ClientError ({exception.response!r})")else:raiseexceptEndpointConnectionError:raiseBucketSoftError(f"Unable to connect to bucket={bucket.name!r} "f"EndpointConnectionError")else:print("It exists and we have access to it.")

google-cloud-storage

fromgoogle.api_core.exceptionsimportBadRequesttry:gcs_client.get_bucket(bucket_name)exceptBadRequestasexception:raiseBucketHardError(f"Unable to connect to bucket={bucket_name!r}, "f"because bucket not found due to {exception}")else:print("It exists and we have access to it.")

Checking if an object exists

boto3

frombotocore.exceptionsimportClientErrordefkey_existing(client,bucket_name,key):"""return a tuple of (        key's size if it exists or 0,        S3 key metadata    )    If the object doesn't exist, return None for the metadata."""try:response=client.head_object(Bucket=bucket_name,Key=key)returnresponse["ContentLength"],response.get("Metadata")exceptClientErrorasexception:ifexception.response["Error"]["Code"]=="404":return0,Noneraise

Note, if you do this a lot and often find that the object doesn't exist the using list_objects_v2 is probably faster.

google-cloud-storage

defkey_existing(client,bucket_name,key):"""return a tuple of (        key's size if it exists or 0,        S3 key metadata    )    If the object doesn't exist, return None for the metadata."""bucket=client.get_bucket(bucket_name)blob=bucket.get_blob(key)ifblob:returnblob.size,blob.metadatareturn0,None

Uploading a file with a special Content-Encoding

Note: You have to use your imagination with regards to the source. In this example, I'm assuming that the source is a file on disk and that it might have already been compressed with gzip.

boto3

defupload(file_path,bucket_name,key_name,metadata=None,compressed=False):content_type=get_key_content_type(key_name)metadata=metadataor{}# boto3 will raise a botocore.exceptions.ParamValidationError# error if you try to do something like:##  s3.put_object(Bucket=..., Key=..., Body=..., ContentEncoding=None)## ...because apparently 'NoneType' is not a valid type.# We /could/ set it to something like '' but that feels like an# actual value/opinion. Better just avoid if it's not something# really real.extras={}ifcontent_type:extras["ContentType"]=content_typeifcompressed:extras["ContentEncoding"]="gzip"ifmetadata:extras["Metadata"]=metadatawithopen(file_path,"rb")asf:s3_client.put_object(Bucket=bucket_name,Key=key_name,Body=f,**extras)

google-cloud-storage

defupload(file_path,bucket_name,key_name,metadata=None,compressed=False):content_type=get_key_content_type(key_name)metadata=metadataor{}bucket=gcs_client.get_bucket(bucket_name)blob=bucket.blob(key_name)ifcontent_type:blob.content_type=content_typeifcompressed:blob.content_encoding="gzip"blob.metadata=metadatablob.upload_from_file(f)

Downloading and uncompressing a gzipped object

boto3

fromioimportBytesIOfromgzipimportGzipFilefrombotocore.exceptionsimportClientErrorfrom.utilsimportiter_linesdefget_stream(bucket_name,key_name):try:response=source.s3_client.get_object(Bucket=bucket_name,Key=key)exceptClientErrorasexception:ifexception.response["Error"]["Code"]=="NoSuchKey":raiseKeyHardError("key not in bucket")raisestream=response["Body"]# But if the content encoding is gzip we have re-wrap the stream.ifresponse.get("ContentEncoding")=="gzip":body=response["Body"].read()bytestream=BytesIO(body)stream=GzipFile(None,"rb",fileobj=bytestream)forlineiniter_lines(stream):yieldline.decode("utf-8")

google-cloud-storage

fromioimportBytesIOfromgzipimportGzipFilefrombotocore.exceptionsimportClientErrorfrom.utilsimportiter_linesdefget_stream(bucket_name,key_name):bucket=gcs_client.get_bucket(bucket_name)blob=bucket.get_blob(key)ifblobisNone:raiseKeyHardError("key not in bucket")bytestream=BytesIO()blob.download_to_file(bytestream)bytestream.seek(0)forlineiniter_lines(bytestream):yieldline.decode("utf-8")

Note That here blob.download_to_file works a bit like requests.get() in that it automatically notices the Content-Encoding metadata and does the gunzip on the fly.

Conclusion

It's not fair to compare them on style because I think boto3 came out of boto which probably started back in the day when Google was just web search and web emails.

I wanted to include a section about how to unit test against these. Especially how to mock them. But what I had for a draft was getting ugly. Yes, it works for the testing needs I have in my app but it's very personal taste (aka. appropriate for the context) and admittedly quite messy.


Viewing all articles
Browse latest Browse all 23070

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>