This should have been obvious to me for a longer time, but until earlier today I did not really realize the severity of the issues caused by str.format on untrusted user input. It came up as a way to bypass the Jinja2 Sandbox in a way that would permit retrieving information that you should not have access to which is why I just pushed out a security release for it.
However I think the general issue is quite severe and needs to be a discussed because most people are most likely not aware of how easy it is to exploit.
The Core Issue
Starting with Python 2.6 a new format string syntax landed inspired by .NET which is also the same syntax that is supported by Rust and some other programming languages. It's available behind the .format() method on byte and unicode strings (on Python 3 just on unicode strings) and it's also mirrored in the more customizable string.Formatter API.
One of the features of it is that you can address both positional and keyword arguments to the string formatting and you can explicitly reorder items at all times. However the bigger feature is that you can access attributes and items of objects. The latter is what is causing the problem here.
Essentially one can do things like the following:
>>> 'class of {0} is {0.__class__}'.format(42)"class of 42 is <class 'int'>"
In essence: whoever controls the format string can access potentially internal attributes of objects.
Where does it Happen?
First question is why would anyone control the format string. There are a few places where it shows up:
- untrusted translators on string files. This is a big one because many applications that are translated into multiple languages will use new-style Python string formatting and not everybody will vet all the strings that come in.
- user exposed configuration. One some systems users might be permitted to configure some behavior and that might be exposed as format strings. In particular I have seen it where users can configure notification mails, log message formats or other basic templates in web applications.
Levels of Danger
For as long as only C interpreter objects are passed to the format string you are somewhat safe because the worst you can discover is some internal reprs like the fact that something is an integer class above.
However tricky it becomes once Python objects are passed in. The reason for this is that the amount of stuff that is exposed from Python functions is pretty crazy. Here is an example from a hypothetical web application setup that would leak the secret key:
CONFIG={'SECRET_KEY':'super secret key'}classEvent(object):def__init__(self,id,level,message):self.id=idself.level=levelself.message=messagedefformat_event(format_string,event):returnformat_string.format(event=event)
If the user can inject format_string here they could discover the secret string like this:
{event.__init__.__globals__[CONFIG][SECRET_KEY]}
Sandboxing Formatting
So what do you do if you do need to let someone else provide format strings? You can use the somewhat undocumented internals to change the behavior.
fromstringimportFormatterfromcollectionsimportMappingclassMagicFormatMapping(Mapping):"""This class implements a dummy wrapper to fix a bug in the Python standard library for string formatting. See http://bugs.python.org/issue13598 for information about why this is necessary."""def__init__(self,args,kwargs):self._args=argsself._kwargs=kwargsself._last_index=0def__getitem__(self,key):ifkey=='':idx=self._last_indexself._last_index+=1try:returnself._args[idx]exceptLookupError:passkey=str(idx)returnself._kwargs[key]def__iter__(self):returniter(self._kwargs)def__len__(self):returnlen(self._kwargs)# This is a necessary API but it's undocumented and moved around# between Python releasestry:from_stringimportformatter_field_name_splitexceptImportError:formatter_field_name_split=lambda \
x:x._formatter_field_name_split()classSafeFormatter(Formatter):defget_field(self,field_name,args,kwargs):first,rest=formatter_field_name_split(field_name)obj=self.get_value(first,args,kwargs)foris_attr,iinrest:ifis_attr:obj=safe_getattr(obj,i)else:obj=obj[i]returnobj,firstdefsafe_getattr(obj,attr):# Expand the logic here. For instance on 2.x you will also need# to disallow func_globals, on 3.x you will also need to hide# things like cr_frame and others. So ideally have a list of# objects that are entirely unsafe to access.ifattr[:1]=='_':raiseAttributeError(attr)returngetattr(obj,attr)defsafe_format(_string,*args,**kwargs):formatter=SafeFormatter()kwargs=MagicFormatMapping(args,kwargs)returnformatter.vformat(_string,args,kwargs)
Now you can use the safe_format method as a replacement for str.format:
>>> '{0.__class__}'.format(42)"<type 'int'>">>> safe_format('{0.__class__}',42)Traceback (most recent call last):
File "<stdin>", line 1, in <module>AttributeError: __class__