Suppose you have a piece of text but you don't know what language it is. If you speak English and the text looks English, it's easy. But what about "Den snabba bruna räven hoppar över den lata hunden" or "haraka kahawia mbweha anaruka juu ya mbwa wavivu" or "A ligeira raposa marrom ataca o cão preguiçoso"? Can you guess?
MeaningCloud can guess. They have a Language Identification API that you can use for free. Their freemium plan allows for 40,000 API requests per month.
So to get started, you have to register, verify your email and sig in to get your "license key". Now when you have that you simply use it like this:
>>> import requests >>> url = 'http://api.meaningcloud.com/lang-1.1' >>> payload={'key': 'b49....................ee', ... 'txt': 'Den snabba bruna räven hoppar över den lata hunden'} >>>>>> requests.post(url, data=payload).json() {'status': {'remaining_credits': '39999', 'credits': '1', 'msg': 'OK', 'code': '0'}, 'lang_list': ['sv', 'da', 'no', 'es']} >>>
If you look at the lang_list
list, the first one is sv
for Swedish.
If you want the full name of a language code, look it up in the "ISO 639-1 Code" table.
Let's do the other ones too:
>>> payload['txt'] = 'A ligeira raposa marrom ataca o cão preguiçoso' >>> # Portugese >>> requests.post(url, data=payload).json() {'status': {'remaining_credits': '39998', 'credits': '1', 'msg': 'OK', 'code': '0'}, 'lang_list': ['pt', 'ro']} >>> payload['txt'] = 'haraka kahawia mbweha anaruka juu ya mbwa wavivu' >>> # Swahili >>> requests.post(url, data=payload).json() {'status': {'remaining_credits': '37363', 'credits': '1', 'msg': 'OK', 'code': '0'}, 'lang_list': ['sw']}
The service isn't perfect. It struggles on shorter texts using non-western alphabet. But it's pretty easy to use and delivers pretty good results.