Suppose you need to scrape data from a website after translating the web page in R and Python. In google chrome, there is an option (or functionality) to translate any foreign language. If you are an english speaker and don't know any other foreign language and you want to extract data from the website which does not have option to convert language to English, this article would help you how to perform translation of a webpage.
What is Selenium?
You may not familiar with Selenium so it is important to understand the background. Selenium is an open-source tool which is very popular in testing domain and used for automating web browsers. It allows you to write test scripts in several programming languages. Selenium is available in both R and Python.
Translate Page in Web Scraping in R and Python
In R there is a package named RSelenium whereas Selenium can be installed by installing selenium package in Python. Following is a list of languages chrome supports along with their code. You need this code in making chrome understand from which language to what language you want to translate the web page.
Name | Code |
---|
Amharic | am |
Arabic | ar |
Basque | eu |
Bengali | bn |
English (UK) | en-GB |
Portuguese (Brazil) | pt-BR |
Bulgarian | bg |
Catalan | ca |
Cherokee | chr |
Croatian | hr |
Czech | cs |
Danish | da |
Dutch | nl |
English (US) | en |
Estonian | et |
Filipino | fil |
Finnish | fi |
French | fr |
German | de |
Greek | el |
Gujarati | gu |
Hebrew | iw |
Hindi | hi |
Hungarian | hu |
Icelandic | is |
Indonesian | id |
Italian | it |
Japanese | ja |
Kannada | kn |
Korean | ko |
Latvian | lv |
Lithuanian | lt |
Malay | ms |
Malayalam | ml |
Marathi | mr |
Norwegian | no |
Polish | pl |
Portuguese (Portugal) | pt-PT |
Romanian | ro |
Russian | ru |
Serbian | sr |
Chinese (PRC) | zh-CN |
Slovak | sk |
Slovenian | sl |
Spanish | es |
Swahili | sw |
Swedish | sv |
Tamil | ta |
Telugu | te |
Thai | th |
Chinese (Taiwan) | zh-TW |
Turkish | tr |
Urdu | ur |
Ukrainian | uk |
Vietnamese | vi |
Welsh | cy |
READ MORE »