Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22910

Real Python: Natural Language Processing With spaCy in Python

$
0
0

If you want to do natural language processing (NLP) in Python, then look no further than spaCy, a free and open-source library with a lot of built-in capabilities. It’s becoming increasingly popular for processing and analyzing data in the field of NLP.

Unstructured text is produced by companies, governments, and the general population at an incredible scale. It’s often important to automate the processing and analysis of text that would be impossible for humans to process. To automate the processing and analysis of text, you need to represent the text in a format that can be understood by computers. spaCy can help you do that.

In this tutorial, you’ll learn how to:

  • Implement NLP in spaCy
  • Customize and extend built-in functionalities in spaCy
  • Perform basic statistical analysis on a text
  • Create a pipeline to process unstructured text
  • Parse a sentence and extract meaningful insights from it

If you’re new to NLP, don’t worry! Before you start using spaCy, you’ll first learn about the foundational terms and concepts in NLP. You should be familiar with the basics in Python, though. The code in this tutorial contains dictionaries, lists, tuples, for loops, comprehensions, object oriented programming, and lambda functions, among other fundamental Python concepts.

Free Source Code:Click here to download the free source code that you’ll use for natural language processing (NLP) in spaCy.

Introduction to NLP and spaCy

NLP is a subfield of artificial intelligence, and it’s all about allowing computers to comprehend human language. NLP involves analyzing, quantifying, understanding, and deriving meaning from natural languages.

Note: Currently, the most powerful NLP models are transformer based. BERT from Google and the GPT family from OpenAI are examples of such models.

Since the release of version 3.0, spaCy supports transformer based models. The examples in this tutorial are done with a smaller, CPU-optimized model. However, you can run the examples with a transformer model instead. All Hugging Face transformer models can be used with spaCy.

NLP helps you extract insights from unstructured text and has many use cases, such as:

spaCy is a free, open-source library for NLP in Python written in Cython. spaCy is designed to make it easy to build systems for information extraction or general-purpose natural language processing.

Installation of spaCy

In this section, you’ll install spaCy into a virtual environment and then download data and models for the English language.

You can install spaCy using pip, a Python package manager. It’s a good idea to use a virtual environment to avoid depending on system-wide packages. To learn more about virtual environments and pip, check out Using Python’s pip to Manage Your Projects’ Dependencies and Python Virtual Environments: A Primer.

First, you’ll create a new virtual environment, activate it, and install spaCy. Select your operating system below to learn how:

PS> python-mvenvvenvPS> ./venv/Scripts/activate(venv)PS> python-mpipinstallspacy
$ python -m venv venv
$ source ./venv/bin/activate
(venv)$ python -m pip install spacy

With spaCy installed in your virtual environment, you’re almost ready to get started with NLP. But there’s one more thing you’ll have to install:

(venv)$ python -m spacy download en_core_web_sm

There are various spaCy models for different languages. The default model for the English language is designated as en_core_web_sm. Since the models are quite large, it’s best to install them separately—including all languages in one package would make the download too massive.

Once the en_core_web_sm model has finished downloading, open up a Python REPL and verify that the installation has been successful:

>>>
>>> importspacy>>> nlp=spacy.load("en_core_web_sm")

If these lines run without any errors, then it means that spaCy was installed and that the models and data were successfully downloaded. You’re now ready to dive into NLP with spaCy!

The Doc Object for Processed Text

Read the full article at https://realpython.com/natural-language-processing-spacy-python/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]


Viewing all articles
Browse latest Browse all 22910

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>