Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

DataCamp: IPython Or Jupyter?

$
0
0

IPython and Jupyter Notebook

For learners as well as for more advanced data scientists, the Jupyter Notebook is one of the popular data science tools out there: the interactive environment is not only ideal to teach and learn, and to share your work with peers, but also ensures reproducible research. Yet, as you’re discovering how to work with this notebook, you’ll often bump into IPython.

The two seem to be synonyms in some cases and you’ll agree with me when I say that it’s very confusing when you want to dig deeper: are magics part of Jupyter or IPython? Is saving and loading notebooks a feature of IPython or Jupyter?

You can probably keep on going with the questions.

Today’s blog post intends to illustrate some of the core differences between the two more explicitly, not only starting from the origins of both to explain how the two relate, but also covering some specific features that are either part of one or the other, so that it will be easier for you to make the distinction between the two!

Consider also reading DataCamp’s Definitive Guide to Jupyter Notebook for tips and tricks, best practices, examples, and much more. 

The Origins of IPython / Jupyter

To fully understand what the Jupyter Notebook is and how it differs from IPython, it might be interesting to first read a bit more about how these two fit into the history and the future of computational notebooks.

The Start of Computational Notebooks: MATLAB, Mathematica & Maple

In the mid-1980s, MATLAB was released by The MathWorks, founded by Jack Little, Steve Bangert, and Cleve Moler.

Let’s go to the late 1980s, 1987 to be exact. Theodore Gray started working on what was to be the Mathematica notebook frontend and a year later, it released to the public. The GUI allowed for the interactive creation and editing of notebook documents that contain pretty-printed program code, formatted text and a whole bunch of other stuff such as typeset mathematics, graphics, GUI components, tables, and sounds. Standard word processing capabilities were there, such as real-time multilingual spell checking. You could output the documents in a slideshow environment for presentations.

When you look at how these notebooks were structured, you notice straight away that they depended on a hierarchy of cells that allowed for the outlining and sectioning of documents, which you now also find in Jupyter notebooks.

Also in the late 1980s, in 1989, Maple introduced their first notebook-style GUI. It was included with version 4.3 for the Macintosh. Versions of the new interface for X11 and Windows followed in 1990.

These notebooks will all be an inspiration for others to develop what will be called “data science notebooks”.

The Rise of Data Science Notebooks

There have been many computational notebooks in between the ones that are now widely known to do interactive data science. This section will focus on the notebooks that have contributed to the been most notable in the rise of the data science notebooks.

Sage Notebook

The Sage notebook as a browser-based system was first released mid-2000s and then in 2007, a new version was released that was more powerful, had user accounts and could be used to make documents public. It resembled the Google docs UI design since the layout of the Sage notebook was based on the layout of Google notebooks.

The creators of the Sage notebooks have confirmed that they were avid users of the Mathematica notebooks and Maple worksheets. Other motivations or drivers that were important when you consider the development of the Sage notebooks were the facts that the developers had close contact with the team behind IPython, they had experienced failed attempts at GUIs for IPython, and the rise of “AJAX” = web applications, which didn’t require users to refresh the whole page every time you do something.

IPython and Jupyter Notebook

In late 2001, twenty years after Guido van Rossum began to work on Python at the National Research Institute for Mathematics and Computer Science in the Netherlands, Fernando Pérez starts developing IPython. The project was heavily influenced by the Mathematica notebooks and Maple worksheets, just like the Sage notebook and many other projects that followed.

In 2005, both Robert Kern and Fernando Pérez attempted building a notebook system. Unfortunately, the prototype had never become fully usable.

Fast forward two years: the team had kept on working, and in 2007, they formulated another attempt at implementing a notebook-type system. By October 2010, there was a prototype of a web notebook and in the summer of 2011, this prototype was incorporated and it was released with 0.12 on December 21, 2011. In subsequent years, the team got awards, such as the Advancement of Free Software for Fernando Pérez on 23 of March 2013 and the Jolt Productivity Award, and funding from the Alfred P. Sloan Foundations, among others.

Lastly, in 2014, Project Jupyter started as a spin-off project from IPython.

The last release of IPython before the split contained an interactive shell, the notebook server, the Qt console, etc. The project was big, with tools that were increasingly becoming more and more distinct projects that happened to pertain to the same project. After the Jupyter project started, the language-agnostic parts of the IPython project, such as the notebook format, message protocol, Qt console, notebook web application, etc. were put into the Jupyter project.

This is called “The Big Split”.

IPython has now only two roles to fulfill: being the Python backend to the Jupyter Notebook, which is also known as the kernel, and an interactive Python shell. But this is not all: within the IPython ecosystem, you’ll also find a parallel computing framework. You’ll read more about this later on!

And just like IPython, Project Jupyter is actually one name for a bunch of projects: the three applications that it harbors are the Notebook itself, a Console and a Qt console, but there are also subprojects such as Jupyterhub to support notebook deployment, nbgrader for educational purposes, etc. You can see an overview of the Jupyter architecture here.

Note that it’s exactly the evolution of this project that explains the confusion that many Pythonistas have when it comes to IPython and Jupyter: since one came out of the other (quite recently), there are some who still have difficulties adopting the right names for the concepts. But what might be the even more complicating factor is the evolution: since one came out of the other, there is a considerable overlap between the IPython and the Jupyter Notebook features that are sometimes hard to distinguish!

How to distinguish between the two will become clear in the next sections of this post.

If you want to know more details about how the development of IPython came about, check out the personal accounts of Fernando Pérez and William Stein about the history of their notebooks.

R Notebooks

R Markdown and Jupyter notebooks share the delivering of a reproducible workflow, the weaving of code, output, and text together in a single document, supporting interactive widgets and outputting to multiple formats.

However, the two also differ: the former focuses on reproducible batch execution, plain text representation, version control, production output and offers the same editor and tools that you use for R scripts. The latter focus on outputting inline with code, caching the output across sessions, sharing code and outputting in a single file. Notebooks have an emphasis on an interactive execution model. They don’t use a plain text representation, but a structured data representation, such as JSON.

That all explains the purpose of RStudio’s notebook application: it combines all the advantages of R Markdown with the good things that computational notebooks have to offer.

To learn more about how you can work with R notebooks and what the exact differences are between Jupyter and R Markdown notebooks in terms of notebook sharing, project management, version control and more, check out DataCamp’s Jupyter and R: Notebooks with R post.

Other Data Science Notebooks

Of course, there are even more notebooks that you can consider when you’re getting into data science. In recent years, a lot of new alternatives have found their way to data scientists and data science enthusiasts: think not only of Beaker Notebook, Apache Zeppelin, Spark Notebook, DataBricks Cloud, etc., but also of other tools such as the Rodeo IDE which also make your data science analyses interactive and reproducible.

The Future of Notebooks

And notebooks seem to be here to stay. Recently, the next generation of Jupyter Notebooks has been introduced to the community: JupyterLab. The Notebook application includes not only support for Notebooks but also a file manager, a text editor, a terminal emulator, a monitor for running Jupyter processes, an IPython cluster manager and a pager to display help.

The rich toolset of the Jupyter Notebook has evolved organically and was driven by the needs of our users and developers. JupyterLab is a next-generation architecture to support all these tools, but with a flexible and responsive UI, offering a user-controlled layout that could tie together the tools.

Read more about it here.

IPython Jupyter Notebook

IPython or Jupyter?

The evolution of the project and the consequent “Big Split” are the foundations to understanding the true differences between the two. But, as the two are inherently connected, you’ll sometimes find yourself doubting what is part of what.

The following section will go over some features that are either part of the IPython ecosystem or of Jupyter Project.

Up to you to select the right answer and discover more about each feature!

Kernels?

Kernels are a feature of the Jupyter Notebook Application. A kernel is a program that runs and introspects the user’s code: it provides computation and communication with the frontend interfaces, such as notebooks. The Jupyter Notebook Application has three main kernels: the IPython, IRkernel and IJulia kernels.

Since the name “Jupyter” is actually short for “Julia, Python and R”, that really doesn’t come as too much of a surprise. The IPython kernel is maintained by the Jupyter team, as a result of the evolution of the project.

However, you can also run many other languages, such as Scala, JavaScript, Haskell, Ruby, and more in the Jupyter Notebook Application. Those are community maintained kernels.

Notebook Deployment?

Deploying notebooks is something that you’ll typically find or look into when you’re working with the Jupyter notebooks. There are quite a number of packages that will help you to deploy your notebooks and that are part of the Jupyter ecosystem.

Here are some of them:

  • docker-stacks will come in handy when you need stacks of Jupyter applications and kernels as Docker containers.
  • ipywidgets provides interactive HTML & JavaScript widgets (such as sliders, checkboxes, text boxes, charts, etc.) for the Jupyter architecture that combine front-end controls coupled to a Jupyter kernel.
  • jupyter-drive allows IPython to use Google Drive for file management.
  • jupyter-sphinx-theme to add a Jupyter Sphinx theme to your notebook. It will make it easier to create intelligent and beautiful documentation.
  • kernel_gateway is a web server that supports different mechanisms for spawning and communicating with Jupyter kernels. Look here to see some use cases in which this package can be come in handy.
  • nbviewer to share your notebooks. Check out the gallery here.
  • tmpnb to create temporary Jupyter Notebook servers using Docker containers. Try it out for yourself here.
  • traitlets is a framework that lets Python classes have attributes with type checking, dynamically calculated default values, and ‘on change’ callbacks. You can also use the package for configuration purposes, to load values from files or from command line arguments. traitlets powers the configuration system of IPython and Jupyter and the declarative API of IPython interactive widgets.

System Shell Usage?

It is possible to adapt IPython for system shell usage with magics: lines that start with ! are passed directly to the system shell. For example, !ls will run ls in the current directory. You can assign the result of a system command to a Python variable with the syntax myfiles=!ls.

However, if you want to get the result of an ls function explicitly printed out as a list with strings, without assigning it to a variable, use two exclamation marks (!!ls) or the %sx magic command without an assignment.

# Assign the result to `ls`
ls = !ls

# Explicit `ls`
!!ls

# Or with magics
%sx

# Assign magics result
ls = %sx

Note that !!commands cannot be assigned to a variable, but that the result of a magic (as long as it returns a value) can be assigned to a variable.

IPython also allows you to expand the value of Python variables when making system calls: just wrap your variables or expressions in braces ({}). Also, in a shell command with ! or !!, any Python variable prefixed with $ is expanded. In the code chunk below, you’ll see that you echo the argv attribute of the sys variable. Note that you can also use the $/$$ syntaxes to Python variables from system output, which you can later use for further scripting.

To pass a literal $ to the shell, use a double $$. You’ll need this literal $ if you want to access the shell and environment variables like $PATH:

# Import and initialize
import math
x = 4 

# System call with variable
!echo {math.factorial(x)}

# Expand a variable
!echo $sys.argv

# Use $$ for a literal $
!echo "A system variable: $$HOME"

Read more here.

Note that besides IPython, there are also other kernels that have magics to make sure that lines are executed as shell commands!

Also, there are aliases that you can define for system commands. These aliases are basically shortcuts to bash commands. One alias is a tuple: (“showTheDirectory”, “ls”). Run %alias? to get more information! Tip: use %rehashx to load all of your $PATH as IPython aliases.

Magics?

If you have gone through DataCamp’s Definitive Guide to Jupyter Notebook or if you have already worked with Jupyter, you might already know the so-called “magic commands”. The magics usually consist of a syntax element that is not valid in the underlying language and some kind of word that implies a command. Beneath the hood, magics functions are actually Python functions.

The IPython kernel uses, as you might already know, the % syntax element because it’s not a valid unary operator in Python. However, lines that begin with %% signal a cell magic: they take as arguments not only the rest of the current line, but all lines below them as well, in the current execution block. Cell magics can in fact make arbitrary modifications to the input they receive, which need not even be valid Python code at all. They receive the whole block as a single string.

Magics are specific to and provided by kernels and are designed to make your work and experience within Jupyter Notebook a lot more interactive. Whether magic commands are available in a certain kernel depends on the kernel developer(s) and on kernel per kernel. You already see it: magics are a kernel feature.

When you’re using the Python backend to the Jupyter Notebook, IPython, which is also known as the kernel, you might want to make use of the following tricks to gain access to functionalities that will make your programming faster, easier, and more interactive. Note that the ones that will be listed are not meant to be exhaustive. Check out this list of built-in magic commands for a complete overview.

Plotting

One major feature of the IPython kernel is the ability to display plots that are the output of running code cells. The kernel is designed to work seamlessly with the matplotlib data visualization library to provide this functionality. To make use of it, use the magic command %matplotlib.

As such, your plot will be displayed in a separate window by default. Additionally, you can also specify a backend, such as inline or qt, the output of the plotting commands will be shown inline or through a different GUI backend. You can read more about it here.

FileSystem Navigation

The kernel’s magic commands also provide a way to navigate through your file system. The magics %cd and %bookmark can be used to either change directory or to bookmark a folder to have faster access to directories you often use.

Debugger Access

Next, you can also use magics to call up a Python debugger %pdb every time there is an uncaught exception. This will direct you through the part of the code that triggered the exception, which will make it possible rapidly find the source of a bug.

You can also use the %run magic command with the -d option to run scripts under the Python debugger’s control. It will automatically set up initial breakpoints for you. Lastly, you can also use the %debug magic for even easier debugger access.

IPython Extensions

You can use the %load_ext magic to load an IPython extension by its module name. IPython extensions are Python modules that modify the behaviour of the shell: extensions can register magics, define variables, and generally modify the user namespace to provide new features for use within code cells.

Here are some examples:

  • Use %load_ext oct2py.ipython to seamlessly call M-files and Octave functions from Python,
  • Use %load_ext rpy2.ipython to use an interface to R running embedded in a Python process,
  • Use %load_ext Cython to use a Python to C compiler,
  • Use sympy.init_printing() to pretty pritn Sympy Basic objects automatically, and
  • To use Fortran in your interactive session, you can use %load_ext fortranmagic.
  • … There are many more! You can create an register your own IPython extensions and register them on PyPi: this also means that there are many other user-defined extensions and magics out there! One example is ipython_unittest, but also check out this Extensions Index.

One of the other extensions that you should keep an eye out on is sparkmagic, a set of tools for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. The sparkmagic library provides a %%spark magic that you can use to easily run code against a remote Spark cluster from a normal IPython notebook.

# Load in sparkmagic
%load_ext sparkmagic.magics

# Set the endpoint
%manage_spark

# Ask for help
%spark?

Go here for more examples of how you can make use of these magics to work with Spark cluster interactively.

Note that IPython has two other magics besides %load_ext that allow you to manage the extensions from within your Jupyter Notebook: %reload_ext and %unload_ext to unload, reimport and load an extension and to unload the extension, respectively.

Different Kernels, Other Magics

However, in other languages, the syntax element in the magic commands might have a meaning.

The R kernel, IRKernel, doesn’t have a magics sytem. To execute bash commands, for example, you’ll use R functions such as system() to invoke an OS command. An example would be system("head -5 *.csv", intern=TRUE). Note that by including the intern argument, you specify that you want to capture the output of the command as a character vector in R. To display markdown input, you make use of display_markdown(), to which you pass the Markdown code as a character vector.

Likewise, the Julia Kernel IJulia also doesn’t use “magics”. Instead, other syntaxes to accomplish the same goals are more natural in Julia, work in environments outside of IJulia code cells, and are often more powerful. However, the developers of the IJulia kernel have made sure that whenever you enter an IPython magic command in an IJulia code cell, you will see a printout with help that explains how you can achieve a similar effect in Julia if possible.

For example, the analog of IPython’s %load in IJulia is IJulia.load().

On the other hand, there are kernels such as the Scala kernel IScala that do support magic commands, similarly to IPython. However, the set of magics is different as it has to match the specifics of Scala and the JVM. Magic commands consist of percent sign % followed by an identifier and optional input to a magic. Some of the most notable magics are:

# Type Information
%type 1

# Library Management 
%libraryDependencies
%update

As you have read above, the sparkmagic library also provides a set of Scala and Python kernels that allow you to automatically connect to a remote Spark cluster, run code and SQL queries, manage your Livy server and Spark job configuration, and generate automatic visualizations. And this without needing any code!

For example, you can easily execute SparkSQL queries with %%sql or access Spark application information and logs via %%info magic.

If you’re working with another kernel and you wonder if you can make use of magic commands, it might be handy to know that there are some kernels that build on the metakernel project and that will use, in most cases, the same magics that you’ll also find in the IPython kernel. You can find a list of the metakernel magics here. The metakernel is a Jupyter/IPython kernel template which includes core magic functions.

Some examples:

This means that, for example, when you’re using the MATLAB kernel, you’ll have the following magics available:

Available line magics:
%cd  %connect_info  %download  %edit  %get  %help  %html  %install  %install_magic  %javascript  %kernel  %kx  %latex  %load  %ls  %lsmagic  %magic  %parallel  %plot  %pmap  %px  %python  %reload_magics  %restart  %run  %set  %shell  %spell

Available cell magics:
%%debug  %%file  %%help  %%html  %%javascript  %%kx  %%latex  %%processing  %%px  %%python  %%shell  %%show  %%spell 

If you look at the chunk that is printed above, you’ll see that some of those magic commands seem very familiar. For those who don’t know the magics so well, compare the above chunk to the magics that you have available in the IPython kernel by default and you’ll see that some of the magics are the same:

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

In essence, you can use one question to distinguish which magics are specific to IPython and which ones can be used in other kernels: is this functionality Python-specific or is it a general thing that can also be used in the language that you’re working with?

For example, %pdb or the Python Debugger or %matplotlib is something specific to Python that wouldn’t make sense when you’re working with the JavaScript kernel. However, changing directories with %cd is typically something that is very general and that should work in any language, as it is such a “general” command. So, chances are that this will be a magic that can be used in other kernels. Of course, you’ll still need to know whether your kernel makes use of the magics.

Conversion and Formatting Notebooks?

Converting and formatting notebooks are features that you’ll find in the Jupyter ecosystem. Two tools that you’ll typically find for these tasks are nbconvert and nbformat.

You can use the former to convert notebooks to various other formats to present information in familiar formats, to publish research and to embed notebooks in papers, to collaborate with others and to share content with a larger audience.

The latter basically contains the Jupyter Notebook format and is the key to understanding that notebook files are simple JSON documents that contain: metadata (such as the kernel or language info), the version of the notebook format (major and minor), and the cells in which all text, code, etc. is stored.

Saving and Loading Notebooks?

Saving and loading notebooks is a feature of the Jupyter Notebook Application. You can load notebooks, saved as files with an .ipynb filename extension, which other people have created by downloading and opening up the file in the Jupyter application. More specifically, you can create a new notebook and then choose to open the file by clicking on the “File” tab, clicking “Open” and selecting your downloaded notebook.

Reversely, you can also save your own notebooks by clicking on the same “File” tab and selecting “Download as” to get your hands on your own notebook file or you can also choose to save the file and set a checkpoint. This is very handy for when you want to do some minor version control and maybe revert back to an earlier version of your notebook. Of course, your modifications are saved automatically every few minutes, so there isn’t always a need to do this action explicitly.

Note that you can also opt to not save any changes to an original notebook by making a copy of it and saving all changes to that copy!

Keyboard Shortcuts & Multicursor Support?

Selecting multiple cells, toggling the cell output, inserting new cells, etc. For all these actions, you have keyboard shortcuts that are part of the Jupyter Notebook. You can find a list of keyboard shortcuts under the menu at the top: go to the “Help” tab and select “Keyboard Shortcuts”. 

Also, the multi cursor support is a feature that you’ll find in the Jupyter Notebook!

Parallel Computing Network?

The parallel computing network was part of the IPython project, but as of 4.0, it's s a standalone package called ipyparallel. The package is basically a collection of CLI scripts for controlling clusters for Jupyter.

Even though it’s split off, it’s still a powerful component of the IPython ecosystem that is generally overlooked; It’s so powerful because instead of running a single Python kernel, it allows you to start many distributed kernels over many machines.

Typical use cases for ipyparallel are, for example, cases in which you need to run models many different times to estimate the distributions of its outputs or how they vary with input parameters. When the runs of the model are independent, you can speed up the process by running them in parallel across multiple computers in a cluster. Think about distributed model training or simulations.

Jupyter IPython Notebooks

Terminal?

This feature is one that is part of the Jupyter ecosystem: you have the Jupyter Console and a Jupyter terminal application. However, since the start, IPython has been used to indicate the original, interactive command-line terminal for Python. It offers an enhanced read-eval-print loop (REPL) environment particularly well adapted to scientific computing. This was the standard before 2011 when the Notebook tool was introduced and started offering a modern and powerful web interface to Python.

Next, you also had the IPython console, which started two processes: the original IPython terminal shell and the default profile or kernel which gets started if not otherwise noted. By default, this was Python.

The IPython console is now deprecated and if you want to start it, you’ll need to use the Jupyter Console, which is a terminal-based console frontend for Jupyter kernels. This code is based on the single-process IPython terminal. The Jupyter Console provides the interactive client-side experience of IPython at the terminal, but with the ability to connect to any Jupyter kernel instead of only to IPython.

This lets you test any Jupyter Kernel you may have installed at the terminal, without needing to fire up a full-blown Notebook for it. The Console allows for console-based interaction with other Jupyter kernels such as IJulia, IRKernel.

Lastly, the Jupyter Notebook Application also has a Terminal Application: a simple bash shell terminal that runs in your browser. You can easily find it when you start the application and select a new terminal from the dropdown menu.

Qt Console?

The Qt console used to be a part of the IPython project, but it has now moved to the Jupyter project. It’s a lightweight application that largely feels like a terminal but provides a number of enhancements only possible in a GUI, such as inline figures, proper multi-line editing with syntax highlighting, graphical call tips, and much more. The Qt console can use any Jupyter kernel.

Conclusion

Today’s blog post was an addition to DataCamp’s Definitive Guide and covered the history of computational notebooks in more detail, together with some of the main features of both IPython and Jupyter projects so that you can understand the evolution and the difference between the two more clearly. The goal was to see that the distinction between the two is sometimes hard if you don’t take into account the historic perspective of the two projects. In some cases, there is a gray zone, an “in between”, that isn’t easily classified.


Viewing all articles
Browse latest Browse all 22462

Trending Articles