Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Codementor: A Python Import Tutorial for Beginners

$
0
0

What?! Another Import Tutorial?

Well, yeah. But there are a lot of overly technical, incomplete, incorrect or just wrong ones out there. As a beginner Python developer I was once faced with sifting through a million tutorials to get my head around this stuff, and I’ve been asked about it enough times that it’s pretty obvious that something is missing in tutorial space. So what is this about?

Any monologue about the strengths of a popular language is going to touch on the strength of the tools accessible to that language, and Python is no different. Python gets a lot of its power from the packages it installs by default and those that you can install yourself. This tutorial goes over the mechanism of importing those packages - making extra functionality (maybe someone else’s code) accessible to your code. Once we’ve covered the basics of importing, we’ll talk about version conflicts and introduce a common tool used for avoiding such conflicts - the virtual environment. Virtual environments, while definitely worth using, do invoke some confusion, so I’ll then take you through a little bit of how virtual environments change the import system.

What’s an Import?

To use any package in your code, you must first make it accessible. You have to import it. You can’t use anything in Python before it is defined. Some things are built in, for example the basic types (like int, float, etc) can be used whenever you want. But most things you will want to do will need a little more than that. Try typing the following into a Python console:

oTime = datetime.datetime.now()

You’ll get a NameError. Apparently Python doesn’t know what datetime means - datetime is not defined. You’ll get a similar result for doing this:

print my_undefined_variable

For datetime (or anything really) to be considered defined, it has to accessible from the current scope. For this to be true it has to satisfy one of the following conditions:

  • it is a part of the defaultPython environment. Like int, list, __name__ and object. Try typing those in an interpreter and see what happens
  • it has been defined in the current program flow (as in you wrote a def or a class or just a plain ‘ol assignment statement to make it mean something. This statement is a bit of a simplification and I’ll expand on it pretty soon
  • it exists as a separate package and you imported that package by executing a suitable import statement.

Try this out:

import datetime
oTime = datetime.datetime.now()
print oTime.isoformat()

That worked a whole lot better. First we import datetime. Then we use the now function to create an object and we assigned it to the oTime variable. We then can access functions and attributes attached to oTime.

Importing the datetime package made it accessible in the current scope.

Ok… but What’s a Scope?

The word scope seems to invoke a lot of fear in new developers so I’ll talk a little bit about it here. The basic idea is that if you want to make use of any object at any point in a program, that object needs to be defined before you use it. Some things are always defined no matter where you are in your program (eg: int), these are said to exist in the global scope.

Other things need to be defined in an explicit way. This implies actually executing a statement or statements to define the object - to give it a name in the current scope. If you have a bunch of accessible objects then those objects are accessible through use of their names, an object exists in a scope if it has been given a name within that scope. And giving something a name always implies actually executing a statement. Now that I’ve paraphrased myself a million times, let’s move on to some examples.

print x   # NameError
x = 3     # associates the name `x` with 3
print x   # works fine this time

Packages, classes and functions work in the same way. Take a look at the datetime examples above.

So that’s pretty straightforward, right? Wrong! There is still the matter of enclosing scopes.

Try this out…

y = 3                                    # first we associate the name 'y' with the number 3

def print_stuff():              # then we associate the name print_stuff with this function
    print "calling print_stuff"          #  (***)
    print y                                                                              
    z = 4     
    print z                
    print "exiting print_stuff"
                                                                                                             
print_stuff()                  # we call print_stuff and the program execution goes to (***)
print y                                       # works fine
print z                                       # NameError!!!

So y was defined outside print_stuff and was accessible both inside and outside print_stuff. And z was defined inside print_stuff and was only accessible within print_stuff. There are two separate scopes here!

So let’s extend our understanding to: an object exists in a scope if it has been given a name within that scope OR an enclosing scope.

Now for something a little different… if you are using an interpreter, put this in a fresh one (exit and open a new one).

# no more y here...

def print_stuff():    
    print "calling print_stuff"  
    print y                
    z = 4      
    print z     
    print "exiting print_stuff"                                                                 

print_stuff()     # NameError. this shouldn't surprise you since we haven't yet set y 
y = 3             # only now do we associate the name 'y' with the number 3
print_stuff()     # the rest works fine
print y

I like to think of scopes in terms of nested dictionaries. The picture I’m drawing here is not entirely accurate, but it’s a nice way of thinking about things…

program_scope = {
        'y'  : 3,        # y was defined outside of print_stuff,
                         #         and is accessible inside and outside
        'print_stuff' : {
            'z' : 4               # z was defined inside of print_stuff, 
                                  #           and is ONLY accessible inside
        },
}

Brace yourself the last example for this section…

a = [1,2,3]             # first we give some names to some stuff, 
b = [4,5,6]             #     thus making it accessible in this scope and enclosing scopes
c = [8,9,10]

def do_things():
    a = "new a"
    b.append(7)
    c = "new c"
    print a
    print b
    print c

do_things()            #nothing surprising in the output...

print a                # [1,2,3]
print b                # [4, 5, 6, 7]
print c                # [8,9,10]

When executing a statement that can create a new name in the scope, Python just goes ahead and makes it happen. When adding something to a scope Python does not look at enclosing scopes. But when trying to access an object Python will check the current scope and all enclosing scopes recursively until it either finds it or runs out of scopes (and raises a NameError).

Drawing out the scope hierarchy as a dictionary like before, we have something like this:

program_scope = {
    'a' : [1,2,3],             
    'b' : [4,5,6,7], 
    'c' : [8,9,10],
    do_things: {
       'a' : 'new_a',    #the enclosed scope has its own a and c but no b!!!
       'c' : 'new c'
    }
}

The import mechanism and the scope, and what’s a package anyway

A package is just a directory tree with some Python files in it. Nothing magical. If you want to tell Python that a certain directory is a package then create a file called __init__.py and just stick it in there. Seriously, that’s all it takes. To make a package that is actually useful you would need to do somewhat more. A package can contain other packages but to be useful it must contain modules somewhere in it’s hierarchy. A module is just a script that contains a bunch of definitions and declarations (you define classes and functions, and declare variables). The whole point of this system is to be able to organise code in such a way as to make it easy to leverage existing code in new projects without excessive use of copy-paste.

The Python documentation has a wonderful tutorial on the subject that can fill you in on the finer points.

In this section we’ll move through the basic motions of creating and using packages and modules.

To start off, make yourself a directory to work in (I’ll call it your working directory for the remainder of this text) and follow the instructions below. Now create a file called my_funcs.py that looks a little something like:

def print_a():
    print "a"
    
def print_b():
    print "b"

Now open up a console and cd into your newly created working directory and launch the Python interpreter. Type in the statements below:

print my_funcs         # NameError
import my_funcs        # from this point on my_funcs is in the scope
print my_funcs         # prints information about the module
my_funcs.print_a()     # call the function print_a that is a member of the my_funcs module
my_funcs.print_b()

Isn’t that exciting? To get to something contained directly within a module you can just import the module and use a dot. But it looks a little verbose, exit the interpreter and launch a fresh one and try this out:

from my_funcs import print_a,print_b      # this time we add the two functions to the scope      
print_a()         # it works!
print_b()
print my_funcs    # this raises a NameError since we didn't make 
                  #              my_funcs accessable, only its members

Now create a directory within your working directory called my_pkg and move my_funcs.py into my_pkg. Create an empty file called __init__.py and stick it in there too. Now you have a directory structure like this:

working_dir\
             my_pkg\
                 __init__.py
                 my_funcs.py

Exit and relaunch your interpreter and type in the following:

print my_pkg             # NameError
print my_funcs           # NameError

import my_pkg            # my_pkg is not defined in the scope

print my_pkg             # isn't that nice
print my_funcs           # NameError
print my_pkg.my_funcs    # module information

my_pkg.my_funcs.print_a() #works fine
my_pkg.my_funcs.print_b()

Again, all the dots are a little verbose. We can do things a little differently. Play with these:

from my_pkg import my_funcs

from my_pkg.my_funcs import print_a, print_b

from my_pkg import my_funcs.print_a, my_funcs.print_b

from my_pkg.my_funcs import print_a as the_coolest_function_ever, print_b as not_the_brightest_crayon

The take home message here is that you can be very specific about what parts of a package or module you want to import, and can even decide what names they will be given in the scope.

Now exit the interpreter and cd out of your working directory and launch a new interpreter. Try importing from my_pkg. You can’t. And that sucks. But we can import datetime. So what gives?

sys.path gives

Python is very aware of what is known as the current working directory. If you try to import something then the current working directory is where it looks first, where it looks second, third and fourth depend a bit on your python installation. Try this:

import sys
print sys.path

This prints out a list of directories, when you tell Python to import something then it looks in each of the listed locations in order. The first item in the list is an empty string, that indicates the current working directory. If you were to make a package or module in your current directory and name it datetime then Python’s standard datetime functionality would be out of reach.

Try this out:

sys.path.append('/path/to/your/working/directory') #the directory that contains my_pkg
import my_pkg

Brilliant! Now we can use my_pkg again.

But it would be kind of annoying to have to add new entries to sys.path for every package we want to make use of, it would be better to just store Python packages at standard locations and point the path there. Luckily Python’s package installation mechanisms handle that sort of thing. It’s a bit outside the scope of this text though. The point here is that you can have any number of packages made available to your script through use of sys.path and that is terribly convenient. Except when it isn’t…

Version conflicts

Let’s assume for a moment that version conflicts are horrible things. What if you’ve gone and started working on two different projects (A and B) that require two different sets of packages, say project A requires you to install C1 and C2; and project B requires you to install packages D1 and D2. And that’s it. Nice and tidy. There are no conflicts there.

But what if C1 requires E version 2 and project B requires E version 3. You could just continually install and uninstall different versions of E as needed whenever you need to run the scripts… but that sounds really horrible.

You could package the correct E versions within the projects that need them, ie in their respective working directories… but what if there are dependency issues you aren’t yet aware of, they could spring up the next time you install any new dependency.

Considering your shiney new knowledge of the import path, we could put each of the E versions in differnt locations and then alter sys.path to include the correrct location. For example in A we would need to do something like:

import sys
sys.path.append('path/to/E_version_2')

Which might seem clever at first but really isn’t. That sort of approach would make installing our packages tedious and brittle as everything would need to be put in the right place. And anyway, it does not address the problem of future issues.

Version conflicts are a pretty horrible problem to have to solve, good thing we don’t have to. Enter virtual environments.

Virtual Environments

A virtual environment is a group of files and configuration that can be activated (and deactivated). A virtual environment is associated with it’s own python executable, and it’s own sys.path and thus it’s own combination of installed packages.

Here is how I usually kick off a new project (this is bash, not Python):

virtualenv venv                # 1. creates the virtual environment
source venv/bin/activate       # 2. activates the virtual environment

# whatever you want
deactivate  # 3. deactivates the virtual environment  (you can also just close the terminal)

Line 1 creates a virtual environment called venv. It’s just a directory structure containing a bunch of Python stuff and some configuration (including a new sys.path). This command has a lot of options, you can even pick which Python you are keen on including in the environment (if you have multiple Python versions installed on your system)

Line 2 activates the environment. If the environment is active and you launch Python then you’ll be launching the Python interpreter that lives inside the virtual environment. If you install a package while the environment is active then the package will not go to the place where system wide packages go, it will rather get installed inside the environment (directory structure).

Line 3 deactivates the environment and makes things normal again. All the stuff you installed in your environment will be accessible the next time you activate it.

Neat, eh?

Conclusion

We’ve covered quite a lot here. We started off with the basics of scope, then proceeded to packages and the import mechanism. We then covered how virtual environments could be used to overcome version conflicts. But the rabbit hole goes a whole lot deeper. If you are interested in taking the import system further then it would be worth checking out the __import__ built in function, and import importlib. Also, there is more to just regular import statements (you can move up a package tree instead of down it, this is occasionally quite useful). You may have noticed the appearance of .pyc files during our experimentations and those are pretty cool on their own. We also touched on the fact that Python has standard package installation mechanisms, these are really worth knowing about if you intend to deploy or give away any significant piece of code.


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>