Python Modules
Overview
So far we've looked at the core Python language, everything we've seen is built into Python. One of the major features of Python though is the very broad range of modules that are shipped with the standard distribution. These modules do everything from cryptography to parsing web pages and sending email and as you learn to develop in Python you will learn more about their capabilities.
The Python standard library is documented in the Module Index section of the Python manual. You'll find chapters on all of the modules that include examples of how to use them; it's well worth skimming through it and exploring anything that looks interesting so that when you're working on a problem, you remember that there is a relevant module. In this course we'll only use a few modules and probably only a small part of each of them.
In some cases, the problem you have isn't solved in the standard library but someone has published a third party module that does the job. The Cheese Shop is a register of third party modules and installing them is easy thanks to the pip tool that will download, build and install a module from the command line.
To use a module in your application, you need to import it into your
program. You do this with the import
statement before you use anything
from the module. For example, to use the listdir
procedure from the
os
module (which returns a list of files in a directory) we would
write:
import os
print(os.listdir("."))
(The argument to listdir
is the directory to list, I gave it "." which
stands for the current directory). Note the dot notation between the
module name os
and the procedure name. The dot is used in Python as a
general way to refer to components of objects, we saw it when we looked
at strings and sequences (remember numbers.sort()
and str1.upper()
).
In this case the we're referring to a procedure inside a module.
Sometimes you'll see multiple dots because modules can be nested, e.g.
os.path.exists()
.
Sometimes you just want one procedure from a module, in this case you
can use from xxx import yyy
notation. This allows you to use the raw
name of the procedure without the module name:
from os import listdir
print(listdir("."))
You can import more than one name:
from os import listdir, getcwd
print("Current directory:", getcwd())
print(listdir("."))
You can even import all the exported names from a module
(from os import *
). However this is discouraged because it generally
means that you didn't think it through - the danger is that you import a
name that clashed with something in your own program.
Finally, you can change the names that you import. You might do this if the name you were importing might clash with another name in your program of from another module.
from os import listdir as showMeTheFiles
print(showMeTheFiles("."))
The rest of this chapter will briefly describe some useful Python modules.
The os
Module
This module contains various procedures for accessing operating system services in a platform independant manner. For our purposes, the main things we will use are to access the file system although the module also provides access to running processes. A few useful procedures are:
os.listdir(dirname)
returns a list of the names of the files in the directorydirname
.os.getcwd()
returns the current working directory that your script is running from.os.mkdir(dirname)
creates a new directory with the given name
A sub-module os.path
provides many procedures to help manipulate file
names (path names including the directory names) in a platform neutral
way.
os.path.join(dirname, filename)
joins the two names together with the right directory separator for the platform, that is, use a forward slash (/) on Mac and Linux and a backward slash (\) on Windows. You should use this to manipulate path names so that your code will work on any platform.os.path.split(pathname)
returns a tuple (head, tail) wheretail
is everything after the last directory separator andhead
is everything before it.os.path.basename(pathname)
returns the last part of the path name, after the last directory separator.os.path.exists(filename)
returns True if the filename is an existing file, False if not.os.path.isdir(name)
returns True if the name is an existing directory, False otherwise.
import re
def string_has_money(text):
"""Return True if this text string
contains a dollar amount, False otherwise."""
dollar_re = r'\$[0-9,]+'
if re.search(dollar_re, text):
return True
else:
return False
Next I might want to find what the dollar amounts are in a string. For
this I need to interrogate the return value of the `re.search`
procedure. As I said, this is an object and we can call the `group()`
method to find what was matched:
import re
def get_money_from_string(text):
"""Return the first dollar amount from the
text string or None if none is found."""
dollar_re = r'\$[0-9,]+'
match = re.search(dollar_re, text)
if match:
return match.group()
else:
return None
I might also want to find all of the matches to this pattern. There are
two ways to achieve this. The procedure `re.findall` returns a list of
strings that match the pattern and the procedure `re.finditer` allows
you to iterate over the matches with a for loop. So, I can find all
dollar amounts like this:
import re
def get_all_money_from_string(text):
"""Return a list of the dollar amounts from the
text string or the empty list if none is found."""
dollar_re = r'\$[0-9,]+'
return re.findall(dollar_re, text)
Finally, I might want to censor a text by replacing any dollar amount
with \$\$\$\$CAPITALISM\_SUCKS!!!!. I could do this with the `re.sub`
procedure:
import re
def censor_text(text):
"""Replaces any dollar amount in the text
string with a suitable anti-capitalist message.
Returns the resulting string."""
dollar_re = r'\$[0-9,]+'
message = "$$$$CAPITALISM_SUCKS!!!!"
return re.sub(dollar_re, message, text)
These are just a few of the things you can do with the regular
expression library. In particular the match object that is returned by
`re.search` has many more capabilities that are useful for more
complicated patterns.
### More on Regular Expression Patterns
Here are the most useful parts of the regular expression language. A
more complete reference can be found [in the Python
documentation](http://docs.python.org/library/re.html#regular-expression-syntax).
'.'
: Matches any character except newline
'\[\]'
: Used to indicate a set of characters and matches any one of
those characters. You can include ranges like \[a-z\], \[0-4\],
\[A-F\] as well as explicit sets like \[abc\]. Special characters
like . and + lose their meaning inside of square brackets so
\[0-9.\] matches either a digit or a period.
'\\d'
: Matches a decimal digit, equivalent to '\[0-9\]'
'\\s'
: Matches any whitespace character, space, tab or newline etc.
'\\S'
: Matches any non-whitespace character, the opposite of \\s.
'\\w'
: Matches any alphanumeric character or underscore, equivalent
to \[A-Za-z0-9\_\].
'\*'
: A modifier which means that the previous pattern matches zero or
more times.
'+'
: A modifier which means that the previous pattern matches one or
more times.
'?'
: A modifier which means that the previous pattern is optional
(matches zero or one time).