So far we've looked at the core Python language, everything we've seen is built into Python. One of the major features of Python though is the very broad range of modules that are shipped with the standard distribution. These modules do everything from cryptography to parsing web pages and sending email and as you learn to develop in Python you will learn more about their capabilities.
The Python standard library is documented in the Module Index section of the Python manual. You'll find chapters on all of the modules that include examples of how to use them; it's well worth skimming through it and exploring anything that looks interesting so that when you're working on a problem, you remember that there is a relevant module. In this course we'll only use a few modules and probably only a small part of each of them.
In some cases, the problem you have isn't solved in the standard library but someone has published a third party module that does the job. The Cheese Shop is a register of third party modules and installing them is easy thanks to the pip tool that will download, build and install a module from the command line.
To use a module in your application, you need to import it into
your program. You do this with the
before you use anything from the module. For example, to use the
listdir procedure from the
(which returns a list of files in a directory) we would write:
import os print(os.listdir("."))
(The argument to
listdir is the directory to list, I gave it
"." which stands for the current directory). Note the dot notation between
the module name
os and the procedure name. The dot is used in
Python as a general way to refer to components of objects, we saw it
when we looked at strings and sequences (remember
str1.upper()). In this case the we're referring to
a procedure inside a module. Sometimes you'll see multiple dots because
modules can be nested, e.g.
Sometimes you just want one procedure from a module, in this case you
from xxx import yyy notation. This allows you to
use the raw name of the procedure without the module name:
from os import listdir print(listdir("."))
You can import more than one name:
from os import listdir, getcwd print("Current directory:", getcwd()) print(listdir("."))
You can even import all the exported names from a module
from os import *). However this
is discouraged because it generally means that you didn't think it through
- the danger is that you import a name that clashed with something in your
Finally, you can change the names that you import. You might do this if the name you were importing might clash with another name in your program of from another module.
from os import listdir as showMeTheFiles print(showMeTheFiles("."))
The rest of this chapter will briefly describe some useful Python modules.
This module contains various procedures for accessing operating system services in a platform independant manner. For our purposes, the main things we will use are to access the file system although the module also provides access to running processes. A few useful procedures are:
os.listdir(dirname)returns a list of the names of the files in the directory
os.getcwd()returns the current working directory that your script is running from.
os.mkdir(dirname)creates a new directory with the given name
os.path provides many procedures to help
manipulate file names (path names including the directory names) in a
platform neutral way.
os.path.join(dirname, filename)joins the two names together with the right directory separator for the platform, that is, use a forward slash (/) on Mac and Linux and a backward slash (\) on Windows. You should use this to manipulate path names so that your code will work on any platform.
os.path.split(pathname)returns a tuple (head, tail) where
tailis everything after the last directory separator and
headis everything before it.
os.path.basename(pathname)returns the last part of the path name, after the last directory separator.
os.path.exists(filename)returns True if the filename is an existing file, False if not.
os.path.isdir(name)returns True if the name is an existing directory, False otherwise.
reModule: Regular Expressions
Regular expressions are patterns that match all or part of a string, they are
very useful in any kind of text processing application and every programmer should
understand how and when to use them. The Python
re module provides
support for various operations using regular expressions, this section gives
some details of the most useful parts of this module.
A regular expression is a pattern specified in a special language where
certain symbols affect the way that strings are matched. For example,
the '.' character will match any character in a string, so the regular
'...' will match any sequence of three characters,
whatever they are. The full language is quite complicated but the most useful
parts are simple to master. We'll first look at a simple example pattern
and show some of the things you can do with it, then describe the most
useful parts of the regular expression language.
If I want to find all of the dollar amounts in a text I can write a simple regular expression that matches things like $4,000, $3, $200 etc. (for simplicity, I'm ignoring amounts with a decimal point). Basically I'm looking for a dollar sign, followed by one or more digits, possibly with commas. That last sentence gives a model for the regular expression I need to write, I just need to encode that in the regular expression language:
'\$'the dollar sign is special in the regular expression language, it means match at the end of a string, so we need to preceed it with a backslash to just match a dollar.
'\$[0-9,]+'The square bracket notation matches any character from the set in brackets. In this case I include the digits 0-9 (shorthand for including all the digits) and the comma character. The + sign after the brackets means do this one or more times.
So the final pattern we have is
'\$[0-9,]+', this should
match any dollar amount in a text. Let's see how to apply this to search for
occurrences of dollar amounts in a text.
re.search procedure finds matches to a pattern in a string.
The returned value from the
re.search method is a match object
which contains details of the matching text. If there is no match then
None. So, the first use of this procedure
is to simply test whether the pattern is present or not, since None
evaluates to False and the match object to True in a test, we can write:
import re def string_has_money(text): """Return True if this text string contains a dollar amount, False otherwise.""" dollar_re = r'\$[0-9,]+' if re.search(dollar_re, text): return True else: return False
Next I might want to find what the dollar amounts are in a string. For this
I need to interrogate the return value of the
As I said, this is an object and we can call the
to find what was matched:
import re def get_money_from_string(text): """Return the first dollar amount from the text string or None if none is found.""" dollar_re = r'\$[0-9,]+' match = re.search(dollar_re, text) if match: return match.group() else: return None
I might also want to find all of the matches to this pattern. There are
two ways to achieve this. The procedure
re.findall returns a list
of strings that match the pattern and the procedure
allows you to iterate over the matches with a for loop. So, I can
find all dollar amounts like this:
import re def get_all_money_from_string(text): """Return a list of the dollar amounts from the text string or the empty list if none is found.""" dollar_re = r'\$[0-9,]+' return re.findall(dollar_re, text)
Finally, I might want to censor a text by replacing any dollar
amount with $$$$CAPITALISM_SUCKS!!!!. I could do this with the
import re def censor_text(text): """Replaces any dollar amount in the text string with a suitable anti-capitalist message. Returns the resulting string.""" dollar_re = r'\$[0-9,]+' message = "$$$$CAPITALISM_SUCKS!!!!" return re.sub(dollar_re, message, text)
These are just a few of the things you can do with the regular
expression library. In particular the match object that is returned
re.search has many more capabilities that are useful
for more complicated patterns.
Here are the most useful parts of the regular expression langauge. A more complete reference can be found in the Python documentation.
remove_htmlwhich takes a string containing HTML text and returns the text with all of the HTML tags removed. Remember that HTML tags can be upper or lower case and can opening tags can contain attributes.