Using SQLite from Python
Introducing SQLite
SQLite is a lightweight relational database system implemented as a shared library that can be embedded into applications. It is widely used in applications by Apple, Adobe, Google and many others, it is ideal if you want the benifits of a real SQL database without the overhead of installing a large database application. For web development, it provides a lightweight alternative to larger production databases like MySQL, Oracle etc. Since it supports most of the standard SQL language, applications can be developed using SQLite and easily ported to a larger, production database system when it is required.
SQLite runs on most platforms and can be downloaded and installed easily. The standard installation provides a command line application that will let you create and query databases. There are also a number of applications both free and commercial that provide graphical interfaces to SQLite databases, a list is maintained on the SQLite Wiki. You may want to install one of these if you want to experiment with creating databases and using SQL. As we will discuss later, there are also a number of interfaces between various programming languages and SQLite including Python, Java and others.
SQLite supports most of the standard SQL language. We won't provide an introduction to SQL here as there are many available on the Web and in various texts. We will include many examples as we go through the features of SQLite and the Python interface; for many simple applications, variations on these examples will be all that you need.
One important point of difference between SQLite and most other relational database systems is that SQLite stores data in a single file that is portable between different systems. If you create a database with MySQL or Microsoft SQL Server, the database is stored somewhere on your hard drive in an undisclosed location and access is mediated by the server software. With SQLite, the database is stored in a single file that you can manage just like all your other files. Where most database systems require you to configure the name of the server, the port used for communication and the username and password required to access it, SQLite only needs to know the name of the database file. If you want to back up your database, you can just take a copy of the file. If you want a copy to carry around with you, just put it on your flash drive.
Here's a simple example of creating a database, inserting some data and
querying the database using the command line SQLite client. You can
start this on Windows by installing the software from the link above and
running sqlite3
in a Command Prompt window or on Mac OS X using
Terminal.app. In this example I've used my Mac:
Macintosh> sqlite3 test.db
SQLite version 3.4.0
Enter ".help" for instructions
sqlite> create table people (first varchar, last varchar, age integer);
sqlite> insert into people values ("Steve", "Cassidy", 21);
sqlite> insert into people values ("Mary", "Mullins", 23);
sqlite> select * from people;
Steve|Cassidy|21
Mary|Mullins|23
sqlite> select * from people where age>21;
Mary|Mullins|23
This example creates a database in a file called test.db
with a simple
table containing three fields for first and last name and age. Two of
these are strings of arbitrary length (varchar) and the last is an
integer value. Two rows are inserted into the table and a query is run
to select all of the rows. The second query finds only those rows where
the age field is greater than 21.
The Python SQLite Interface
The most common use of SQLite is from application programs as an embedded datbase. SQLite itself is written as a system library in C which means that it is relatively easy to port to new environments (eg. your mobile phone) and can be linked into applications written in many languages. For our purposes, the Python SQLite interface is most relevant and there are a number of alternatives to choose from. We will make use of the most popular of these which is pysqlite.
Pysqlite provides an interface between SQL and Python such that we can query the database using SQL and get the results back as Python data structures. There are some bells and whistles that allow you to change the way that you deal with results etc but the core of the interface is simple. Here are the main concepts that you need to understand. One good thing is that these concepts are shared with most other database interfaces so that when you want to convert your code to MySQL in the future you don't need to learn a new interface.
The first thing that is needed is a connection to the database system. A connection object embodies all the required information to send queries to the database and get responses back. For SQLite all that is needed to create a connection is the name of the database file.
>>> import sqlite3
>>> db = sqlite3.connect("test.db")
The next step is to create a cursor object from the connection; a cursor is used to issue queries and retrieve results. Once we have a cursor we can issue a query; this example reproduces the database creation shown earlier.
cur = db.cursor()
cur.execute("create table people (first varchar, last varchar, age integer)")
cur.execute('insert into people values ("Steve", "Cassidy", 21)')
cur.execute('insert into people values ("Mary", "Mullins", 23)')
db.commit()
Note the use of single quotes to enclose the SQL query that contains
double quote characters. The above code creates the database and inserts
two rows in the same way as the first example. The call to
db.commit()
commits the changes that have been made, effectively
ensuring that they are saved to the database. Note that the commit
method is called via the connection object, not the cursor.
Running queries on the database is done in a similar manner with the
execute
method of the cursor object. However, the most important thing
here is to get at the results of the query. This is done with one of the
methods fetchall
, fetchone
or fetchmany
. As you might guess these
return all, one or a given number of rows from the result. The result of
fetchall
and fetchmany
is a list of sequences of the returned
matches from the query. The result of fetchone
is a single sequence.
Here are some examples.
>>> cur.execute("select first,last,age from people")
<sqlite3.Cursor object at 0x6e980>
>>> cur.fetchone()
('Steve', 'Cassidy', 21)
>>> cur.fetchone()
('Mary', 'Mullins', 23)
In the first example, fetchone
is used to return the next row from the
result; successive calls to fetchone
return successive rows, when
there are no more None
is returned.
>>> cur.execute("select first,last,age from people")
<sqlite3.Cursor object at 0x6e980>
>>> cur.fetchall()
[('Steve', 'Cassidy', 21), ('Mary', 'Mullins', 23)]
In this example, fetchall
is used to get all results rows as a list.
Note that this returns a list (square brackets) of tuples (round
brackets) where each tuple contains three values corresponding to the
variables first, last and age in the query.
In both of these examples I've explicitly named the values I want
returned rather than using select * from people
. It's a good idea to
get into the habit of doing this since you might change your database
schema to add more fields and doing so could then introduce subtle bugs
into your code. If you name the fields, you know which the first second
and third elements are going to be in your result.
In the examples above, the result of a query is a tuple or a list of
tuples. This is fine but means that when we want to access parts of
the result we need to use a numerical index - eg. result[2]
to get the
third element of a result tuple. The Python SQLite interface offers
a more usable option to return Row
objects which behave like
dictionaries. To use this we configure the database connection
as follows:
db = sqlite3.connect("mydbase.db")
db.row_factory = sqlite3.Row
Then the results of a query can be accessed in this way:
>>> cursor.execute("select first,last,age from people")
<sqlite3.Cursor object at 0x106fd1ea0>
>>> row = cursor.fetchone()
>>> row['first']
'Steve'
>>> row['last']
'Cassidy'
>>> row['age']
21
Using these methods we can write some simple code that runs a query and generates a string containing an HTML table:
cur = db.cursor()
cur.execute("select first, last, age from people")
table = "<table>"
table += "<tr><th>First</th><th>Last</th><th>Age</th></tr>"
for row in cur.fetchall():
table += "<tr><td>" + row['first'] + "</td>""
table += "<td>" + row['last'] + "</td>"
table += "<td>" + row['age'] + "</td></tr>"
table += "</table>"
All of the above conforms to the stanard Python database interface
specification (dbapi
) and will work with most databases you can access
from Python. Pysqlite has a few extensions which make working with it a
little easier but which are non-standard. For the most part, any code
that you write to work with SQLite will also work with many other
database systems such as Postgres and MYSQL - the main difference
is the way that connections are created. There are minor differences
in the SQL syntax understood by these systems but most simple
queries will be the same.
Adding Parameters to Queries
Quite often you will want to use the value of a Python variable as part of an SQL query. For example if we have the last name of a person in a variable and we want to find their age, the temptation is to construct an SQL query using the standard Python string functions:
lastname = "Cassidy"
# note this is the _wrong_ way to do this!
query = "select age from people where last='"+lastname+"'"
cur = db.execute(query)
...
While this will work in many cases it has a problem which stems from the
fact that we don't know in advance what the string lastname
contains
(since it probably came from user input or via the web). In the above
example all is well but if the value of lastname
were "D'Arcy"
then
the query we would construct would be:
"select age from people where last='D'Arcy'"
which is badly formed
because of the extra single quote. Even worse, if a malicious user sent
a last name of "x'; delete from people where 'a'='a"
the query would
be
"select age from people where last='x'; delete from people where 'a'='a'"
which would delete every row from your database. This is the basis of
the so-called SQL Injection attack that is a very common exploit on the
web.
So, we need a better way to introduce variable values into an SQL
string. Pysqlite provides a suitable mechanism which ensures that any
special characters (like ') in the string are quoted before the SQL is
run. So, "D'Arcy"
would be inserted as "D\'Arcy"
. The mechanism for
this is a second argument to execute
to carry the values to be
inserted and question marks in the query string to show where the values
should be inserted. Here's the above example done right:
lastname = "Cassidy"
query = "select age from people where last=?"
cur = db.execute(query, [lastname])
...
Note that no quotes are used around the question mark in the query
string, they are inserted automatically. The second argument to
execute
must be a sequence (list or tuple, here I've used a list) and
the question marks are substituted in order. Here's an example with two
values:
cur = db.execute("select age from people where first=? and last=?", [first, last])
You should always use this mechanism to substitute values into your SQL queries. Not doing so will lead to bugs and may make your web scripts insecure.
Summary
I've only covered a small amount of SQLite and the Python interface here but this should be enough to enable you to write some very useful scripts to store and manipulate data. Further details are available on the web for SQLite and the Python SQLite module.