[ login ]

Chapter Table of Contents

The Database Abstraction Layer

DAL

Dependencies

web2py comes with a Database Abstraction Layer (DAL), an API that maps Python objects into database objects such as queries, tables, and records. The DAL dynamically generates the SQL in real time using the specified dialect for the database back end, so that you do not have to write SQL code or learn different SQL dialects (the term SQL is used generically), and the application will be portable among different types of databases. At the time of this writing, the supported databases are SQLite (which comes with Python and thus web2py), PostgreSQL, MySQL, Oracle, MSSQL, FireBird, DB2, Informix, and Ingres and (partially) the Google App Engine (GAE). GAE is treated as a particular case in Chapter 11.

The Windows binary distribution works out of the box with SQLite and MySQL. The Mac binary distribution works out of the box with SQLite. To use any other database back-end, run from the source distribution and install the appropriate driver for the required back end.

database drivers

Once the proper driver is installed, start web2py from source, and it will find the driver. Here is a list of drivers:

databasedriver (source)
SQLitesqlite3 or pysqlite2 or zxJDBC 56 (on Jython)
PostgreSQLpsycopg2 57 or zxJDBC 56 (on Jython)
MySQLMySQLdb 58
Oraclecx_Oracle 59
MSSQLpyodbc 60
FireBirdkinterbasdb 61
DB2pyodbc 60
Informixinformixdb 62
Ingresingresdbi 63
web2py defines the following classes that make up the DAL:

DAL represents a database connection. For example:

1.
db = DAL('sqlite://storage.db')

define_table
Table represents a database table. You do not directly instantiate Table; instead, DAL.define_table instantiates it.
1.
db.define_table('mytable', Field('myfield'))

The most important methods of a Table are:

.insert, .truncate, .drop, and .import_from_csv_file.

Field
Field represents a database field. It can be instantiated and passed as an argument to DAL.define_table.

Rows
DAL Rows
Row
is the object returned by a database select. It can be thought of as a list of Row rows:
1.
rows = db(db.mytable.myfield!=None).select()

Row
Row contains field values.
1.
2.
for row in rows:
print row.myfield

Query
Query is an object that represents an SQL "where" clause:
1.
myquery = (db.mytable.myfield != None) | (db.mytable.myfield > 'A')

Set
Set is an object that represents a set of records. Its most important methods are count, select, update, and delete. For example:
1.
2.
3.
4.
myset = db(myquery)
rows = myset.select()
myset.update(myfield='somevalue')
myset.delete()

Expression
Expression is something like an orderby or an groupby expressions. The Field class is derived from Expression. Here is an example.
1.
2.
myorder = db.mytable.myfield.upper() | db.mytable.id
db().select(db.table.ALL, orderby=myorder)

Connection Strings

connection strings

A connection with the database is established by creating an instance of the DAL object:

1.
>>> db = DAL('sqlite://storage.db', pool_size=0)
db is not a keyword; it is a local variable that stores the connection object DAL. You are free to give it a different name. The constructor of DAL requires a single argument, the connection string. The connection string is the only web2py code that depends on a specific back-end database. Here are examples of connection strings for specific types of supported back-end databases (in all cases, we assume the database is running from localhost on its default port and is named "test"):

SQLite sqlite://storage.db
MySQL mysql://username:password@localhost/test
PostgreSQLpostgres://username:password@localhost/test
MSSQL mssql://username:password@localhost/test
FireBird firebird://username:password@localhost/test
Oracle oracle://username:password@test
DB2 db2://username:password@test
Ingres ingres://username:password@localhost/test
DB2 db2://username:password@test
Informix informix://username:password@test
Google App Enginegae
Notice that in SQLite the database consists of a single file. If it does not exist, it is created. This file is locked every time it is accessed. In the case of MySQL, PostgreSQL, MSSQL, FireBird, Oracle, DB2, Informix the database "test" must be created outside web2py. Once the connection is established, web2py will create, alter, and drop tables appropriately.

It is also possible to set the connection string to None. In this case DAL will not connect to any back-end database, but the API can still be accessed for testing. Examples of this will be discussed in Chapter 7.

Connection Pooling

connection pooling

The second argument of the DAL constructor is the pool_size; it defaults to 0.

Since it is slow to establish a new database connection for each request. To avoid this, web2py implements a mechanism of connection pooling. When a connection is established, after the page has been served and the transaction completed, the connection is not closed, but it goes into a pool. When the next http request arrives, web2py tries to pick a connection from the pool and use that one for a new transaction. If there are no available connections from the pool, a new connection is established.

The pool_size parameter is ignored by SQLite and GAE.

Connections in the pools are shared sequentially among threads, in the sense that they may be used by two different but not simultaneous threads. There is only one pool for each web2py process.

When web2py starts, the pool is always empty. The pool grows up to the minimum between the value of pool_size and the max number of concurrent requests. This means that if pool_size=10 but our server never receives more than 5 concurrent requests, then the actual pool size will only grow to 5. If pool_size=0 then connection pooling is not used.

Connection pooling is ignored for SQLite, since it would not yield any benefit.

Connection Failures

If web2py fails to connect to the database it waits 1 seconds and tries again up to 5 times before declaring a failure. In case of connection pooling it is possible that a pooled connection that stays open but unused for some time is closed by the database end. Thanks to the retry feature web2py tries to re-establish these dropped connections.

Replicated Databases

The first argument of DAL(...) can be a list of URIs. In this case web2py tries to connect to each of them. The main purpose for this is to deal with multiple database servers and distribute the workload among them). Here is a typical use case:

1.
db = DAL(['mysql://...1','mysql://...2','mysql://...3'])

In this case the DAL tries to connect to the first and, on failure, it will they the second and the first. This can also be used distribute load in a database master-slave configuration. We will talk more about this in Chapter 11 in the context of scalability.

Reserved Keywords

reserved Keywords

There is also another argument that can be passed to the DAL constructor to check table names and column names against reserved SQL keywords in target back-end databases.

This argument is check_reserved and it defaults to None.

This is a list of strings that contain the database back-end adapter names.

The adapter name is the same as used in the DAL connection string. So if you want to check against PostgreSQL and MSSQL then your connection string would look as follows:

1.
2.
db = DAL('sqlite://storage.db',
check_reserved=['postgres', 'mssql'])

The DAL will scan the keywords in the same order as of the list.

There are two extra options "all" and "common". If you specify all, it will check against all known SQL keywords. If you specify common, it will only check against common SQL keywords such as SELECT, INSERT, UPDATE, etc.

For supported back-ends you may also specify if you would like to check against the non-reserved SQL keywords as well. In this case you would append _nonreserved to the name. For example:

1.
check_reserved=['postgres', 'postgres_nonreserved']

The following database backends support reserved words checking.

PostgreSQLpostgres(_nonreserved)
MySQLmysql
FireBirdfirebird(_nonreserved)
MSSQLmssql
Oracleoracle

DAL, Table, Field

The best way to understand the DAL API is to try each function yourself. This can be done interactively via the web2py shell, although ultimately, DAL code goes in the models and controllers.

Start by creating a connection. For the sake of example, you can use SQLite. Nothing in this discussion changes when you change the back-end engine.

DAL
SQLite
MySQL
PostgresSQL
Oracle
MSSQL
FireBird
DB2
Informix
1.
>>> db = DAL('sqlite://storage.db')

The database is now connected and the connection is stored in the global variable db.

At any time you can retrieve the connection string.

_uri
1.
2.
>>> print db._uri
sqlite://storage.db

and the database name

_dbname
1.
2.
>>> print db._dbname
sqlite

The connection string is called a _uri because it is an instance of a Uniform Resource Identifier.

The DAL allows multiple connections with the same database or with different databases, even databases of different types. For now, we will assume the presence of a single database since this is the most common situation.

define_table
Field
type
length
default
requires
required
unique
notnull
ondelete
uploadfield
uploadseparate
migrate
sql.log

The most important method of a DAL is define_table:

1.
>>> db.define_table('person', Field('name'))

It defines, stores and returns a Table object called "person" containing a field (column) "name". This object can also be accessed via db.person, so you do not need to catch the return value.

Record Representation

It is optional but recommended to specify a format representation for records:

1.
>>> db.define_table('person', Field('name'), format='%(name)s')

or

1.
>>> db.define_table('person', Field('name'), format='%(name)s %(id)s')

or even more complex ones using a function:

1.
2.
>>> db.define_table('person', Field('name'),
format=lambda r: r.name or 'anonymous')

The format attribute will be used for two purposes:

  • To represent referenced records in select/option drop-downs.
  • To set the db.othertable.person.represent attribute for all fields referencing this table. This means that SQLTABLE will not show references by id but will use the format preferred representation instead.

Field constructor
These are the default values of a Field constructor:
1.
2.
3.
4.
5.
6.
7.
8.
Field(name, 'string', length=None, default=None,
required=False, requires='<default>',
ondelete='CASCADE', notnull=False, unique=False,
uploadfield=True, widget=None, label=None, comment=None,
writable=True, readable=True, update=None, authorize=None,
autodelete=False, represent=None, compute=None,
uploadfolder=os.path.join(request.folder,'uploads'),
uploadseparate=None)

Not all of them are relevant for every field. "length" is relevant only for fields of type "string". "uploadfield" and "authorize" are relevant only for fields of type "upload". "ondelete" is relevant only for fields of type "reference" and "upload".

  • length sets the maximum length of a "string", "password" or "upload" field. If length is not specified a default value is used but the default value is not guaranteed to be backward compatible. To avoid unwanted migrations on upgrades, we recommend that you always specify the length for string, password and upload fields.
  • default sets the default value for the field. The default value is used when performing an insert if a value is not explicitly specified. It is also used to pre-populate forms built from the table using SQLFORM.
  • required tells the DAL that no insert should be allowed on this table if a value for this field is not explicitly specified.
  • requires is a validator or a list of validators. This is not used by the DAL, but it is used by SQLFORM. The default validators for the given types are shown in the following table:
field typedefault field validators
stringIS_LENGTH(length)
textNone
blobNone
booleanNone
integerIS_INT_IN_RANGE(-1e100, 1e100)
doubleIS_FLOAT_IN_RANGE(-1e100, 1e100)
decimal(n,m)IS_DECIMAL_IN_RANGE(-1e100, 1e100)
dateIS_DATE()
timeIS_TIME()
datetimeIS_DATETIME()
passwordNone
uploadNone
reference <table> IS_IN_DB(db,'<table>.id')
list:stringNone
list:integerNone
list:reference <table> IS_IN_DB(db,'<table>.id',multiple=True)
Decimal requires and returns values as Decimal objects, as defined in the Python decimal module. SQLite does not handle the decimal type so internally we treat it as a double. The (n,m) and the number of digits before and after the decimal point.

The list: fields are special because they are designed to take advantage of certain denormalization features on NoSQL (in the case of Google App Engine, the field types ListProperty and StringListProperty) and back-port them all the other supported relational databases. On relational databases lists are stored as a text field. The items are separated by a | and each | in string item is escaped as a ||. They are discussed in their own section.

Notice that requires=... is enforced at the level of forms, required=True is enforced at the level of the DAL (insert), while notnull, unique and ondelete are enforced at the level of the database. While they sometimes may seem redundant, it is important to maintain the distinction when programming with the DAL.
  • ondelete translates into the "ON DELETE" SQL statement. By default "CASCADE" tells the database that when it deletes a record, it should also delete all records that refer to it.
  • notnull=True translates into the "NOT NULL" SQL statement. It prevents the database from inserting null values for the field.
  • unique=True translates into the "UNIQUE" SQL statement and it makes sure that values of this field are unique within the table. It is enforced at the database level.
  • uploadfield applies only to fields of type "upload". A field of type "upload" stores the name of a file saved somewhere else, by default on the filesystem under the application "uploads/" folder. If uploadfield is set, then the file is stored in a blob field within the same table and the value of uploadfield is the name of the blob field. This will be discussed in more detail later in the context of SQLFORM.
  • uploadfolder defaults to the application's "uploads/" folder. If set to a different path, files will uploaded to a different folder. For example, uploadfolder=os.path.join(request.folder,'static/temp') will upload files to the web2py/applications/myapp/static/temp folder.
  • uploadseparate if set to True will upload files under different subfolders of the uploadfolder folder. This is optimized to avoid too many files under the same folder/subfolder. ATTENTION: You cannot change the value of uploadseparate from True to False without breaking the system. web2py either uses the separate subfolders or it does not. Changing the behavior after files have been uploaded will prevent web2py from being able to retrieve those files. If this happens it is possible to move files and fix the problem but this is not described here.
  • widget must be one of the available widget objects, including custom widgets, for example: SQLFORM.widgets.string.widget. A list of available widgets will be discussed later. Each field type has a default widget.
  • label is a string (or something that can be serialized to a string) that contains the label to be used for this field in autogenerated forms.
  • comment is a string (or something that can be serialized to a string) that contains a comment associated with this field, and will be displayed to the right of the input field in the autogenerated forms.
  • writable if a field is writable, it can be edited in autogenerated create and update forms.
  • readable if a field is readable, it will be visible in readonly forms. If a field is neither readable nor writable, it will not be displayed in create and update forms.
  • update contains the default value for this field when the record is updated.
  • compute is an optional function. If a record is inserted and there is no value for a field that has a compute attribute, the value is calculated by passing the record (as a dict) to the compute function.
  • authorize can be used to require access control on the corresponding field, for "upload" fields only. It will be discussed more in detail in the context of Authentication and Authorization.
  • autodelete determines if the corresponding uploaded file should be deleted when the record referencing the file is deleted. For "upload" fields only.
  • represent can be None or can point to a function that takes a field value and returns an alternate representation for the field value. Examples:
1.
2.
3.
4.
db.mytable.name.represent = lambda name: name.capitalize()
db.mytable.other_id.represent = lambda id: db.other(id).myfield
db.mytable.some_uploadfield.represent = lambda value: \
A('get it', _href=URL('download', args=value))

blob
"blob" fields are also special. By default, binary data is encoded in base64 before being stored into the actual database field, and it is decoded when extracted. This has the negative effect of using 25% more storage space than necessary in blob fields, but has two advantages. On average it reduces the amount of data communicated between web2py and the database server, and it makes the communication independent of back-end-specific escaping conventions.

You can query the database for existing tables:

tables
1.
2.
>>> print db.tables
['person']

You can also query a table for existing fields:

fields
1.
2.
>>> print db.person.fields
['id', 'name']

Do not declare a field called "id", because one is created by web2py anyway. Every table has a field called "id" by default. It is an auto-increment integer field (starting at 1) used for cross-reference and for making every record unique, so "id" is a primary key. (Note: the id's starting at 1 is back-end specific. For example, this does not apply to the Google App Engine (GAE).)

named id field
Optionally you can define a field of type='id' and web2py will use this field as auto-increment id field. This is not recommended except when accessing legacy database tables. With some limitation, you can also use different primary keys and this is discussed in the section on "Legacy databases and keyed tables".

You can query for the type of a table:

Table
1.
2.
>>> print type(db.person)
<
class 'gluon.sql.Table'>

and you can access a table from the DAL connection using:

1.
2.
>>> print type(db['person'])
<
class 'gluon.sql.Table'>

Similarly you can access fields from their name in multiple equivalent ways:

1.
2.
3.
4.
5.
6.
>>> print type(db.person.name)
<
class 'gluon.sql.Field'>
>>>
print type(db.person['name'])
<
class 'gluon.sql.Field'>
>>>
print type(db['person']['name'])
<
class 'gluon.sql.Field'>

Given a field, you can access the attributes set in its definition:

1.
2.
3.
4.
5.
6.
7.
8.
>>> print db.person.name.type
string
>>> print db.person.name.unique
False
>>> print db.person.name.notnull
False
>>> print db.person.name.length
32

including its parent table, tablename, and parent connection:

1.
2.
3.
4.
5.
6.
>>> db.person.name._table == db.person
True
>>> db.person.name._tablename == 'person'
True
>>> db.person.name._db == db
True

Migrations

migrations

define_table checks whether or not the corresponding table exists. If it does not, it generates the SQL to create it and executes the SQL. If the table does exist but differs from the one being defined, it generates the SQL to alter the table and executes it. If a field has changed type but not name, it will try to convert the data(If you do not want this, you need to redefine the table twice, the first time, letting web2py drop the field by removing it, and the second time adding the newly defined field so that web2py can create it.). If the table exists and matches the current definition, it will leave it alone. In all cases it will create the db.person object that represents the table.

We refer to this behavior as a "migration". web2py logs all migrations and migration attempts in the file "databases/sql.log".

The first argument of define_table is always the table name. The other unnamed arguments are the fields (Field). The function also takes an optional last argument called "migrate" which must be referred to explicitly by name as in:

1.
>>> db.define_table('person', Field('name'), migrate='person.table')

The value of migrate is the filename (in the "databases" folder for the application) where web2py stores internal migration information for this table. These files are very important and should never be removed except when the entire database is dropped. In this case, the ".table" files have to be removed manually. By default, migrate is set to True. This causes web2py to generate the filename from a hash of the connection string. If migrate is set to False, the migration is not performed, and web2py assumes that the table exists in the datastore and it contains (at least) the fields listed in define_table. The best practice is to give an explicit name to the migrate table.

There may not be two tables in the same application with the same migrate filename.

Fixing Broken Migrations

fake_migrate

There are two common problems with migrations and there are ways to recover from them.

One problem is specific with SQLite. SQLite does not enforce column types and cannot drop columns. This means that if you have a column of type string and you remove it, it is not really removed. If you add the column again with a different type (for example datetime) you end up with a datetime column that contains strings (junk for practical purposes). web2py does not complain about this because does not know what is in the database, until it tries to retrieve records and it fails.

If web2py returns an error in the gluon.sql.parse function when selecting records, this is the problem: corrupted data in a column because of the above issue.

The solution consists in updating all records of the table and updating the values in the column in question with None.

The other problem is more generic but typical with MySQL. MySQL does not allow more than one ALTER TABLE in a transaction. This means that web2py must break complex transactions into smaller ones (one ALTER TABLE at the time) and commit one piece at the time. It is therefore possible that part of a complex transaction gets committed and one part fails leaving web2py in a corrupted state. Why would part of a transaction fail? Because, for example, it involves altering a table and converting a string column into a datetime column, web2py tries to convert the data, but the data cannot be converted. What happens to web2py? It gets confused about what exactly is the table structure actually stored in the database.

The solution consists of disabling migrations for all tables and enabling fake migrations:

1.
db.define_table(....,migrate=False,fake_migrate=True)

This will rebuild web2py metadata about the table according to the table definition. Try multiple table definitions to see which one works (the one before the failed migration and the one after the failed migration). Once successful remove the fake_migrate=True attribute.

Before attempting to fix migration problems it is prudent to make a copy of "applications/yourapp/databases/*.table" files.

insert

Given a table, you can insert records

insert
1.
2.
3.
4.
>>> db.person.insert(name="Alex")
1
>>> db.person.insert(name="Bob")
2

Insert returns the unique "id" value of each record inserted.

You can truncate the table, i.e., delete all records and reset the counter of the id.

truncate
1.
>>> db.person.truncate()

Now, if you insert a record again, the counter starts again at 1 (this is back-end specific and does not apply to GAE):

1.
2.
>>> db.person.insert(name="Alex")
1

bulk_insert
web2py also provides a /ft bulk_insert method
1.
2.
>>> db.person.bulk_insert([{'name':'Alex'}, {'name':'John'}, {'name':'Tim'}])
[
3,4,5]

It takes a list of dictionaries of fields to be inserted and performs multiple inserts at once. It returns the IDs of the inserted records. On the supported relational databases there is no advantage in using this function as opposed to looping and performing individual inserts but on Google App Engine, there is a major speed advantage.

commit and rollback

No create, drop, insert, truncate, delete, or update operation is actually committed until you issue the commit command

commit
1.
>>> db.commit()

To check it let's insert a new record:

1.
2.
>>> db.person.insert(name="Bob")
2

and roll back, i.e., ignore all operations since the last commit:

rollback
1.
>>> db.rollback()

If you now insert again, the counter will again be set to 2, since the previous insert was rolled back.

1.
2.
>>> db.person.insert(name="Bob")
2

Code in models, views and controllers is enclosed in web2py code that looks like this:

1.
2.
3.
4.
5.
6.
7.
8.
9.
try:
execute models, controller function and view
except:
rollback all connections
log the traceback
send a ticket to the visitor
else:
commit all connections
save cookies, sessions and return the page

There is no need to ever call commit or rollback explicitly in web2py unless one needs more granular control.

Raw SQL

executesql

The DAL allows you to explicitly issue SQL statements.

executesql
1.
2.
>>> print db.executesql('SELECT * FROM person;')
[(
1, u'Massimo'), (2, u'Massimo')]

In this case, the return values are not parsed or transformed by the DAL, and the format depends on the specific database driver. This usage with selects is normally not needed, but it is more common with indexes. executesql takes two optional arguments: placeholders and as_dict placeholders is an optional sequence of values to be substituted in or, if supported by the DB driver, a dictionary with keys matching named placeholders in your SQL.

If as_dict is set to True, and the results cursor returned by the DB driver will be converted to a sequence of dictionaries keyed with the db field names. Results returned with as_dict = True are the same as those returned when applying .to_list() to a normal select.

1.
[{field1: value1, field2: value2}, {field1: value1b, field2: value2b}]

_lastsql

Whether SQL was executed manually using executesql or was SQL generated by the DAL, you can always find the SQL code in db._lastsql. This is useful for debugging purposes:

_lastdb
1.
2.
3.
>>> rows = db().select(db.person.ALL)
>>>
print db._lastsql
SELECT person.id, person.name FROM person;

web2py never generates queries using the "*" operator. web2py is always explicit when selecting fields.

drop

Finally, you can drop tables and all data will be lost:

drop
1.
>>> db.person.drop()

Indexes

Currently the DAL API does not provide a command to create indexes on tables, but this can be done using the executesql command. This is because the existence of indexes can make migrations complex, and it is better to deal with them explicitly. Indexes may be needed for those fields that are used in recurrent queries.

Here is an example of how to create an index using SQL in SQLite:

1.
2.
3.
>>> db = DAL('sqlite://storage.db')
>>>
db.define_table('person', Field('name'))
>>>
db.executesql('CREATE INDEX IF NOT EXISTS myidx ON person name;)

Other database dialects have very similar syntaxes but may not support the optional "IF NOT EXISTS" directive.

Legacy Databases and Keyed Tables

web2py can connect to legacy databases under some conditions.

The easiest way is when these conditions are met:

  • Each table must have a unique auto-increment integer field called "id"
  • Records must be referenced exclusively using the "id" field.
When accessing an existing table, i.e., a table not created by web2py in the current application, always set migrate=False.
If the legacy table has a auto-increment integer field but it is not called "id", web2py can still access it but the table definition must contain explicitly as Field('....','id') where ... is the name of auto-increment integer field.

Finally if the legacy table uses a primary key that is not an auto-increment id field it is possible to use a "keyed table", for example:

1.
2.
3.
4.
5.
6.
db.define_table('account',
Field('accnum','integer'),
Field('acctype'),
Field('accdesc'),
primarykey=['accnum','acctype'],
migrate=False)

In this example the primarykey attribute is a list of field that constitute the primary key. At the time of writing, we cannot guarantee that the primarykey attribute works with every existing legacy table and every supported database backend. For simplicity, we recommend, if possible, creating a database view that has an auto-increment id field.

Distributed Transaction

distributed transactions
At the time of writing this feature is only supported by PostgreSQL, MySQL and Firebird, since they expose API for two-phase commits.
Assuming you have two (or more) connections to distinct PostgreSQL databases, for example:
1.
2.
db_a = DAL('postgres://...')
db_b = DAL('postgres://...')

In your models or controllers, you can commit them concurrently with:

1.
DAL.distributed_transaction_commit(db_a, db_b)

On failure, this function rolls back and raises an Exception.

In controllers, when one action returns, if you have two distinct connections and you do not call the above function, web2py commits them separately. This means there is a possibility that one of the commits succeeds and one fails. The distributed transaction prevents this from happening.

Manual Uploads

Consider the following model:

1.
db.define_table('myfile',Field('image','upload'))

Normally an insert is handled automatically via a SQLFORM or a crud form (which is a SQLFORM) but occasionally you already have the file on the filesystem and want to upload it programmatically. This can be done in this way:

1.
2.
stream = open(filename,'rb')
db.myfile.insert(image=db.myfile.image.store(stream,filename))

The store method of the upload field object takes a file stream and a filename. It uses the filename to determine the extension (type) of the file, creates a new temp name for the file (according to web2py upload mechanism) and loads the file content in this new temp file (under the uploads folder unless specified otherwise). It returns the new temp name that is to be stores into the image field of the db.myfile table.

Query, Set, Rows

Let's consider again the table defined (and dropped) previously and insert three records:

1.
2.
3.
4.
5.
6.
7.
>>> db.define_table('person', Field('name'))
>>>
db.person.insert(name="Alex")
1
>>> db.person.insert(name="Bob")
2
>>> db.person.insert(name="Carl")
3

You can store the table in a variable. For example, with variable person, you could do:

Table
1.
>>> person = db.person

You can also store a field in a variable such as name. For example, you could also do:

Field
1.
>>> name = person.name

You can even build a query (using operators like ==, !=, <, >, <=, >=, like, belongs) and store the query in a variable q such as in:

Query
1.
>>> q = name=='Alex'

When you call db with a query, you define a set of records. You can store it in a variable s and write:

Set
1.
>>> s = db(q)

Notice that no database query has been performed so far. DAL + Query simply define a set of records in this db that match the query. web2py determines from the query which table (or tables) are involved and, in fact, there is no need to specify that.

select

Given a Set, s, you can fetch the records with the command select:

Rows
select
1.
>>> rows = s.select()

Row
It returns an iterable object of class gluon.sql.Rows whose elements are Row objects. gluon.sql.Row objects act like dictionaries, but their elements can also be accessed as attributes, like gluon.storage.Storage.The former differ from the latter because its values are readonly.

The Rows object allows looping over the result of the select and printing the selected field values for each row:

1.
2.
3.
>>> for row in rows:
print row.id, row.name
1 Alex

You can do all the steps in one statement:

1.
2.
3.
>>> for row in db(db.person.name=='Alex').select():
print row.name
Alex

ALL

The select command can take arguments. All unnamed arguments are interpreted as the names of the fields that you want to fetch. For example, you can be explicit on fetching field "id" and field "name":

1.
2.
3.
4.
5.
>>> for row in db().select(db.person.id, db.person.name):
print row.name
Alex
Bob
Carl

The table attribute ALL allows you to specify all fields:

1.
2.
3.
4.
5.
>>> for row in db().select(db.person.ALL):
print row.name
Alex
Bob
Carl

Notice that there is no query string passed to db. web2py understands that if you want all fields of the table person without additional information then you want all records of the table person.

An equivalent alternative syntax is the following:

1.
2.
3.
4.
5.
>>> for row in db(db.person.id > 0).select():
print row.name
Alex
Bob
Carl

and web2py understands that if you ask for all records of the table person (id > 0) without additional information, then you want all the fields of table person.

Shortcuts

DAL shortcuts

The DAL supports various code-simplifying shortcuts. In particular:

1.
myrecord = db.mytable[id]

returns the record with the given id if it exists. If the id does not exist, it returns None. The above statement is equivalent to

1.
myrecord = db(db.mytable.id==id).select().first()

You can delete records by id:

1.
del db.mytable[id]

and this is equivalent to

1.
db(db.mytable.id==id).delete()

and deletes the record with the given id, if it exists.

You can insert records:

1.
db.mytable[0] = dict(myfield='somevalue')

It is equivalent to

1.
db.mytable.insert(myfield='somevalue')

and it creates a new record with field values specified by the dictionary on the right hand side.

You can update records:

1.
db.mytable[id] = dict(myfield='somevalue')

which is equivalent to

1.
db(db.mytable.id==id).update(myfield='somevalue')

and it updates an existing record with field values specified by the dictionary on the right hand side.

Fetching a Row

Yet another convenient syntax is the following:

1.
2.
3.
record = db.mytable(id)
record = db.mytable(db.mytable.id==id)
record = db.mytable(id,myfield='somevalue')

Apparently similar to db.mytable[id] the above syntax is more flexible and safer. First of all it checks whether id is an int (or str(id) is an int) and returns None if not (it never raises an exception). It also allows to specify multiple conditions that the record must meet. If they are not met, it also returns None.

Recursive selects

recursive selects

Consider the previous table person and a new table "dog" referencing a "person":

1.
>>> db.define_table('dog', Field('name'), Field('owner',db.person))

and a simple select from this table:

1.
>>> dogs = db(db.dog.id>0).select()

For each Row of dogs it is possible to fetch not just fields from the selected table (dog) but also from linked tables (recursively):

1.
>>> for dog in dogs: print dog.info, dog.owner.name

Here dog.owner.name requires one database select for each dog in dogs and it is therefore inefficient. We suggest using joins whenever possible instead of recursive selects, nevertheless this is convenient and practical when accessing individual records.

You can also do it backwards, by selecting the dogs referenced by a person:

1.
2.
3.
person =  db.person(id)
for dog in person.dog.select(orderby=db.dog.name):
print person.name, 'owns', dog.name

In this last expressions person.dog is a shortcut for

1.
db(db.dog.owner==person.id)

i.e. the Set of dogs referenced by the current person. This syntax breaks down if the referencing table has multiple referenced to the referenced table. In this case one needs to be more explicit and use a full Query.

Serializing Rows in Views

SQLTABLE
The result of a select can be displayed in a view with the following syntax:
1.
2.
3.
{{extend 'layout.html'}}
<h1>Records</h2>
{{=db().select(db.person.ALL)}}

and it is automatically converted into an HTML table with a header containing the column names and one row per record. The rows are marked as alternating class "even" and class "odd". Under the hood, the Rows is first converted into a SQLTABLE object (not to be confused with Table) and then serialized. The values extracted from the database are also formatted by the validators associated to the field and then escaped. (Note: Using a db in this way in a view is usually not considered good MVC practice.)

Yet it is possible and sometimes convenient to call SQLTABLE explicitly.

The SQLTABLE constructor takes the following optional arguments:

  • linkto the URL or an action to be used to link reference fields (default to None)
  • upload the URL or the download action to allow downloading of uploaded files (default to None)
  • headers a dictionary mapping field names to their labels to be used as headers (default to {}). It can also be an instruction. Currently we support headers='fieldname:capitalize'.
  • truncate the number of characters for truncating long values in the table (default is 16)
  • columns the list of fieldnames to be shown as columns. Those not listed are not displayed. (defaults to all).
  • **attributes generic helper attributes to be passed to the most external TABLE object.

Here is an example:

1.
2.
3.
4.
5.
6.
7.
{{extend 'layout.html'}}
<h1>Records</h2>
{{=SQLTABLE(db().select(db.person.ALL),
headers='fieldname:capitalize',
truncate=100,
upload=URL('download'))
}}

orderby, groupby, limitby, distinct

The select command takes five optional arguments: orderby, groupby, limitby, left and cache. Here we discuss the first three.

You can fetch the records sorted by name:

orderby
1.
2.
3.
4.
5.
>>> for row in db().select(db.person.ALL, orderby=db.person.name):
print row.name
Alex
Bob
Carl

You can fetch the records sorted by name in reverse order (notice the tilde):

1.
2.
3.
4.
5.
>>> for row in db().select(db.person.ALL, orderby=db.person.name):
print row.name
Carl
Bob
Alex

And you can sort the records according to multiple fields by concatenating them with a "|":

1.
2.
3.
4.
5.
>>> for row in db().select(db.person.ALL, orderby=db.person.name|db.person.id):
print row.name
Carl
Bob
Alex

Using groupby together with orderby, you can group records with the same value for the specified field (this is backend specific, and is not on the GAE):

1.
2.
3.
4.
5.
6.
>>> for row in db().select(db.person.ALL, orderby=db.person.name,
groupby=db.person.name):
print row.name
Alex
Bob
Carl

distinct

With the argument distinct=True, you can specify that you only want to select distinct records. This has the same effect as grouping using all specified fields except that it does not require sorting. When using distinct it is important not to select ALL fields, and in particular not to select the "id" field, else all records will always be distinct.

Here is an example:

1.
2.
3.
4.
5.
>>> for row in db().select(db.person.name, distinct=True):
print row.name
Alex
Bob
Carl

With limitby, you can select a subset of the records (in this case, the first two starting at zero):

limitby
1.
2.
3.
4.
>>> for row in db().select(db.person.ALL, limitby=(0, 2)):
print row.name
Alex
Bob

Currently, "limitby" is only partially supported on MSSQL since the Microsoft database does not provide a mechanism to fetch a subset of records not starting at 0.

Logical Operators

Queries can be combined using the binary AND operator "&":

and
or
not
1.
2.
3.
>>> rows = db((db.person.name=='Alex') & (db.person.id>3)).select()
>>>
for row in rows: print row.id, row.name
4 Alex

and the binary OR operator "|":

1.
2.
3.
>>> rows = db((db.person.name=='Alex') | (db.person.id>3)).select()
>>>
for row in rows: print row.id, row.name
1 Alex

You can negate a query (or sub-query) with the != binary operator:

1.
2.
3.
4.
>>> rows = db((db.person.name!='Alex') | (db.person.id>3)).select()
>>>
for row in rows: print row.id, row.name
2 Bob
3 Carl

or by explicit negation with the | unary operator:

1.
2.
3.
4.
>>> rows = db((db.person.name=='Alex') | (db.person.id>3)).select()
>>>
for row in rows: print row.id, row.name
2 Bob
3 Carl

Due to Python restrictions in overloading "AND" and "OR" operators, these cannot be used in forming queries. The binary operators must be used instead.

count, delete, update

You can count records in a set:

count
1.
2.
>>> print db(db.person.id > 0).count()
3

You can delete records in a set:

delete
1.
>>> db(db.person.id > 3).delete()

And you can update all records in a set by passing named arguments corresponding to the fields that need to be updated:

update
1.
>>> db(db.person.id > 3).update(name='Ken')

Expressions

The value assigned an update statement can be an expression. For example consider this model

1.
2.
3.
4.
5.
>>> db.define_table('person',
Field('name'),
Field('visits', 'integer', default=0))
>>>
db(db.person.name == 'Massimo').update(
visits = db.person.visits + 1)

The values used in queries can also be expressions

1.
2.
3.
4.
5.
>>> db.define_table('person',
Field('name'),
Field('visits', 'integer', default=0),
Field('clicks', 'integer', default=0))
>>>
db(db.person.visits == db.person.clicks + 1).delete()

update_record

update_record
web2py also allows updating a single record that is already in memory using update_record

1.
2.
3.
>>> rows = db(db.person.id > 2).select()
>>>
row = rows[0]
>>>
row.update_record(name='Curt')

This should not be confused with

1.
>>> row.update(name='Curt')

because for a single row, the method update updates the row object but not the database record, as in the case of update_record.

first and last

first
last

Given a Rows object containing records:

1.
2.
3.
>>> rows = db(query).select()
>>>
first_row = rows.first()
>>>
last_row = rows.last()

are equivalent to

1.
2.
>>> first_row = rows[0] if len(rows)>0 else None
>>> last_row = rows[-1] if len(rows)>0 else None

as_dict and as_list

as_list
as_dict

A Row object can be serialized into a regular dictionary using the as_dict() method and a Rows object can be serialized into a list of dictionaries using the as_list() method. Here are some examples:

1.
2.
3.
>>> rows = db(query).select()
>>>
rows_list = rows.as_list()
>>>
first_row_dict = rows.first().as_dict()

These methods are convenient for passing Rows to generic views and or to store Rows in sessions (since Rows objects themselves cannot be serialized since contain a reference to an open DB connection):

1.
2.
3.
>>> rows = db(query).select()
>>>
session.rows = rows # not allowed!
>>> session.rows = rows.as_list() # allowed!

find, exclude, sort

find
exclude
sort

There are times when one needs to perform two selects and one contains a subset of a previous select. In this case it is pointless to access the database again. The find, exclude and sort objects allow you to manipulate a Rows objects and generate another one without accessing the database. More specifically:

  • find returns a new set of Rows filtered by a condition and leaves the original unchanged.
  • exclude returns a new set of Rows filtered by a condition and removes them from the original Rows.
  • sort returns a new set of Rows sorted by a condition and leaves the original unchanged.

All these methods take a single argument, a function that acts on each individual row.

Here is an example of usage:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
>>> db.define_table('person',Field('name'))
>>>
db.insert(name='John')
>>>
db.insert(name='Max')
>>>
db.insert(name='Alex')
>>>
rows = db(db.person.id>0).select()
>>>
for row in rows.find(lambda row: row.name[0]=='M'): print row.name
Max
>>> print len(rows)
3
>>> for row in rows.extract(lambda row: row.name[0]=='M'): print row.name
Max
>>> print len(rows)
2
>>> for row in rows.sort(lambda row: row.name): print row.name
Alex
John

They can be combined:

1.
2.
3.
4.
5.
>>> rows = db(db.person.id>0).select()
>>>
rows = rows.find(lambda row: 'x' in row.name).sort(lambda row: row.name)
>>>
for row in rows: print row.name
Alex
Max

Computed Fields

compute

DAL fields may have a compute attribute. This must be a function (or lambda) that takes a Row object and returns a value for the field. When a new record is inserted (or updated), if a value for the field is not provided, web2py tries to compute from the other field values using the compute function. Here is an example:

1.
2.
3.
4.
5.
6.
7.
>>> db.define_table('item',
Field('unit_price','double'),
Field('quantity','integer'),
Field('total_price',compute=lambda r: r['unit_price']*r['quantity']))
>>>
r = db.item.insert(unit_price=1.99, quantity=5)
>>>
print r.total_price
9.95

Notice that the computed value is stored in the db and it is not computed on retrieval, as in the case of virtual fields, described later. Two typical applications of computed fields are:

  • in wiki applications, to store the processed input wiki text as html, to avoid re-processing on every request
  • for searching, to compute normalized values for a field, to be used for searching.

Virtual Fields

virtualfields

Virtual fields are also computed fields (as in the previous subsection) but they differ from those because they are "virtual" in the sense that they are not stored in the db and they are computed every time records are extracted from the database. they can be used to simplify the user's code without using additional storage but they do cannot be used for searching.

In order to define one or more virtual field, you have to define a container class, instantiate it and link it to a table or to a select. For example, consider the following table:

1.
2.
3.
>>> db.define_table('item',
Field('unit_price','double'),
Field('quantity','integer'),

One can define a virtual field total price as

1.
2.
3.
4.
>>> class MyVirtualFields:
def total_price(self):
return self.item.unit_price*self.item.quantity
>>> db.item.virtualfields.append(MyVirtualFields())

Notice that each method of the class that takes a single argument (self) is a new virtual fields. self refers to each one row of the select. Field values are referred by full path as in self.item.unit_price. The table is linked to the virtual fields by appending an instance of the class to the table's virtualfields attribute.

Virtual fields can also access recursive fields as in

1.
2.
3.
4.
5.
6.
7.
8.
9.
>>> db.define_table('item',
Field('unit_price','double'))
>>>
db.define_table('order_item',
Field('item',db.item),
Field('quantity','integer'))
>>>
class MyVirtualFields:
def total_price(self):
return self.order_item.item.unit_price*self.order_item.quantity
>>> db.order_item.virtualfields.append(MyVirtualFields())

Notice the recursive field access self.order_item.item.unit_price where self is the looping record.

They can also act on the result of a JOIN

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
>>> db.define_table('item',
Field('unit_price','double'))
>>>
db.define_table('order_item',
Field('item',db.item),
Field('quantity','integer'))
>>>
rows = db(db.order_item.item==db.item.id).select()
>>>
class MyVirtualFields:
def total_price(self):
return self.item.unit_price*self.order_item.quantity
>>> rows.setvirtualfields(order_item=MyVirtualFields())
>>>
for row in rows: print row.order_item.total_price

Notice how in this case the syntax is different. The virtual field accesses both self.item.unit_price and self.order_item.quantity which belong to the join select. The virtual field is attached to the rows of the table using the setvirtualfields method of the rows object. This method takes an arbitrary number of named arguments and can be used to set multiple virtual fields, defined in multiple classes, and attach them to multiple tables:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
>>> class MyVirtualFields1:
def discounted_unit_price(self):
return self.item.unit_price*0.90
>>> class MyVirtualFields2:
def total_price(self):
return self.item.unit_price*self.order_item.quantity
def discounted_total_price(self):
return self.item.discounted_unit_price*self.order_item.quantity
>>> rows.setvirtualfields(item=MyVirtualFields1(),order_item=MyVirtualFields2())
>>>
for row in rows: print row.order_item.discounted_total_price

Virtual fields can be lazy, all they need to do is return a function and accessed by calling the function:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
>>> db.define_table('item',
Field('unit_price','double'),
Field('quantity','integer'),
>>>
class MyVirtualFields:
def lazy_total_price(self):
def lazy(self=self):
return self.item.unit_price*self.item.quantity
return lazy
>>> db.item.virtualfields.append(MyVirtualFields())
>>>
for item in db(db.item.id>0).select(): print item.lazy_total_price()

or shorter using a lambda function:

1.
2.
3.
>>> class MyVirtualFields:
def lazy_total_price(self):
return lambda self=self: self.item.unit_price*self.item.quantity

One to Many Relation

one to many

To illustrate how to implement one to many relations with the web2py DAL, define another table "dog" that refers to the table "person" which we redefine here:

1.
2.
3.
4.
5.
6.
7.
>>> db.define_table('person',
Field('name'),
format='%(name)s')
>>>
db.define_table('dog',
Field('name'),
Field('owner', db.person),
format='%(name)s')

Table "dog" has two fields, the name of the dog and the owner of the dog. When a field type is another table, it is intended that the field reference the other table by its id. In fact, you can print the actual type value and get:

1.
2.
>>> print db.dog.owner.type
reference person

Now, insert three dogs, two owned by Alex and one by Bob:

1.
2.
3.
4.
5.
6.
>>> db.dog.insert(name='Skipper', owner=1)
1
>>> db.dog.insert(name='Snoopy', owner=1)
2
>>> db.dog.insert(name='Puppy', owner=2)
3

You can select as you did for any other table:

1.
2.
3.
4.
>>> for row in db(db.dog.owner==1).select():
print row.name
Skipper
Snoopy

Because a dog has a reference to a person, a person can have many dogs, so a record of table person now acquires a new attribute dog, which is a Set, that defines the dogs of that person. This allows looping over all persons and fetching their dogs easily:

referencing
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
>>> for person in db().select(db.person.ALL):
print person.name
for dog in person.dog.select():
print ' ', dog.name
Alex
Skipper
Snoopy
Bob
Puppy
Carl

Inner Joins

Another way to achieve a similar result is by using a join, specifically an INNER JOIN. web2py performs joins automatically and transparently when the query links two or more tables as in the following example:

Rows
inner join
join
1.
2.
3.
4.
5.
6.
>>> rows = db(db.person.id==db.dog.owner).select()
>>>
for row in rows:
print row.person.name, 'has', row.dog.name
Alex has Skipper
Alex has Snoopy
Bob has Puppy

Observe that web2py did a join, so the rows now contain two records, one from each table, linked together. Because the two records may have fields with conflicting names, you need to specify the table when extracting a field value from a row. This means that while before you could do:

1.
row.name

and it was obvious whether this was the name of a person or a dog, in the result of a join you have to be more explicit and say:

1.
row.person.name

or:

1.
row.dog.name

Left Outer Join

Notice that Carl did not appear in the list above because he has no dogs. If you intend to select on persons (whether they have dogs or not) and their dogs (if they have any), then you need to perform a LEFT OUTER JOIN. This is done using the argument "left" of the select command. Here is an example:

Rows
left outer join
outer join
1.
2.
3.
4.
5.
6.
7.
>>> rows=db().select(db.person.ALL, db.dog.ALL, left=db.dog.on(db.person.id==db.dog.owner))
>>>
for row in rows:
print row.person.name, 'has', row.dog.name
Alex has Skipper
Alex has Snoopy
Bob has Puppy
Carl has None

where:

1.
left = db.dog.on(...)

does the left join query. Here the argument of db.dog.on is the condition required for the join (the same used above for the inner join). In the case of a left join, it is necessary to be explicit about which fields to select.

Grouping and Counting

When doing joins, sometimes you want to group rows according to certain criteria and count them. For example, count the number of dogs owned by every person. web2py allows this as well. First, you need a count operator. Second, you want to join the person table with the dog table by owner. Third, you want to select all rows (person + dog), group them by person, and count them while grouping:

grouping
1.
2.
3.
4.
5.
>>> count = db.person.id.count()
>>>
for row in db(db.person.id==db.dog.owner).select(db.person.name, count, groupby=db.person.id):
print row.person.name, row[count]
Alex 2
Bob 1

Notice the count operator (which is built-in) is used as a field. The only issue here is in how to retrieve the information. Each row clearly contains a person and the count, but the count is not a field of a person nor is it a table. So where does it go? It goes into the storage object representing the record with a key equal to the query expression itself.

Many to Many

many-to-many
In the previous examples, we allowed a dog to have one owner but one person could have many dogs. What if Skipper was owned by Alex and Curt? This requires a many-to-many relation, and it is realized via an intermediate table that links a person to a dog via an ownership relation.

Here is how to do it:

1.
2.
3.
4.
5.
6.
7.
>>> db.define_table('person',
Field('name'))
>>>
db.define_table('dog',
Field('name'))
>>>
db.define_table('ownership',
Field('person', db.person),
Field('dog', db.dog))

the existing ownership relationship can now be rewritten as:

1.
2.
3.
>>> db.ownership.insert(person=1, dog=1) # Alex owns Skipper
>>> db.ownership.insert(person=1, dog=2) # Alex owns Snoopy
>>> db.ownership.insert(person=2, dog=3) # Bob owns Puppy

Now you can add the new relation that Curt co-owns Skipper:

1.
>>> db.ownership.insert(person=3, dog=1) # Curt owns Skipper too

Because you now have a three-way relation between tables, it may be convenient to define a new set on which to perform operations:

1.
>>> persons_and_dogs = db((db.person.id==db.ownership.person) | (db.dog.id==db.ownership.dog))

Now it is easy to select all persons and their dogs from the new Set:

1.
2.
3.
4.
5.
6.
>>> for row in persons_and_dogs.select():
print row.person.name, row.dog.name
Alex Skipper
Alex Snoopy
Bob Puppy
Curt Skipper

Similarly, you can search for all dogs owned by Alex:

1.
2.
3.
4.
>>> for row in persons_and_dogs(db.person.name=='Alex').select():
print row.dog.name
Skipper
Snoopy

and all owners of Skipper:

1.
2.
3.
4.
>>> for row in persons_and_dogs(db.dog.name=='Skipper').select():
print row.owner.name
Alex
Curt

A lighter alternative to Many 2 Many relations is a tagging. Tagging is discussed in the context of the IS_IN_DB validator. Tagging works even on database backends that does not support JOINs like the Google App Engine.

Many to Many, list:<type>, and contains

list:string
list:integer
list:reference
contains
multiple
tags

web2py provides the following special field types:

1.
2.
3.
list:string
list:integer
list:reference <table>

They can contain lists of strings, of integers and of references respectively.

On Google App Engine list:string is mapped into StringListProperty, the other two are mapped into ListProperty(int). On relational databases they all mapped into text fields which contain the list of items separated by |. For example [1,2,3] is mapped into |1|2|3|.

For lists of string the items are escaped so that any | in the item is replaced by a ||. Anyway this is an internal representation and it is transparent to the user.

You can use list:string, for example, in the following way:

1.
2.
3.
4.
5.
6.
7.
8.
>>> db.define_table('product',
Field('name'),
Field('colors','list:string'))
>>>
db.product.colors.requires=IS_IN_SET(('red','blue','green'))
>>>
db.product.insert(name='Toy Car',colors=['red','green'])
>>>
products = db(db.product.colors.contains('red')).select()
>>>
for item in products: print item.name, item.colors
Toy Car ['red', 'green']

list:integer works in the same way but the items must be integers.

As usual the requirements are enforced at the level of forms, not at the level of insert.

For list:<type> fields the contains(value) operator maps into a non trivial query that checks for lists containing the value. The contains operator also works for regular string and text fields and it maps into a LIKE '%value%'.
The list:reference and the contains(value) operator are particularly useful to de-normalize many-to-many relations. Here is an example:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
>>> db.define_table('tag',Field('name'),format='%(name)s')
>>>
db.define_table('product',
Field('name'),
Field('tags','list:reference tag'))
>>>
a = db.tag.insert(name='red')
>>>
b = db.tag.insert(name='green')
>>>
c = db.tag.insert(name='blue')
>>>
db.product.insert(name='Toy Car',tags=[a, b, c])
>>>
products = db(db.product.tags.contains(b)).select()
>>>
for item in products: print item.name, item.tags
Toy Car [1, 2, 3]
>>>
for item in products:
print item.name, db.product.tags.represent(item.tags)
Toy Car red, green, blue

Notice that a list:reference tag field get a default constraint

1.
requires = IS_IN_DB(db,'tag.id',db.tag._format,multiple=True)

that produces a SELECT/OPTION multiple drop-box in forms.

Also notice that this field gets a default represent attribute which represents the list of references as a comma-separated list of formatted references. This is used in read forms and SQLTABLEs.

While list:reference has a default validator and a default representation, list:integer and list:string do not. So these two need a IS_IN_SET or a IS_IN_DB validator if you want to use them in forms.

Other Operators

web2py has other operators that provide an API to access equivalent SQL operators. Let's define another table "log" to store security events, their event_time and severity, where the severity is an integer number.

date
datetime
time
1.
2.
3.
>>> db.define_table('log', Field('event'),
Field('event_time', 'datetime'),
Field('severity', 'integer'))

As before, insert a few events, a "port scan", an "xss injection" and an "unauthorized login". For the sake of the example, you can log events with the same event_time but with different severities (1, 2, 3 respectively).

1.
2.
3.
4.
5.
6.
7.
8.
>>> import datetime
>>> now = datetime.datetime.now()
>>>
print db.log.insert(event='port scan', event_time=now, severity=1)
1
>>> print db.log.insert(event='xss injection', event_time=now, severity=2)
2
>>> print db.log.insert(event='unauthorized login', event_time=now, severity=3)
3

like, startswith, contains, upper, lower

like
startswith
contains
upper
lower

Fields have a like operator that you can use to match strings:

1.
2.
3.
>>> for row in db(db.log.event.like('port%')).select():
print row.event
port scan

Here "port%" indicates a string starting with "port". The percent sign character, "%", is a wild-card character that means "any sequence of characters".

web2py also provides some shortcuts:

1.
2.
db.mytable.myfield.startswith('value')
db.mytable.myfield.contains('value')

which are equivalent respectively to

1.
2.
db.mytable.myfield.like('value%')
db.mytable.myfield.like('%value%')

Notice that contains has a special meaning for list:<type> fields and it was discussed in a previous section.

Similarly, you can use upper and lower methods to convert the value of the field to upper or lower case, and you can also combine them with the like operator.

upper
lower
1.
2.
3.
>>> for row in db(db.log.event.upper().like('PORT%')).select():
print row.event
port scan

year, month, day, hour, minutes, seconds

hour
minutes
seconds
day
month
year

The date and datetime fields have day, month and year methods. The datetime and time fields have hour, minutes and seconds methods. Here is an example:

1.
2.
3.
4.
5.
>>> for row in db(db.log.event_time.year()==2009).select():
print row.event
port scan
xss injection
unauthorized login

belongs

The SQL IN operator is realized via the belongs method which returns true when the field value belongs to the specified set (list of tuples):

belongs
1.
2.
3.
4.
>>> for row in db(db.log.severity.belongs((1, 2))).select():
print row.event
port scan
xss injection

The DAL also allows a nested select as the argument of the belongs operator. The only caveat is that the nested select has to be a _select, not a select, and only one field has to be selected explicitly, the one that defines the set.

nested select
1.
2.
3.
4.
5.
6.
>>> bad_days = db(db.log.severity==3)._select(db.log.event_time)
>>>
for row in db(db.log.event_time.belongs(bad_days)).select():
print row.event
port scan
xss injection
unauthorized login

sum
Previously, you have used the count operator to count records. Similarly, you can use the sum operator to add (sum) the values of a specific field from a group of records. As in the case of count, the result of a sum is retrieved via the store object:
1.
2.
3.
>>> sum = db.log.severity.sum()
>>>
print db().select(sum).first()[sum]
6

Generating raw SQL

raw SQL

Sometimes you need to generate the SQL but not execute it. This is easy to do with web2py since every command that performs database IO has an equivalent command that does not, and simply returns the SQL that would have been executed. These commands have the same names and syntax as the functional ones, but they start with an underscore:

Here is _insert

_insert
1.
2.
>>> print db.person._insert(name='Alex')
INSERT INTO person(name) VALUES ('Alex');

Here is _count

_count
1.
2.
>>> print db(db.person.name=='Alex')._count()
SELECT count(*) FROM person WHERE person.name='Alex';

Here is _select

_select
1.
2.
>>> print db(db.person.name=='Alex')._select()
SELECT person.id, person.name FROM person WHERE person.name='Alex';

Here is _delete

_delete
1.
2.
>>> print db(db.person.name=='Alex')._delete()
DELETE FROM person WHERE person.name='Alex';

And finally, here is _update

_update
1.
2.
>>> print db(db.person.name=='Alex')._update()
UPDATE person SET WHERE person.name='Alex';

Moreover you can always use db._lastsql to return the most recent SQL code, whether it was executed manually using executesql or was SQL generated by the DAL.

Exporting and Importing Data

export
import

CSV (one table at a time)

When a DALRows object is converted to a string it is automatically serialized in CSV:

csv
1.
2.
3.
4.
5.
6.
>>> rows = db(db.person.id==db.dog.owner).select()
>>>
print rows
person.id,person.name,dog.id,dog.name,dog.owner
1,Alex,1,Skipper,1
1
,Alex,2,Snoopy,1
2
,Bob,3,Puppy,2

You can serialize a single table in CSV and store it in a file "test.csv":

1.
>>> open('test.csv', 'w').write(str(db(db.person.id).select()))

and you can easily read it back with:

1.
>>> db.person.import_from_csv_file(open('test.csv', 'r'))

When importing, web2py looks for the field names in the CSV header. In this example, it finds two columns: "person.id" and "person.name". It ignores the "person." prefix, and it ignores the "id" fields. Then all records are appended and assigned new ids. Both of these operations can be performed via the appadmin web interface.

CSV (all tables at once)

In web2py, you can backup/restore an entire database with two commands:

To export:

1.
>>> db.export_to_csv_file(open('somefile.csv', 'wb'))

To import:

1.
>>> db.import_from_csv_file(open('somefile.csv', 'rb'))

This mechanism can be used even if the importing database is of a different type than the exporting database. The data is stored in "somefile.csv" as a CSV file where each table starts with one line that indicates the tablename, and another line with the fieldnames:

1.
2.
TABLE tablename
field1, field2, field3, ...

Two tables are separated \r\n\r\n. The file ends with the line

1.
END

The file does not include uploaded files if these are not stored in the database. In any case it is easy enough to zip the "uploads" folder separately.

When importing, the new records will be appended to the database if it is not empty. In general the new imported records will not have the same record id as the original (saved) records but web2py will restore references so they are not broken, even if the id values may change.

If a table contains a field called "uuid", this field will be used to identify duplicates. Also, if an imported record has the same "uuid" as an existing record, the previous record will be updated.

CSV and Remote Database Synchronization

Consider the following model:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
db = DAL('sqlite:memory:')
db.define_table('person',
Field('name'),
format='%(name)s')
db.define_table('dog',
Field('owner', db.person),
Field('name'),
format='%(name)s')

if not db(db.person.id>0).count():
id = db.person.insert(name="Massimo")
db.dog.insert(owner=id, name="Snoopy")

Each record is identified by an ID and referenced by that ID. If you have two copies of the database used by distinct web2py installations, the ID is unique only within each database and not across the databases. This is a problem when merging records from different databases.

In order to make a record uniquely identifiable across databases, they must:

  • have a unique id (UUID),
  • have a event_time (to figure out which one is more recent if multiple copies),
  • reference the UUID instead of the id.

This can be achieved without modifying web2py. Here is what to do:

1. Change the above model into:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
db.define_table('person',
Field('uuid', length=64, default=uuid.uuid4()),
Field('modified_on', 'datetime', default=now),
Field('name'),
format='%(name)s')

db.define_table('dog',
Field('uuid', length=64, default=uuid.uuid4()),
Field('modified_on', 'datetime', default=now),
Field('owner', length=64),
Field('name'),
format='%(name)s')

db.dog.owner.requires = IS_IN_DB(db,'person.uuid','%(name)s')

if not db(db.person.id).count():
id = uuid.uuid4()
db.person.insert(name="Massimo", uuid=id)
db.dog.insert(owner=id, name="Snoopy")

2. Create a controller action to export the database:

1.
2.
3.
4.
5.
def export():
s = StringIO.StringIO()
db.export_to_csv_file(s)
response.headers['Content-Type'] = 'text/csv'
return s.getvalue()

3. Create a controller action to import a saved copy of the other database and sync records:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
def import_and_sync():
form = FORM(INPUT(_type='file', _name='data'), INPUT(_type='submit'))
if form.accepts(request.vars):
db.import_from_csv_file(form.vars.data.file,unique=False)
# for every table
for table in db.tables:
# for every uuid, delete all but the latest
items = db(db[table].id>0).select(db[table].id,
db[table].uuid,
orderby=db[table].modified_on,
groupby=db[table].uuid)
for item in items:
db((db[table].uuid==item.uuid)&\
(db[table].id!=item.id)).delete()
return dict(form=form)

4. Create an index manually to make the search by uuid faster.

Notice that steps 2 and 3 work for every database model; they are not specific for this example.

Alternatively, you can use XML-RPC to export/import the file.

If the records reference uploaded files, you also need to export/import the content of the uploads folder. Notice that files therein are already labeled by UUIDs so you do not need to worry about naming conflicts and references.

HTML/XML (one table at a time)

DALRows objects also have an xml method (like helpers) that serializes it to XML/HTML:

HTML
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
>>> rows = db(db.person.id > 0).select()
>>>
print rows.xml()
<
table>
<
thead>
<
tr>
<
th>person.id</th>
<
th>person.name</th>
<
th>dog.id</th>
<
th>dog.name</th>
<
th>dog.owner</th>
</
tr>
</
thead>
<
tbody>
<
tr class="even">
<
td>1</td>
<
td>Alex</td>
<
td>1</td>
<
td>Skipper</td>
<
td>1</td>
</
tr>
...
</
tbody>
</
table>

If you need to serialize the DALRows in any other XML format with custom tags, you can easily do that using the universal TAG helper and the * notation:

XML

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
>>> rows = db(db.person.id > 0).select()
>>>
print TAG.result(*[TAG.row(*[TAG.field(r[f], _name=f) \
for f in db.person.fields]) for r in rows])
<
result>
<
row>
<
field name="id">1</field>
<
field name="name">Alex</field>
</
row>
...
</
result>

Data Representation

The export_to_csv_file function accepts a keyword argument named represent. When True it will use the columns represent function while exporting the data instead of the raw data.

The function also accepts a keyword argument named colnames that should contain a list of column names one wish to export. It defaults to all columns.

Both export_to_csv_file and import_from_csv_file accept keyword arguments that tell the csv parser the format to save/load the files:

  • delimiter: delimiter to separate values (default ',')
  • quotechar: character to use to quote string values (default to double quotes)
  • quoting: quote system (default csv.QUOTE_MINIMAL)

Here is some example usage:

1.
2.
3.
4.
5.
6.
>>> import csv
>>> db.export_to_csv_file(open('/tmp/test.txt', 'w'),
delimiter='|',
quotechar='"',
quoting=csv.QOUTE_NONNUMERIC)
quoting=csv.QUOTE_NONNUMERIC)

Which would render something similar to

1.
"hello"|35|"this is the text description"|"2009-03-03"

For more information consult the official Python documentation 64

Caching Selects

The select method also takes a cache argument, which defaults to None. For caching purposes, it should be set to a tuple where the first element is the cache model (cache.ram, cache.disk, etc.), and the second element is the expiration time in seconds.

In the following example, you see a controller that caches a select on the previously defined db.log table. The actual select fetches data from the back-end database no more frequently than once every 60 seconds and stores the result in cache.ram. If the next call to this controller occurs in less than 60 seconds since the last database IO, it simply fetches the previous data from cache.ram.

cache select
1.
2.
3.
def cache_db_select():
logs = db().select(db.log.ALL, cache=(cache.ram, 60))
return dict(logs=logs)

The results of a select are complex, un-pickleable objects; they cannot be stored in a session and cannot be cached in any other way than the one explained here.

Self-Reference and Aliases

It is possible to define tables with fields that refer to themselves although the usual notation may fail. The following code would be wrong because it uses a variable db.person before it is defined:

1.
2.
3.
4.
db.define_table('person',
Field('name'),
Field('father_id', db.person),
Field('mother_id', db.person))

The solution consists of using an alternate notation

1.
2.
3.
4.
db.define_table('person',
Field('name'),
Field('father_id', 'reference person'),
Field('mother_id', 'reference person'))

In fact db.tablename and "reference tablename" are equivalent field types.

If the table refers to itself, then it is not possible to perform a JOIN to select a person and its parents without use of the SQL "AS" keyword. This is achieved in web2py using the with_alias. Here is an example:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
>>> Father = db.person.with_alias('father')
>>>
Mother = db.person.with_alias('mother')
>>>
db.person.insert(name='Massimo')
1
>>> db.person.insert(name='Claudia')
2
>>> db.person.insert(name='Marco', father_id=1, mother_id=2)
3
>>> rows = db().select(db.person.name, Father.name, Mother.name,
left=(Father.on(Father.id==db.person.father_id),
Mother.on(Mother.id==db.person.mother_id)))
>>>
for row in rows:
print row.person.name, row.father.name, row.mother.name
Massimo None None
Claudia None None
Marco Massimo Claudia

Notice that we have chosen to make a distinction between:

  • "father_id": the field name used in the table "person";
  • "father": the alias we want to use for the table referenced by the above field; this is communicated to the database;
  • "Father": the variable used by web2py to refer to that alias.

The difference is subtle, and there is nothing wrong in using the same name for the three of them:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
db.define_table('person',
Field('name'),
Field('father', 'reference person'),
Field('mother', 'reference person'))
>>>
father = db.person.with_alias('father')
>>>
mother = db.person.with_alias('mother')
>>>
db.person.insert(name='Massimo')
1
>>> db.person.insert(name='Claudia')
2
>>> db.person.insert(name='Marco', father=1, mother=2)
3
>>> rows = db().select(db.person.name, father.name, mother.name,
left=(father.on(father.id==db.person.father),
mother.on(mother.id==db.person.mother)))
>>>
for row in rows:
print row.person.name, row.father.name, row.mother.name
Massimo None None
Claudia None None
Marco Massimo Claudia

But it is important to have the distinction clear in order to build correct queries.

Table Inheritance

inheritance

It is possible to create a table that contains all the fields from another table. It is sufficient to pass the other table in place of a field to define_table. For example

1.
2.
db.define_table('person', Field('name'))
db.define_table('doctor', db.person, Field('specialization'))

It is also possible to define a dummy table that is not stored in a database in order to reuse it in multiple other places. For example:

1.
2.
3.
4.
5.
6.
7.
signature = db.Table(db, 'signature',
Field('created_on', 'datetime', default=request.now),
Field('created_by', db.auth_user, default=auth.user_id),
Field('updated_on', 'datetime', default=request.now),
Field('updated_by', db.auth_user, update=auth.user_id))

db.define_table('payment', signature, Field('amount', 'double'))

This example assumes that standard web2py authentication is enabled.

© 2008-2010 by Massimo Di Pierro - All rights reserved - Powered by web2py - design derived from a theme by the earlybird
The content of this book is released under the Artistic License 2.0 - Modified content cannot be reproduced.