Database Indexing Strategies for FarCry

Overview

This document is a guide for those wishing to index the tables used within the Farcry Content Management System, and will also serve as a basic primer in database indexing.

The purpose of indexing

Tables are indexed in a database for the same reason as there are indexes in the backs of books: to facilitate the fast lookup of the contents of the table. For this reason, they are most effective on tables with large numbers of rows. There is little point indexing a table with less than 100 rows.

Farcry applications currently support 3 different database platforms (MSSQL, Oracle, and MySQL). Optimal indexing may differ between these platforms, so I will address one at a time. My core competence is with MSSQL, so I will deal with that platform first, and then discuss the ways in which that platform differs from the others.

MSSQL Server

There are 2 types of indexes in MSSQL Server: Clustered and Non-clustered. We will look at the non-clustered variety first, because they cover most cases, and are simpler to understand.

Non-Clustered Indexes

Because of the normalised nature of the tables in most read/write databases (and the ones in Farcry), a lot of the queries used to get data out of them use joins to do so.

For example,

select * from refobjects r
inner join dmHTML h
on r.objectID = h.objectID

or

select * from refobjects r, dmHTML h
where r.objectID = h.objectID

Both queries do the same thing: the syntax is different, and MSSQL favours the first version for performance reasons, but that is another story.

The above queries have to join the 2 tables on all the rows where an objectID in refObjects matches an objectID in dmHTML. There might be 2000 rows in one table and 500 in the other, so without an index to aid it, the database has to go through both sets of objectids, and match them up, which as you might imagine is a fairly intensive process. If however, you indexed the joining columns (the 2 objectID cols), it would make it much easier on the db, and faster at the front end.

In the same way, when you have a WHERE clause that follows the format WHERE <columnname> = SOMETHING, an index can be mighty handy. Consider the following query:

select * from refobjects r
where typename = 'dmArchive'

To optimise the performance of the query, you would index the typename column. In both the above examples, the type of index you would use is non-clustered.

Non-clustered indexes are best for joining columns and where clauses that use = as the operator. Non-clustered indexes are pointers to where in the table the particular item can be found, just as the index in the back of a book will tell you the page on which a word or subject may be found.

Non-clustered indexes are most effective when used on columns with a reasonable degree of uniqueness of content. By this I mean that if you have a column where every row is unique (e.g. an objectID which is of the type UUID), then an index of this type will be very effective.

However, if your column has a low degree of uniqueness, then it won't help much (e.g. you have a column called type, and type can be 1,2,3 or 4, and there are 2000 rows in your table).

If you were to go through your database and create a non-clustered index on all such columns for all tables that have large numbers of rows in them, most of your indexing work would be done.

Clustered Indexes

There is another type of index, which is the clustered index. This type is used in ORDER BY clauses, and BETWEEN clauses and WHERE clauses where the operators are < or >.

If we go back to the book analogy, this type is more your table of contents: it deals with the order of things, ranges and the like. If you apply a clustered index to a table, it stores the data in that order. So for instance, consider the following table definition:

CREATE TABLE DMSEMINARS (
startDate datetime NULL ,
Cost varchar (255) NULL ,
Location varchar (255) NULL ,
Teaser ntext NULL ,
status varchar (255) NOT NULL ,
Body ntext NULL ,
Title varchar (255) NULL ,
endDate datetime NULL ,
closingDate datetime NULL ,
Participants varchar (255) NULL ,
Venue varchar (255) NULL ,
createdby nvarchar (512) NOT NULL ,
locked int NOT NULL ,
datetimelastupdated datetime NOT NULL ,
label nvarchar (512) NULL ,
datetimecreated datetime NOT NULL ,
ObjectID varchar (50) NOT NULL ,
lastupdatedby nvarchar (512) NOT NULL ,
lockedBy nvarchar (512) NULL ,
seminarEmail varchar (255) NULL ,
emailResponse ntext NULL ,
expiryDate datetime NULL ,
publishDate datetime NULL
)

Let's also say that the following is a typical query of this table:

Select * from dmSeminars
Order by startDate

Because the startDate column is often used to order the result, if you apply a clustered index to it, it will take away the need for the db engine to order the result, as the data is already ordered that way.

By the same token, if your typical query was

Select * from dmSeminars
Where startDate between '2001-01-01' and '2003-01-01'

The clustered index would once again be useful.

Only one clustered index can be applied to a table, as it physically reorders the
data. On the other hand, you can create as many non-clustered indexes as you like on a table, as these are pointers.

After having read all this, you may think it is a good idea to race out and create as many indexes as possible on every table in the db. Not so. For as useful as indexes are when the db engine is reading from the tables in the db, they get in the way when writing is going on, because each new row in an indexed table not only needs to be written to the table, but also written to the index(es) attached to the table.

So you need to use them sparingly if your db is being written to (this probably means you), and only after reflection.

To recap:

  • if the table has less than 100 rows, leave it alone
  • only index if the table is regularly being used in queries. Work from the tables with most use and most rows down to those with least use and least rows
  • if the table is regularly being joined to other tables, non-clustered index on the joining cols
  • if the columns are being regularly used in where clauses, and are reasonably
    unique, non-clustered index those cols
  • if the col is being regularly used in order by clauses, clustered index the col
  • if the col is being regularly used in between or < or > clauses, clustered index the col
  • you can have one clustered index per table, many non-clustered indexes per table
  • on a read-write db, don't over-index. Keep the number below 6 if possible.

Oracle

All indexes created in Oracle are equivalent to non clustered indexes in MSSQL Server. The general rules of indexing apply, and more info can be found Oracle Tuning FAQ.

MySQL

MySQL has very fast retrieval of rows, so in terms of performance, you are already ahead of the game if you use MySQL. The general rules of indexing apply, and more info can be found on the mySQL site under Indexes and Table and Indexes.

Implementation

The following are indexing recommendations specific to Farcry. Depending on the
version of your installation, some of them may already be in place.

Due to the difference in content requirements from installation to installation,
the numbers of rows will differ, and therefore the requirement to index will change, so you will have to remove from the list any table with less than 100 rows in your
installation. Also, due to the existence of userdefined tables in Farcry, it is not possible to give exact indexing instructions. The recommendations will cover the basic idea; specific installations may require more analysis (using the guidelines as explained above).

TableName ColumnName Type of Index
Categories CategoryID Non-Clustered
Container objectID Non-Clustered
Container label Non-Clustered
*_aObjectIDs, *_aRules,
*_aRelatedIDs
objectID Non-Clustered
*_aObjectIDs, *_aRules,
*_aRelatedIDs
Data Non-Clustered
Dm* (type table) objectID Non-Clustered
Dm* (type table) One of the date columns Clustered
Dm* (type table) Title (if defined) Non-Clustered
FqAudit objectID Non-Clustered
Nested_Tree_Obejcts objectID Non-Clustered
Nested_Tree_Obejcts nLeft Clustered
Nested_Tree_Objects nRight Clustered
Nested_Tree_Obejcts parentID Non-Clustered
RefCategories objectID Non-Clustered
RefCategories CategoryID Non-Clustered
RefContainers objectID Non-Clustered
RefContainers ContainerID Non-Clustered
RefObjects ObjectID Non-Clustered
RefObjects Typename Non-Clustered
Rule* objectID Non-Clustered
Rule* label Non-Clustered
Stats LogDateTime Clustered
Stats PageID Non-Clustered
Stats navID Non-Clustered