Monday, December 17, 2007

Indexing in SQLSERVER

Indexes in Sql Server

A database index is similar to an index in a book – it is comprised of a lookup value, and a number identifier that corresponds to the row number in a table. In SQL Server, there are two kinds of indexes – clustered and non-clustered. Clustered Indexes require that the data in the table is physically sorted in the order of the index. Because the data in a table can be physically sorted only one way, there can be at most only one clustered index per table. Non clustered index do not require that data be physically sorted, so there can be more that one non-clustered index per table. In fact SQL Server allows up to 249 non-clustered indexes per table. Because data is not physically sorted, range searches using a non clustered index are not very efficient.

The command for creating an index in T-SQL is

CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name ON { table | view } ( column [ ASC | DESC ] [ ,...n ] ) [ WITH <> [ ,...n] ] [ ON filegroup ] <> :: = { PAD_INDEX | FILLFACTOR = fillfactor | IGNORE_DUP_KEY | DROP_EXISTING | STATISTICS_NORECOMPUTE | SORT_IN_TEMPDB }

Relational databases like SQL Server use indexes to find data quickly when a query is processed. Creating and removing indexes from a database schema will rarely result in changes to an application's code; indexes operate 'behind the scenes' in support of the database engine. However, creating the proper index can drastically increase the performance of an application.

We can create the same index using the following SQL. The command specifies the name of the index (IDX_UnitPrice), the table name (Products), and the column to index (UnitPrice).

 CREATE INDEX [IDX_UnitPrice] ON Products (UnitPrice)

To verify that the index is created, use the following stored procedure to see a list of all indexes on the Products table:

 EXEC sp_helpindex Customers 

How It Works

The database takes the columns specified in a CREATE INDEX command and sorts the values into a special data structure known as a B-tree. A B-tree structure supports fast searches with a minimum amount of disk reads, allowing the database engine to quickly find the starting and stopping points for the query we are using.

Conceptually, we may think of an index as shown in the diagram below. On the left, each index entry contains the index key (UnitPrice). Each entry also includes a reference (which points) to the table rows which share that particular value and from which we can retrieve the required information.

Much like the index in the back of a book helps us to find keywords quickly, so the database is able to quickly narrow the number of records it must examine to a minimum by using the sorted list of UnitPrice values stored in the index. We have avoided a table scan to fetch the query results. Given this sketch of how indexes work, lets examine some of the scenarios where indexes offer a benefit.

Taking Advantage of Indexes

The database engine can use indexes to boost performance in a number of different queries. Sometimes these performance improvements are dramatic. An important feature of SQL Server 2000 is a component known as the query optimizer. The query optimizer's job is to find the fastest and least resource intensive means of executing incoming queries. An important part of this job is selecting the best index or indexes to perform the task. In the following sections we will examine the types of queries with the best chance of benefiting from an index.

Searching For Records

The most obvious use for an index is in finding a record or set of records matching a WHERE clause. Indexes can aid queries looking for values inside of a range (as we demonstrated earlier), as well as queries looking for a specific value. By way of example, the following queries can all benefit from an index on UnitPrice:

DELETE FROM Products WHERE UnitPrice = 1 
UPDATE Products SET Discontinued = 1 WHERE UnitPrice > 15 

Indexes work just as well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements.

Sorting Records

When we ask for a sorted dataset, the database will try to find an index and avoid sorting the results during execution of the query. We control sorting of a dataset by specifying a field, or fields, in an ORDER BY clause, with the sort order as ASC (ascending) or DESC (descending). For example, the following query returns all products sorted by price:

 SELECT * FROM Products ORDER BY UnitPrice ASC 

With no index, the database will scan the Products table and sort the rows to process the query. However, the index we created on UnitPrice (IDX_UnitPrice) earlier provides the database with a presorted list of prices. The database can simply scan the index from the first entry to the last entry and retrieve the rows in sorted order.

The same index works equally well with the following query, simply by scanning the index in reverse.


Grouping Records

We can use a GROUP BY clause to group records and aggregate values, for example, counting the number of orders placed by a customer. To process a query with a GROUP BY clause, the database will often sort the results on the columns included in the GROUP BY. The following query counts the number of products at each price by grouping together records with the same UnitPrice value.

SELECT Count(*), UnitPrice FROM Products GROUP BY UnitPrice 

The database can use the IDX_UnitPrice index to retrieve the prices in order. Since matching prices appear in consecutive index entries, the database is able count the number of products at each price quickly. Indexing a field used in a GROUP BY clause can often speed up a query.

Maintaining a Unique Column

Columns requiring unique values (such as primary key columns) must have a unique index applied. There are several methods available to create a unique index. Marking a column as a primary key will automatically create a unique index on the column. We can also create a unique index by checking the Create UNIQUE checkbox in the dialog shown earlier. The screen shot of the dialog displayed the index used to enforce the primary key of the Products table. In this case, the Create UNIQUE checkbox is disabled, since an index to enforce a primary key must be a unique index. However, creating new indexes not used to enforce primary keys will allow us to select the Create UNIQUE checkbox. We can also create a unique index using SQL with the following command:

CREATE UNIQUE INDEX IDX_ProductName On Products (ProductName)

The above SQL command will not allow any duplicate values in the ProductName column, and an index is the best tool for the database to use to enforce this rule. Each time an application adds or modifies a row in the table, the database needs to search all existing records to ensure none of values in the new data duplicate existing values. Indexes, as we should know by now, will improve this search time.

Index Drawbacks

There are tradeoffs to almost any feature in computer programming, and indexes are no exception. While indexes provide a substantial performance benefit to searches, there is also a downside to indexing. Let's talk about some of those drawbacks now.

Indexes and Disk Space

Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users. To see the space required for a table, use the sp_spaceused system stored procedure in a query window.

 EXEC sp_spaceused Orders 

Given a table name (Orders), the procedure will return the amount of space used by the data and all indexes associated with the table, like so:

Name    rows     reserved    data    index_size  unused       
------- -------- ----------- ------  ----------  -------
Orders  830      504 KB      160 KB  320 KB      24 KB

According to the output above, the table data uses 160 kilobytes, while the table indexes use twice as much, or 320 kilobytes. The ratio of index size to table size can vary greatly, depending on the columns, data types, and number of indexes on a table.

Indexes and Data Modification

Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. As we discussed earlier, indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, we now caveat the discussion with the understanding that providing too many indexes to update can actually hurt the performance of data modifications. This leads to a delicate balancing act when tuning the database for performance.

In decision support systems and data warehouses, where information is stored for reporting purposes, data remains relatively static and report generating queries outnumber data modification queries. In these types of environments, heavy indexing is commonplace in order to optimize the reports generated. In contrast, a database used for transaction processing will see many records added and updated. These types of databases will use fewer indexes to allow for higher throughput on inserts and updates.

Every application is unique, and finding the best indexes to use for a specific application usually requires some help from the optimization tools offered by many database vendors. SQL Server 2000 and Access include the Profiler and Index Tuning Wizard tools to help tweak performance.

Now we have enough information to understand why indexes are useful and where indexes are best applied. It is time now to look at the different options available when creating an index and then address some common rules of thumb to use when planning the indexes for your database.

  • Clustered indexes define the physical sorting of a database table’s rows in the storage media. For this reason, each database table may have only one clustered index. If a PRIMARY KEY constraint is created for a database table and no clustered index currently exists for that table, SQL Server automatically creates a clustered index on the primary key.
  • Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself. SQL Server 2000 supports a maximum of 249 non-clustered indexes per table. However, it’s important to keep in mind that non-clustered indexes slow down the data modification and insertion process, so indexes should be kept to a minimum

One of the hardest tasks facing database administrators is the selection of appropriate columns for non-clustered indexes. You should consider creating non-clustered indexes on any columns that are frequently referenced in the WHERE clauses of SQL statements. Other good candidates are columns referenced by JOIN and GROUP BY operations.

You may wish to also consider creating non-clustered indexes that cover all of the columns used by certain frequently issued queries. These queries are referred to as “covered queries” and experience excellent performance gains.

SQL Server provides a wonderful facility known as the Index Tuning Wizard which greatly enhances the index selection process. To use this tool, first use SQL Profiler to capture a trace of the activity for which you wish to optimize performance. You may wish to run the trace for an extended period of time to capture a wide range of activity. Then, using Enterprise Manager, start the Index Tuning Wizard and instruct it to recommend indexes based upon the captured trace. It will not only suggest appropriate columns for queries but also provide you with an estimate of the performance increase you’ll experience after making those changes!