Issues
There are a number of issues to discuss before
actually creating the code for the stored
procedures. It is difficult to discuss them in a
vacuum so to illustrate the issues involved in
making the CRUD procedures let's
use the following theoretical product table:
setuser
GO
EXEC sp_addtype N'AppUSERID', N'varchar (20)', N'not null'
GO
CREATE TABLE Product (
ProductID int NOT NULL identity (1,1) PRIMARY KEY
, ProductName varchar(100) DEFAULT 'Product unnamed'
, [Description] varchar(2000) NULL
, UnitCost money NOT NULL-- Cost of making it
, UnitsInStock int NOT NULL -- units in the inventory
, InventoryCost as UnitCost * UnitsInStock -- Computed column
, CreatedDT datetime NOT NULL DEFAULT getdate()
, CreatedByUSERID AppUSERID DEFAULT user_name()
, LastUdDT datetime NULL
, LastUpdByUSERID AppUSERID NULL
, RowVrsn rowversion NOT NULL
)
I have kept the Product table deliberately short;
each column is there to illustrate a particular
point. You will find a similar table in the
Northwind sample database and in many other
databases that store product information.
Primary Keys
For the CRUD
operations to work, every table must have a primary
key or at least one unique index. Of course, most
tables have primary keys anyway but sometimes there
are a few tables that do not really need a primary
key. For example, some tables used for historical
reporting do not have any unique combinations of
fields because they record non-unique situations. In
thesecases, an identity column must be added
to the table to serve as the primary key. Without it
our stored procedures will not work.
Identity Columns
Many tables use identity columns as the primary
key. Whether you prefer the use of natural keys or
create synthetic keys, such as identity columns, for
every table, they are necessary in some
circumstances, such as the one mentioned above and
our Create procedure should return
it to the caller.
If the table has an identity column, the
Create stored procedure is responsible for
returning it to the application program. SQL Server
2000 makes this easier than it was in previous
versions by adding the SCOPE_IDENTITY() function. We
will use it whenever a table has an identity column.
Concurrency Control and Timestamps
When a row of data is read by an application and
presented to the user for possible modification,
most applications do not hold a lock on the row.
That is good because if they did hold locks while
the user contemplated making changes, many lockout
situations would occur. If fact, unless the
application used the Read Uncommitted isolation
level, other users could not even look at the data
until the lock is released.
But if there are no locks held, how does one
prevent incorrect successive updates to the same
row? Let's say that a shipping application is
updating the UnitsInStock column for the Products
table. What we want to prevent is this sequence of
updates:
- User A reads row X with UnitsInStock of 6
- User B reads row X with UnitsInStock of 5
- User B updates row X changing UnitsInStock
to 1
- User A updates row X changing UnitsInStock
to 2
The problem with allowing A's attempted update to
succeed is that anything that B changed is ignored.
B reduced the inventory level by 5 to 1. A's change
to UnitsInStock column should not be allowed in this
situation. Not only is A's change incorrect but I
probably should not have been allowed at all because
the total units shipped would be greater than the
inventory.
The answer to preventing this problem lies in the
WHERE clause of the UPDATE statement. As we have
seen, the WHERE clause is used to identify the
primary key to the row. It can also be used to
identify the data that the client application thinks
that it is updating. If the row has been changed
since the client application read the row, it should
not be allowed to apply updates. This is
accomplished in one of two ways:
- A clause specifying the original value of
every column is added to the WHERE clause
- A check on the timestamp column is added to
the WHERE clause.
Many systems, including ADO, will write the check
for the original value of every column. That
technique works in every database management system,
not just SQL Server. SQL Server offers the timestamp
data type, which can be used to track when a row
changes. Every time a row is updated, a timestamp (a.k.a
rowversion) is updated to a new unique value. This
simplifies writing the stored procedures because we
only have to pass in the timestamp, not the original
value of every column.
timestamp and rowversion
The SQL Server Books on-line discusses
Microsoft's intent to change the timestamp data
type to be in line with the SQL-92 standard,
which calls for the timestamp to contain a date
and time like the current SQL Server datetime
data type. To accommodate this future change the
rowversion data type has been added as a synonym
to timestamp. It will be there when timestamp is
changed. For that reason, I have used rowversion
in the table and procedure definitions in this
article.
Computed Columns
SQL Server allows columns to be defined as a
computation on other columns in the same table.
Using a User-Defined function, the computation can
even extend to accessing data in other tables. In
the product table, the computed column is
InventoryValue, which is defined with the line:
, InventoryCost as UnitCost * UnitsInStock -- Computed column
The Create, Read,
and Update stored procedures must
return the value of computed columns in the table.
They will be OUTPUT parameters in each of the
CRU procedures. The Delete
procedure can ignore them.
Special Fields
For tables that might be edited by a user I add
four fields to the server as a sort of audit trail.
They capture the datetime and user when the row was
inserted and when it was last updated. The Product
table has them defined this way:
, CreatedDT datetime NOT NULL DEFAULT getdate()
, CreatedByUSERID AppUSERID NOT NULL DEFAULT user_name()
, LastUdDT datetime NULL
, LastUpdByUSERID AppUSERID NULL
I have supplied defaults for the CreatedDT and
CreatedByUSERID columns. The user_name() default
returns the current user. If you're using an
application role, the user_name() function doesn't
help because it returns the name of the application
role. In this and similar situations the application
should supply an identity, not the database. Also,
if your users are distributed across more than one
time zone, using the database time is a good idea.
That prevents some confusion about the order in
which rows were created or modified.
Conclusion
This article has covered the rational and
implementation issues surrounding the implementation
of the CRUD operations in SQL
Server. I hope that you're convinced that using
stored procedures instead of ad hoc SQL statements
is a good idea and that you've had a chance to think
about some of the issues that will come up when we
write the procedures.
Next month, I will go into detail about how to
write each of the procedures, taking into
consideration SQL Server features such as computed
columns, timestamps and identity columns.
Writing those stored procedures can quickly add
up to quite a task. However, because every such
procedure in nearly identical, they are easy to
generate with a program. That has been my practice
for the past several years. There are many programs,
both free and commercial, that will generate the
procedure for you. Next month's article describes
some of them and discusses what is involved in
generating the procedures so you do not have to do
it by hand. |