| Business Application Development with: SQL Server, C#, VB, VB.Net, ASP, ASP.Net, and XML |
| N | S |
Novick Software Management • Design • Programming • Training • Consulting |
|
|
News Links Schedule Site Map Contact |
|
Using Inline UDFs for Precise PagingAndrew Novick, January, 2003 In a typical data-driven Web site, many pages include tables of data that have been retrieved from a database. How much data should Web server scripts retrieve from the database when showing a table that might be a hundred or more lines long and span several pages? My answer is: as little as possible. To support that strategy, I've found SQL Server 2000's Inline User-Defined Functions are a great tool for managing paging. I've been using Inline User-Defined Functions (UDFs) as an alternate to SQL Views. Inline UDFs really are Views, but with the addition of parameters. By combining parameters with the SQL TOP clause, the minimal number of rows can be retrieved for each page. Before going into Inline UDFs and paging, I want to lay some groundwork for different types of UDFs and how they can be used. After that we'll continue discussing Inline UDFs and show how to write them to support paging operations. What are User-Defined Functions?While other relational database management systems such as Oracle and Access have provided ways to create functions in their scripting languages, SQL Server never has. I first encountered this limitation in 1996 when I wrote my first system based on Sybase System 11, back when Microsoft and Sybase were still sharing SQL Server. Having used Oracle for some time, I had planned on using functions to perform unit system conversions to and from metric units and U.S. standard units. Boy was I surprised when I found no way to add a function to SQL Server. Of course, I overcame the problem, but I've always hoped for a way to create functions in Transact-SQL. SQL Server 2000 introduces three forms of user-defined functions, and they can each be a great addition to your SQL repertoire. The types are:
The following sections describe each type of UDF and show how to use them. Scalar UDFsScalar UDFs are very similar to functions in other procedural languages. They take one or more parameters and return a single value. Along the way they can execute multiple T-SQL statements that could involve anything from very simple computations to very complex logic. Here's an example of a very simple scalar UDF:
As you can see, the function computes the area by multiplying @Length by @Width and returning the product as the result. Although the database owner has permission to execute the function without any additional GRANTs, you must GRANT EXECUTE permission to a user or group before they are allowed to use the function. In the script above, EXECUTE permission is given to PUBLIC. Here's how you might use a scalar UDF in a SELECT list:
Notice that the function name of a Scalar UDF must be qualified with the owner name. This is mandatory for Scalar UDFs but doesn't effect other types of UDFs. Of course, the parameters to the function can also be column names. This script creates the Rectangle table, populates it, and then computes the area of the rectangles:
In addition to T-SQL logic and calculations, a scalar UDF can read data using a SELECT statement, but all UDFs are prohibited from SQL statements that perform INSERTs, UPDATEs, or DELETEs. There are other limitations on UDFs, most of them involve limitations that prevent functions from changing the database. The next example uses the Northwind sample database that comes with SQL Server. The function illustrates the ability to retrieve information using a SELECT statement to compute the results of a scalar UDF:
The logic of ufd_EmpTerritoryCOUNT is pretty simple, retrieve the count of territories for just the EmployeeID given by the @EmployeeID parameter and return it. Now, let's use the function in several places within the same query to illustrate how a scalar UDF can be used:
As you can see, udf_EmpTerritoryCOUNT is used in the SELECT list, the WHERE clause, and the ORDER BY clause. This is far from efficient, but it shows most of the places where a scalar UDF can work. By the way, the TOP 3 clause is applied here just to limit the size of the output for the purpose of this article. The TOP clause is even more useful later in Inline UDFs. The two scalar UDFs in this section could have been eliminated. It's possible to rewrite the SELECT statements shown above without using the functions. What the scalar function gives the developer is a way to simplify the coding process and to create code that can be easily reused. Scalar UDFs don't have to be so simple. They can involve much more complex logic using looping , cursors, and other conditional logic. The SQL Server Books-Online has at least one more example, and you'll find additional samples in the directory \Program Files\Microsoft SQL Server\80\Tools\DevTools\Samples\UDF. In addition, my Web site has the archives of the free T-SQL UDF of the Week newsletter at http://www.NovickSoftware.com/UdfOfWeek/UDFofWeek.htm. There you'll find several more examples of UDFs with a discussion of why and how to use each one. Now, let's move onto Inline UDFs. Although they're called functions, they're very different from scalar UDFs. Inline User-Defined FunctionsAn Inline User-Defined Function takes parameters like a scalar UDF. That's about where the similarities end. The function body of an Inline UDF consists of one and only one SELECT statement. Sound familiar? Of course it sounds familiar; an Inline UDF is a VIEW. The difference is that it has parameters, and these parameters can be inserted in the SELECT statement to limit what is returned by the function. The next example comes from the Northwind database. It returns a table of information for all the products in a category:
As with all Inline UDFs, udf_ProductsInCategoryTAB returns a TABLE. The columns returned by the function come from the SELECT list. In this case, they all happen to be drawn from the Products table. The parameter, @CategoryName, is used in this WHERE clause: Categories.CategoryName = @CategoryName to restrict the results to one category. There are a couple other things to notice about the script:
Now, lets try out udf_ProductsInCategory. Inline UDFs are invoked in the FROM clause of a data manipulation (DML) SQL statement. Here's our query:
The SELECT statement has its own TOP clause, select list, and ORDER BY clause. The rows it selects from are taken from the results of udf_ProductsInCategory. The Inline UDF is just another rowset-producing object that can be used in the FROM clause like a table, view, or OPENROWSET clause. It can be joined to other rowsets as this query, which joins the results of udf_ProductsInCategory with an inline select clause that computes the total number of items produced for each product, illustrates:
I hope that gives you an idea of what can be done with Inline UDFs. The next section discusses multistatement table valued UDFs. After that, I return to Inline UDFs and show how to make them a welcome addition to Web page table paging. Multistatement Table Valued User-Defined FunctionsMultistatement Table Valued User-Defined Function is a mouthful. I'll refer to them as MTV UDFs. They're sort of a cross between an Inline UDF, a stored procedure, and a scalar UDF. Like Inline UDFs, they produce rowsets and are used in the FROM clause of a SQL DML statement such as SELECT. Like a stored procedure, they contain multiple lines of T-SQL including IF statements, WHILE loops, and cursors. Like a scalar UDF, the T-SQL that they contain cannot change the state of the database. The following UDF, udf_DT_WeeksBtwnTAB, returns one row for each week that falls in a date range. First, let's look at the function, then I'll show how to use it.
udf_DT_WeeksBtwnTAB constructs a table of weeks. Here's a query to test it out:
Typically I'll use udf_DT_WeeksBtwnTAB when reporting some activity at the week level. By using it to define the weeks for reporting, I'm assured of not missing a week that has no activity, as would happen with a GROUP BY clause. Now that we've discussed the types of UDFs, it's time to turn our attention to using UDFs for Web page generation. In particular, the Inline UDF works well for minimizing the amount of data retrieved for each page. Retrieving Minimal Data for Each Web PageWhether using ASP or ASP.NET, JSP, PHP, CGI or some other programming tool to generate data-driven Web pages, we must face the issues of how much data to show for each request and how to show additional pages. Once you've decided how much data to show, you then have to decide how much data to retrieve and how to let users surf beyond the first page. A typical query might result in just a few or many thousands of results. Once the number of results grows beyond the number of rows that can be shown on the page, sending additional rows becomes counter productive. Sending a large number of rows to the browser slows down the time to complete the page display. So how do you go about retrieving just the right amount of data? Some pointers I've picked up about creating data-driven Web sites are:
Sometimes this has motivated me to cache data in the ASP/ASP.Net level, but I've usually found this to be disappointing. The percentage of times that users travel beyond the first page of data is pretty low, so I only retrieve the minimal amount of data on a page. What's the minimum? Showing more than fifteen or so rows on a page requires users to scroll to see all their results. A little bit of scrolling isn't so bad. After all, it can be done with the page down key or the mouse wheel. But after one or two page down keys, users, only occasionally, ever look at the data. I've come to the conclusion that it's best to keep pages pretty short, about the amount that can be shown on the screen. If the aim is to retrieve only the rows needed to display on a single page, then the Inline UDF works very well. As I described above, an Inline UDF is essentially a View with parameters. Creating the Inline UDFBefore we create the UDF, let's decide what should be on our page. To illustrate, we'll use the Northwind database and base the query on this scenario:
So the query to retrieve this data in the desired order is:
This is the same information that was retrieved in a previous example, with the exception that we're not limiting the data to a single category. In many of the queries so far, you may have noticed the TOP clause. The TOP clause restricts the result set to the number of rows specified. It was used previously in this article to limit the size of the query results in the interest of saving space. When you want to retrieve just enough rows for your page, it's important because it tells the SQL optimizer that it can stop after just a few rows. Adding the TOP clause to the query changes the first line to: SELECT TOP 15 P.ProductID, P.ProductName, P.UnitsInStock How many rows should you retrieve? Since the TOP clause must be a constant, I use the largest number that could fit on a page. Since the data transmission between the SQL Server and the Web creation engine is usually over a very fast connection, use the largest number that might ever fit on a page and don't worry if a few extra rows are sent. Next, it's essential that we are able to retrieve rows for display on pages after the first one. To do this it's necessary to save one or more columns that identify where we are in the paging process. Exactly which columns to save depends on the order of the query. The columns used to identify position must uniquely identify the last row displayed on the Web page. It may be necessary to add additional columns to the ORDER BY clause to provide uniqueness. In fact, for our sample query, the UnitsInStock column doesn't provide uniqueness, and we must add an additional column or columns. While there might be some benefit to using [Total Sales] as the second sort column, it's a field that could actually change between pages. We're better off using a combination of ProductName and ProductID. Why two? Because in the Products table of the Northwind database, ProductName isn't guaranteed to be unique. Most of the time ProductName provides a very understandable and useful ordering. But in rare occasions where a page with two products with the same quantity have the same name and fall on the exact end of a page, we might produce an error if we don't also use the ProductID. So the ORDER BY clause in our query becomes:
In the Web page generation code, we'll have to save three scalar values -- one for each of the sort variables: UnitsInStock, ProductName, and ProductID. In ASP or ASP.Net these values can safely be saved in the SESSION object. They'll be used when the second and subsequent pages are retrieved. The ASP/ASP.Net code must hand these back to the Inline UDF that embodies the query as parameters, which can then be used in the WHERE clause. The declaration of the parameters is:
Each of them has a default value that retrieves the first page. Providing the defaults simplifies retrieval of the first page. Just use default for the parameter value. The WHERE clause gets a little tricky. Of course the " P.ProductID = S.ProductID" condition must remain in the query. And the three parameters must be compared to the corresponding columns in each of the rows so that we start where we left off. My first instinct is to code these comparisons as:
But that's wrong! The problem is that it only returns rows with ProductName columns that are greater than or equal to the last ProductName, even if they have lower UnitsInStock values. And the same problem holds for ProductIDs. The correct coding of the WHERE conditions for positioning the results is:
This retrieves rows that are after the last shown row. Using the less-than-or-equal and greater-than-or-equal (>=) comparison operators gives us one row of overlap between pages. Use just the less-than (<) or greater-than (>) comparison operators for no overlap. Now we pull these changes together for the function creation script:
I name all UDFs with a prefix of "udf_" to distinguish them from other database objects and from system functions, which are named with the prefix "fn_". The next part of the name identifies the UDF as one used for Paging. The name "ProductsByUnits" identifies the page that the function is for. Finally, "Forward" tells us the direction of paging. More on paging back later. To retrieve the rows for the first page, the SELECT statement -- shown with the results of the query -- is:
The parameters were not supplied for the first page because the defaults could be used. To retrieve the rows for the next page the SELECT statement shown with the first few rows of the results is:
Now you might ask, "Do I really need an Inline UDF to accomplish this?" My answer is that the Inline UDF is not essential. You could put the query inline in the Web page creation script, but using the Inline UDF has important advantages:
It's the latter reason that is most important. Separating the SQL logic from other page creation logic is a big simplifying step that pays many times over in a reduction in complexity and thus in maintenance. For this reason I almost always move my SQL into stored procedures or UDFs and out of the Web-creation script. So far I've put off discussing paging backward. I know of two approaches to paging back based on the Inline UDF technique:
The first method requires that you save the key values for the top of each page. Once you're saving a set of values, you might as well save an array. But this approach involves more coding on the Web page creation side, and it has the additional disadvantage of possibly missing a row or more if row insertion was going on at the same time as paging. The Inline UDF to page in reverse is very similar to the forward UDF with the addition of an extra sort operation. Here's the create function script:
The inline SELECT RowsOnPreviousPage grabs the fifteen previous rows. This uses similar logic to forward paging with the exception that the WHERE clause selects rows that are less than or equal to the first row on the last page. The outer SELECT is used to re-sort the rows into the desired order. A TOP clause is required when using an ORDER BY clause in an Inline UDF. "TOP 100 PERCENT WITH TIES" is used in udf_Paging_ProductsByUnit_REVERSE to sort without excluding any rows from the result. By the way, to use the reverse paging function, the Web page creation code must save the three key values from the first row on the page. In ASP or ASP.NET these can be saved in the SESSION object in the same way the key values from the last row are saved. SummaryUser-Defined Functions are valuable to the SQL Server 2000 programmer. This article has introduced the three types of UDFs and shown how to use them. The Inline UDF is essentially a SQL View with parameters. For Web page creation, it can be put to effective use to retrieve just the right number of rows to display on each Web page and no more. This has proven to be an effective strategy for efficient data retrieval. About the AuthorAndrew Novick is Principal of Novick Software where he develops business applications as a consultant using ASP/ASP.NET, VB/VB .NET, XML and SQL Server. He recently co-authored SQL Server 2000 XML Distilled, which was published by Curlingstone in October of 2002. 2003 marks his 32nd year of computer programming, starting in High School with a PDP-8 and moving onto a degree in Computer Science, an MBA, and then programming mainframes, minicomputers, and for the last 17 years, PCs. When not programming, he enjoys coaching Little League baseball, woodworking, mowing the lawn, and the occasional movie with his wife. He can be reached at anovick@NovickSoftware.com.
|
|
|
Copyright © 2003-2008 Novick Software, Inc. | Terms of Use | Privacy Policy | Nice Things People Say| |