| Business Application Development with: SQL Server, C#, VB, VB.Net, ASP, ASP.Net, and XML |
| N | S |
Novick Software Management • Design • Programming • Training • Consulting |
|
|
News Links Schedule Site Map Contact |
|
SQL Server T-SQL User-Defined Function of the WeekComparing the Performance of Rot13 ImplementationsVolume 2 Number 44 November 9, 2004Check out the UDF Frequently Asked Questions at:
http://www.novicksoftware.com/UdfOfWeek/UDF_FAQ.htm
Transact-SQL User-Defined Functions has been published! Take a look at it right now! The previous two issues have been devoted to alternative implementation of the Rot13 cipher. This is a character substitution cipher that is used to obscure text strings in the Window registry and in many web cookies. Both implementations were contributed by Luke Schollmeyer. The first version, ufn_Rot13, from issue #42 uses a loop to translate each character. The second version, ufn_Rot13Redux, from issue #43 puts the loop into a SELECT statement so that it's executed in the SQL data engine as part of the select process rather than in an explicit loop. Take a look back at those issues if you'd like to see the details. This issue is devoted to comparing the performance of the two versions of the function. As I write this I have no idea how it's going to come out but I thought it would be interesting to see. The test of performance of the two functions uses a methodology similar to the one used in Volume 1 #10 of this newsletter. The methodology is also used in Chapter 10 of Transact-SQL User-Defined functions. We're going to construct a table with 8192 rows of random character data. Then we're going to pin the table into memory to eliminate I/O related differences in the trials. Finally we're going to execute the two functions on the 8192 rows and use the MAX aggregation function to minimize the amount of data sent to the client program. I'm pretty confident that this test gives a fair comparison of the functions. If you think there's a flaw in my reasoning, please let me know at udf@novicksoftware.com. In order to create the data we're going to need this issue's UDF of the week. udf_Txt_XlateChars modifies a string by translating one character to another. Here's the CREATE FUNCTION script:
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
CREATE FUNCTION dbo.udf_Txt_XlateChars (
@sInput varchar(8000) -- string to remove chars from
, @sFromChars varchar(255) -- Characters to remove
, @sToChars varchar(255) -- Characters to replace with
) RETURNS varchar(8000) -- @sInput with characters in
--@sFromChars replaced with the character in @sToChars
WITH SCHEMABINDING
/*
* Modifies @sInput by replacing any character in @sFromChars
* with the corresponding character in @sToChars.
* Works from left to right in @sFromChars so chars may be
* translated twice if a character appears in @sToChars
* to the left of where it appears in @sFromChars.
*
select dbo.udf_Txt_XlateChars(',AB=D-EIJ/\NOP-Q'
, '-,\/.;:<>()#$@!%^&*+="''', '......................')
* Test:
PRINT 'Test 1 ' + CASE WHEN '.ABC.D.EFGHIJ..XKL.M.NOP.' =
dbo.udf_Txt_XlateChars ('>ABC=D-EFGHIJ/\XKL"M&NOP-'
, '-\/.;:<>()#$@!%^&*+="''', '......................')
THEN 'Worked' ELSE 'ERROR' END
*
* © Copyright 2003 Andrew Novick http://www.NovickSoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain
* other unrelated database objects. You may not publish this
* UDF either in print or electronically.
* Published in the T-SQL UDF of the Week Vol 2 #44 11/9/04
http://www.NovickSoftware.com/UDFofWeek/UDFofWeek.htm
****************************************************************/
AS BEGIN
DECLARE @sReplaced varchar(8000) -- Working result
, @nNum2Rep int -- how many characters to replace
, @nPos int -- postion in @sFromChars
, @RemoveChar CHAR(1) -- Char to replace this time
SELECT @nNum2Rep = LEN(@sFromChars)
, @nPos = 1
, @sReplaced = @sInput
WHILE @nPos <= @nNum2Rep BEGIN
SET @sReplaced = REPLACE(@sReplaced
, SUBSTRING(@sFromChars, @nPos,1)
, SUBSTRING(@sToChars, @nPos, 1)
)
SET @nPos = @nPos + 1
END -- WHILE
RETURN @sReplaced
END
GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO
GRANT EXEC ON dbo.udf_Txt_XlateChars to PUBLIC
GO
Like ufn_Rot13, udf_Txt_XlateChars has a WHILE loop that is executed once for each character in a string. Instead of being executed once for each character in the input, it's executed once for each character in the translation strings and the REPLACE function is used the change all the characters of one type to characters of another. This leads to a limitation of udf_Txt_XlateChars that prevents it's use for replacing ufn_Rot13. Because the result of replacing each character is used as the input of the next replacement, if one of the characters in the @sFromChars parameter is also in the @sToChars that it could get translated twice. I use this function mostly for translating special characters into a more acceptable alternative and so I've been able to use this function with that limitation. Now lets build our test data so we can compare the performance of these UDFs. The next stored procedure was used in Volume 1 #10 to build a table of number strings. Here's the CREATE PROC script:
CREATE PROC usp_CreateExampleNumberString
@Loops int = 20 -- creates POWER (2, @Loops) Rows
-- 20 Creates 1,048,576 rows
AS
DECLARE @LC int -- Loop counter
-- delete an existing ExampleNumberString table
if exists (select * from dbo.sysobjects
where id = object_id(N'dbo.ExampleNumberString')
and OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE dbo.ExampleNumberString
CREATE TABLE ExampleNumberString (
ID int identity (1,1)
, BigNum Numeric (38, 0)
, NumberString varchar(128) NULL
)
INSERT INTO ExampleNumberString (BigNum, NumberString)
VALUES( CONVERT (numeric(38,0), rand() * 9999999999999)
, ' ') -- preallocate
SELECT @LC = 0
WHILE @LC < @Loops BEGIN
INSERT INTO ExampleNumberString (BigNum, NumberString)
SELECT BigNum * RAND(@LC + 1) * 2
, ' '
FROM ExampleNumberString
SELECT @LC = @LC + 1
END -- WHILE
UPDATE ExampleNumberString
SET NumberString = convert(varchar(128)
, convert(numeric(38,0), 9834311) * bignum)
Once you've created the procedure, run it to create as many rows as you like. The parameter to usp_CreateExampleNumberString is use as the exponent of 2 to calculate the number of rows desired. So 3 creates 8 rows and 20 creates 1,048,576 rows. I suggest that you use at least 8192 rows (@Loops = 13) to get meaningful results as I do in the script that follows: exec usp_CreateExampleNumberString 13 -- for 8192 rows select count(*) from ExampleNumberString -- verify the count select top 10 * from ExampleNumberString -- check out a few rows go Now since we're translating letters instead of numbers, we need a different table that has letters. That's where udf_Txt_XlateChars comes in. Here we translate the numbers to letters and make a new table out of it:
-- Create ExampleLetterString
SELECT dbo.udf_Txt_XlateChars (Numberstring, '0123456789'
, 'ABCDEVWXYZ') Letters
into ExampleLetterString
FROM ExampleNumberString
go
Since Rot13 divides the alphabet at the 13'th letter, the statement above translates half the numbers into letters below the split and half to letters above the split. Since we're going to run two queries against the Letters column to test the time it takes to use a UDF on the 8192 rows, we should try and make the circumstances for the two queries as close as possible. One way to do that is to force all the rows that we'll query into memory and keep them there. If we don't have all the desired rows in memory then we might be comparing one query that reads a 8192 rows from disk with another query that reads the same 8192 rows from SQL Server's page cache. SQL Server provides the DBCC PINTABLE to force all pages from a table to remain in the cache once they've been read the first time. Use it with caution. It can fill SQL Server's cache and cause the query engine to lock up to the point where you have to shut it down and restart it. So only do this if you have adequate RAM. There's no point using virtual memory as a substitute RAM. That just substitutes one form of disk I/O (paging) for the one we're trying to eliminate (reading pages from disk). This next script pins ExampleLetterString in memory. On my test system with a 8192 row table it forced SQL Server to consume 32 megabytes of RAM. The test system has 1000 megabytes of RAM and I can use the task manager to see that memory isn't full. If your system is short on RAM, reduce the number of rows in the test to eliminate paging.
-- Pin ExampleLetterString into memory
-- Be sure that you have enough memory available before you do
-- this. My SQL Server Process grew to 32 Megabytes when I ran
-- this script.
DECLARE @db_id int, @tbl_id int
SET @db_id = DB_ID('
The stage is set for comparing the two queries using the two Rot13 functions. Before we run the queries, I want to throw in one more wrinkle. Any query that returns a million rows to SQL Query Analyzer's results window is going do a lot of work on sending, receiving, and displaying the results. To eliminate most of that work, I've used the MAX function to just return biggest result. You'll see this function in the queries below. Query #1:
SELECT MAX( LEN(dbo.ufn_Rot13(Letters))) as [Max Len]
FROM ExampleLetterString
GO
This took 26 seconds on my system. Query #2:
SELECT MAX( LEN(dbo.ufn_Rot13Redux(Letters))) as [Max Len]
FROM ExampleLetterString
GO
This took 11 seconds on my system. So the explicit WHILE loop takes about 2 and a half times the version using the implicit loop in the select statement. That's a significant but not overwhelming difference. It might make this technique worth exploring in your queries. However, if you step back for a second and see that it takes on average 1.3 milliseconds to execute ufn_Rot13Redux, you must approach using a UDF this way with caution. 1.3 milliseconds is fine if you're executing the UDF once to prepare a cookie for a web page. If you're processing a million rows with a UDF, watch out. That 1.3 milliseconds would add up to 1300 seconds or about 22 minutes. Do you have a great UDF that you'd like to share? Or maybe you
have a T-SQL problem that you think could be solved by a UDF
but you don't know how? Send them to:
UDF@NovickSoftware.com
|
|
|
Copyright © 2003-2008 Novick Software, Inc. | Terms of Use | Privacy Policy | Nice Things People Say| |