N S

Novick Software Management • Design • Programming • Training • Consulting

   

 

 

SQL Server T-SQL User-Defined Function of the Week

Writing UDFs to Handle UNICODE

Volume 1 Number 52   November 12, 2003

Check out the UDF Frequently Asked Questions at: http://www.novicksoftware.com/UdfOfWeek/UDF_FAQ.htm

To ask a UDF related question or to contribute a UDF, send e-mail to udf@NovickSoftware.com

Sign up for this newsletter at: http://www.novicksoftware.com/UdfOfWeek/UDFofWeekSignup.htm

Transact-SQL User-Defined Functions has been published!  Take a look at it right now!


This week's UDF is a rather simple text handling UDF that removes multiple spaces from between the works in a character string. Before returning its result, leading and trailing spaces are also removed.

You might use this to clean data entry input where multiple spaces aren't valid. For example, in a name or address.

The function is named udf_TxtN_SingleSpace. It's in the TxtN group instead of the Txt group because it handles UNICODE character strings rather than just ASCII strings.

Gone are the days when the largest string that SQL Server could handle was 255 characters. For text handling functions I usually try and handle strings up to the maximum size. For ASCII strings (char, varchar) that's 8000 characters. Since each UNICODE character takes two bytes, the maximum size for a UNICODE string is 4000 bytes. The nchar and nvarchar data types are used to store UNICODE.

Even the smaller 4000 byte UNICODE strings are large enough for most purposes. If you need to store large text files, web pages, reports or other character based data that could be larger than 8000 ASCII characters or 4000 UNICODE characters, you'll need the text or ntext data types.

Here's the CREATE FUNCTION script for udf_TxtN_SingleSpace:


SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO

CREATE FUNCTION dbo.udf_TxtN_SingleSpace (

    @sInput nvarchar(4000) -- String to modify to have only single spaces.
)   RETURNS nvarchar(4000) -- @sInput with only single spaces between words.
    WITH SCHEMABINDING -- 
/* 
* Transforms a string by changing multiple spaces between words
* into single spaces.  Also trims leading and trailing spaces.
*
* Example:
select dbo.udf_TxtN_SingleSpace('   A  B C  D  EFGHI  J KlN    OP  ')
*
* Related code: udf_Txt_SingleSpace is an ASCII version.
*
* Test:
PRINT 'Test 1    ' + CASE WHEN 'A B C D EFGHI J KlN OP ' =
      dbo.udf_TxtN_SingleSpace ('  A  B C  D  EFGHI  J KlN    OP  ')
                     THEN 'Worked' ELSE 'ERROR' END
* History:
* When          Who     Description
* ------------- ------- -----------------------------------------
* 2003-11-01           ASN        Initial Coding
*
* © Copyright 2003 Andrew Novick http://www.NovickSoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain 
* other unrelated database objects. You may not publish this 
* UDF either in print or electronically.
* Published in T-SQL UDF of the Week Newsletter Vol 1 #52 
http://www.NovickSoftware.com/UDFofWeek/UDFofWeek.htm
****************************************************************/
AS BEGIN

DECLARE @Result nvarchar(4000)
DECLARE @Pos int -- Position in the @Input String
      , @Mark int -- Marker where searching from
      , @Len int -- Length of the input

SELECT @Pos = CHARINDEX (N'  ', @sInput, 1)
     , @Result = N'' -- zero length non-null string
     , @Mark = 1
     , @Len = LEN(@sInput)

WHILE @Pos > 0 BEGIN

    SELECT @Result = @Result 
                   + SUBSTRING (@sInput, @Mark, @Pos - @Mark + 1)
         , @Mark = @Pos + 1
                  + PATINDEX (N'%[^ ]%'
                             , substring(@sInput, @Pos+2 , @Len))

    SET @Pos = CHARINDEX(N'  ', @sInput, @Mark)

END -- WHILE

-- Now add from the end of the mark to the end of the string
SET @Result = LTRIM(RTRIM(@Result 
                              + SUBSTRING(@sInput, @Mark, @Len) ))

RETURN @Result
END
GO

GRANT EXEC, REFERENCES ON [dbo].[udf_TxtN_SingleSpace] 
        TO [PUBLIC]
GO

Let's try out the function on a few test cases:


SELECT '->' + dbo.udf_TxtN_SingleSpace('A  B C  D  EKlN    OP  ')
                 + '<-'
     , '->' + dbo.udf_TxtN_SingleSpace('   OP ') + '<-'
     , '->' + dbo.udf_TxtN_SingleSpace('P ') + '<-'
 
GO

\-------Stop copying above this line ----/
(Results)

--------------------  ------  --------------
->A B C D EKlN OP<-   ->OP<-  ->P<-

(End of Results)

I've bracketed each result with arrows to be sure that you see where the results begin and end.

The amount of support that you want to give to UNICODE characters depends on your situation. The more support that you give to them, the more widely used your database can be. However, since a UNICODE string takes twice the space of an ASCII string, you might want to hold off until you know that they're necessary.

And don't forget, supporting UNICODE takes more than just a few changes in your database. The other tiers of your application must also support it. That is'nt very difficult. Visual Basic 6 supports UNICODE as do all .Net languages such as C# and VB.Net, as well as Java and many other development environments.


Do you have a great UDF that you'd like to share?  Or maybe you have a T-SQL problem that you think could be solved by a UDF but you don't know how? Send them to: UDF@NovickSoftware.com

and they might be published in this newsletter.  I try and respond to every request that I get. 

Thanks, 
Andrew Novick


RSS as HTML

Personal Blog

 
New Tips:

Loading SQL Profiler trace (.trc) files with fn_trace_gettrace

Upcoming
Presentations:

Big Data:
Working with Terabytes
in SQL Server

July 9 '08

Nov 19-21
SQL Pass 2008


Full Schedule