Novick Software
SQL Server Consulting Design Programming Tuning

  andy novick is a sql server mvp

 

 

Creating a User Defined Aggregate with SQL Server 2005

by Andrew Novick

 Sample code:

Note: There is a comment on the Product aggregate by Joe Celko and more information on
Aggregates in SQL in the article A Note on SQL Aggregates 9/4/05

SQL Server doesn't have a Product aggregate function.  A recent blog post by Karen Watterson pointed out that Microsoft Knowledge Base article Q89656: Simulating a PRODUCT() Aggregate Function discusses how to achieve the Product aggregate functionality using the POWER function.  We e-mailed about it and she challenged me to create a Product aggregate in SQL Server 2005.  I decided to accept the challenge. 

SQL Server 2005 allows the creation of User Defined Aggregates in VB.Net or C#.   I started with a new  Visual Basic/Database type project and I selected SQL Server Project.  That created the project without any classes.  The next step is to add a User Defined Aggregate Class to the project and code it.  So here it is, my first attempt at a udagg:

Imports System
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports System.Runtime.InteropServices

''' <summary>
''' Sample user-defined aggregate that takes the product
''' of multiple values.  NULLs are ignored, except that if there
''' are no non-null values, the result is NULL.
'''
</summary>
''' <remarks>This is sample code created with a Beta version of
''' Visual Studio 2005. Do not use this in a production application.
'''
''' by Andrew Novick  http://www.NovickSoftware.com
'''
</remarks>
<Serializable()> _
<StructLayout(LayoutKind.Sequential)> _
<SqlUserDefinedAggregate(Format.Native, IsInvariantToNulls:=True, _
                    IsInvariantToOrder:=True, IsNullIfEmpty:=True, _
                    IsInvariantToDuplicates:=False)> _
Public Class Product

    Private m_Accumulator As SqlTypes.SqlDouble
    Private m_HasNonNull As Boolean ' Tracks that any non-nulls
    ' have been passed to the Accumulate or Merge methods.

    ''' <summary>
    ''' Called once when the class is being initialized or reused.
    ''' </summary>
    ''' <remarks></remarks>
    Public Sub Init()
        m_Accumulator = 1.0
        m_HasNonNull = False
    End Sub

    ''' <summary>
    ''' Called once for each row.  value could be NULL.
    ''' </summary>
    ''' <param name="value">New value to aggregate by
    ''' multiplication</param>
    ''' <remarks></remarks>
    Public Sub Accumulate(ByVal value As SqlDouble)
     
       if value.isnull then exit sub     

        m_Accumulator *= value
        m_HasNonNull = True
    End Sub

    ''' <summary>
    ''' Joins two Product aggregates.
    ''' </summary>
    ''' <param name="value">The Product aggregate to join to this one.</param>
    ''' <remarks></remarks>
    Public Sub Merge(ByVal value As Product)

        if value is nothing then exit sub
        if not value.m_HasNonNull then exit sub
        m_Accumulator *= value.m_Accumulator
        m_HasNonNull = True
       
    End Sub

    ''' <summary>
    ''' Returns the result.
    ''' </summary>
    ''' <returns>Product of all non-null arguments.</returns>
    ''' <remarks></remarks>
    Public Function Terminate() As SqlDouble
        if not m_HasNonNull then
            return sqldouble.null
        else
            Return m_Accumulator
        end if
    End Function
End
Class

A user defined aggregate must implement four methods:

Init Initializes the aggregate and reinitializes it when SQL Server chooses to reuse an aggregate class instead of creating a new one. The Product.Init method resets m_Accumulator to 1 and the m_HasNonNull boolean back to false.  These put the aggregate back to it starting point.
Accumulate Called once per row to be aggregated. Product.Aggregate keeps track of non-NULL arguments and does the multiplication.
Merge SQL Server may decide to use multiple threads to perform aggregation and each would use its own ugagg class.  Then the results accumulated into one of the classes is merged into another to combine the results.  Notice that Product.Merge is handed a Product aggregate, not a SQLDouble.  Private members are available to it, which makes the merge process simple.
Terminate Finally, the terminate method returns the results.  Product.Terminate checks to see if any non-NULL values were aggregated.  If not, it returns null.  If non-NULLs were encountered, the product is returned.

There are a couple of choices that have to be made in the design of the Product aggregate:

  • What sort of precision should be maintained?

  • How to handle NULLs?

I chose to use SQLDouble as the data type for the aggregate.  For starters, There's only a limited choice of types that can be when using the Format.Native attribute.  SQLDouble is one of them.  Without Format.Native, the aggregate must implement the IBinarySerializable interface and I preferred not to do that for a first cut at a udagg.

The artribute IsNullIfEmpty:=True covers the case where there are no values to aggregate by returning NULL.  But what's the Product of only NULL values?  I decided it was NULL.  I suppose that an argument could be made for 1.  After all, one raised to the zero power is one.  And you might even argue for zero.  In the end, I decided to mimic the behavior of the SUM aggregate which returns NULL when given only NULL values.  However, the point of udagg's is that these decisions are now in our hands.

Visual Studio 2005 takes care of deploying the assembly to SQL Server 2005 (f.k.a Yukon).  If the assembly is added to SQL Server manually, a CREATE AGGREGATE statement is required.  Here's a sample of what it looks like with an appropriate GRANT statement:

CREATE AGGREGATE [Product](@Value float)
                             Returns [float]
       EXTERNAL NAME [SampleAggregate].[SampleAggregate.Product]
go
GRANT EXEC ON dbo.Product TO PUBLIC
go

After using the Build -> Deploy menu command you're ready to try out Product.  My first instinct was to run my tests in the the new SQL Server Management Studio, which takes over the function of Query Analyzer, but I decided to stick to Visual Studio.  My understanding is that they're actually the same program, just configured differently.  The Visual Studio 2005 Solution Explorer has a node for TestScripts.  All you have to do is pick Add Test Script from its context menu.  I added a script called Test.sql.  Here it is:

-- Simple case with 3 integer sample numbers.
SELECT dbo.product(sample) [Three integers]
    FROM (           SELECT CAST(1.0 as float) as Sample
           UNION ALL SELECT 3
           UNION ALL SELECT 5
         ) Numbers

-- Case with 3 floating point sample numbers and a NULL value
SELECT    dbo.Product(Sample) [Three floats and one NULL]
    FROM (           SELECT CAST(1.745 as float) as Sample
           UNION ALL SELECT 20.7
           UNION ALL SELECT NULL -- <<< NULL value
           UNION ALL SELECT 23.4
         ) Numbers

-- Case of just a NULL value.
SELECT dbo.product(sample) [Just a NULL]
    FROM (           SELECT CAST(NULL as float) as Sample
         ) Numbers

I set Test.sql as the "Default Debug Script" using its context menu and pressed F5, equivalent to Debug->Start.  Here are the results:

(1 row(s) affected)
Three integers 
---------------
15             
(1 row(s) returned)
(1 row(s) affected)
Three floats and one NULL
---------------
845.2431       
(1 row(s) returned)
(1 row(s) affected)
Just a NULL    
---------------
1              
(1 row(s) returned)
Finished running sp_executesql.

There seem to be a few more system messages than in SQL Server 2005 than in 2000 but everything seems in order.  Integers were converted to float by SQL Server.  NULLs were handled the way that I intended.  It worked.

Okay.  It didn't really happen so easily.  I spent a fair amount of time going back and forth between the documentation and the book A First Look at SQL Server 2000 for Developers and my sample code.  All-in-all it wasn't that difficult.

I'll be writing more about programming with the .Net CLR in SQL Server 2005 in coming months.  CLR programming will be the theme for my Coding-in-SQL Newsletter in  September and October so you might want to sign up.  Also in September, I'm giving a presentations on the topic at Microsoft's Code Camp IV (Sept 24-25). If you're in the Boston area you might want to attend.


RSS as HTML

Personal Blog

 
New Tips:

Use dsinit to set the SQL Server instance for Windows Azure dev storage

Upcoming
Presentations:

SQL PASS
Nov 7, '12
Biggest
Loser: DB
Edition


Full Schedule