N S

Novick Software Management • Design • Programming • Training • Consulting

   

 

 

SQL Server T-SQL User-Defined Function of the Week

Percentile in SQL Server

Volume 1 Number 45     September 23, 2003

Check out the UDF Frequently Asked Questions at:
http://www.novicksoftware.com/UdfOfWeek/UDF_FAQ.htm

To ask a UDF related question or to contribute a UDF,
send e-mail to udf@NovickSoftware.com

Sign up for this newsletter at:
http://www.novicksoftware.com/UdfOfWeek/UDFofWeekSignup.htm
A couple of readers on the experts-exchange have asked for a 
percentile function.  If you're not familiar with percentile, 
the American Heritage Dictionary defines it as:

    One of a set of points on a scale arrived at by dividing a 
    group into parts in order of magnitude. For example, a score 
    equal to or greater than 97 percent of those attained on an 
    examination is said to be in the 97th percentile.

Percentile is used to show how an individual compares to a group.  
So my son's physician usually tells us that Tommy is in the 75th 
percentile by height and the 55th percentile of weight for 
6 year olds.  That is to say that he's taller than 74 percent of 
six year olds and heavier than 54 percent of them.  

There are several ways that you might write a percentile 
function.  For starters, let's assume that there is a set of 
measurements defined by the Measurement table.  Here's the 
CREATE TABLE script:

/------- Start copying below this line -------------------------\
CREATE TABLE Measurement (
        AgeGroup int NOT NULL-- group of measurements
      , SubjectID int NOT NULL -- individual measured
      , IndividualMeasurement float NOT NULL
      , CONSTRAINT Measurement_PK 
                      PRIMARY KEY (AgeGroup, SubjectID)
      )
GO
\-------Stop copying above this line ---------------------------/
     
A function could answer questions about percentile stated in the 
following way: Given the Measurement table for MeasurementID 1,
"What percentile is new measurement X compared to the group?"  
For example, assuming MeasurementID 6 is for the height of 
six year olds, if Tommy is 48 inches, what percentile is he 
compared to other six year olds?
 
Another way to put the question is, "What score would someone 
have to achieve to score in the Yth percentile?  Let's say what 
score would someone have to get to score in the 80th percentile?

The problem with writing these into UDFs is the difficulty 
generalizing the function.  That's because of the restrictions 
that SQL Server places on UDFs.  The most relent restrictions on 
UDFs are that they can't:

    - Execute dynamic SQL
    - Call any stored procedures

You'll find additional information on the restrictions placed on
UDFs in Chapter 5 of my book T-SQL User-Defined Functions, to be
published by Wordware in the fall of 2003.

Given the restrictions on UDFs, the easiest way, and probably the
best, is to write a percentile function that's specific to the 
table in question.  This has the disadvantage that you have to 
rewrite it for each table. There really isn't a good way around
it and each one is very easy to write.

Now, to write a UDF for the Measurement table that is defined 
above.  Let's use the first approach to defining a percentile 
function: Given measurement X what percentile is X in the 
Measurements table for an AgeGroup?  

Here's the CREATE FUNCTION Script:
/------- Start copying below this line -------------------------\
SET QUOTED_IDENTIFIER ON 
SET ANSI_NULLS ON 
GO

CREATE FUNCTION dbo.udf_Measurement_Percentile (

    @AgeGroup int -- which group to measurement
 ,  @Measure float -- measurement to check
)   RETURNS float
    -- NO SCHEMABINDING due to use of a table
/* 
* Returns the percentile that @Measure is in the Measurement
* table for @AgeGroup.
*
* Example:
select dbo.udf_Measurement_Percentile (6, 48) as [Percentile]
*
* © Copyright 2003 Andrew Novick http://www.NovickSoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain 
* other unrelated database objects. You may not publish this 
* UDF either in print or electronically.
* Published in T-SQL UDF of the Week Newsletter Vol 1 #45 9/23/03
http://www.NovickSoftware.com/UDFofWeek/UDFofWeek.htm
****************************************************************/
AS BEGIN

DECLARE @NumberBelow float      -- # with lower measurements
      , @TotalMeasurements float -- how many in the group
  
SELECT @NumberBelow=SUM(CASE WHEN IndividualMeasurement <@Measure
                             THEN 1 ELSE 0 END
                       )
     , @TotalMeasurements = COUNT(*)
    FROM Measurement
    WHERE AgeGroup = @AgeGroup

  RETURN CASE WHEN @TotalMeasurements = 0 
              THEN 0
              ELSE 100.0 * (@NumberBelow/@TotalMeasurements)
              END

END
GO

GRANT EXEC on dbo.udf_Measurement_Percentile to PUBLIC
GO
\-------Stop copying above this line ---------------------------/

To test the UDF, we'll need some data.  I've written the 
following stored procedure to populate the Measurement table:

/------- Start copying below this line -------------------------\
CREATE PROC usp_Measurement_Populate

    @AgeGroup int -- the ID to populate
  , @NumSamples int -- how many
  , @MinMeasure float -- lowest
  , @MaxMeasure float -- highest
/* 
* Creates a sample set of random data using a uniform
* distribution.
* Example
DECLARE @RC int
EXEC @RC = usp_Measurement_Populate 6, 1000, 30, 60
****************************************************************/
AS 

DECLARE @I INT -- LOOP COUNTER
DECLARE @Measure float -- sample measurement

SET NOCOUNT ON 

DELETE FROM Measurement WHERE AgeGroup = @AgeGroup

SET @I = 1
WHILE @I <= @NumSamples BEGIN

    SET @Measure = @MinMeasure
                   + (@MaxMeasure-@MinMeasure) * RAND()  
    INSERT INTO Measurement  
       (AgeGroup, SubjectID, IndividualMeasurement)
       VALUES (@AgeGroup, @I, @Measure)

    SET @I = @I + 1
END -- LOOP
\-------Stop copying above this line ---------------------------/
      

Now, execute the sp for the age group with this script:
/------- Start copying below this line -------------------------\
DECLARE @RC int
EXEC @RC = usp_Measurement_Populate 6, 1000, 30, 60
\-------Stop copying above this line ---------------------------/

Finally, we can use the UDF with this script:
/------- Start copying below this line -------------------------\
select dbo.udf_Measurement_Percentile (6, 48) as [Percentile] 
\-------Stop copying above this line ---------------------------/
(Results)
Percentile                                            
----------------------------------------------------- 
                                   58.199999999999996 

(1 row(s) affected)

The other UDF that could be written to answer the question, "What
measurement is needed to be in at or above the 60th percentile?"
is more complicated.  I'll leave that for another issue.

Please share this newsletter with anyone interested in SQL Server.

Do you have a great UDF that you'd like to share?  Or maybe you
have a T-SQL problem that you think could be solved by a UDF
but you don't know how? Send them to:

UDF@NovickSoftware.com 

and they might be published in this newsletter.  I try and 
respond to every request that I get.  

Thanks,  
Andrew Novick
+--------------------------------------------------------------+


RSS as HTML

Personal Blog

 
New Tips:

Loading SQL Profiler trace (.trc) files with fn_trace_gettrace

Upcoming
Presentations:

Big Data:
Working with Terabytes
in SQL Server

July 9 '08

Nov 19-21
SQL Pass 2008


Full Schedule