|
|
SQL Server T-SQL User-Defined Function of the Week
Percentile in SQL Server
Volume 1 Number 45 September 23, 2003
Check out the UDF Frequently Asked Questions at:
http://www.novicksoftware.com/UdfOfWeek/UDF_FAQ.htm
To ask a UDF related question or to contribute a UDF,
send e-mail to udf@NovickSoftware.com
Sign up for this newsletter at:
http://www.novicksoftware.com/UdfOfWeek/UDFofWeekSignup.htm
A couple of readers on the experts-exchange have asked for a
percentile function. If you're not familiar with percentile,
the American Heritage Dictionary defines it as:
One of a set of points on a scale arrived at by dividing a
group into parts in order of magnitude. For example, a score
equal to or greater than 97 percent of those attained on an
examination is said to be in the 97th percentile.
Percentile is used to show how an individual compares to a group.
So my son's physician usually tells us that Tommy is in the 75th
percentile by height and the 55th percentile of weight for
6 year olds. That is to say that he's taller than 74 percent of
six year olds and heavier than 54 percent of them.
There are several ways that you might write a percentile
function. For starters, let's assume that there is a set of
measurements defined by the Measurement table. Here's the
CREATE TABLE script:
/------- Start copying below this line -------------------------\
CREATE TABLE Measurement (
AgeGroup int NOT NULL-- group of measurements
, SubjectID int NOT NULL -- individual measured
, IndividualMeasurement float NOT NULL
, CONSTRAINT Measurement_PK
PRIMARY KEY (AgeGroup, SubjectID)
)
GO
\-------Stop copying above this line ---------------------------/
A function could answer questions about percentile stated in the
following way: Given the Measurement table for MeasurementID 1,
"What percentile is new measurement X compared to the group?"
For example, assuming MeasurementID 6 is for the height of
six year olds, if Tommy is 48 inches, what percentile is he
compared to other six year olds?
Another way to put the question is, "What score would someone
have to achieve to score in the Yth percentile? Let's say what
score would someone have to get to score in the 80th percentile?
The problem with writing these into UDFs is the difficulty
generalizing the function. That's because of the restrictions
that SQL Server places on UDFs. The most relent restrictions on
UDFs are that they can't:
- Execute dynamic SQL
- Call any stored procedures
You'll find additional information on the restrictions placed on
UDFs in Chapter 5 of my book T-SQL User-Defined Functions, to be
published by Wordware in the fall of 2003.
Given the restrictions on UDFs, the easiest way, and probably the
best, is to write a percentile function that's specific to the
table in question. This has the disadvantage that you have to
rewrite it for each table. There really isn't a good way around
it and each one is very easy to write.
Now, to write a UDF for the Measurement table that is defined
above. Let's use the first approach to defining a percentile
function: Given measurement X what percentile is X in the
Measurements table for an AgeGroup?
Here's the CREATE FUNCTION Script:
/------- Start copying below this line -------------------------\
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
CREATE FUNCTION dbo.udf_Measurement_Percentile (
@AgeGroup int -- which group to measurement
, @Measure float -- measurement to check
) RETURNS float
-- NO SCHEMABINDING due to use of a table
/*
* Returns the percentile that @Measure is in the Measurement
* table for @AgeGroup.
*
* Example:
select dbo.udf_Measurement_Percentile (6, 48) as [Percentile]
*
* © Copyright 2003 Andrew Novick http://www.NovickSoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain
* other unrelated database objects. You may not publish this
* UDF either in print or electronically.
* Published in T-SQL UDF of the Week Newsletter Vol 1 #45 9/23/03
http://www.NovickSoftware.com/UDFofWeek/UDFofWeek.htm
****************************************************************/
AS BEGIN
DECLARE @NumberBelow float -- # with lower measurements
, @TotalMeasurements float -- how many in the group
SELECT @NumberBelow=SUM(CASE WHEN IndividualMeasurement <@Measure
THEN 1 ELSE 0 END
)
, @TotalMeasurements = COUNT(*)
FROM Measurement
WHERE AgeGroup = @AgeGroup
RETURN CASE WHEN @TotalMeasurements = 0
THEN 0
ELSE 100.0 * (@NumberBelow/@TotalMeasurements)
END
END
GO
GRANT EXEC on dbo.udf_Measurement_Percentile to PUBLIC
GO
\-------Stop copying above this line ---------------------------/
To test the UDF, we'll need some data. I've written the
following stored procedure to populate the Measurement table:
/------- Start copying below this line -------------------------\
CREATE PROC usp_Measurement_Populate
@AgeGroup int -- the ID to populate
, @NumSamples int -- how many
, @MinMeasure float -- lowest
, @MaxMeasure float -- highest
/*
* Creates a sample set of random data using a uniform
* distribution.
* Example
DECLARE @RC int
EXEC @RC = usp_Measurement_Populate 6, 1000, 30, 60
****************************************************************/
AS
DECLARE @I INT -- LOOP COUNTER
DECLARE @Measure float -- sample measurement
SET NOCOUNT ON
DELETE FROM Measurement WHERE AgeGroup = @AgeGroup
SET @I = 1
WHILE @I <= @NumSamples BEGIN
SET @Measure = @MinMeasure
+ (@MaxMeasure-@MinMeasure) * RAND()
INSERT INTO Measurement
(AgeGroup, SubjectID, IndividualMeasurement)
VALUES (@AgeGroup, @I, @Measure)
SET @I = @I + 1
END -- LOOP
\-------Stop copying above this line ---------------------------/
Now, execute the sp for the age group with this script:
/------- Start copying below this line -------------------------\
DECLARE @RC int
EXEC @RC = usp_Measurement_Populate 6, 1000, 30, 60
\-------Stop copying above this line ---------------------------/
Finally, we can use the UDF with this script:
/------- Start copying below this line -------------------------\
select dbo.udf_Measurement_Percentile (6, 48) as [Percentile]
\-------Stop copying above this line ---------------------------/
(Results)
Percentile
-----------------------------------------------------
58.199999999999996
(1 row(s) affected)
The other UDF that could be written to answer the question, "What
measurement is needed to be in at or above the 60th percentile?"
is more complicated. I'll leave that for another issue.
Please share this newsletter with anyone interested in SQL Server.
Do you have a great UDF that you'd like to share? Or maybe you
have a T-SQL problem that you think could be solved by a UDF
but you don't know how? Send them to:
UDF@NovickSoftware.com
and they might be published in this newsletter. I try and
respond to every request that I get.
Thanks,
Andrew Novick
+--------------------------------------------------------------+
|
|