Wednesday, March 28, 2012

Persisting data in custom aggregate

I want to create a custom aggregate to calculate percentiles.
For example:
The 50th percentile is calculated by extracting the value(s) in the centre
of a sorted dataset.
What I need in this case is a sorted dataset so that I can extract values at
specific indexes to be able to calculate different percentiles.
1) At the moment I'm using a Private ArrayList to persist the intermediate
data.
2) The Accumalate method is used to add the data to the ArrayList.
3) I use the Terminate method to sort the ArrayList and extract the values I
need at specific index(es)
4) The Merge method appends two ArrayList's
Unfortunately I have no idea what to do in the Read and Write methods
concerning the ArrayList.
The way I understand it is that the Write method should write the ArratList
in a binary format which the Read are able to consume.
I am quite new at this so any ideas would be appreciated.
JR MalherbeJames
Does it relate somehow to SQL Server?
If it does ,please post DDL+ sample data+ expected result.
"JamesM" <JamesM@.discussions.microsoft.com> wrote in message
news:BE227508-8478-4758-903B-5DDB2E25D723@.microsoft.com...
>I want to create a custom aggregate to calculate percentiles.
> For example:
> The 50th percentile is calculated by extracting the value(s) in the centre
> of a sorted dataset.
> What I need in this case is a sorted dataset so that I can extract values
> at
> specific indexes to be able to calculate different percentiles.
> 1) At the moment I'm using a Private ArrayList to persist the intermediate
> data.
> 2) The Accumalate method is used to add the data to the ArrayList.
> 3) I use the Terminate method to sort the ArrayList and extract the values
> I
> need at specific index(es)
> 4) The Merge method appends two ArrayList's
> Unfortunately I have no idea what to do in the Read and Write methods
> concerning the ArrayList.
> The way I understand it is that the Write method should write the
> ArratList
> in a binary format which the Read are able to consume.
> I am quite new at this so any ideas would be appreciated.
> --
> JR Malherbe|||JamesM wrote:
> I want to create a custom aggregate to calculate percentiles.
> For example:
> The 50th percentile is calculated by extracting the value(s) in the centre
> of a sorted dataset.
> What I need in this case is a sorted dataset so that I can extract values
at
> specific indexes to be able to calculate different percentiles.
> 1) At the moment I'm using a Private ArrayList to persist the intermediate
> data.
> 2) The Accumalate method is used to add the data to the ArrayList.
> 3) I use the Terminate method to sort the ArrayList and extract the values
I
> need at specific index(es)
> 4) The Merge method appends two ArrayList's
> Unfortunately I have no idea what to do in the Read and Write methods
> concerning the ArrayList.
> The way I understand it is that the Write method should write the ArratLis
t
> in a binary format which the Read are able to consume.
> I am quite new at this so any ideas would be appreciated.
> --
> JR Malherbe
SQL Server 2005? Use the NTILE or RANK aggregate functions. I don't see
why you would need a user-defined aggregate but if you still think you
do then please give us a better spec.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||"Uri Dimant" wrote:

> James
> Does it relate somehow to SQL Server?
> If it does ,please post DDL+ sample data+ expected result.
This relates to extending SQL2005 with CLR functions (in this case a user
defined aggregate)
I'm trying to write a aggregate which is doing the same as the PERSENTILE
function in Excel
I'll post my code and expected results with David Portas|||"David Portas" wrote:
> SQL Server 2005? Use the NTILE or RANK aggregate functions. I don't see
> why you would need a user-defined aggregate but if you still think you
> do then please give us a better spec.
Unfortunately not
The function needs to do what the PERCENTILE function in Excel does
example 1 (uneven number of records)
You have a sorted list of values: (20,23,34,56,58,61,67,70,72,84,85)
In this case the the 50th percentile is the center value = 61
example 2 (even number of records)
You have a sorted list of values: (20,23,34,56,58,61,67,70,72,84,85,88)
In this case the the 50th percentile is the average of the two center values
= (61+67)/2 = 64
Here is my code:
using System;
using System.Data;
using System.Data.Sql;
using System.Data.SqlTypes;
using System.Collections;
using Microsoft.SqlServer.Server;
using System.IO;
[Serializable]
[SqlUserDefinedAggregate(
Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = false,
MaxByteSize = 8000)
]
public class Median : IBinarySerialize
{
private ArrayList intermediateResult;
public void Init()
{
intermediateResult = new ArrayList();
}
public void Accumulate(SqlDouble Value)
{
if (Value.IsNull)
{
return;
}
intermediateResult.Add(Value.Value);
}
public void Merge(Median Group)
{
intermediateResult.InsertRange(0, Group.intermediateResult);
}
public SqlDouble Terminate()
{
double output = double.NaN;
if (intermediateResult != null && intermediateResult.Count > 0)
{
double cntItem = intermediateResult.Count;
if (cntItem > 10)
{
intermediateResult.Sort();
double pos_relative = (cntItem - 1) * 0.5; //median
int pos1 = Convert.ToInt32(Math.Floor(pos_relative));
double pos_fraction = pos_relative - pos1;
if (pos_fraction == 0)
output = Convert.ToDouble(intermediateResult[pos1]);
else
output = Convert.ToDouble(intermediateResult[pos1]) +
pos_fraction * (Convert.ToDouble(intermediateResult[pos1 + 1]) -
Convert.ToDouble(intermediateResult[pos1]));
}
}
return new SqlDouble(output);
}
public void Read(BinaryReader r)
{
//need to read binary to intermediateResult from the format that was
used below
}
public void Write(BinaryWriter w)
{
//need to write intermediateResult to a binary format
}
}|||James
CREATE TABLE #Test
(
num INT
)
INSERT INTO #Test VALUES (20)
INSERT INTO #Test VALUES (23)
INSERT INTO #Test VALUES (34)
INSERT INTO #Test VALUES (56)
INSERT INTO #Test VALUES (58)
INSERT INTO #Test VALUES (61)
INSERT INTO #Test VALUES (67)
INSERT INTO #Test VALUES (70)
INSERT INTO #Test VALUES (72)
INSERT INTO #Test VALUES (84)
INSERT INTO #Test VALUES (85)
INSERT INTO #Test VALUES (88)
SELECT AVG( b3.num )
FROM (
SELECT MAX( b1.num )
FROM (
SELECT TOP 50 PERCENT b.num
FROM #Test AS b
ORDER BY b.num ASC
) AS b1
UNION ALL
SELECT MIN( b2.num )
FROM (
SELECT TOP 50 PERCENT b.num
FROM #Test AS b
ORDER BY b.num DESC
) AS b2
) AS b3( num )
"JamesM" <JamesM@.discussions.microsoft.com> wrote in message
news:7945781D-56E4-4C5B-B915-2AD4284FBA04@.microsoft.com...
> "David Portas" wrote:
> Unfortunately not
>
> The function needs to do what the PERCENTILE function in Excel does
> example 1 (uneven number of records)
> You have a sorted list of values: (20,23,34,56,58,61,67,70,72,84,85)
> In this case the the 50th percentile is the center value = 61
> example 2 (even number of records)
> You have a sorted list of values: (20,23,34,56,58,61,67,70,72,84,85,88)
> In this case the the 50th percentile is the average of the two center
> values
> = (61+67)/2 = 64
>
> Here is my code:
> using System;
> using System.Data;
> using System.Data.Sql;
> using System.Data.SqlTypes;
> using System.Collections;
> using Microsoft.SqlServer.Server;
> using System.IO;
> [Serializable]
> [SqlUserDefinedAggregate(
> Format.UserDefined,
> IsInvariantToNulls = true,
> IsInvariantToDuplicates = false,
> IsInvariantToOrder = false,
> MaxByteSize = 8000)
> ]
> public class Median : IBinarySerialize
> {
> private ArrayList intermediateResult;
> public void Init()
> {
> intermediateResult = new ArrayList();
> }
> public void Accumulate(SqlDouble Value)
> {
> if (Value.IsNull)
> {
> return;
> }
> intermediateResult.Add(Value.Value);
> }
> public void Merge(Median Group)
> {
> intermediateResult.InsertRange(0, Group.intermediateResult);
> }
> public SqlDouble Terminate()
> {
> double output = double.NaN;
> if (intermediateResult != null && intermediateResult.Count > 0)
> {
> double cntItem = intermediateResult.Count;
> if (cntItem > 10)
> {
> intermediateResult.Sort();
> double pos_relative = (cntItem - 1) * 0.5; //median
> int pos1 = Convert.ToInt32(Math.Floor(pos_relative));
> double pos_fraction = pos_relative - pos1;
> if (pos_fraction == 0)
> output = Convert.ToDouble(intermediateResult[pos1]);
> else
> output = Convert.ToDouble(intermediateResult[pos1]) +
> pos_fraction * (Convert.ToDouble(intermediateResult[pos1 + 1]) -
> Convert.ToDouble(intermediateResult[pos1]));
> }
> }
> return new SqlDouble(output);
> }
> public void Read(BinaryReader r)
> {
> //need to read binary to intermediateResult from the format that
> was
> used below
> }
> public void Write(BinaryWriter w)
> {
> //need to write intermediateResult to a binary format
> }
> }
>|||Thanks Uri
We also have workarouds in SQL, which is very tedious if you are calculating
different percentiles (10th, 25th, 50th,...) in different columns and at
different break levels.
We chose to go the route of a custom aggregate since this will save us lots
of work in the next 10 months.
All I need to know is how to persist an ArrayList for a user defined
aggregate.

No comments:

Post a Comment