Analytics in Writing: Turning Score into Probabilistic Grouping: The Fraction Option In Proc Rank

The SAS code below turns raw score into probability based groups

"
%let rankme =crscore;

proc rank data=indsn(keep=&rankme.) fraction ties=mean out=outdsn.;
var &rankme.;
ranks &rankme._ranked ;
run;

proc means data=outdsn. n nmiss min mean median max range std;
run ;
"

Variable	N	N Miss	Minimum	Mean	Median	Maximum	Range	Std Dev
CrScore	39779	0	365	493.69683	495	610	245	28.819612
crscore_ranked	39779	0	2.5139E-05	0.5000126	0.506637	1	0.999975	0.288662

The Fraction option is in parallel to the Group option that is used most often and longest. The Fraction option allows for probability based grouping, normalizes the distribution and caps it between 0 and 1. One variation of the Fraction option is NPLUS1 that yields similar results.

In this case, the original 39,779 observations are collapsed to 257 groups. The following is a portion of the group distribution

crscore_ranked	Frequency

0.952097841	162
0.955805827	133
0.959099022	129
0.962191106	117
0.965119787	116
0.967809648	98
0.970310968	101
0.97277458	95
0.974936524	77
0.976947636	83
0.978757636	61
0.980316247	63
0.982101109	79
0.983785414	55
0.98501722	43
0.986110762	44
0.987242012	46
0.988297846	38

Computation wise, the Fraction and NPLUS1 options are among those Proc Rank options supported through SAS in-DB technology. As of today February 2nd, 2013, the supported databases include Oracle, Teradata, Netezza and DB2. The probablistic grouping can be executed inside supported database tables without having to query and move big data to SAS environment.

Analytics in Writing

Saturday, February 2, 2013

Turning Score into Probabilistic Grouping: The Fraction Option In Proc Rank

No comments:

Post a Comment

About Me