"
%let rankme =crscore;
proc rank data=indsn(keep=&rankme.) fraction ties=mean out=outdsn.;
var &rankme.;
ranks &rankme._ranked ;
run;
proc means data=outdsn. n nmiss min mean median max range std;
run ;
"
Variable | N | N Miss | Minimum | Mean | Median | Maximum | Range | Std Dev |
CrScore | 39779 | 0 | 365 | 493.69683 | 495 | 610 | 245 | 28.819612 |
crscore_ranked | 39779 | 0 | 2.5139E-05 | 0.5000126 | 0.506637 | 1 | 0.999975 | 0.288662 |
The Fraction option is in parallel to the Group option that is used most often and longest. The Fraction option allows for probability based grouping, normalizes the distribution and caps it between 0 and 1. One variation of the Fraction option is NPLUS1 that yields similar results.
In this case, the original 39,779 observations are collapsed to 257 groups. The following is a portion of the group distribution
crscore_ranked | Frequency |
0.952097841 | 162 |
0.955805827 | 133 |
0.959099022 | 129 |
0.962191106 | 117 |
0.965119787 | 116 |
0.967809648 | 98 |
0.970310968 | 101 |
0.97277458 | 95 |
0.974936524 | 77 |
0.976947636 | 83 |
0.978757636 | 61 |
0.980316247 | 63 |
0.982101109 | 79 |
0.983785414 | 55 |
0.98501722 | 43 |
0.986110762 | 44 |
0.987242012 | 46 |
0.988297846 | 38 |
Computation wise, the Fraction and NPLUS1 options are among those Proc Rank options supported through SAS in-DB technology. As of today February 2nd, 2013, the supported databases include Oracle, Teradata, Netezza and DB2. The probablistic grouping can be executed inside supported database tables without having to query and move big data to SAS environment.
No comments:
Post a Comment