Proc Logistic is one of the most popular and widely used procedures in SAS products for logistic regression model building. To cover all major big data facilities in the new Proc HPLOGSITIC likely will yield a big paper. This writing focuses on one key aspect, the optimization techniques for maximum likelihood estimation and includes some excerpts from HPLOGISTIC user guide where presentation / explanation is the best
Under Proc Logistic, the default optimization technique is Fisher Scoring. One can change it to Newton Raphson (NF). One can also set Ridge option to Absolute or Relative. All is done under Model Statement. Under Proc HPLOGISTIC, Fisher Scoring disappears entirely. The default optimization technique is set at Newton Raphson Ridge, or NRRIDE. The table lists all the options
"For many optimization problems, computing the gradient takes more computer time than computing the function value. Computing the Hessian sometimes takes much more computer time and memory than computing the gradient, especially when there are many decision variables. Unfortunately, optimization techniques that do not use some kind of Hessian approximation usually require many more iterations than techniques that do use a Hessian matrix, and, as a result the total run time of these techniques is often longer. Techniques that do not use the Hessian also tend to be less reliable. For example, they can terminate more ".
Time taken to computer gradient, function value, Hessian (where applicable), number of decision variables involved are among the key choice factors
- Second-derivative methods include TRUREG, NEWRAP, and NRRIDG (best for small problems for which the Hessian matrix is not expensive to compute. This does not necessarily say calculating Hessian matrix, for small problems or not, is not expensive. 'Small problems' still vary a lot)
- If you want to replicate your old model where Fisher Scoring is used, you can use NRRIDG. Where your target is binary, you may get identical results. Otherwise results may be slightly different (mainly estimation coefficients)
- First-derivative methods include QUANEW and DBLDOG (best for medium-sized problems for which the objective function and the gradient can be evaluated much faster than the Hessian). In general, the QUANEW and DBLDOG algorithms require more iterations than the Second-derivative methods above, but each iteration can be much faster. The QUANEW and DBLDOG algorithms require only the gradient to update an approximate Hessian, and they require slightly less memory than TRUREG or NEWRAP.
- Because CONGRA requires only a factor of p double-word memory, many large applications can be solved only by CONGRA. However, I personally feel the computational beauty of CONGRA may actually be overstated a bit
- All these insights and guidelines are of course to be vetted and reckoned with other key aspects such as selection criteria (selection in HPLOGISTIC, by the way , has become a separate statement under the procedure, unlike Proc Logistic where Selection is a Model statement option)
My next writing on SAS logistic regression will cover selection criteria.