Disclosure risk reduction for generalized linear model output in a remote analysis system

https://www.sciencedirect.com/science/article/pii/S0169023X16301483

Remote analysis systems allow analysts to obtain statistical results without providing direct access to confidential data stored in a secure server system. An attacking analyst could send queries to a remote server to obtain outputs of statistical analyses and use those outputs for a disclosure attack. Statistical disclosure control (SDC) methods are used to modify remote analysis system (RAS) outputs in the protection of confidential information. Confidentiality protection through perturbation is one of the most commonly adopted SDC methods. In the case of generalized linear modelling, random noise is added to the estimated coefficients or to the associated estimating equation prior to getting estimates. This inflates the variances of estimators, and some efficiency and utility of estimators are lost. Thus the application of any perturbation based SDC method could result in an inefficient estimator, with the danger of producing worthless inferences. To date, little attention has been given to systematically controlling the disclosure risk and utility in SDC methods for RAS. In this paper, we develop a framework for the perturbation of estimating equations that enables an RAS to release modified generalized linear model output in such a way that the disclosure risk is not only reduced but also a good utility is maintained. Finally, we present some empirical results demonstrating the application of our framework for obtaining estimates from perturbed estimating equations of binary and count response models.

Leave a comment