Title: | A Tool for Automating the Statistical Disclosure Control of Research Outputs |
---|---|
Description: | Assists researchers and output checkers by distinguishing between research output that is safe to publish, output that requires further analysis, and output that cannot be published because of substantial disclosure risk. A paper about the tool was presented at the UNECE Expert Meeting on Statistical Data Confidentiality 2023; see <https://uwe-repository.worktribe.com/output/11060964>. |
Authors: | Jim Smith [cre, ctb] , Maha Albashir [aut, ctb], Richard John Preen [aut, ctb] |
Maintainer: | Jim Smith <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2024-11-20 13:31:58 UTC |
Source: | https://github.com/ai-sdc/acro-r |
Add comments to outputs
acro_add_comments(name, comment)
acro_add_comments(name, comment)
name |
The name of the output. |
comment |
The comment. |
No return value, called for side effects
Adds an exception request to an output.
acro_add_exception(name, reason)
acro_add_exception(name, reason)
name |
The name of the output. |
reason |
The comment. |
No return value, called for side effects
Compute a simple cross tabulation of two (or more) factors.
acro_crosstab(index, columns, values = NULL, aggfunc = NULL)
acro_crosstab(index, columns, values = NULL, aggfunc = NULL)
index |
Values to group by in the rows. |
columns |
Values to group by in the columns. |
values |
Array of values to aggregate according to the factors. Requires |
aggfunc |
If specified, requires |
Cross tabulation of the data
Adds an unsupported output to the results dictionary
acro_custom_output(filename, comment = NULL)
acro_custom_output(filename, comment = NULL)
filename |
The name of the file that will be added to the list of the outputs. |
comment |
An optional comment. |
No return value, called for side effects
Creates a results file for checking.
acro_finalise(path, ext)
acro_finalise(path, ext)
path |
Name of a folder to save outputs. |
ext |
Extension of the results file. Valid extensions are json or xlsx. |
No return value, called for side effects
Fits Logit or Probit model.
acro_glm(formula, data, family)
acro_glm(formula, data, family)
formula |
The formula specifying the model. |
data |
The data for the model. |
family |
Decide whether to fit a logit or probit model. |
Regression Results Wrapper
Histogram
acro_hist( data, column, breaks = 10, freq = TRUE, col = NULL, filename = "histogram.png" )
acro_hist( data, column, breaks = 10, freq = TRUE, col = NULL, filename = "histogram.png" )
data |
The object holding the data. |
column |
The column that will be used to plot the histogram. |
breaks |
Number of histogram bins to be used. |
freq |
If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin. |
col |
The color of the plot. |
filename |
The name of the file where the plot will be saved. |
The histogram.
Initialise an ACRO object
acro_init(suppress = FALSE)
acro_init(suppress = FALSE)
suppress |
Whether to automatically apply suppression. |
No return value, called for side effects
Fits Ordinary Least Squares Regression
acro_lm(formula, data)
acro_lm(formula, data)
formula |
The formula specifying the model. |
data |
The data for the model. |
Regression Results Wrapper.
Pivot table
acro_pivot_table( data, values = NULL, index = NULL, columns = NULL, aggfunc = "mean" )
acro_pivot_table( data, values = NULL, index = NULL, columns = NULL, aggfunc = "mean" )
data |
The data to operate on. |
values |
Column to aggregate, optional. |
index |
If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values. |
columns |
If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values. |
aggfunc |
If list of strings passed, the resulting pivot table will have hierarchical columns whose top level are the function names |
Cross tabulation of the data.
Prints the current results dictionary.
acro_print_outputs()
acro_print_outputs()
No return value, called for side effects
Remove outputs
acro_remove_output(name)
acro_remove_output(name)
name |
Key specifying which output to remove, e.g., 'output_0'. |
No return value, called for side effects
Rename outputs
acro_rename_output(old, new)
acro_rename_output(old, new)
old |
The old name of the output. |
new |
The new name of the output. |
No return value, called for side effects
Survival analysis
acro_surv_func(time, status, output, filename = "kaplan-meier.png")
acro_surv_func(time, status, output, filename = "kaplan-meier.png")
time |
An array of times (censoring times or event times). |
status |
Status at the event time. |
output |
A string determine the type of output. Available options are table or plot. |
filename |
The name of the file where the plot will be saved. |
The survival table or plot.
Compute a simple cross tabulation of two (or more) factors.
acro_table(index, columns, dnn = NULL, deparse.level = 0, ...)
acro_table(index, columns, dnn = NULL, deparse.level = 0, ...)
index |
Values to group by in the rows. |
columns |
Values to group by in the columns. |
dnn |
The names to be given to the dimensions in the result |
deparse.level |
Controls how the default |
... |
Any other parameters. |
Cross tabulation of the data
Create a python virtual environment
create_virtualenv(...)
create_virtualenv(...)
... |
Any other parameters. |
No return value, called for side effects
Install acro
install_acro(envname = "r-acro", ...)
install_acro(envname = "r-acro", ...)
envname |
the name of the Python virtual environment |
... |
Any other parameters. |
No return value, called for side effects
The lung dataset contains information about lung cancer survival.
lung
lung
A data frame with columns:
institutional identification
Survival time in months.
Survival status (1 = death, 0 = censored).
Age of the patient at the start of the study.
Gender of the patient.
Performance status (Eastern Cooperative Oncology Group).
'Karnofsky' performance status.
'Karnofsky' performance status as assessed by the patient.
Daily caloric intake at the start of the study.
Weight loss in the last six months.
data(lung)
data(lung)
This dataset is originated from a hierarchical decision model created to evaluate applications for nursery schools.
nursery_data
nursery_data
A data frame with columns: A data frame with 12960 rows and 9 columns:
Parents' occupation
Child's nursery
Form of the family
Number of children
Housing conditions
Financial standing of the family
Social conditions
Health conditions
The ranking of applications for nursery schools
https://www.openml.org/search?type=data&status=active&id=26&sort=runs
data(nursery_data)
data(nursery_data)