Title: | General to Specific Modeling and Indicator Saturation in 2SLS Models |
---|---|
Description: | Provides facilities of general to specific model selection for exogenous regressors in 2SLS models. Furthermore, indicator saturation methods can be used to detect outliers and structural breaks in the sample. |
Authors: | Kurle Jonas [aut, cre] |
Maintainer: | Kurle Jonas <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.2 |
Built: | 2024-10-13 04:49:58 UTC |
Source: | https://github.com/jkurle/ivgets |
A data set containing dependent variable, endogenous and exogenous regressors, and excluded instruments for 2SLS models. The structural error is also stored even though not observed in practice.
artificial2sls
artificial2sls
A data frame with 100 observations (rows) and 16 variables (columns):
name | variable description |
y | dependent variable |
x1 | intercept |
x2 | relevant exogenous regressor |
x3 | irrelevant exogenous regressor |
x4 | irrelevant exogenous regressor |
x5 | irrelevant exogenous regressor |
x6 | irrelevant exogenous regressor |
x7 | irrelevant exogenous regressor |
x8 | irrelevant exogenous regressor |
x9 | irrelevant exogenous regressor |
x10 | irrelevant exogenous regressor |
x11 | relevant endogenous regressor |
u | structural error (in practice unobserved) |
z11 | excluded instrument |
z12 | excluded instrument |
id | unique observation identifier |
A data set containing dependent variable, endogenous and exogenous regressors, and excluded instruments for 2SLS models. The structural error is also stored even though not observed in practice. Some errors are contaminated, making these observations outliers.
artificial2sls_contaminated
artificial2sls_contaminated
A data frame with 100 observations (rows) and 16 variables (columns):
name | variable description |
y | dependent variable |
x1 | intercept |
x2 | relevant exogenous regressor |
x3 | irrelevant exogenous regressor |
x4 | irrelevant exogenous regressor |
x5 | irrelevant exogenous regressor |
x6 | irrelevant exogenous regressor |
x7 | irrelevant exogenous regressor |
x8 | irrelevant exogenous regressor |
x9 | irrelevant exogenous regressor |
x10 | irrelevant exogenous regressor |
x11 | relevant endogenous regressor |
u | structural error (in practice unobserved) |
z11 | excluded instrument |
z12 | excluded instrument |
id | unique observation identifier |
The data frame has two additional attributes that store the indices
of the outliers, "outliers"
, and their magnitudes
"magnitude"
.
Artificial data set without outliers prepared for shiny application.
artificial2sls_shiny
artificial2sls_shiny
A data frame with 100 observations (rows) and 17 variables (columns):
name | variable description |
y | dependent variable |
x1 | intercept |
x2 | relevant exogenous regressor |
x3 | irrelevant exogenous regressor |
x4 | irrelevant exogenous regressor |
x5 | irrelevant exogenous regressor |
x6 | irrelevant exogenous regressor |
x7 | irrelevant exogenous regressor |
x8 | irrelevant exogenous regressor |
x9 | irrelevant exogenous regressor |
x10 | irrelevant exogenous regressor |
x11 | relevant endogenous regressor |
u | structural error (in practice unobserved) |
z11 | excluded instrument |
z12 | excluded instrument |
id | unique observation identifier |
is.outlier | factor variable whether the observation is an outlier (1 ) or not (0 )
|
extract_variables
takes a formula object for ivreg::ivreg()
, i.e.
in a format of y ~ x1 + x2 | x1 + z2
and extracts the different
elements in a list.
extract_variables(formula)
extract_variables(formula)
formula |
A formula for the ivreg::ivreg function, i.e. in format
|
extract_variables
returns a list with three components:
$yvar
stores the name of the dependent variable, $first
the
names of the regressors of the first stage and $second
the names of
the second stage regressors.
factory_indicators
creates a function that takes the name of an
indicator and returns the corresponding indicator to be used in a regression.
For user-specified indicators, it extracts the corresponding column from the
uis matrix.
factory_indicators(n)
factory_indicators(n)
n |
An integer specifying the length of the indicators. |
Argument n
should equal the number of observations in the
data set which will be augmented with the indicators.
The created function takes a name of an indicator and the original uis argument that was used in indicator saturation and returns the indicator.
factory_indicators
returns a function called creator()
.
gets.ivreg
conducts general-to-specific model selection on an ivreg
object returned by ivreg::ivreg()
.
## S3 method for class 'ivreg' gets( x, gum.result = NULL, t.pval = 0.05, wald.pval = t.pval, do.pet = TRUE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, include.gum = FALSE, include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, alarm = FALSE, keep_exog = NULL, overid = NULL, weak = NULL, ... )
## S3 method for class 'ivreg' gets( x, gum.result = NULL, t.pval = 0.05, wald.pval = t.pval, do.pet = TRUE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, include.gum = FALSE, include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, alarm = FALSE, keep_exog = NULL, overid = NULL, weak = NULL, ... )
x |
An object of class |
gum.result |
a |
t.pval |
|
wald.pval |
|
do.pet |
|
ar.LjungB |
a two element |
arch.LjungB |
a two element |
normality.JarqueB |
|
include.gum |
|
include.1cut |
|
include.empty |
|
max.paths |
|
turbo |
|
tol |
numeric value ( |
max.regs |
|
print.searchinfo |
|
alarm |
|
keep_exog |
A numeric vector of indices or a character vector of names
corresponding to the exogenous regressors in the |
overid |
|
weak |
|
... |
Further arguments passed to or from other methods. |
Returns a list of class "ivgets"
with three named elements.
$selection
stores the selection results from
getsFun
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics. $keep
stores the names of the regressors that were
not selected over, including the endogenous regressors, which are always
kept.
isat.ivreg
conducts indicator saturation model selection on an ivreg
object returned by ivreg::ivreg()
.
## S3 method for class 'ivreg' isat( y, iis = TRUE, sis = FALSE, tis = FALSE, uis = FALSE, blocks = NULL, ratio.threshold = 0.8, max.block.size = 30, t.pval = 1/NROW(data), wald.pval = t.pval, do.pet = FALSE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, info.method = c("sc", "aic", "hq"), include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, parallel.options = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, plot = NULL, alarm = FALSE, overid = NULL, weak = NULL, fast = FALSE, ... )
## S3 method for class 'ivreg' isat( y, iis = TRUE, sis = FALSE, tis = FALSE, uis = FALSE, blocks = NULL, ratio.threshold = 0.8, max.block.size = 30, t.pval = 1/NROW(data), wald.pval = t.pval, do.pet = FALSE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, info.method = c("sc", "aic", "hq"), include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, parallel.options = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, plot = NULL, alarm = FALSE, overid = NULL, weak = NULL, fast = FALSE, ... )
y |
An object of class |
iis |
logical. If |
sis |
logical. If |
tis |
logical. If |
uis |
a matrix of regressors, or a list of matrices. If a list, the matrices must have named columns that should not overlap with column names of any other matrices in the list. |
blocks |
|
ratio.threshold |
Minimum ratio of variables in each block to total observations to determine the block size, default=0.8. Only relevant if blocks = |
max.block.size |
Maximum size of block of variables to be selected over, default=30. Block size used is the maximum of given by either the ratio.threshold and max.block.size |
t.pval |
numeric value between 0 and 1. The significance level used for the two-sided regressor significance t-tests |
wald.pval |
numeric value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs) |
do.pet |
logical. If |
ar.LjungB |
a two-item list with names |
arch.LjungB |
a two-item list with names |
normality.JarqueB |
|
info.method |
character string, "sc" (default), "aic" or "hq", which determines the information criterion to be used when selecting among terminal models. The abbreviations are short for the Schwarz or Bayesian information criterion (sc), the Akaike information criterion (aic) and the Hannan-Quinn (hq) information criterion |
include.1cut |
logical. If |
include.empty |
logical. If |
max.paths |
|
parallel.options |
|
turbo |
logical. If |
tol |
numeric value (default = 1e-07). The tolerance for detecting linear dependencies in the columns of the regressors (see |
max.regs |
integer. The maximum number of regressions along a deletion path. It is not recommended that this is altered |
print.searchinfo |
logical. If |
plot |
NULL or logical. If |
alarm |
logical. If |
overid |
|
weak |
|
fast |
A logical value indicating whether to speed up the 2SLS
estimation but providing less details. Requires |
... |
Further arguments passed to or from other methods. |
Returns a list of class "ivisat"
with two named elements.
$selection
stores the selection results from
isat
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics.
ivDiag
provides several diagnostic tests for 2SLS models that can be
used during model selection. Currently, a weak instrument F-test of the first
stage(s) and the Sargan test of overidentifying restrictions on the validity
of the instruments are implemented.
ivDiag(x, weak = FALSE, overid = FALSE)
ivDiag(x, weak = FALSE, overid = FALSE)
x |
A list containing the estimation results of the 2SLS model. Must
contain an entry |
weak |
A logical value whether to conduct weak instrument tests. |
overid |
A logical value whether to conduct the Sargan test of overidentifying restrictions. |
The resulting matrix also has an attribute named
"is.reject.bad"
, which is a logical vector of length m. Each
entry records whether a rejection of the test means that the diagnostics
have failed or vice versa. The first entry refers to the first row, the
second entry to the second row etc. However, this attribute is not used in
the following estimations. Instead, the decision rule is specified inside
the user.fun
argument of gets::diagnostics()
, which allows for a
named entry $is.reject.bad
.
Returns a matrix with three columns named "statistic"
,
"df"
, and "p-value"
and m rows. Each row records these
results for one of the tests, so the number of rows varies by the arguments
specified and the model (e.g. how many first stages equations there are).
General-to-specific modeling for 2SLS models
ivgets( formula, data, gum.result = NULL, t.pval = 0.05, wald.pval = t.pval, do.pet = TRUE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, include.gum = FALSE, include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, alarm = FALSE, keep_exog = NULL, overid = NULL, weak = NULL )
ivgets( formula, data, gum.result = NULL, t.pval = 0.05, wald.pval = t.pval, do.pet = TRUE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, include.gum = FALSE, include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, alarm = FALSE, keep_exog = NULL, overid = NULL, weak = NULL )
formula |
A formula in the format |
data |
A data frame with all necessary variables y, x, and z. |
gum.result |
a |
t.pval |
|
wald.pval |
|
do.pet |
|
ar.LjungB |
a two element |
arch.LjungB |
a two element |
normality.JarqueB |
|
include.gum |
|
include.1cut |
|
include.empty |
|
max.paths |
|
turbo |
|
tol |
numeric value ( |
max.regs |
|
print.searchinfo |
|
alarm |
|
keep_exog |
A numeric vector of indices or a character vector of names
corresponding to the exogenous regressors in the |
overid |
|
weak |
|
Returns a list of class "ivgets"
with three named elements.
$selection
stores the selection results from
getsFun
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics. $keep
stores the names of the regressors that were
not selected over, including the endogenous regressors, which are always
kept.
Indicator saturation modeling for 2SLS models
ivisat( formula, data, iis = TRUE, sis = FALSE, tis = FALSE, uis = FALSE, blocks = NULL, ratio.threshold = 0.8, max.block.size = 30, t.pval = 1/NROW(data), wald.pval = t.pval, do.pet = FALSE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, info.method = c("sc", "aic", "hq"), include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, parallel.options = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, plot = NULL, alarm = FALSE, overid = NULL, weak = NULL, fast = FALSE )
ivisat( formula, data, iis = TRUE, sis = FALSE, tis = FALSE, uis = FALSE, blocks = NULL, ratio.threshold = 0.8, max.block.size = 30, t.pval = 1/NROW(data), wald.pval = t.pval, do.pet = FALSE, ar.LjungB = NULL, arch.LjungB = NULL, normality.JarqueB = NULL, info.method = c("sc", "aic", "hq"), include.1cut = FALSE, include.empty = FALSE, max.paths = NULL, parallel.options = NULL, turbo = FALSE, tol = 1e-07, max.regs = NULL, print.searchinfo = TRUE, plot = NULL, alarm = FALSE, overid = NULL, weak = NULL, fast = FALSE )
formula |
A formula in the format |
data |
A data frame with all necessary variables y, x, and z. |
iis |
logical. If |
sis |
logical. If |
tis |
logical. If |
uis |
a matrix of regressors, or a list of matrices. If a list, the matrices must have named columns that should not overlap with column names of any other matrices in the list. |
blocks |
|
ratio.threshold |
Minimum ratio of variables in each block to total observations to determine the block size, default=0.8. Only relevant if blocks = |
max.block.size |
Maximum size of block of variables to be selected over, default=30. Block size used is the maximum of given by either the ratio.threshold and max.block.size |
t.pval |
numeric value between 0 and 1. The significance level used for the two-sided regressor significance t-tests |
wald.pval |
numeric value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs) |
do.pet |
logical. If |
ar.LjungB |
a two-item list with names |
arch.LjungB |
a two-item list with names |
normality.JarqueB |
|
info.method |
character string, "sc" (default), "aic" or "hq", which determines the information criterion to be used when selecting among terminal models. The abbreviations are short for the Schwarz or Bayesian information criterion (sc), the Akaike information criterion (aic) and the Hannan-Quinn (hq) information criterion |
include.1cut |
logical. If |
include.empty |
logical. If |
max.paths |
|
parallel.options |
|
turbo |
logical. If |
tol |
numeric value (default = 1e-07). The tolerance for detecting linear dependencies in the columns of the regressors (see |
max.regs |
integer. The maximum number of regressions along a deletion path. It is not recommended that this is altered |
print.searchinfo |
logical. If |
plot |
NULL or logical. If |
alarm |
logical. If |
overid |
|
weak |
|
fast |
A logical value indicating whether to speed up the 2SLS
estimation but providing less details. Requires |
Returns a list of class "ivisat"
with two named elements.
$selection
stores the selection results from
isat
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics.
ivregFun
calls ivreg::ivreg()
in a format that is suitable for the
model selection function gets::getsFun()
and for the indicator saturation
function gets::isat()
.
ivregFun(y, x, z, formula, tests, fast = FALSE)
ivregFun(y, x, z, formula, tests, fast = FALSE)
y |
A numeric vector with no missing values. |
x |
A matrix or |
z |
A numeric vector or matrix. |
formula |
A formula in the format |
tests |
A logical value whether to calculate the
|
fast |
A logical value whether to speed up the 2SLS estimation but
providing less details. Requires |
For the required outputs of user-specified estimators, see the article "User-Specified General-to-Specific and Indicator Saturation Methods" by Genaro Sucarrat, published in the R Journal: https://journal.r-project.org/archive/2021/RJ-2021-024/index.html
A list with entries needed for model selection via gets::getsFun()
or gets::isat()
.
new_formula
takes a formula object for ivreg::ivreg()
, i.e. in a
format of y ~ x1 + x2 | x1 + z2
, and returns a list with element
suitable for model selection. For example, it updates the data by creating
an intercept if specified in the formula, checks for collinearity among the
regressors, and updates the formula accordingly.
new_formula(formula, data, keep_exog)
new_formula(formula, data, keep_exog)
formula |
A formula for the ivreg::ivreg function, i.e. in format
|
data |
A data frame. |
keep_exog |
A numeric vector of indices or a character vector of names
corresponding to the exogenous regressors in the |
A list with several named elements. Component $fml
stores the
new baseline formula that will be used for model selection. Components
y
, x
, and z
store the data of the dependent variable,
structural regressors, and excluded instruments. The entries
$depvar
, $x1
, $x2
, $z1
, and $z2
contain
the names of the dependent variable, endogenous and exogenous regressors,
included and excluded instruments. $dx1
, $dx2
, $dz1
,
$dz2
store the dimensions of the respective variables. Finally,
$keep
and $keep.names
contain the indices and names of the
regressors that will not be selected over.