Methods Meetup: Session 1
September 5, 2023
\[ \sqrt{(1-\frac{n}{N})\frac{variance}{n}} \]
survey
and svrep
packages
survey
is the R workhorse for survey data analysissvrep
is the only package which supports multi-stage, unequal probability sampling without replacement and large fractions of populationdesign_object <- svydesign(
ids = ~ county + venue_2 + record_id,
probs = ~ county_probability + venue_probability + final_individual_probability,
fpc = ~ county_probability + venue_probability + final_individual_probability,
pps = "brewer",
data = design_data
)
options(list(survey.lonely.psu = "adjust", survey.replicates.mse = TRUE))
replicate_weights_design <- as_bootstrap_design(
design = design_object,
replicates = 100,
type = "Rao-Wu-Yue-Beaumont",
mse = TRUE,
samp_method_by_stage = c("PPSWOR", "PPSWOR", "SRSWOR")
)
Call: svrepdesign.default(ids = ~record_id, weights = ~academic_weight,
repweights = "academic_rep_weight*", type = "bootstrap",
mse = TRUE, data = surveys)
Survey bootstrap with 100 replicates and MSE variances.
Call: svyglm(formula = episode_length ~ race_7cat, survey_object)
Coefficients:
(Intercept) race_7catNH White
40.366 9.158
race_7catNH Black race_7catNH AAPI
-2.819 16.096
race_7catNH Native American/Alaskan race_7catNH Multiracial
9.247 -2.846
race_7catNH Other
15.916
Degrees of Freedom: 3139 Total (i.e. Null); 93 Residual
(59 observations deleted due to missingness)
Null Deviance: 11750000
Residual Deviance: 11650000 AIC: 35910
library(srvyr)
survey_object %>%
as_survey() %>%
group_by(race_7cat) %>%
summarise(survey_mean(episode_length, na.rm = TRUE))
# A tibble: 8 × 3
race_7cat coef `_se`
<fct> <dbl> <dbl>
1 Latinx/Hispanic/Indigenous 40.4 2.50
2 NH White 49.5 3.56
3 NH Black 37.5 3.03
4 NH AAPI 56.5 6.98
5 NH Native American/Alaskan 49.6 5.63
6 NH Multiracial 37.5 2.23
7 NH Other 56.3 15.3
8 <NA> 34.2 5.59
use "${bhhi_shared_drive}/statewide_survey_processed_data/latest/statewide_survey_processed.dta"
keep if survey_count == 1
svyset _n [pweight=academic_weight], ///
bsrweight(academic_rep_weight*) ///
vce(bootstrap) ///
mse
Sampling weights: academic_weight
VCE: bootstrap
MSE: on
Bootstrap weights: academic_rep_weight_1 .. academic_rep_weight_100
Single unit: missing
Strata 1: <one>
Sampling unit 1: <observations>
FPC 1: <zero>
(running regress on estimation sample)
Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
Survey: Linear regression Number of obs = 3,140
Population size = 115,572.49
Replications = 100
Wald chi2(6) = 17.75
Prob > chi2 = 0.0069
R-squared = 0.0078
---------------------------------------------------------------------------------------------
| Observed Bstrap *
episode_length | coefficient std. err. z P>|z| [95% conf. interval]
----------------------------+----------------------------------------------------------------
race_7cat |
NH White | 9.157548 4.801016 1.91 0.056 -.2522703 18.56737
NH Black | -2.81859 4.02318 -0.70 0.484 -10.70388 5.066699
NH AAPI | 16.09569 7.197665 2.24 0.025 1.988528 30.20286
NH Native American/Alaskan | 9.246603 5.991962 1.54 0.123 -2.497426 20.99063
NH Multiracial | -2.846081 2.898077 -0.98 0.326 -8.526207 2.834045
NH Other | 15.91599 16.3178 0.98 0.329 -16.06631 47.89828
|
_cons | 40.36602 2.482923 16.26 0.000 35.49958 45.23246
---------------------------------------------------------------------------------------------
(running mean on estimation sample)
Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
Survey: Mean estimation Number of obs = 3,140
Population size = 115,572.49
Replications = 100
-----------------------------------------------------------------------------
| Observed Bstrap *
| mean std. err. [95% conf. interval]
----------------------------+------------------------------------------------
c.episode_length@race_7cat |
Latinx/Hispanic/Indigenous | 40.36602 2.482923 35.49958 45.23246
NH White | 49.52357 3.537419 42.59036 56.45678
NH Black | 37.54743 3.010734 31.6465 43.44836
NH AAPI | 56.46171 6.94129 42.85703 70.06639
NH Native American/Alaskan | 49.61262 5.600275 38.63629 60.58896
NH Multiracial | 37.51994 2.218442 33.17188 41.86801
NH Other | 56.28201 15.18007 26.52963 86.03439
-----------------------------------------------------------------------------
survey
Packagesvrep
PackageBHHI Methods Meetup Session 1: Replicate Weights