New Mexico Primary Care
Physician Accessibility Models
Preliminary Analysis with R - 2002
Data
Larry Spear,
UNM (10/15/2018)
This preliminary analysis using R
will eventually compare all the results from the generalized two-step methods
and the one-step models. Various distance decay method; exponential, power, Gaussian,
and DGR power have been used. Several analytical techniques will be employed including;
exploratory data analysis (graphics), ANOVA and related diagnostics, Moran’s I
test for spatial autocorrelation, and spatial oriented ANOVA. Only the results
and a brief discussion are presented here. A more comprehensive version
including the data and R code will be prepared later using Jupyter
Notebook. Also, an ArcGIS Online Story Map with a more in-depth discussion of
results will be developed in the future.
The following Group or Item names
have been used to designate the individual methods:
2SEE - Two Step Hybrid Zonal,
Exponential Function
2SEG - Two
Step Hybrid Zonal, Gaussian Function
2SEP - Two
Step Hybrid Zonal, Std. Power Function
2SED - Two
Step Hybrid Zonal, DGR Power Function
1SED - One
Step Hybrid Zonal, DGR Power Function
1SEE - One
Step Hybrid Zonal, Exponential Function
1SEG - One
Step Hybrid Zonal, Gaussian Function
1SEP - One
Step Hybrid Zonal, Std. Power Function
Two-Step Methods Compared with
One-Step (DGR) Method
This preliminary analysis using R
will be based on a comparison of the generalized two-step methods using
exponential, power, and Gaussian distance decay with the one-step method using
the DGR power distance decay method. Additional comparisons of two-step methods
with the other one-step methods (exponential, power, and Gaussian distance
decay) will also be presented later after the analysis procedures have first
been tested and refined here.
Summary Statistics
- table shows the resulting means, standard deviations, minimum and maximum
values (physicians per 1000 population) for the two-step and one-step (DGR)
accessibility models (ACC Methods):
Group count mean
sd
min max
<ord> <int>
<dbl> <dbl> <dbl>
<dbl>
1 2SEE 499 0.988 0.892 0.0000290 7.36
2 2SEG 499 0.990 1.10 0 10.3
3 2SEP 499 1.00 0.772 0.227 5.79
4 1SED 499 0.630 0.305 0.0437 2.74
Note: The one-step method (1SED) with the
DGR power decay has both a lower mean (0.630) and less variance (0.305) than
all the two-step methods. There are two other important mean values to be
considered. The overall statewide mean derived by dividing the state population
estimate (1,874,591) for 2002 by the estimated number of primary care
physicians in 2002 (1,167) is 0.62235. The county-based service area (COSVAR)
mean is 0.437665. The closest mean values to the statewide mean is derived by
using the one-step methods (small difference primarily due to round off).
Boxplots, Histograms – and related plots are useful for
visualizing the differences between the one-step (DGR and the two-step methods:





Note: These plots clearly indicate that
there may be a significant difference in the two-step method results compared
with the one-step method. The median values, interquartile ranges, and outliers
(maximum values) are very different. It is also apparent from the histograms
that neither of the resulting distributions appear to be normally distributed.
ANOVA (one-way) – test and related results
are shown below. The Null Hypothesis (H0) is that the
means from the various methods are the same. The Alternative Hypothesis
(Ha) is that at least one of the methods is not equal to the others.
Df Sum Sq
Mean Sq F value Pr(>F)
Group 3
54 17.998 53.75 <2e-16 ***
Residuals 1992
667 0.335
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Tukey multiple comparisons of
means
95% family-wise confidence level
Fit: aov(formula = Phys_per_P ~ Group, data = data_test1_df)
$`Group`
diff lwr upr p adj
2SEG-2SEE 0.0000242485 -0.09416672 0.09421522 1.0000000
2SEP-2SEE 0.0006913828 -0.09349959 0.09488236 0.9999976
1SED-2SEE -0.3795959920 -0.47378697 -0.28540502 0.0000000
2SEP-2SEG 0.0006671343 -0.09352384 0.09485811 0.9999978
1SED-2SEG -0.3796202405 -0.47381121 -0.28542927 0.0000000
1SED-2SEP -0.3802873747 -0.47447835 -0.28609640 0.0000000
Levene's Test for
Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 59.671 < 2.2e-16 ***
1992
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
One-way analysis of means (not
assuming equal variances)
data: Phys_per_P and
Group
F = 103.65, num df = 3.0, denom df = 1032.2, p-value < 2.2e-16
Pairwise comparisons using t
tests with non-pooled SD
data: data_test1_df$Phys_per_P and
data_test1_df$Group
2SEE
2SEG 2SEP
2SEG 1 -
-
2SEP 1 1
-
1SED <2e-16 <2e-16 <2e-16
P value adjustment method: BH
Shapiro-Wilk normality test
data: aov_residuals
W = 0.85362, p-value <
2.2e-16
Kruskal-Wallis rank sum test
data: Phys_per_P by Group
Kruskal-Wallis chi-squared =
170.93, df = 3, p-value < 2.2e-16
Note: - As the p-value (<2e-16 ***) is so
small the ANOVA test indicates that the Null Hypothesis (H0)
can be rejected in favor of the Alternative Hypothesis (Ha).
There appears to be a significant difference between at least one of the
methods means and the others. However, there are three important assumptions or
requirements that should be considered when applying ANOVA. 1) The data are
independent and obtained randomly from the population; 2) The data are normally
distributed; and 3) The data have common variances. All these assumptions have
not been met here and these results should be interpreted with caution. These
results are not independent or obtained from a random experiment. There is
evidence of more than moderate spatial autocorrelation (see Moran’s I test).
The previous histograms show that the data are not normally distributed. The
summary statistics show a lack of common variances. Regardless, it is important
to present these results using standard ANOVA and related diagnostic
techniques. If necessary, routine measures such as data transformations can be
subsequently employed. Also, research is underway to eventually conduct a
spatial ANOVA test to see if there is any noticeable change in the results.
The
additional routine diagnostic tests confirm the initial observations and
standard ANOVA results. The Tukey multiple comparison of means indicates that
the one-step method always has a low p-value (0.0) when compared with any of the two-step
methods. The Leven’s test for homogeneity of variance also has a low p-value (2.2e-16 ***) that
suggests that the variances are not common across methods. The pair-wise t test
with no assumption of equal variance also indicates that the one-step method is
significantly different from the two-step methods, p-values (<2e-16). The
Shapiro-Wilk normality test p-value (< 2.2e-16) also indicates a lack of normality. The
Kruskal-Wallis rank sum test (non-parametric) which can be used when ANOVA
assumptions are not met does not change the outcome, p-value (< 2.2e-16) confirming
the Null Hypothesis (H0) can be rejected in favor of
the Alternative Hypothesis (Ha). Additional
confirmation of concern for caution in interpreting the ANOVA results is
apparent by reviewing the Normal QQ plot of standardized residuals that should
be mostly normally distributed. The residuals deviate considerable from a
straight line, confirming a lack of desired normality.

Moran’s I – global test for spatial
autocorrelation using a queen’s case neighbors list and row standardization
results for each method are shown below:
Neighbour list object:
Queen’s case
Number of regions: 499
Number of nonzero links: 2960
Percentage nonzero weights:
1.18875
Average number of links:
5.931864
Weights style: W
Weights constants summary:
n
nn S0 S1
S2
W 499 249001 499 185.3664
2095.07
moran.range(Results.lw)
[1] -0.7214727 1.0623680
Moran I test under
randomisation
data: Results_Pop_Phys_spdf$Phys_2SEE
weights: Results.lw
Moran I statistic standard
deviate = 17.861, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.4786685352
-0.0020080321 0.0007242812
Moran I test under
randomisation
data: Results_Pop_Phys_spdf$Phys_2SEG
weights: Results.lw
Moran I statistic standard
deviate = 17.715, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.4746291857
-0.0020080321 0.0007239152
Moran I test under
randomisation
data: Results_Pop_Phys_spdf$Phys_2SEP
weights: Results.lw
Moran I statistic standard
deviate = 17.825, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.4776201197
-0.0020080321 0.0007239826
Moran I test under
randomisation
data: Results_Pop_Phys_spdf$Phys_1SED
weights: Results.lw
Moran I statistic standard
deviate = 21.084, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.5673121641 -0.0020080321 0.0007291042
Note: There is significant spatial
autocorrelation for the one-step and all the two-step methods (similar Moran’s
I statistics, very low p-values, and large standard deviates). These results
indicate strong clustering and it is extremely unlikely (less than 1%) that
these clustered patterns could be the results of random chance. The one-step
method is perhaps even more clustered (a larger Moran’s I statistic) than the
two-step methods. This lack of independence is a violation of a major standard
ANOVA assumption. A not that widely used or well documented alternative test
method that can take into consideration non-independence or spatial
autocorrelation is spatial ANOVA. This method is currently being researched and
results will be available soon.
Spatial ANOVA (one-way) – currently
being prepared!
Note:
Two-Step Exponential Method Compared
with One-Step Exponential Method
currently
being prepared!
Two-Step Power Method Compared with
One-Step Power Method
currently
being prepared!
Two-Step Gaussian Method Compared
with One-Step Gaussian Method
currently
being prepared!
Summary of Results
currently
being prepared!
Larry Spear
Sr. Research Scientist (Ret.)
Division of Government Research
University of New Mexico