terrykrohe

terrykrohe OP t1_jde845j wrote

Keats, "Ode on a Grecian Urn" –

Thou shalt remain, ... a friend to man, to whom thou say'st,
"Beauty is truth, truth beauty,
– that is all Ye know on earth, and all ye need to know."

... as for "Nor easily understood"
I agree, I do not understand how police killings and suicides are correlated:

The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. He to whom the emotion is a stranger, who can no longer pause to wonder and stand wrapped in awe, is as good as dead; his eyes are closed.

−7

terrykrohe OP t1_jde6diq wrote

Purpose

Police killings and suicides, it seems, are different spheres of activity and would/should have NO connection/correlation (excepting suicide-by-cop).Butthe data says there IS correlation.

That is, why would a state with a high suicide rate also have a high police killing rate?
conversely, why would a state with a low suicide rate also have a low police killing rate?

How? Why?

0

terrykrohe OP t1_jde3aja wrote

1
top, left: shows the fifty states separated by their 2020 election vote (red/Rep and blue/Dem); the source data is worked up to determine the police killings per 100,000 pop and tabulated; the ranked table is visually presented ... identifying the states is not important: the importance is in the non-random, top/bottom pattern of the data

2
the top, right: is a visualization of source suicide per 100,000 pop data

3
– the means and standard deviations of the Rep and Dem data are represented by the dashed lines and the shaded boxes
– the t-test compares means of Sample populations: low t-test values indicate that the means are NOT due to random data fluctuations

4
the bottom plot
– plots the (suicide, police killings) coordinates for each state
– a best fit line is determined for the Rep and Dem coordinates
– the Pearson correlation calculates how 'strong' the data fits the best-fit line (0.81 and 0.75 are strong correlations)
– the "P-value" is the probability that the Pearson "r-value" represents random fluctuation of the data
– smaller P-values indicate less random character of Sample data

5
the "impact value" use the r-value and the P-value to quantify the data fit
– just looking at the plot coordinates, the best fit lines, and the Pearson values ... it is (for me) hard to see that the Dem correlation is so much stronger than the Rep correlation; but the impact value informs me so.

1

terrykrohe OP t1_jddyrpe wrote

other comments for "police killings VS suicide"

1
... there is a non-random, top/bottom, RepDem pattern in the "police killings" data and in the suicide data.
This pattern has been seen in previous posts: GDP (posted 06May2021), state+local ed spending (posted 20May2021), suicide rate (13May2021), state taxes (posted 17Jun2021), opioid dispensing rate (01Jul2021), life expectancy (29Jul2021), infant mortality (05Aug2021), incarceration rate (posted 19Aug2021)
,... only 'missing persons' (posted 28Oct221) showed randomness
... drug overdose deaths (posted 23Feb2023) was 50/50 random/non-random probability

... always, the Rep states were on the negative side of the metric: less GDP, less ed spending, more suicides, lower state taxes, more opioids dispensed, shorter life expectancy, more infant mortality, higher incarceration rates.

2
Suicides correlate strongly with gun ownership (posted 02Mar2023)
– Rep correlation, impact value = 42,100; Dem correlation, impact value = 27,500
now, suicides correlate more strongly with police killings than with gun ownership
– Rep correlation, impact value = 71,600; Dem correlation, impact value = 962,800
(... curious that Dem states have a much stronger police killings/suicide correlation than Rep states)

3
There is a "rationale" that 'explains' the suicide/gun ownership correlation: gun availability.
What rationale would explain the police killings VS suicide data?

(" a coincidence"? nah, "... there are no coincidences," says Detective)

1

terrykrohe OP t1_jddxmf3 wrote

sources

police killings, US
https://www.washingtonpost.com/graphics/investigations/police-shootings-database/
suicide rate
https://www.cdc.gov/nchs/pressroom/sosmap/suicide-mortality/suicide.htm

tool: Mathematica

***************

– the dashed lines are the means; the 'boxes' are ± one standard deviation (SD) from the mean
– the parenthetical percent is the "relative standard deviation" (RSD)
– for the bottom plot
...red/blue lines represent the 'best-fit' through the Rep/Dem states' data points; the states' coordinate points are colored according to the 2020 Electoral College vote
– the ellipses are centered on the Rep/Dem means; the standard deviations are represented by the ellipses' axes

"Statistic" is the "Pearson r-value
"the r-value is a measure of the "strength" of the correlation;
the p-value is the probability that the r-value represents random fluctuations of the data (that is, a small p-value would characterize non-random data)

1

terrykrohe OP t1_j9um4po wrote

... yeah, there is the thought that 2020 was unique (some 'causative' event) and that, therefore, the data is just 'coincidental" because the non-random, top/bottom Rep/Dem pattern would not be so noticeable

BUT

Mencken did a ranking of the states in 1930 using multiple data sources and multiple metrics and Politico did so in 2014 and there was an 2022 International Economic Review paper doing a 'well-ness' survey ––– all with similar results showing the pattern (05Jan2023 post).

Ninety years, different investigators, different (mostly) metics: same results ...

... go figure

I think you are right: "probably political" ... finding the metrics which would define "political" ... now there is a quest

2

terrykrohe OP t1_j9ua7t2 wrote

"trick people"

"tricks" ...successful tricks create an illusion of fairness: for example, a card trick requires the illusion that a "fair" deck is "fairly" shuffled and "fairly" dealt.

This post presents four data metrics. The tabular data is presented visually. The plot of missing persons illustrates definite random distribution; the plot of drug overdose deaths shows a 50/50 maybe yes/maybe no "fair" deal; the suicide and life expectancy plots definitely show a top/bottom distribution. The means and SDs quantify the random/non-random character of the deals.

The random/nonrandom, top/bottom, Rep/Dem pattern is mysterious; especially as it is repeated for other data metrics; e.g. "obesity, suicide, infant mortality , accidental deaths, incarceration rate, murder rate, violent crime, etc.".

The point: except for missing persons and (likely) drug overdose deaths, the data is being unfairly shuffled and dealt (assuming a fair deck). Who is this Trickster? Twain's "Mysterious Stranger"? or is it Maxwell's Demon operating politically?

I don't think so: the deck is not a fair deck: the deck is "stacked". The data is evidence of Systemic Bias, not the work of a Trickster presenting an illusion of fairness....

the illusion of fairness is a delusion:
The majority of men prefer delusion to truth. It [delusion] soothes. It is easy to grasp. Above all, it fits more snugly than the truth into a universe of false appearances – of complex and irrational phenomena, defectively grasped.
H.L. Mencken

3

terrykrohe OP t1_j9s8jj1 wrote

other comments for "missing persons and drug overdose death rate"
(compared with suicide rate and life expectancy)
1 "compare and contrast"
... the top two plots show random data: missing persons t-test = 0.96; overdose death t-test = 0.46. Note the SD overlaps.
... the bottom two plots show non-random data. Note the smaller t-test p-values.
What is the same about the top two? What is the same about the bottom two?
What is it about the top two that make them different from the bottom two?
2
... the curious aspect: the top two are "atypical" because of the greater "random" character of the data.
(Other data sets showing similar atypicality have not been found.)
and the bottom two are "typical" of other non-random, top/bottom, Rep/Dem data sets:
obesity, suicide, infant mortality , accidental deaths, incarceration rate, murder rate, violent crime, etc.
(summary of "typical" metrics posted 14Apr2022)
3
– the difference between "random" and "non-random" data is Systemic Bias
– Systemic Bias is either genetic or environmental
– How did 150 million voters separate the fifty states into the two distinct non-random, top/bottom, Rep/Dem groupings which exhibit quantifiable different character?

2

terrykrohe OP t1_j9s8334 wrote

sources

missing persons
https://namus.nij.ojp.gov
drug overdose death rate
https://www.cdc.gov/drugoverdose/deaths/2020.html
suicide rate
https://www.cdc.gov/nchs/pressroom/sosmap/suicide-mortality/suicide.htm
life expectancy https://www.cdc.gov/nchs/pressroom/sosmap/life_expectancy/life_expectancy.htm

tool: Mathematica

​

***************

– the dashed lines are the means; the 'boxes' are ± one standard deviation (SD) from the mean
– the parenthetical percent is the "relative standard deviation" (RSD)

3

terrykrohe OP t1_j66kodi wrote

... yeah, the t-test is analysis

It validates the separation of Rep and Dem states into distinct Sample populations; which permits the bottom plot: correlation of ed spending vs evangelical %, considered for Rep and Dem states separately.

(that "NO" violated the "avoid absolutes" dictum)

0

terrykrohe OP t1_j65asln wrote

1
I do not think that the "lay person" has trouble understanding the presentation:
i) Dem states residents spend $300 more per person on education than do Rep state residents
ii) Rep states are more evangelical than are Dem states
iii) for both Dem and Rep states, as the evangelical % increases, the state+local ed spending decreases

2
I do not think that the "lay person" mis-understands why a state is labelled Rep or Dem (note the "2020 election" in title)

3
I do not think that the "lay person" cares about the t-test reporting (the issue is a "tempest in a tea-pot"). I have never had a non-"lay person" ask if the t-test is the statistic or the p-value; the Mathematica documentation notes By default, a probability value or p-value is returned.

4
The data is a visualization of tabular data presented by the source. The data is visualized using the Mathematica function"ListPlot":
https://reference.wolfram.com/language/ref/ListPlot.html

5
... you object to the "grouping at the state level": it is the way that the source presents the data.

6
There is NO analysis being done here; just data presentation. Inferences are the Reader's prerogative.

0

terrykrohe OP t1_j643917 wrote

"Is every data point a state?"
50 states = 50 plot points

"Are you classifying a state as "Democrat" or "Republican" based on majority vote for president in some election?"
red = Rep states in 2020 election
blue = Dem states in 2020 election

"Is that the t-statistic or the p-value?
t-tests are usually reported using the p-value

"And why doesn't the top right graph have a number, especially when it looks more likely to have a statistical difference?"
... the t-test is sensitive to small mean variations: the top right plot shows the means separated by a SD, which is NOT a small difference ( t-test = 0.000015).

1

terrykrohe OP t1_j62imh0 wrote

best-fit lines, correlations: state+local ed spending VS evangelical

Purpose
In order to 'understand' the non-random, top/bottom, Rep/Dem differentiation of metric values, eight "response" metrics are correlated with three "predictor" metrics. This post presents the 'response' variable state+local ed spending vs the evangelical 'predictor' metric....
the eight "response" metrics: GDP, state taxes; suicide rate, opioids; life expectancy, infant mortality; incarceration, state+local ed spending
... the three "predictor" metrics: 'rural-urban', evangelical, diversity*

the "big picture"
i) There is a non-random, top/bottom, Dem/Rep pattern. Patterns have reasons/causes and are mathematical.
ii) Rep states are always on the negative side (less GDP, more suicides, lower life expectancy, etc).
iii) How did 150 million voters, acting individually, separate the fifty states into two such disparate groups?
iv) is there a "predictive" metric or combination of metrics which can be used to explain the characteristic Rep/Dem differences seen in the data?

other comments
i) the t-test value of 0.10 for EdSpending is the largest t-test value of the eight Response metrics – indicating that the data has a 10% probability that the sample means represent the same Population
ii) the Ed Spending metric shows 'typical' Response to evangelical Predictor: increasing evangelical population correlates with decreased ed spending
iii) however, the impact values are small – indicating that the Evangelical-EdSpending relationship is not-important
iv) that curious Rep state with the smallest evangelical population? Utah

3

terrykrohe OP t1_j62i7ct wrote

sources

state+local ed spending
https://www.usgovernmentspending.com/compare_state_spending_2019b20a#copypaste
evangelical population
https://www.pewforum.org/religious-landscape-study/religious-tradition/evangelical-protestant/
tool: Mathematica

***************

top two plots:
– dashed lines are the mean values; the 'boxes' show one standard deviation from the mean
– "(3400 ± 630 (18%)" represents (mean ± 1 SD (relative SD); "relative SD" = SD/mean

bottom plot:
– the ellipses are centered on the Rep/Dem means; the standard deviations are represented by the ellipses' axes
– the 50 plot points represent the (evangelical, state+local ed spending) coordinates for each state; and are colored according to their 2020 Electoral College vote
– "r" is the Pearson correlation value
– the lines are the 'best-fit' lines thru the Dem and Rep data

3

terrykrohe OP t1_j55yef5 wrote

... "politics" I used to brush aside as "he said, she said" discussions; then I noticed in 2016 that in a table of "obesity" data that the most obese states were Rep. In 2020 the observation was repeated (14 of most obese states were Rep, posted 29Apr21).

... really?! it should be 'random' ... what about other metrics?

... other metrics: NOT random (summary post, 14Apr2022). suicide rate, incarceration, ed spending, life expectancy, infant mortality, accidental deaths, GDP state taxes, gun ownership, murder rate, violent crime, and others were NOT random.

I suspect that the reason for the Systemic Bias of the data is not a matter of "opinion".

**********

Max Boot has suggested, WaPost, 26Oct2022:
There are many reasons, from history to geography, why per capita GDP in the United Kingdom ($47,334) is so much higher than in Russia ($12,172) or China ($12,556), but I would argue it ultimately comes down to governance. Britain, as a liberal democracy, has long been run for the benefit of its people, while Russia and China have always been run primarily for the benefit of their rulers.

I think that he is correct: in this case, Rep governance ("rulers" = "of the few, by the few, for the few") is the reason for the data differentiation.

2

terrykrohe OP t1_j53uvnd wrote

other comments: heart disease mortality; with GDP and life expectancy

i) the overall pattern is of interest: a non-random, top/bottom Rep/Dem pattern is seen
ii) the Rep/Dem means are a SD apart: the t-test indicates systemic bias or different Sample populations
iii) And ... there are the questions:
Why are the Republican states always on the 'negative' side of the metrics? (excepting 'missing persons')
and
... how did 150 million voters separate the fifty states in such a distinct manner?
iv) The systemic bias: is it genetic in origin? or is it a consequence of environmental factors?
v) Which begs the question: what would be the environmental factors which result in the differentiated data?

1

terrykrohe OP t1_j53up5x wrote

sources:

2020 heart disease mortality
https://www.cdc.gov/nchs/pressroom/sosmap/heart_disease_mortality/heart_disease.htm
2019 GDP
https://apps.bea.gov/regional/downloadzip.cfm
2018 life expectancy
https://www.cdc.gov/nchs/pressroom/sosmap/life_expectancy/life_expectancy.htm
tool: Mathematica

***************

– the dashed lines are the means; the 'boxes' are \[PlusMinus] one standard deviation (SD) from the mean
– the parenthetical percent is the "relative standard deviation" (RSD)

1