Viewing a single comment thread. View all comments

MrMobster t1_j3lcilx wrote

There are some claims that sounds in languages adapt to the environmental conditions (like development of sounds that cary over long distances in rural areas or the paper on tones and humidity mentioned in another post), but I personally remain unconvinced. In my opinion, these results are obtained with too much statistical hand-waving and little actual substance. I've spent quite some time discussing this back and forth with the authors of the climate and tonal languages paper, but hey, science is all about difference of opinions :)

23

Djinn_and_juice OP t1_j3lrh0t wrote

Do you believe there to be no link at all or just nothing warranting discussion without that substance?

2

sjiveru t1_j3o8z6y wrote

Not OP, but the general opinion in linguistics is I think fairly well reflected by this Language Log post from a good decade ago, in response to a paper about high altitudes correlating with ejectives:

> Still, the (presumably) spurious correlations of the two word-order variables with altitude remind us of the possibility for false findings here. (...)

> Whether or not the altitude/ejective correlation reveals a causal connection, we can expect the near future to bring us a large number of spurious correlational analyses, along with a few meaningful ones. There are three reasons for this:

> (1) The existence of digital datasets makes it increasingly easy to perform quantitative checks on hypotheses about possible relationships between linguistic and non-linguistic variables;

> (2) The astronomically large number of such possible relationships guarantees that many of them should exhibit a strong pair-wise connection by chance, even if all of the distributions were statistically independent;

> (3) The distributions are not statistically independent, due to factors such as cultural and geographical diffusion.

> Note that the "file drawer effect" strongly undermines the often-made argument "But I/we made the hypothesis before we checked, we didn't just dredge for correlations and then try to explain them". The data-dredging (and the associated multiple comparisons) can (and do) occur across many unconnected investigations, with only the "significant" ones getting published.

In short, such correspondences aren't impossible, but it's a lot of effort to show that they're not just random coincidences. Languages are incredibly complex systems, and aren't independent of each other - which makes them extremely difficult to do statistics on. Personally, I think for a lot of purposes (including these) the set of all human languages isn't a statistically significant sample size - the systems are too complex and too interrelated for only seven thousand data points to be anywhere near enough to show clear trends above the background of noise.

2