OfficialWireGrind

OfficialWireGrind OP t1_j16ebqn wrote

In this context, origin is not the same thing as originator. I looked up every one of these words, and, in every instance, the references cited usage in the Spanish Language. It could be that the French Language acquired many of them at about the same time and from the same or from another source.

−4

OfficialWireGrind OP t1_j16bc4m wrote

I am sorry that any of this is displeasing. I would fully agrees with anyone who claims that many, if not the majority, of the listed words were not invented by people who identify as Spanish speakers. Regarding the chart though, the word "origin" is not intended to mean the inventor of a particular word. The intention is to refer to the most direct source or the source of the most direct parent word. Also, the term "Spanish" is intended to refer to the Spanish Language and not Spain.

2

OfficialWireGrind OP t1_j163136 wrote

The plot shows English words of Spanish origin and the number of times each appears in English Wikipedia.

Sources:

Spanish-origin English words were obtained from Wikipedia's List of English words of Spanish origin

Number of mentions was derived from an analysis of English Wikipedia's database dump

Tools: Python, Matplotlib

−10

OfficialWireGrind t1_j0ibb3p wrote

data.gov is another source. There's also /r/datasets/, which might be a good place to ask this.

For me, sourcing raw data is actually part of the problem, and finding it can involve some amount of creativity. A lot of it is hiding in plain sight, but it has to realized as such.

2

OfficialWireGrind OP t1_ixwxsfb wrote

One thing I'm seeing is that there aren't a whole lot of songs above 170 words per minute (WPM). The songs "7 Rings," "Roses Imanbek," "The Box," "STAY," "Rich Flex," and "Levitating" are all in the range of 158-165. Kendrick Lamar's "N95" comes in around 210, but among 30 songs I've looked at, it's a bit of an outlier.

Another observation is that the highest WPM is more than twice that of the lowest. Comparing "N95" to Yahritza y Su Esencia's "Soy el Unico" increases the ratio to 3.6, and it's strongly reflected by a relatively broad range of vocal styles.

All of this is a fairly preliminary analysis though. I would imagine that if the input data is selected more thoughtfully, then patterns will emerge in the plot.

6

OfficialWireGrind OP t1_ixwkzkh wrote

The plot shows duration in minutes, total word count, and words per minute for a selection of 15 songs. The plot is done in the style of a topographic map, but with words per minute instead of elevation. The song selection is Spotify's top-5 most streamed songs for the years 2019, 2020, and 2021.

Sources: genius.com, azlyrics.com, musixmatch.com, en.wikipedia.org

Tools: Python, Matplotlib

4

OfficialWireGrind OP t1_ixf7vfe wrote

The bar chart shows how word usage in 2022 compares with that in 2021. Word frequency counts were made using Reddit comment datasets from Pushshift. All posts were made during June of 2022 and June of 2021. Each percentage indicates the change in a word's absolute count (after adjusting counts to reflect datasets of slightly difference sizes). The bar chart was made with Python and Matplotlib.

17