earnest_dad

earnest_dad OP t1_isxvvrx wrote

Sorry to disappoint. In case it eases the sorrow, there were 5 babies in 2007 named "Lilylynn", which is similarly difficult to say, but I agree -- not quite as exciting as the much-discussed, but never actually documented "Lilly-lee"

1

earnest_dad OP t1_isxvmrp wrote

Interesting question! (Similarly, I'm more familiar with "ara" prefixes; from my identification strategy, though, "Ara" isn't a common standalone name in the way "Ada" is).

As it turns out, this isn't a bug --

"Adalynn" has been in use (with n>5 instances) since 1996, and is really on the rise since ~1007. In 2017, there were 2.651 female sex babies given the name "Adalynn"

"Adamary" isn't used with the frequency that "Adalynn" is given, but we're seeing its use along a similar timeline -- the name first crossed the (n>5) threshold in 1998, and it has been used at least that many times every year since. While its usage is declining recently, in the early 2000s it was typically given ~40 times per year.

Similar stories with "Adabelle" and "Adabella", though the timelines are different. "Adabell" is a *much* older name -- it was given to a handful of female sex babies starting in the early 20th century -- we see n>5 uses quite frequently from 1900 - 1931, then it falls off the radar until 2006.

"Adabella" looks more like "Adamary" -- wasn't really in vogue (if you can even say that about a VERY rare name) until 2008.

1

earnest_dad OP t1_isxugk0 wrote

I think you may be onto an interesting piece of this, but I do want to be precise about how the names here are identified (this is described carefully in the source / tools comment above. Note that syllabic count isn't a feature here).

I (personally) think it's interesting to identify names that can be sub-divided into standalone names. There's some complexity around whether names that can technically be subdivided (but we do not think about as compound names themselves) should be included. As an example that generated a lot of discussion in a previous version, think about something like "Elizabeth" that can technically be subdivided into the standalone names "Eliza" and "Beth", but we don't really think of as a compound name in the sense you describe.

I think you raise an interesting question that gets at what the central interpretation of the plot is; there's a question here about whether this strategy maps cleanly onto true compound names, and I think you're right that there are some we'd want to hand-edit (or otherwise identify) if that's the main goal. To me, it's a tricky thing to decide.

1

earnest_dad OP t1_isxtgkg wrote

Can you say a bit more about the concern here? The code I used does the following:

(1) identifies the "maximal proportion" as the greatest share (of all female names in that year) a name receives in any year. Note that these maximal proportions are quite small -- the greatest value represented here is "Joanne" with 0.00420; the smallest values in this chart are less than 10^-5.

(2) convert to "1 name per..." by finding 1/maximal proportion, Note that by this measure, "Joanne" is roughly 1 per 238 names; the very uncommon names (e.g. lilylynn" are roughly 1 per 400,000.

(3) use a log scale gradient to plot

2

earnest_dad OP t1_issyqyv wrote

These are *excellent* questions. With the data we have, it's much easier to examine the first question you posed. It seems totally doable to create an indicator for whether a name is a combination (like these), and look at the proportion of all names that satisfy this property over time. I'm guessing you're right that there's some regional variation, but unfortunately the babynames library I used doesn't connect names to distinct geographies. Would be very cool to examine that, though!

Thanks for the comment!

9

earnest_dad OP t1_isswtbh wrote

Source: babynames library (R package): https://cran.r-project.org/web/packages/babynames/index.html

Note: this package draws data from the US Social Security Administration

Tools used: R

data preprocessing: tidyverse

visualization: ggplot2

Additional notes:

(1) identify "standalone" names by finding top 1000 female names

(2) identify names that are composed of two standalone names combined

(3) identify common "prefix" and "suffix" names by finding the maximum (annual) proportion of names from (2); restrict to instances where log(max frequency) > -8.5

(4) restrict attention to combined names composed of the names from (3)

(5) hand-edit (that is, remove) unusual prefixes and suffixes: (redditors objected to the inclusion of "eliza-beth" and "elisa-beth"; also hand-remove "ina" and "ora")

Note: an earlier draft of this plot did not filter to female names only, and so incidentally included the name "Josue", a male name which is composed of the common female standalone names, "Jo" and "Sue"

1