Submitted by N3XT191 t3_ycb3np in dataisbeautiful
Comments
draypresct t1_itl53tp wrote
Nice! Interesting to see the range of fantasy novel lengths.
I wonder how many of the pages in the historical books come from the reference section.
N3XT191 OP t1_itl5fgm wrote
Yeah, most fantasy books are in the 400-700 range but at the lower end I’ve got a couple novellas and at the upper end I’ve got Sanderson, Tad Williams and Steven Erikson 😅
The “Historical” section is actual “Historical Fiction” (bad label I just realized), so no reference. The “History” genre is probably about 15% references on average.
N3XT191 OP t1_itl5ktz wrote
Nope, not normalized. There’s definitely quite some range in word count / page, I know that The Power Broker is about 15% longer (by word count) than Oathbringer but it has 100 fewer pages.
Especially the very short novellas have sometimes only half (or even fewer) as many words/page.
Sadly word count data is not nicely available…
draypresct t1_itl6gyd wrote
Whups! I should have spotted history/historical. Thanks!
N3XT191 OP t1_itl6qzm wrote
I’ve got them grouped into fiction and non-fiction internally, so in my database it does makes sense (:
[deleted] t1_itl8zjw wrote
[deleted]
stone_chestnut t1_itlc1x6 wrote
That's good ! I think you could go even further with some statistic tests, like ANOVA for instance. It can give you some accurate indications to test relations between book genres.
doesnothingtohirt t1_itlcv11 wrote
I like the way you show median and mean.
N3XT191 OP t1_itlddjr wrote
But the mean isn’t even shown? 😅
Box plots show 25th, 50th (median) and 75th percentile plus the full range (minus outliers)
doesnothingtohirt t1_itldha3 wrote
Oh thanks for clarifying that.
N3XT191 OP t1_itlf3jp wrote
That would be interesting, but the data is inherently biased by my selection and only 3 genres have enough numbers for conclusions anyway.
So actual conclusions would be very hard to make!
PFhelpmePlan t1_itm9006 wrote
Any chance you could share your code for doing the boxplots with the individual data points included like that?
N3XT191 OP t1_itm9t4u wrote
Sure: https://pastebin.com/raw/kd1WgRza
The data file is just a CSV with pagecount,genre_id
.
I start with creating filtered_pagecounts
which is just a list of genres, each genre being a list of y-values.
Add some random x-offsets (line 36) and then plot 1 scatter plot per genre and the box plot on top.
Important_Ice_1080 t1_itmiwpy wrote
What’s the 750 page Sci-fi? I like long hard reads daddy.
N3XT191 OP t1_itmj8sz wrote
Death's End: 736
The Relentless Moon 704
Dune 687
Important_Ice_1080 t1_itmjdn5 wrote
Read Dune, classic. I’ll check out the other two. Thanks OP 👍🏻
N3XT191 OP t1_itmjn7k wrote
Both are the 3rd book in a trilogy, so you gotta earn it! ;)
InsuranceToTheRescue t1_itmwplv wrote
What's the break between History & Current Events? Like What's the cutoff year for something passing from CE to History?
N3XT191 OP t1_itmxdsy wrote
No hard cutoff, but CA is topics like climate change, covid, the Theranos scandal while the most recent books included in History are a book on Watergate and a book on The Troubles.
PFhelpmePlan t1_itmxzxs wrote
Awesome, thank you for the explanation! I really like how the offset points look as well.
N3XT191 OP t1_itmy5xb wrote
Ideally they’d be evenly distributed so the width of the point cloud represents the density (like in a violin plot), but that was too annoying to implement. Maybe next time!
RestlessAmbivert t1_itnqolk wrote
You can get those Fantasy numbers way up if you get into The Wheel of Time, lot of chonkers in there. Sanderson did a great job of helping to finish the series off.
mimprocesstech t1_ito573b wrote
You need more books. I hear 30 is the minimum for a statistically relevant sample size lol.
lisiate t1_ito98fo wrote
The last five novels in Erikson's Malazan Book of the Fallen series are all over 1,200 pages in paperback.
holdenontoyoubooks t1_itpftyx wrote
A few comments:
​
I really like this idea especially for books owned, rather than read, just because it removes any timeline, or expectation of "reading more books is good". This is a really cool idea.
​
I wish I had done this before I purged most of my books (except ones that I like to display)
​
The outliers are fun, because it makes sense that longer books end up getting collected.
What is the low outlier in Science?
N3XT191 OP t1_itl3ycb wrote
EDIT: “Historical” (4th genre) is Historical Fiction! “History” (7th) is non-fiction!
Data: My own shelves, exported from my own app (https://shelf.li, you can explore the data by clicking "Try the demo")
Tools: Matplotlib
Genres: Assigned manually, sometimes obviously debatable...
Books over 1100 pages: