Comments

You must log in or register to comment.

draypresct t1_itl53tp wrote

Nice! Interesting to see the range of fantasy novel lengths.

I wonder how many of the pages in the historical books come from the reference section.

14

N3XT191 OP t1_itl5fgm wrote

Yeah, most fantasy books are in the 400-700 range but at the lower end I’ve got a couple novellas and at the upper end I’ve got Sanderson, Tad Williams and Steven Erikson 😅

The “Historical” section is actual “Historical Fiction” (bad label I just realized), so no reference. The “History” genre is probably about 15% references on average.

8

draypresct t1_itl6gyd wrote

Whups! I should have spotted history/historical. Thanks!

5

N3XT191 OP t1_itl6qzm wrote

I’ve got them grouped into fiction and non-fiction internally, so in my database it does makes sense (:

3

N3XT191 OP t1_itl3ycb wrote

EDIT: “Historical” (4th genre) is Historical Fiction! “History” (7th) is non-fiction!

Data: My own shelves, exported from my own app (https://shelf.li, you can explore the data by clicking "Try the demo")

Tools: Matplotlib

Genres: Assigned manually, sometimes obviously debatable...

Books over 1100 pages:

  • Oathbringer (Brandon Sanderson): 1331
  • The Stand (Stephen King): 1325
  • The Count of Monte Cristo: 1312
  • Die Arena (Stephen King, German): 1279
  • The Power Broker (Robert A. Caro): 1246
  • World without End (Ken Follett): 1240
  • Rhythm of War (Brandon Sanderson): 1230
  • Master of the Senate (Robert A. Caro): 1200
  • Memories of Ice (Steven Erikson): 1180
  • Edge of Eternity (Ken Follett) 1158
6

InsuranceToTheRescue t1_itmwplv wrote

What's the break between History & Current Events? Like What's the cutoff year for something passing from CE to History?

1

N3XT191 OP t1_itmxdsy wrote

No hard cutoff, but CA is topics like climate change, covid, the Theranos scandal while the most recent books included in History are a book on Watergate and a book on The Troubles.

3

stone_chestnut t1_itlc1x6 wrote

That's good ! I think you could go even further with some statistic tests, like ANOVA for instance. It can give you some accurate indications to test relations between book genres.

2

N3XT191 OP t1_itlf3jp wrote

That would be interesting, but the data is inherently biased by my selection and only 3 genres have enough numbers for conclusions anyway.

So actual conclusions would be very hard to make!

3

RestlessAmbivert t1_itnqolk wrote

You can get those Fantasy numbers way up if you get into The Wheel of Time, lot of chonkers in there. Sanderson did a great job of helping to finish the series off.

2

mimprocesstech t1_ito573b wrote

You need more books. I hear 30 is the minimum for a statistically relevant sample size lol.

2

dongorras t1_itl560a wrote

Is the page count somehow normalized? Due to different page and font sizes. Although maybe I'm just ruining the fun of a harmless hobby measurement

1

N3XT191 OP t1_itl5ktz wrote

Nope, not normalized. There’s definitely quite some range in word count / page, I know that The Power Broker is about 15% longer (by word count) than Oathbringer but it has 100 fewer pages.

Especially the very short novellas have sometimes only half (or even fewer) as many words/page.

Sadly word count data is not nicely available…

4

doesnothingtohirt t1_itlcv11 wrote

I like the way you show median and mean.

1

N3XT191 OP t1_itlddjr wrote

But the mean isn’t even shown? 😅

Box plots show 25th, 50th (median) and 75th percentile plus the full range (minus outliers)

2

PFhelpmePlan t1_itm9006 wrote

Any chance you could share your code for doing the boxplots with the individual data points included like that?

1

N3XT191 OP t1_itm9t4u wrote

Sure: https://pastebin.com/raw/kd1WgRza

The data file is just a CSV with pagecount,genre_id.

I start with creating filtered_pagecounts which is just a list of genres, each genre being a list of y-values.

Add some random x-offsets (line 36) and then plot 1 scatter plot per genre and the box plot on top.

2

PFhelpmePlan t1_itmxzxs wrote

Awesome, thank you for the explanation! I really like how the offset points look as well.

2

N3XT191 OP t1_itmy5xb wrote

Ideally they’d be evenly distributed so the width of the point cloud represents the density (like in a violin plot), but that was too annoying to implement. Maybe next time!

1

Important_Ice_1080 t1_itmiwpy wrote

What’s the 750 page Sci-fi? I like long hard reads daddy.

1

N3XT191 OP t1_itmj8sz wrote

Death's End: 736

The Relentless Moon 704

Dune 687

1

Important_Ice_1080 t1_itmjdn5 wrote

Read Dune, classic. I’ll check out the other two. Thanks OP 👍🏻

2

N3XT191 OP t1_itmjn7k wrote

Both are the 3rd book in a trilogy, so you gotta earn it! ;)

1

holdenontoyoubooks t1_itpftyx wrote

A few comments:

​

I really like this idea especially for books owned, rather than read, just because it removes any timeline, or expectation of "reading more books is good". This is a really cool idea.

​

I wish I had done this before I purged most of my books (except ones that I like to display)

​

The outliers are fun, because it makes sense that longer books end up getting collected.

What is the low outlier in Science?

1