PartisanPlayground OP t1_j6eo5hz wrote on January 29, 2023 at 8:38 PM

Reply to comment by ianhillmedia in [OC] How news stories evolve in the news cycle by PartisanPlayground

You're hitting on the most subjective part of this whole process. I've run into all of the issues you describe, and the question is ultimately: how do you define a story?

Your GOP primaries example is a good one. Let's say we have articles on Trump's legal issues, other articles on Pence's classified documents, and other articles on DeSantis and books. Now let's say all of these articles describe these things in the context of the 2024 GOP primaries. Is this one story called "GOP primaries"? Or three separate stories? You could make a case either way.

I've tuned the algorithm to split stories in a way that "looks about right" to me. That's subjective, but there's no way around it. This is an issue whether you're using an algorithm or doing this manually.

A related challenge is that story definitions may change over time. The classified documents story is a good example for this. Right now there are articles on Trump, Biden, and Pence all mishandling classified documents. The algorithm is categorizing all of them as the same story (fair enough).

But let's say that next week (just making this up), Trump gets indicted for it. Is that a separate story now? If so, how do you treat that? Do you retroactively split out the "Trump" portion of the "classified documents" story as though they were not the same story before? Do you show the classified documents story splitting into two? Do you just create a new story on the day the indictment happens? Currently, the algorithm is set up to do the first of these, but again, you could make a case for any of them.

All of this is to say that there is subjectivity involved in this process.

PartisanPlayground OP t1_j6d6ebz wrote on January 29, 2023 at 2:44 PM

Reply to comment by ianhillmedia in [OC] How news stories evolve in the news cycle by PartisanPlayground

I'm getting the data from the Google News API. I've used RSS feeds in the past with similar results.

And actually I'm using a clustering algorithm to identify the specific stories. I have an automated process that pulls all articles from the past five days, clusters them into stories, then produces a bunch of analysis. This saves me a lot of time and brings some objectivity to the process.

PartisanPlayground OP t1_j6d2gwe wrote on January 29, 2023 at 2:13 PM

Reply to comment by ianhillmedia in [OC] How news stories evolve in the news cycle by PartisanPlayground

This is an excellent comment, thank you for this!

I think I need a clearer way of describing "prevalence". This chart is showing the top ten stories by the share of articles written about them, not by the amount that they are consumed. I take articles from 64 sources on every day, cluster them together into "stories", then calculate each story's share based on the number of articles written about it. For example, if there are 1000 articles for a day, and one story has 100 articles written about it, then its share is 10%. Does that make sense?

I've explored measuring consumption of news in the past, and found it to be very difficult! (Facebook's Graph API used to be wide open, so I was able to get likes/engagement on news stories there, but it has since been locked down) Your comment does a great job of explaining the complexity in measuring consumption. You would need to combine:

- GA data from news outlets (which they don't publish)

- Cable news data (sources exist for this, but you would need to make a lot of assumptions to combine this with articles)

- Social media data

And you would need to make a lot of assumptions about what weights to use on each of those. As a result, I'm keeping this simple and focusing on article shares.

I do publish a daily automated Twitter thread on which news outlet gets the most engagement on Twitter. It includes the most liked and ratioed tweets from each "side" of the media. This is limited to Twitter, so does not cover all the channels you described. See an example here: https://twitter.com/PartisanPlayG/status/1619300675094970369

The other thing I've been doing is cutting articles by which "side" of the media they're on using media bias ratings from AllSides. Again, this involves some simplifying assumptions so it's not perfect but gives a good high-level view. You can see examples here: https://partisanplayground.substack.com

Thanks again for your comment. This is exactly the sort of thing I was looking for when I posted.

PartisanPlayground OP t1_j6cktep wrote on January 29, 2023 at 10:58 AM

Reply to comment by PD216ohio in [OC] How news stories evolve in the news cycle by PartisanPlayground

That's right, the labels can change over time as the discussion shifts. GPT-3 does the labeling and I manually adjust it, if necessary. Occasionally, the stories themselves can split or combine.

PartisanPlayground OP t1_j6a13rx wrote on January 28, 2023 at 8:54 PM

Reply to comment by Immarhinocerous in [OC] How news stories evolve in the news cycle by PartisanPlayground

Thank you!

PartisanPlayground OP t1_j6a0zlr wrote on January 28, 2023 at 8:53 PM

Reply to comment by i_build_minds in [OC] How news stories evolve in the news cycle by PartisanPlayground

I've had the same idea. It'd be pretty cool to have this as a big landing page, where hovering over each story on the plot gave you details about the story and links to articles.

PartisanPlayground OP t1_j69zsb9 wrote on January 28, 2023 at 8:45 PM

Reply to comment by byfourness in [OC] How news stories evolve in the news cycle by PartisanPlayground

Common US news outlets, including most in this chart from AllSides: https://www.allsides.com/media-bias/media-bias-chart

PartisanPlayground OP t1_j69w3w8 wrote on January 28, 2023 at 8:20 PM

Reply to comment by Timemaster_2000 in [OC] How news stories evolve in the news cycle by PartisanPlayground

https://www.reddit.com/r/dataisbeautiful/comments/10nfwir/comment/j69go80/

PartisanPlayground OP t1_j69raiq wrote on January 28, 2023 at 7:46 PM

Reply to comment by Astronaut-Frost in [OC] How news stories evolve in the news cycle by PartisanPlayground

Thank you!

PartisanPlayground OP t1_j69r7j3 wrote on January 28, 2023 at 7:45 PM

Reply to comment by Spacegrass1978 in [OC] How news stories evolve in the news cycle by PartisanPlayground

Daily! https://partisanplayground.substack.com

And yes I can post it here once a week.

PartisanPlayground OP t1_j69r2qg wrote on January 28, 2023 at 7:45 PM

Reply to comment by i_build_minds in [OC] How news stories evolve in the news cycle by PartisanPlayground

I'm producing this daily in a Substack newsletter: https://partisanplayground.substack.com

PartisanPlayground OP t1_j69kyys wrote on January 28, 2023 at 7:03 PM

Reply to comment by nervousengrish in [OC] How news stories evolve in the news cycle by PartisanPlayground

What do you mean by grouping the items that emerge later? Thanks for the feedback.

PartisanPlayground OP t1_j69js3w wrote on January 28, 2023 at 6:54 PM

Reply to comment by Segamaike in [OC] How news stories evolve in the news cycle by PartisanPlayground

Glad it looks good. I figured the title of the chart took care of the y-axis, but I'll label it explicitly in the future. I think the x-axis is pretty clear.

PartisanPlayground OP t1_j69jf1h wrote on January 28, 2023 at 6:52 PM

Reply to comment by sohosurf in [OC] How news stories evolve in the news cycle by PartisanPlayground

Of course, thanks for your interest!

PartisanPlayground OP t1_j69ikqe wrote on January 28, 2023 at 6:46 PM

Reply to comment by sohosurf in [OC] How news stories evolve in the news cycle by PartisanPlayground

No worries, here you go:

- 3rd place on Tuesday: Oscar nominations
- 10th place on Monday: Ron Klain resigning as Chief of Staff
- 8th place on Wednesday and Thursday: Biden and the US-Mexico border
- 10th place on Friday: Abortion

PartisanPlayground OP t1_j69i6ji wrote on January 28, 2023 at 6:44 PM

Reply to comment by ajt9000 in [OC] How news stories evolve in the news cycle by PartisanPlayground

Articles from 64 news outlets. I included some explanation in another comment but was downvoted, I think for mentioning my Substack. Lesson learned.

https://www.reddit.com/r/dataisbeautiful/comments/10nfwir/comment/j68fge6/

PartisanPlayground OP t1_j69go80 wrote on January 28, 2023 at 6:34 PM

Reply to comment by sohosurf in [OC] How news stories evolve in the news cycle by PartisanPlayground

Good question! Here they are:

- Oscar nominations

- Ron Klain resigning as Chief of Staff

- Biden and the US-Mexico border

- Abortion

I'm thinking of adding labels to the stories that fell out of the news cycle along the bottom of the chart.

PartisanPlayground OP t1_j69ai2i wrote on January 28, 2023 at 5:52 PM

Reply to comment by janellthegreat in [OC] How news stories evolve in the news cycle by PartisanPlayground

The thickness indicates the share of articles that cover each story. It sounds like I need to make this explicit on the chart.

PartisanPlayground OP t1_j699ivt wrote on January 28, 2023 at 5:45 PM

Reply to comment by R1ppedWarrior in [OC] How news stories evolve in the news cycle by PartisanPlayground

That's fair. "Prevalence" means share of articles that cover each story. This chart looks at the top ten stories on any given day. I can add a clearer explanation to the chart in the future. Thanks for the feedback.

PartisanPlayground OP t1_j68fge6 wrote on January 28, 2023 at 2:08 PM

Reply to [OC] How news stories evolve in the news cycle by PartisanPlayground

Hi Reddit! I've been producing analysis on news stories for awhile now, so I figured I should post to Reddit to get some feedback.

You can find this visualization daily along with others at my Substack: https://partisanplayground.substack.com/

My intention is to add more GPT-driven content in the future.

You can also find these visualizations and analysis of tweets at my Twitter account: https://twitter.com/PartisanPlayG

The source for the data in this visualization is news articles taken from 64 news outlets. Articles are clustered into stories on a daily basis and plotted with ggplot2. This particular visualization uses the ggsankey package.