Comments

You must log in or register to comment.

PartisanPlayground OP t1_j68fge6 wrote

Hi Reddit! I've been producing analysis on news stories for awhile now, so I figured I should post to Reddit to get some feedback.

You can find this visualization daily along with others at my Substack: https://partisanplayground.substack.com/

My intention is to add more GPT-driven content in the future.

You can also find these visualizations and analysis of tweets at my Twitter account: https://twitter.com/PartisanPlayG

The source for the data in this visualization is news articles taken from 64 news outlets. Articles are clustered into stories on a daily basis and plotted with ggplot2. This particular visualization uses the ggsankey package.

−13

Jeepcomplex t1_j68lrgf wrote

Interesting how a gang of cops publicly executing a citizen via torture can dominate the news

174

hearmenowboi t1_j68q0ke wrote

They only have the bandwidth for 10 stories at a time. 10 stories on rotation over 24 hours.

153

rramosbaez t1_j68wn5s wrote

Nothing about guy killed protesting cop city? Thats all over my feeds, but i guess it makes sense big media wont run it

8

R1ppedWarrior t1_j68xt9j wrote

It seems weird to have data on a chart that is never labeled.

843

MEMENARDO_DANK_VINCI t1_j68yhf5 wrote

You’re looking at it from your brain and not the populations brain, if it was more likely to make money with a million different stories it would. All of humanity only has so much room in their head for the news and the news knows that if they want any two people to be discussing and thinking the same things at the same time, otherwise being valuable as “news” then they’ll have to focus their attention

I’d also like to point out that these are the stories that are the top 10 not the thousands of other stories made every single day and fighting for these slots

18

ar243 t1_j6922r3 wrote

It's crazy that Ukraine isn't at least #2.

I hope we don't run out of steam supporting them.

−7

Cash907 t1_j6953m1 wrote

Wow this style of chart sucks. Thanks for making my eyes hurt first thing in the morning.

80

I_eat_dookies t1_j695epj wrote

This is the most garbage format to present data. It isn't easy to read or digest, just use a different chart cause this shit is bunk af.

37

sakezx t1_j695h8d wrote

Which news? Where? Another prime example of r/usdefaultism

9

OhGodNotAnotherOne t1_j696sax wrote

Sure but stuff like this usually ends up plastered all over. Memes, serious discussions, etc.

I was surprised to just see one post.

Apparently I'm alone in thinking it's an interesting development, perhaps it's better to forget the whole thing and just move on to something important, like changing how M&Ms dress?

5

janellthegreat t1_j697355 wrote

What does the thickness of the color band indicate?

71

cnorw00d t1_j69b16g wrote

I'd get that for a news show like ABC or CBS but fox and CNN are 24 hour news networks. They have the opportunity to introduce new conversations instead of focusing on the same few ( and dumb most of the time) stories and they mostly do it to keep people watching for advertising. They don't even come at it with differing perspectives beyond what is considered the mainstream left and right.

In the age of the internet we should know that people are capable of focusing on multiple things. Tik tok should show you that there are worldwide superstars you may have never even heard of

1

PartisanPlayground OP t1_j69go80 wrote

Good question! Here they are:

- Oscar nominations

- Ron Klain resigning as Chief of Staff

- Biden and the US-Mexico border

- Abortion

I'm thinking of adding labels to the stories that fell out of the news cycle along the bottom of the chart.

109

Jeb_Kerman1 t1_j69i7s9 wrote

It’s fucked up that a shooting isn’t in the top three for a week

4

PartisanPlayground OP t1_j69ikqe wrote

No worries, here you go:

- 3rd place on Tuesday: Oscar nominations
- 10th place on Monday: Ron Klain resigning as Chief of Staff
- 8th place on Wednesday and Thursday: Biden and the US-Mexico border
- 10th place on Friday: Abortion

52

Segamaike t1_j69jg10 wrote

I mean. It looks lovely, but how is it surprising to you that a chart needs both axes to be labeled? Data is pointless if it’s not measurable and your most important metric isn’t even in the presentation

59

Powerzap t1_j69jv38 wrote

No, it’s the American defaultists who are stupid. They are completely ignorant of the fact that there are people on the internet who are not in America. This mentality extends way beyond Reddit.

Re Reddit being an ‘American website’. Sure, it’s hosted in the US - but a considerable 46% of users are not there.

10

nervousengrish t1_j69kqk0 wrote

Label on the left and group items that emerge later and don’t grab as much share (Israel, Paul Pelosi, etc). It’s almost unreadable otherwise.

1

victory-or-death t1_j69ljyx wrote

This is so so difficult to process, I barely know what I’m looking at let alone how to understand this

6

Denziloe t1_j69m92p wrote

Whatever insight this is supposed to convey, I'm not getting it.

2

Astronaut-Frost t1_j69mno7 wrote

I subscribed. I really enjoy this. I'd love for it to become even more in depth with filters for specific topics.

1

myasterism t1_j69msvs wrote

5 black cops in the same city where civil rights activist Martin Luther king jr was killed, beat to death a 140lb black man who was unarmed and crying out for his mother, after they detained him for no good reason. The five black cops were unceremoniously denounced by the police department (the only “good” thing to happen in this whole mess), while the white officer whose body-cam footage has been all over the place was not disciplined, despite the lack of any attempt on his part to intervene in the cold-blooded killing his coworkers were committing.

It’s heinous.

ETA: I grew up in Memphis (where this happened) and wish I could say that I’m even slightly surprised by it, but it’s heartbreakingly and infuriatingly predictable.

19

Valhallapeenyo t1_j69myte wrote

Yeah, there’s no denying it at this point. These cops absolutely brutalized this kid, 110% murdered him. The lack of coverage this story has gotten compared to…. Similar incidents… is absolutely fuckin mind blowing.

5

KrozJr_UK t1_j69nbag wrote

I don’t mind a preponderance of American-centric views. What I ask for is two simple letters. Compare: “How news stories evolve in the news cycle” to “How news stories evolve in the US news cycle”. Real big difference, right?

11

thepancakehouse t1_j69p58c wrote

And not a fucking drop of information about the Pfizer employee

2

wagonmaker85 t1_j69pjkf wrote

*in the United States

I’ve seen one of those news stories at all in the past week. A fleeting mention of three others. And never even heard of the rest.

45

Spacegrass1978 t1_j69pml0 wrote

Holy F can I get this updated every week plz???!!!

−1

wagonmaker85 t1_j69pum7 wrote

This sub is about data. Well-presented data should be clear and unambiguous. The reader should have no doubt about what they are reading. Leaving out the fact that this is American, anywhere on the post, is simply sub-par data analysis.

15

vberl t1_j69r2dy wrote

Talk about an America-centric post…

7

tudorcat t1_j69rdvq wrote

Can you clarify whether you're just looking at the US, worldwide, or what?

2

TightEntry t1_j69t5xs wrote

It’s also because the cops were fired and charged with murder. What is the impulse to protest/riot?

Yeah what happened is bad, and it casts yet another harsh light on the realities of police in America, but at least it is being recognized for the heinous act that it is.

There is no “following my training/feared for my life” defense. Police unions aren’t lining up to defend the murderers and the police chief called the act out for being particularly bad.

I guess this is progress since the Rodney King beating.

13

Siom_one t1_j69tb1q wrote

And not one word about the VA girl who got trafficked in Baltimore because the judge "couldn't return the child to her parents" due to them knowing what a boy and a girl is.

−1

Timemaster_2000 t1_j69vseo wrote

What are those 4 news stories that just fizzle it before the end of the graph.

5

Kazko25 t1_j69w8xn wrote

Pretty unreadable, the colors are too similar to each other

14

epomzo t1_j69wra2 wrote

Exactly. There are 5 subtly different greens, 3 different mauve-purples, 2 of orange-red, and one light blue. These are different pastel tints in the same washed-out shade. Guaranteed this is not ADA compliant.

9

grubas t1_j69yvpx wrote

It's the video.

It was always a story, fairly high up, but it wasn't THE story until the video dropped. The fact that they fired and went after the cops FAST got a lot of attention. Then the police, feds, and family asking people to remain calm and not riot 24 hours before release meant it was going to be bad.

Now it's going to be a dragged out legal process, which is much less easy to push compared to a 10 minute execution.

3

nervousengrish t1_j6a09hf wrote

The Paul Pelosi and Israel stories only appear for one day and don’t make much of an impact on the story you’re trying to tell about longevity in the news cycle (there’s just not enough data).

I would group them into a single bucket labeled ‘Emerging Stories’ or something.

1

GenerikDavis t1_j6a3bvj wrote

As the other people said, a 29 year old was beat to death.

Skip to ~38 minutes for a camera on a street light(I assume) that shows the whole thing. The first 38 minutes have the initial encounter before Tyre runs from them since he's scared for his life, and then 2 perspectives of them catching up to him and beating him. To warn you ahead of time though, it's literally them beating someone to the point that they died 3 days later. Like a soccer kick to the head, a cop breaking his baton from hitting him so hard, and 2 cops holding Tyre up while another punches him in the face. Its a top contender for the worst police behavior I've seen on camera.

https://www.localmemphis.com/article/news/local/video-released-tyre-nichols/522-bafcab5c-cc2d-4d87-a0b9-7f9bf02ca5bb

4

Danny_ODevin t1_j6a47zi wrote

Not really. "Prevalence" seems common sense but is easy to interpret differently from person to person, and easy to be misrepresented on a chart built off of quantitation yet doesn't actually quantify anything. But sure, it's pretty.

1

despejado t1_j6a72jr wrote

Monterrey park shooting… wow tells me all I need to know about this society.

2

DirtyGoo t1_j6a77e1 wrote

Really hope the lies from George Santos keep surfacing. It just gets more bonkers every time and is immensely entertaining.

1

iiioiia t1_j6a7iwl wrote

> I'm thinking of adding labels to the stories that fell out of the news cycle along the bottom of the chart.

I think it would be interesting to manually tag various events and then see if there is any temporal correlation between tags of certain types over long periods of time - for example: political scandals may coincidentally be commonly followed shortly by "social" scandals.

LOTS more could be done in this space, especially if one isn't too concerned for their health if you know what I mean.

3

iiioiia t1_j6a8wx2 wrote

What's interesting is that "made a point" does not have one single implementation, but this tends to be not how it appears to a person performing an implementation.

I mean come on, this is a data science subreddit, not /r/politics.

0

levoniust t1_j6aa85t wrote

Is there a website for this? I would love to go back in history even a few months back and take a look at what was super popular in a chart like this. Or just every day see a snapshot of what is being popular. Having something that is arguably unbiased and simply showing the topics that are most popular I find quite intriguing and useful.

1

Plokmijn27 t1_j6aak6e wrote

ukraine tanks and debt ceiling look interesting

1

merc08 t1_j6ab510 wrote

Discovering something like that seems like that kind of thing that would drive a person to commit suicide by shooting themselves three times in the back of the head.

2

MEMENARDO_DANK_VINCI t1_j6abvab wrote

No, you come on and use your words to describe the point YOU want to make. Arguing by allusion is not arguing in good faith. Any Interpretor would have to parse their thoughts through you not being forthcoming with yours.

0

iiioiia t1_j6acsoo wrote

> No, you come on and use your words to describe the point YOU want to make.

I've made it above, you are welcome to do with it as you please.

> Arguing by allusion is not arguing in good faith.

lol, memes are not effective on me, though I suspect they'll be rather influential on 3rd party observers (which is the point perhaps?).

> Any Interpretor would have to parse their thoughts through you not being forthcoming with yours.

Oh God....how do you know how people you've never met would experience the situation?

Sir: are you putting me on?

0

MEMENARDO_DANK_VINCI t1_j6ad9ck wrote

What? How do I know that the human psyche changes when met with unknowns vs knowns?

Making a point to yourself is what you did, troll me is also what you did. I didn’t state a meme I informed you that your method of argument is lackluster in a polite discussion, though I didn’t use so many words.

It’s okay if you can’t articulate your point well just try your best! And I’ll help you refine what you actually wanted to say :)

2

iiioiia t1_j6aedge wrote

> What? How do I know that the human psyche changes when met with unknowns vs knowns?

You very well may not know it....it's not exactly common knowledge!

> Making a point to yourself is what you did, troll me is also what you did.

Here you are describing your experience. The experiences of others (more commonly known as "reality", or what "is") are not necessarily the same.

> I didn’t state a meme I informed you that your method of argument is lackluster in a polite discussion, though I didn’t use so many words.

"Not arguing in good faith" is a meme.

meme:

  • an image, video, piece of text, etc., typically humorous in nature, that is copied and spread rapidly by internet users, often with slight variations.

  • an element of a culture or system of behavior passed from one individual to another by imitation or other nongenetic means.

> It’s okay if you can’t articulate your point well just try your best! And I’ll help you refine what you actually wanted to say :)

Haha, I love it!! 🙏 You just earned yourself an updoot, partner!

1

OhGodNotAnotherOne t1_j6ajsu3 wrote

I'm speaking about my experience, not yours.

I don't even know who you are or your reddit habits to even begin to tell why you saw hundreds of posts and I only saw one at the tjme (3 now, since I originally posted, including the mugshot post that's at the top now).

−5

thepancakehouse t1_j6b1jsc wrote

Oh God, it's gotten worse in just 24hrs. Of course only some wacked out conservative source (fox) is the only source. I know Newsweek (not crazy reputable) had an article. If nothing else, the 10min video where the Pfizer guy, Walker absolutely spilling all of the beans, made its rounds on twitter. I'll see what I can do. This is my point though. They are burying it; making it out to be nothing. This is cahoots on a legitimately high level. Why can't it be discussed?!?! Why can't we ask questions and get answers from the source and not simply a lack of existence of information?! There's no reason the Pelosi tape should have more coverage than this. That attack is OLD NEWS and there is a metric ton of stuff news media will talk at length about even though they know literally nothing about it.

0

Dry_Inflation_861 t1_j6bhyvc wrote

TIL this sub is a bunch of prima donnas and cynical assholes. This chart is nice and interesting but definitely has it's flaws. I think the color palette makes it a little difficult to follow backwards and the trump/ desantis / gop/ future I don't quite understand, the other topics don't seem to be combined like that. But man, you don't deserve the hate you're getting.

2

PD216ohio t1_j6by3tg wrote

I noticed that from Jan 24-25 the title of the top story changed from Biden Documents to Classified documents. Was this because the subject of the discussion shifted in the media, or was the title changed for another reason. It is still the same "line" of the graph, just renamed.

1

hellopomelo t1_j6byuis wrote

Am I the only one who thinks the colors are, for once, pretty good?

1

justennn t1_j6c3iib wrote

This is meaningless without a y axis label and scale

1

FixSwords t1_j6cd16z wrote

Is it possible there’s not really much of story there? Or at least not enough verified info/identities for journalists to report on?

I’m not trying to discount the story, it’s just this reads a little like some conspiracy claims can sound where the lack of evidence is cited as proof of the conspiracy.

I don’t know how ‘they’ would bury it from the likes of journalists if it was all over Twitter?

2

ianhillmedia t1_j6cydkh wrote

Hey there, journalist here with 20+ years experience in the news industry, including 10+ years in digital news. To fairly consider the “prevalence” of news stories and how they progress in the news cycle you’d need a much bigger dataset. Right now, as you’ve described it in other comments, a more accurate title for this chart is: “How the top 10 articles were curated and ranked on homepages of 64 U.S. general interest national(?) news sites when the researcher looked at them.” Correct?

That can still be interesting data, but it doesn’t truly reflect the which stories are most “prevalent” in the news.

Know that news website homepages are one of a few (many?) places people consume news online. Google Analytics, which many news organizations use to track success online, reports acquisition channels as Search, Social, Referral and Direct. The percentage of users that see or are delivered (“prevalence”) and consume news via each channel varies by news organization, but a news org with a legacy brand will get a healthy percentage of traffic from each. People that come to news homepages are a subset of Direct traffic, which also includes users who just type in Mylocalsite.com/weather, for example.

Direct traffic also can include visitors to news organization mobile apps, which can be curated differently from website homepages. Direct traffic does not include people who read push alerts from mobile apps but don’t click through, and what those folks see also should be considered when determining the “prevalence” of news stories online. Referrals, meanwhile, can include visitors who click through from news organization email newsletters, which are often written and curated differently from homepages.

So the stories “prevalent” to U.S. news consumers can be different based on the platform on which they’re delivered news.

That brings us to Social and Search, both of which send healthy traffic to U.S. news sites and play a noteworthy role in determining the “prevalence” of news for Americans. Pew research reported in September indicated that 50% of U.S. adults get news from social media sometimes or often. 82% of American adults use YouTube; 25% of those users say they regularly get news on the site. 70% of American adults use Facebook, and 31% of those users regularly get news on the site. 30% of American adults use TikTok, and 10% of those users regularly get news on the site.

So to really track and report the “prevalence” of news stories, you’d also need to track and report which stories are delivered and consumed on social, and that delivery is determined in large part by social network algorithms powered by user behavior. Which is in part how we get vertical communities on social (“BookTok,” “Black Twitter.”) The prevalence of news stories in those communities can be community-dependent.

For Search, the good news is that data on what news stories people seek out and are delivered is available from Google Trends. That said, I’d suggest reading the Google Trends help docs before digging into and reporting that data. You need to know what relevance means to Google when looking at those numbers.

Those are just the differences in digital formats that need to be considered when researching the “prevalence” of news stories. We haven’t even discussed that to really measure “prevalence” you’d need to consider what’s in print editions and broadcast newscasts, both of which still help determine the news agenda for the country. We haven’t discussed the role that consumers play on setting the agenda - the number of people clicking on a story on other acquisition channels helps determine if that story is ranked on a homepage, and for how long. And we didn’t discuss the fact that some news organizations are testing personalization of homepages powered by machine learning. What you see on a news homepage can be unique to you and based on how cookies tracked your habits across the web. It might be different from what other visitors see.

It’s also worth noting that 64 news sites may not constitute a useful sample. At a minimum, in the top 100 DMAs in the U.S., there are typically at least four broadcast news websites and one newspaper of record website. That’s 500 local sites that determine the prevalence of news in the U.S. just in the top 100 DMAs. There are 210 Nielsen DMAs in the U.S. Many of those DMAs also are home to hyperlocal startups and alts which also should be considered when tracking and reporting the “prevalence” of news stories. What’s prevalent to people in Cleveland will be different from what’s prevalent to people in Memphis, which will be different from what’s prevalent to people in L.A., etc.

And that’s just the U.S.

That’s not to say that there aren’t worthwhile data-based stories to tell about news consumption and delivery in the U.S. It’s always interesting to learn more about how specific stories are presented by different news organizations on a specific market. You also could subscribe to a bunch of different newsletters and report on what they present, given that newsletters are static. Google Trends data from the previous day also is static.

Here’s the source of the Pew data about news consumption on social: https://www.pewresearch.org/journalism/fact-sheet/social-media-and-news-fact-sheet/

Hope that’s helpful!

2

PartisanPlayground OP t1_j6d2gwe wrote

This is an excellent comment, thank you for this!

I think I need a clearer way of describing "prevalence". This chart is showing the top ten stories by the share of articles written about them, not by the amount that they are consumed. I take articles from 64 sources on every day, cluster them together into "stories", then calculate each story's share based on the number of articles written about it. For example, if there are 1000 articles for a day, and one story has 100 articles written about it, then its share is 10%. Does that make sense?

I've explored measuring consumption of news in the past, and found it to be very difficult! (Facebook's Graph API used to be wide open, so I was able to get likes/engagement on news stories there, but it has since been locked down) Your comment does a great job of explaining the complexity in measuring consumption. You would need to combine:

- GA data from news outlets (which they don't publish)

- Cable news data (sources exist for this, but you would need to make a lot of assumptions to combine this with articles)

- Social media data

And you would need to make a lot of assumptions about what weights to use on each of those. As a result, I'm keeping this simple and focusing on article shares.

I do publish a daily automated Twitter thread on which news outlet gets the most engagement on Twitter. It includes the most liked and ratioed tweets from each "side" of the media. This is limited to Twitter, so does not cover all the channels you described. See an example here: https://twitter.com/PartisanPlayG/status/1619300675094970369

The other thing I've been doing is cutting articles by which "side" of the media they're on using media bias ratings from AllSides. Again, this involves some simplifying assumptions so it's not perfect but gives a good high-level view. You can see examples here: https://partisanplayground.substack.com

Thanks again for your comment. This is exactly the sort of thing I was looking for when I posted.

2

ianhillmedia t1_j6d3dqb wrote

Happy to help! And I think you’re spot on when you say you need to clarify the definition of prevalence. Just because a news org puts resources into a topic doesn’t mean it’s prevalent to the user. That said, the number of stories a news org efforts on a subject is an interesting data point.

As someone on the other side of this, I hear you on the challenges associated with getting useful data. How are you currently tracking all articles published by those news orgs? And how are you parsing that data to identify specific stories - what search terms are you using to filter the data?

2

PartisanPlayground OP t1_j6d6ebz wrote

I'm getting the data from the Google News API. I've used RSS feeds in the past with similar results.

And actually I'm using a clustering algorithm to identify the specific stories. I have an automated process that pulls all articles from the past five days, clusters them into stories, then produces a bunch of analysis. This saves me a lot of time and brings some objectivity to the process.

1

ianhillmedia t1_j6db30j wrote

Got it thanks for the reply! I know not everyone supports RSS, and it’s a challenge when folks format RSS in different ways, but as they’re a primary source from the publisher I’d encourage you to use RSS over APIs from Google.

I was curious the signals in your algorithm as well. One of the challenges with automating taxonomies for news stories is the inexactitude of language and differences in style. A story might mention DeSantis and books in the headline and description but might actually be about GOP primaries; a story might emphasize DeSantis in the primaries in the headline and title but it might actually be about book banning.

Or a better example: a story that mentions Tyre Nichols may be about the actual incident, police violence or defunding the police.

Digging in even further, a local news organization might use colloquialisms for place names that can make it difficult for folks from outside that market to categorize those stories.

2

PartisanPlayground OP t1_j6eo5hz wrote

You're hitting on the most subjective part of this whole process. I've run into all of the issues you describe, and the question is ultimately: how do you define a story?

Your GOP primaries example is a good one. Let's say we have articles on Trump's legal issues, other articles on Pence's classified documents, and other articles on DeSantis and books. Now let's say all of these articles describe these things in the context of the 2024 GOP primaries. Is this one story called "GOP primaries"? Or three separate stories? You could make a case either way.

I've tuned the algorithm to split stories in a way that "looks about right" to me. That's subjective, but there's no way around it. This is an issue whether you're using an algorithm or doing this manually.

A related challenge is that story definitions may change over time. The classified documents story is a good example for this. Right now there are articles on Trump, Biden, and Pence all mishandling classified documents. The algorithm is categorizing all of them as the same story (fair enough).

But let's say that next week (just making this up), Trump gets indicted for it. Is that a separate story now? If so, how do you treat that? Do you retroactively split out the "Trump" portion of the "classified documents" story as though they were not the same story before? Do you show the classified documents story splitting into two? Do you just create a new story on the day the indictment happens? Currently, the algorithm is set up to do the first of these, but again, you could make a case for any of them.

All of this is to say that there is subjectivity involved in this process.

1