gurenkagurenda t1_j9a0m4m wrote on February 20, 2023 at 11:46 AM

> Marconi said he asked the chatbot for a list of news sources it was trained on and received a response naming 20 outlets.

I see absolutely no reason to think that ChatGPT can answer this question accurately, and expect that it is hallucinating this answer. Its training process isn’t something it “remembers” like someone would remember their time in high school. Instead, its thought process is more like “what would a conversational response from a language model look like?”

That’s not to say that it wasn’t trained on those sources, but you have to understand the limitations of the model. Asking it about its training process is like asking a human about their evolutionary history. Unless they’ve been explicitly taught about that, they just don’t know.

Pletter64 t1_j9aabdt wrote on February 20, 2023 at 1:29 PM

Let's not forget you can make chatGPT tell the truth and then gaslight it into saying it is providing fake news. It is extremely good at generating fluff which is also its weakness.

Malkiot t1_j9adba7 wrote on February 20, 2023 at 1:56 PM

Ask it explain anything and cite sources. All of the sources are hallucinated: The BBC exists, but the cites article doesn't.

AntonOlsen t1_j9bopqm wrote on February 20, 2023 at 7:19 PM

>Let's not forget you can make chatGPT tell the truth and then gaslight it into saying it is providing fake news.

Or have it tell a lie, then gaslight you about it.

KSRandom195 t1_j9bkjwf wrote on February 20, 2023 at 6:52 PM

I asked ChatGPT for some info. Then I asked it for links to find more about the topic. Each link it provided was bogus and did not resolve to a page.

Cakeking7878 t1_j9d07xc wrote on February 21, 2023 at 12:44 AM

I actually did get it to generate a random YouTube link that wasn’t dead. It had 4 views and it was some families vacation video from 2015.

However I should stress, it was a random chance after several tries of me asking. Trying to pull factual or useful information from this is a dumb idea at best and a harmful idea at worst

GhostofDownvotes t1_j9e52yr wrote on February 21, 2023 at 6:45 AM

Allegedly, I co-authored papers on organic chemistry with my cat. All the links it provided were real, but there was obviously no mention of organic chemistry or my cat co-authoring anything with me.

Plot twists: maybe I did co-author them with my cat and the competition just wiped my brain. 🧐

whtevn t1_j9bmvng wrote on February 20, 2023 at 7:08 PM

"you have to understand the limitations of the model" should be the singular response to every one of these stupid articles

[deleted] t1_j9atsku wrote on February 20, 2023 at 3:59 PM

Can we all just agree to stop posting 743.6448 articles a day about Chat GPT?

Alternatively a filter to filter all of them out would be lovely too

gurenkagurenda t1_j9aw1wh wrote on February 20, 2023 at 4:14 PM

I think the current volume of ChatGPT articles would actually be tolerable if the media would actually focus on interesting aspects of the subject. But they just keep playing the same four notes over and over agin. At least this one isn't "<recognizable name in tech> thinks <opinion> about ChatGPT, but also says <slightly different opinion>"

Digital_Simian t1_j9dgfqe wrote on February 21, 2023 at 2:49 AM

It's because the articles are written by ChatGPT.

Twombls t1_j9axe91 wrote on February 20, 2023 at 4:23 PM

Soon enough we will get to ignore it. This reminds me of self driving cars in 2015 reddit got flooded with wierd hypebeasts and then the tech progress slowed way down.

The hype is reaching unrealistic levels. Subreddits dedicated to chatgtp and bing are essentially cults who believe its sentient at this point. Soon we will get to the trough of disappointment as this tech gets deployed to the general public and people start finding its faults

AMirrorForReddit t1_j9brf47 wrote on February 20, 2023 at 7:37 PM

...why would you not want to follow arguably the most important technological development of our time? Get used to it honey, and stop crying. It's only going to get worse. You don't get to pick and choose what it relevant, ok?

I just wish people could grow a brain about what the technology is, and stop pretending it is sentient and shit. That's all I desire. But it's never gonna happen.

DrabDonut t1_j9d14nb wrote on February 21, 2023 at 12:51 AM

> the most important technological development of our time

It’s an early 1900s theory with a 1940s application using a 2020 level of data. It’s not much of a technological development beyond how much we could feed it. AI researchers in my department are kind of pissed that LLMs are getting this much attention because an astonishing number of humans are so dumb they can’t pass the mirror test.

AMirrorForReddit t1_j9d2ynu wrote on February 21, 2023 at 1:05 AM

Almost all humans can pass the mirror test.

DrabDonut t1_j9d8xqb wrote on February 21, 2023 at 1:51 AM

Interacting with ChatGPT and other LLMs are just a different kind of mirror test.

AMirrorForReddit t1_j9deh5p wrote on February 21, 2023 at 2:34 AM

How can it simply be a mirror test when a large part of it presents information the average individual does not have?

Just to be clear, I think I see where you are going with that statement, but I disagree that it is nuanced enough to make much sense.

Yeah, people are hopeless, it seems, at understanding what they are working with with ChatGPT.

[deleted] t1_j9clp9t wrote on February 20, 2023 at 10:58 PM

[removed]

egypturnash t1_j99o9kf wrote on February 20, 2023 at 8:54 AM

Oh good, maybe this will result in “fair use” being defined to explicitly not include some asshole scraping the Internet and dumping everything they find into their copyright-washing “AI”.

gurenkagurenda t1_j9adicl wrote on February 20, 2023 at 1:57 PM

I cannot see any possible way to define fair use the way you’re saying which wouldn’t have massive unintended effects. If you want to propose that, you’re going to need to be a hell of a lot more specific than “dumping into an AI” when describing what you think should actually be prohibited.

bairbs t1_j9ag5if wrote on February 20, 2023 at 2:19 PM

Why not? Just say scraping is fine for research and private models. As soon as you release it to the public or try to monetize it, then it's outside of fair use. Just like Nintendo, when they go after passion project games that are similar in theme, style, and mechanics. You can't just take other people's work and make money off of it

gurenkagurenda t1_j9allgk wrote on February 20, 2023 at 3:01 PM

How do you define a model? What statistics are you and are you not allowed to scrape and publish? Comments like yours speak to a misunderstanding of what training is with respect to a work, which is simply nudging some numbers according to the statistical relationships within the text. That’s an incredibly broad category of operations.

For example, if I scrape a large number of pages, and analyze the number of incoming and outgoing links, and how those links relate to other links, in order to build a model that lets me match a phrase to a particular webpage and assess its relevance, is that fair use?

If not, you just outlawed search engines. If so, what principle are you using to distinguish that from model training?

Edit: Gotta love when someone downvotes you in less time than it would take to actually read the comment. Genuine discourse right there.

ImSuperHelpful t1_j9apjid wrote on February 20, 2023 at 3:29 PM

Your argument neglects the business side of the situation which explains the motivations to allow and disallow use in the two scenarios… if I run a content website, a search engine crawling the site so it can generate search results which send traffic to my site is beneficial to both parties, it’s symbiotic.

Alternatively, if I run a content site that an AI company crawls and then uses to train a model which then negates the need for my site to would-be visitors, it’s parasitic.

gurenkagurenda t1_j9avebb wrote on February 20, 2023 at 4:09 PM

I'm not neglecting anything. I'm asking for some semblance of precision in defining model training out of fair use. The purpose and character of use, and the effect on the market are already factors in fair use decisions, but that's a lot more complicated of an issue than "AI models can't scrape content." It's specific to the application, and even for ChatGPT specifically, it would be pretty murky.

ImSuperHelpful t1_j9awd5k wrote on February 20, 2023 at 4:16 PM

Except that’s what was missing from your original point, but either way I gave you a starting point… if it’s beneficial for both parties and both parties consent (which content site operators do via robot.txt instructions), no one has a problem. In the AI case it’s beneficial to the AI creator/owner but harmful to the content owner since the AI is competing with them by using their content, so it shouldn’t be considered free use.

gurenkagurenda t1_j9b9muc wrote on February 20, 2023 at 5:43 PM

>Except that’s what was missing from your original point

Again, it's not missing from my original point, because my original point was to ask how the commenter above was distinguishing these cases. You've given a possible answer. That's an answer to my question, not a rebuttal.

I don't think that answer is very compelling, though. Arguing that an explicitly unreliable chat bot that hallucinates as often as it tells the truth is somehow a competitor to news media etc. is a tall order.

ImSuperHelpful t1_j9biixs wrote on February 20, 2023 at 6:40 PM

I didn’t present it as a rebuttal, I added important context that was missing from your question that makes the answer much more clear.

And these thing are unreliable now, but Microsoft and others are dumping billions of dollars into making them better and they’re doing it for profit. Waiting around until they’re perfected before fighting against the ongoing unfair use of copyrighted content is a sure fire strategy to losing that fight.

gurenkagurenda t1_j9bl2kk wrote on February 20, 2023 at 6:56 PM

What they're dumping money into now on this front are AI enhanced search engines, which are complimentary to the content they're training on.

ImSuperHelpful t1_j9brj6f wrote on February 20, 2023 at 7:38 PM

That’s all they’ve launched on this front, that doesn’t mean it’s all they’re working on

zutnoq t1_j9bhoq8 wrote on February 20, 2023 at 6:34 PM

Search providers like google don't just show you links though. They also show you potentially relevant excerpts so you often don't even need to go to the linked site to get what you were after, and show previews of images in image search etc.

Determining exactly where to draw the line of what to consider fair-use for things like this is a highly complex and dynamic issue. Web search engines are (by necessity) parasitic as well but that alone neither makes them bad nor illegal.

Parasitic is also not the "bad" counterpart of symbiotic. A symbiotic relationship is simply a parasitic relationship that benefits both parties. Just saying parasitic says nothing about which side(s) would benefit. I think exploitative would be a more appropriate word to use for such relationships.

ImSuperHelpful t1_j9bqacf wrote on February 20, 2023 at 7:30 PM

Those relevant excerpts and similar features have been pretty detrimental to search click through rates in certain areas (they’re known as “no click” searches in the industry)… but the alternative is to block google bots entirely, which isn’t viable if you’re operating a content site since google has an effective monopoly on search. Also, those features do still link out to the content they’re showing on the SERP, whereas the chat ai doesn’t and gives the appearance that it’s the source of the information.

Your point about vocabulary is fair

bairbs t1_j9an2db wrote on February 20, 2023 at 3:12 PM

I'm speaking about using copyrighted art, music, etc. I understand what training is. I also understand the steps companies take to prevent even the perception that they're training on copyrighted material. They either generate pseudo data or purchase entire libraries from stock photo sites. OpenAI and by extension, Microsoft are hoping they can get enough people on their side by saying, "Nothing is copyright if you think about it," so they can do whatever they like.

gurenkagurenda t1_j9ang4f wrote on February 20, 2023 at 3:14 PM

None of what you said addresses anything I said in my comment.

bairbs t1_j9aog2w wrote on February 20, 2023 at 3:21 PM

Because I'm not talking about defining a model, I'm talking about scraping copyrighted material. Why would I change the subject to your strawman argument?

gurenkagurenda t1_j9avgd7 wrote on February 20, 2023 at 4:10 PM

So you think that search engines should be considered illegal copyright infringement? You say that you're just referring to scraping content, which is a necessary part of how a search engine works. So I'm forced to assume that the answer is yes.

bairbs t1_j9axo6n wrote on February 20, 2023 at 4:24 PM

Lol, you're the one bringing search engines into this for some reason. It's a disingenuous argument and way off base from my point, which is why I'm not responding to it. You've also found all my comments and responded to them agressuvely like a good shill

gurenkagurenda t1_j9b8p1r wrote on February 20, 2023 at 5:37 PM

>You've also found all my comments and responded to them agressuvely like a good shill

Are you talking about this? You replied to me.

I mean Jesus Christ. Anyway, I'm done trying to explain to the concept of unintended consequences to you.

yUQHdn7DNWr9 t1_j9bn2ue wrote on February 20, 2023 at 7:09 PM

You don’t need permission to read, memorise, analyse, synthesise, learn from, paraphrase, praise or criticise copyrighted text. You need permission to reproduce it. It isn’t obvious to me that a statistical model would need to reproduce the data it is studying.

UmdieEcke2 t1_j9a5bby wrote on February 20, 2023 at 12:40 PM

Yeah, reading things and then using the information is the most deplorable action any actor can do. Thank god humans are above such disgusting behaviour. Imagine the dystopia we would be living in otherwise.

stealytheblackguy t1_j99cby0 wrote on February 20, 2023 at 6:19 AM

They also just straight up used dialogue created by the developers. Chatgpt is heavily biased.

hikeonpast t1_j99d0km wrote on February 20, 2023 at 6:27 AM

“Maybe we can charge AI to read our articles, since humans won’t pay for our content anymore”, said Sensationalist “Biased” McMedia.

professorDissociate t1_j99carm wrote on February 20, 2023 at 6:19 AM

The remediation reporting the medias response to OpenAI including the media in training material. I feel like I’ve heard this one before but with Putin assassinating himself.

Special_Rice9539 t1_j9at12x wrote on February 20, 2023 at 3:53 PM

It turns out that chatGPT isn't actually an AI but just has well-trained staff in the background answering your prompts.

Thebadmamajama t1_j9dwk7a wrote on February 21, 2023 at 5:10 AM

Monkeys with typewriters

littleMAS t1_j9bbixo wrote on February 20, 2023 at 5:55 PM

Imagine that you were a true genius with an amazing 'photographic' memory that could recount almost everything you ever read. Imagine winning awards, getting a premium 'Ivy League' education, publishing award-winning original essays, and becoming a revered scholar. Now, imagine every publication such as the WSJ coming after you for 'using' their published content to make yourself so smart.

Slippedhal0 t1_j99zasl wrote on February 20, 2023 at 11:29 AM

It's the same argument that artist's complaining about using copyrighted artwork as training data.

At some point there will be a major ruling about how companies training AI need to approach copyright for their training data sources, and if they rule in favour of copyright holders it will probably severely slow AI progress as systems to request permission are built.

Although I could maybe see a fine-tuned AI like bing being less affected because it cites sources rather than opaquely uses previously acquired knowledge

gurenkagurenda t1_j9ae1ic wrote on February 20, 2023 at 2:02 PM

I don’t think it will slow AI at this point, so much as it will concentrate control over AI even more into the hands of well funded, established players. OpenAI has already hired an army of software developer contractors to produce training data for Codex. The same could be done even more cheaply for writers. The technology is proven now, so there’s no risk anymore. We know that you just need the training data.

So the upshot would just be a higher barrier to entry. Training a new model means not only funding the compute, but also paying to create the training set.

bairbs t1_j9agwyq wrote on February 20, 2023 at 2:25 PM

Exactly. This is what big tech has been doing already to create legal and ethical data.

The training data is the bottleneck. OpenAI is trying to see if they can pull a fast one by releasing models using copyrighted material

gurenkagurenda t1_j9amk4h wrote on February 20, 2023 at 3:08 PM

They’re not “pulling a fast one”. There’s no precedent here, and there’s a boatload of lawyers who agree that this is fair use. There are also a number who believe that it won’t be. The courts will have to figure it out, but until then, nobody knows how it will play out.

bairbs t1_j9ao00n wrote on February 20, 2023 at 3:18 PM

They actually are. The precedent has been to use public domain material (which is why there are so many fine art style GANs), create your own data, pay for data to be created, pay for existing data, or keep the models private. There are plenty more artists and other jobs than lawyers who know this isn't fair use and will be negatively impacted if these companies are allowed to continue this practice.

gurenkagurenda t1_j9av16n wrote on February 20, 2023 at 4:07 PM

That's not what I mean by precedent. I mean that there is no legal precedent.

bairbs t1_j9aygil wrote on February 20, 2023 at 4:30 PM

Lol, if you think these huge companies don't have teams of lawyers advising them on how to legally create models, you're nuts. OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.

But keep doing your own research. Maybe they'll hire you (or maybe they already do)

gurenkagurenda t1_j9b8gzz wrote on February 20, 2023 at 5:35 PM

> OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.

Please cite the case that you're talking about which you claim sets this precedent. Thanks.

bairbs t1_j9agew9 wrote on February 20, 2023 at 2:21 PM

People can do whatever they want with copyright privately. It's when you release the work or try to commercialize it that causes the problems. Nothing is stopping AI companies from scraping and training all day. In order to release it, they should compensate the copyright holders

Slippedhal0 t1_j9asf0r wrote on February 20, 2023 at 3:49 PM

Technically thats not correct, its just very hard to enforce private use. For example, if you copy a movie, even for prvate use(except very specific circumstances) thats illegal, and people have been charged.

That said, the public release point is what I was thinking of anyway.

bairbs t1_j9awnxc wrote on February 20, 2023 at 4:18 PM

Technically, if you bought the movie, you could copy it for your own use. You just can't share it, which to your point is very hard to enforce for private use outside of the internet.

I'm thinking of fair use when I say "do whatever they want with copyright privately"

[deleted] t1_j9a2xoy wrote on February 20, 2023 at 12:14 PM

[removed]

Lick-a-Leper t1_j9b4eo4 wrote on February 20, 2023 at 5:09 PM

There is a decent portion of internet articles and opinion pieces written by AI already . It's been happening for a few years. It's interesting that AI is teaching AI to be flawed

[deleted] t1_j9dhrwx wrote on February 21, 2023 at 2:59 AM

[deleted]

Sigma_Atheist t1_j9avhlh wrote on February 20, 2023 at 4:10 PM

God forbid anyone actually read our articles. After all, what are headlines for?