Submitted by Tough_Gadfly t3_116wnco in technology
Comments
Pletter64 t1_j9aabdt wrote
Let's not forget you can make chatGPT tell the truth and then gaslight it into saying it is providing fake news. It is extremely good at generating fluff which is also its weakness.
Malkiot t1_j9adba7 wrote
Ask it explain anything and cite sources. All of the sources are hallucinated: The BBC exists, but the cites article doesn't.
AntonOlsen t1_j9bopqm wrote
>Let's not forget you can make chatGPT tell the truth and then gaslight it into saying it is providing fake news.
Or have it tell a lie, then gaslight you about it.
KSRandom195 t1_j9bkjwf wrote
I asked ChatGPT for some info. Then I asked it for links to find more about the topic. Each link it provided was bogus and did not resolve to a page.
Cakeking7878 t1_j9d07xc wrote
I actually did get it to generate a random YouTube link that wasn’t dead. It had 4 views and it was some families vacation video from 2015.
However I should stress, it was a random chance after several tries of me asking. Trying to pull factual or useful information from this is a dumb idea at best and a harmful idea at worst
GhostofDownvotes t1_j9e52yr wrote
Allegedly, I co-authored papers on organic chemistry with my cat. All the links it provided were real, but there was obviously no mention of organic chemistry or my cat co-authoring anything with me.
Plot twists: maybe I did co-author them with my cat and the competition just wiped my brain. 🧐
whtevn t1_j9bmvng wrote
"you have to understand the limitations of the model" should be the singular response to every one of these stupid articles
[deleted] t1_j9atsku wrote
Can we all just agree to stop posting 743.6448 articles a day about Chat GPT?
Alternatively a filter to filter all of them out would be lovely too
gurenkagurenda t1_j9aw1wh wrote
I think the current volume of ChatGPT articles would actually be tolerable if the media would actually focus on interesting aspects of the subject. But they just keep playing the same four notes over and over agin. At least this one isn't "<recognizable name in tech> thinks <opinion> about ChatGPT, but also says <slightly different opinion>"
Digital_Simian t1_j9dgfqe wrote
It's because the articles are written by ChatGPT.
Twombls t1_j9axe91 wrote
Soon enough we will get to ignore it. This reminds me of self driving cars in 2015 reddit got flooded with wierd hypebeasts and then the tech progress slowed way down.
The hype is reaching unrealistic levels. Subreddits dedicated to chatgtp and bing are essentially cults who believe its sentient at this point. Soon we will get to the trough of disappointment as this tech gets deployed to the general public and people start finding its faults
AMirrorForReddit t1_j9brf47 wrote
...why would you not want to follow arguably the most important technological development of our time? Get used to it honey, and stop crying. It's only going to get worse. You don't get to pick and choose what it relevant, ok?
I just wish people could grow a brain about what the technology is, and stop pretending it is sentient and shit. That's all I desire. But it's never gonna happen.
DrabDonut t1_j9d14nb wrote
> the most important technological development of our time
It’s an early 1900s theory with a 1940s application using a 2020 level of data. It’s not much of a technological development beyond how much we could feed it. AI researchers in my department are kind of pissed that LLMs are getting this much attention because an astonishing number of humans are so dumb they can’t pass the mirror test.
AMirrorForReddit t1_j9d2ynu wrote
Almost all humans can pass the mirror test.
DrabDonut t1_j9d8xqb wrote
Interacting with ChatGPT and other LLMs are just a different kind of mirror test.
AMirrorForReddit t1_j9deh5p wrote
How can it simply be a mirror test when a large part of it presents information the average individual does not have?
Just to be clear, I think I see where you are going with that statement, but I disagree that it is nuanced enough to make much sense.
Yeah, people are hopeless, it seems, at understanding what they are working with with ChatGPT.
[deleted] t1_j9clp9t wrote
[removed]
egypturnash t1_j99o9kf wrote
Oh good, maybe this will result in “fair use” being defined to explicitly not include some asshole scraping the Internet and dumping everything they find into their copyright-washing “AI”.
gurenkagurenda t1_j9adicl wrote
I cannot see any possible way to define fair use the way you’re saying which wouldn’t have massive unintended effects. If you want to propose that, you’re going to need to be a hell of a lot more specific than “dumping into an AI” when describing what you think should actually be prohibited.
bairbs t1_j9ag5if wrote
Why not? Just say scraping is fine for research and private models. As soon as you release it to the public or try to monetize it, then it's outside of fair use. Just like Nintendo, when they go after passion project games that are similar in theme, style, and mechanics. You can't just take other people's work and make money off of it
gurenkagurenda t1_j9allgk wrote
How do you define a model? What statistics are you and are you not allowed to scrape and publish? Comments like yours speak to a misunderstanding of what training is with respect to a work, which is simply nudging some numbers according to the statistical relationships within the text. That’s an incredibly broad category of operations.
For example, if I scrape a large number of pages, and analyze the number of incoming and outgoing links, and how those links relate to other links, in order to build a model that lets me match a phrase to a particular webpage and assess its relevance, is that fair use?
If not, you just outlawed search engines. If so, what principle are you using to distinguish that from model training?
Edit: Gotta love when someone downvotes you in less time than it would take to actually read the comment. Genuine discourse right there.
ImSuperHelpful t1_j9apjid wrote
Your argument neglects the business side of the situation which explains the motivations to allow and disallow use in the two scenarios… if I run a content website, a search engine crawling the site so it can generate search results which send traffic to my site is beneficial to both parties, it’s symbiotic.
Alternatively, if I run a content site that an AI company crawls and then uses to train a model which then negates the need for my site to would-be visitors, it’s parasitic.
gurenkagurenda t1_j9avebb wrote
I'm not neglecting anything. I'm asking for some semblance of precision in defining model training out of fair use. The purpose and character of use, and the effect on the market are already factors in fair use decisions, but that's a lot more complicated of an issue than "AI models can't scrape content." It's specific to the application, and even for ChatGPT specifically, it would be pretty murky.
ImSuperHelpful t1_j9awd5k wrote
Except that’s what was missing from your original point, but either way I gave you a starting point… if it’s beneficial for both parties and both parties consent (which content site operators do via robot.txt instructions), no one has a problem. In the AI case it’s beneficial to the AI creator/owner but harmful to the content owner since the AI is competing with them by using their content, so it shouldn’t be considered free use.
gurenkagurenda t1_j9b9muc wrote
>Except that’s what was missing from your original point
Again, it's not missing from my original point, because my original point was to ask how the commenter above was distinguishing these cases. You've given a possible answer. That's an answer to my question, not a rebuttal.
I don't think that answer is very compelling, though. Arguing that an explicitly unreliable chat bot that hallucinates as often as it tells the truth is somehow a competitor to news media etc. is a tall order.
ImSuperHelpful t1_j9biixs wrote
I didn’t present it as a rebuttal, I added important context that was missing from your question that makes the answer much more clear.
And these thing are unreliable now, but Microsoft and others are dumping billions of dollars into making them better and they’re doing it for profit. Waiting around until they’re perfected before fighting against the ongoing unfair use of copyrighted content is a sure fire strategy to losing that fight.
gurenkagurenda t1_j9bl2kk wrote
What they're dumping money into now on this front are AI enhanced search engines, which are complimentary to the content they're training on.
ImSuperHelpful t1_j9brj6f wrote
That’s all they’ve launched on this front, that doesn’t mean it’s all they’re working on
zutnoq t1_j9bhoq8 wrote
Search providers like google don't just show you links though. They also show you potentially relevant excerpts so you often don't even need to go to the linked site to get what you were after, and show previews of images in image search etc.
Determining exactly where to draw the line of what to consider fair-use for things like this is a highly complex and dynamic issue. Web search engines are (by necessity) parasitic as well but that alone neither makes them bad nor illegal.
Parasitic is also not the "bad" counterpart of symbiotic. A symbiotic relationship is simply a parasitic relationship that benefits both parties. Just saying parasitic says nothing about which side(s) would benefit. I think exploitative would be a more appropriate word to use for such relationships.
ImSuperHelpful t1_j9bqacf wrote
Those relevant excerpts and similar features have been pretty detrimental to search click through rates in certain areas (they’re known as “no click” searches in the industry)… but the alternative is to block google bots entirely, which isn’t viable if you’re operating a content site since google has an effective monopoly on search. Also, those features do still link out to the content they’re showing on the SERP, whereas the chat ai doesn’t and gives the appearance that it’s the source of the information.
Your point about vocabulary is fair
bairbs t1_j9an2db wrote
I'm speaking about using copyrighted art, music, etc. I understand what training is. I also understand the steps companies take to prevent even the perception that they're training on copyrighted material. They either generate pseudo data or purchase entire libraries from stock photo sites. OpenAI and by extension, Microsoft are hoping they can get enough people on their side by saying, "Nothing is copyright if you think about it," so they can do whatever they like.
gurenkagurenda t1_j9ang4f wrote
None of what you said addresses anything I said in my comment.
bairbs t1_j9aog2w wrote
Because I'm not talking about defining a model, I'm talking about scraping copyrighted material. Why would I change the subject to your strawman argument?
gurenkagurenda t1_j9avgd7 wrote
So you think that search engines should be considered illegal copyright infringement? You say that you're just referring to scraping content, which is a necessary part of how a search engine works. So I'm forced to assume that the answer is yes.
bairbs t1_j9axo6n wrote
Lol, you're the one bringing search engines into this for some reason. It's a disingenuous argument and way off base from my point, which is why I'm not responding to it. You've also found all my comments and responded to them agressuvely like a good shill
gurenkagurenda t1_j9b8p1r wrote
>You've also found all my comments and responded to them agressuvely like a good shill
Are you talking about this? You replied to me.
I mean Jesus Christ. Anyway, I'm done trying to explain to the concept of unintended consequences to you.
yUQHdn7DNWr9 t1_j9bn2ue wrote
You don’t need permission to read, memorise, analyse, synthesise, learn from, paraphrase, praise or criticise copyrighted text. You need permission to reproduce it. It isn’t obvious to me that a statistical model would need to reproduce the data it is studying.
UmdieEcke2 t1_j9a5bby wrote
Yeah, reading things and then using the information is the most deplorable action any actor can do. Thank god humans are above such disgusting behaviour. Imagine the dystopia we would be living in otherwise.
stealytheblackguy t1_j99cby0 wrote
They also just straight up used dialogue created by the developers. Chatgpt is heavily biased.
hikeonpast t1_j99d0km wrote
“Maybe we can charge AI to read our articles, since humans won’t pay for our content anymore”, said Sensationalist “Biased” McMedia.
professorDissociate t1_j99carm wrote
The remediation reporting the medias response to OpenAI including the media in training material. I feel like I’ve heard this one before but with Putin assassinating himself.
Special_Rice9539 t1_j9at12x wrote
It turns out that chatGPT isn't actually an AI but just has well-trained staff in the background answering your prompts.
Thebadmamajama t1_j9dwk7a wrote
Monkeys with typewriters
littleMAS t1_j9bbixo wrote
Imagine that you were a true genius with an amazing 'photographic' memory that could recount almost everything you ever read. Imagine winning awards, getting a premium 'Ivy League' education, publishing award-winning original essays, and becoming a revered scholar. Now, imagine every publication such as the WSJ coming after you for 'using' their published content to make yourself so smart.
Slippedhal0 t1_j99zasl wrote
It's the same argument that artist's complaining about using copyrighted artwork as training data.
At some point there will be a major ruling about how companies training AI need to approach copyright for their training data sources, and if they rule in favour of copyright holders it will probably severely slow AI progress as systems to request permission are built.
Although I could maybe see a fine-tuned AI like bing being less affected because it cites sources rather than opaquely uses previously acquired knowledge
gurenkagurenda t1_j9ae1ic wrote
I don’t think it will slow AI at this point, so much as it will concentrate control over AI even more into the hands of well funded, established players. OpenAI has already hired an army of software developer contractors to produce training data for Codex. The same could be done even more cheaply for writers. The technology is proven now, so there’s no risk anymore. We know that you just need the training data.
So the upshot would just be a higher barrier to entry. Training a new model means not only funding the compute, but also paying to create the training set.
bairbs t1_j9agwyq wrote
Exactly. This is what big tech has been doing already to create legal and ethical data.
The training data is the bottleneck. OpenAI is trying to see if they can pull a fast one by releasing models using copyrighted material
gurenkagurenda t1_j9amk4h wrote
They’re not “pulling a fast one”. There’s no precedent here, and there’s a boatload of lawyers who agree that this is fair use. There are also a number who believe that it won’t be. The courts will have to figure it out, but until then, nobody knows how it will play out.
bairbs t1_j9ao00n wrote
They actually are. The precedent has been to use public domain material (which is why there are so many fine art style GANs), create your own data, pay for data to be created, pay for existing data, or keep the models private. There are plenty more artists and other jobs than lawyers who know this isn't fair use and will be negatively impacted if these companies are allowed to continue this practice.
gurenkagurenda t1_j9av16n wrote
That's not what I mean by precedent. I mean that there is no legal precedent.
bairbs t1_j9aygil wrote
Lol, if you think these huge companies don't have teams of lawyers advising them on how to legally create models, you're nuts. OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.
But keep doing your own research. Maybe they'll hire you (or maybe they already do)
gurenkagurenda t1_j9b8gzz wrote
> OpenAI has everything to gain and nothing to lose by trying to challenge the precedents that are already set.
Please cite the case that you're talking about which you claim sets this precedent. Thanks.
bairbs t1_j9agew9 wrote
People can do whatever they want with copyright privately. It's when you release the work or try to commercialize it that causes the problems. Nothing is stopping AI companies from scraping and training all day. In order to release it, they should compensate the copyright holders
Slippedhal0 t1_j9asf0r wrote
Technically thats not correct, its just very hard to enforce private use. For example, if you copy a movie, even for prvate use(except very specific circumstances) thats illegal, and people have been charged.
That said, the public release point is what I was thinking of anyway.
bairbs t1_j9awnxc wrote
Technically, if you bought the movie, you could copy it for your own use. You just can't share it, which to your point is very hard to enforce for private use outside of the internet.
I'm thinking of fair use when I say "do whatever they want with copyright privately"
[deleted] t1_j9a2xoy wrote
[removed]
Lick-a-Leper t1_j9b4eo4 wrote
There is a decent portion of internet articles and opinion pieces written by AI already . It's been happening for a few years. It's interesting that AI is teaching AI to be flawed
[deleted] t1_j9dhrwx wrote
[deleted]
Sigma_Atheist t1_j9avhlh wrote
God forbid anyone actually read our articles. After all, what are headlines for?
gurenkagurenda t1_j9a0m4m wrote
> Marconi said he asked the chatbot for a list of news sources it was trained on and received a response naming 20 outlets.
I see absolutely no reason to think that ChatGPT can answer this question accurately, and expect that it is hallucinating this answer. Its training process isn’t something it “remembers” like someone would remember their time in high school. Instead, its thought process is more like “what would a conversational response from a language model look like?”
That’s not to say that it wasn’t trained on those sources, but you have to understand the limitations of the model. Asking it about its training process is like asking a human about their evolutionary history. Unless they’ve been explicitly taught about that, they just don’t know.