Submitted by Soupjoe5 t3_z225bd in technology
Comments
Soupjoe5 OP t1_ixe36go wrote
2
Programmers have, of course, always studied, learned from, and copied each other's code. But not everyone is sure it is fair for AI to do the same, especially if AI can then churn out tons of valuable code itself, without respecting the source material’s license requirements. “As a technologist, I'm a huge fan of AI ,” Butterick says. “I'm looking forward to all the possibilities of these tools. But they have to be fair to everybody.”
Thomas Dohmke, the CEO of GitHub, says that Copilot now comes with a feature designed to prevent copying from existing code. “When you enable this, and the suggestion that Copilot would make matches code published on GitHub—not even looking at the license—it will not make that suggestion,” he says
Whether this provides enough legal protection remains to be seen, and the coming legal case may have broader implications. “Assuming it doesn’t settle, it’s definitely going to be a landmark case,” says Luis Villa, a coder turned lawyer who specializes in cases related to open source.
Villa, who knows GitHub cofounder Nat Friedman personally, does not believe it is clear that tools like Copilot go against the ethos of open source and free software. “The free software movement in the ’80s and ’90s talked a lot about reducing the power of copyrights in order to increase people’s ability to code,” he says. “I find it a little bit frustrating that we're now in a position where some people are running around saying we need maximum copyright in order to protect these communities.”
Whatever the outcome of the Copilot case, Villa says it could shape the destiny of other areas of generative AI. If the outcome of the Copilot case hinges on how similar AI-generated code is to its training material, there could be implications for systems that reproduce images or music that matches the style of material in their training data.
Anil Dash, the CEO of Glitch and a board member of the Electronic Frontier Foundation, says that the legal debate is just one part of a bigger adjustment set in train by generative AI. “When people see AI creating art, creating writing, and creating code, they think ‘What is all this, what does it mean to my business, and what does it mean to society?’” he says. “I don't think every organization has thought deeply about it, and I think that's sort of the next frontier.” As more people begin to ponder and experiment with generative AI, there will probably be more lawsuits too.
slashinvestor t1_ixegkm8 wrote
I actually think this case is legitimate and it highlights an interesting difference between humans and AI.
Humans learn a programming language by learning the basics of the programming language and the applying it. This means humans learn from ground zero. They will copy code, but they will do so knowing what kind of code they want.
An AI, in this context, does not learn the basics. It learns by looking at other people's code. Meaning if there is an esoteric functionality like a special identifier in the loop then the AI will not know about it.
This is a HUGE difference because what it means is that when an AI does come up with code it is based on somebody's code. It is not based on the thought process -> need loop -> need loop with basic structure -> oh here is one similar to what I need.
Thus I agree with the lawsuit entirely because the AI is a big grand copying machine, and not a thinking machine.
youre_a_pretty_panda t1_ixfharn wrote
I've said the same in similar articles but it bears repeating: This case will boil down to a few simple factors.
What is the output of the AI? Does it create something new or does it merely regurgitate copy-pasted output?
If it merely spits out pre-existing code then it is clearly a copyright infringement.
However, it should be very clearly noted that simply training an AI model on a dataset does not violate copyright law. The output is key. If the AI creates new versions of, say for example, paintings then those are now new and unique works if they are sufficiently distinct from the originals in the training data set (there is a long history of precedent for testing whether works of art are sufficiently distinct)
This is a fundamental point on which courts will inevitably have to settle. Anything else would not only stifle innovation (because small AI teams could never afford to pay exorbitant licensing fees for data sets while big corps could easily do so) but it would be bad law that flies in the face of centuries of precedent regarding the creation of new and derivative works.
People need to use their brains and see that what Microsoft is doing can be illegal and bad (if code is regurgitated) but, other projects which are training their AI on publicly available data sets are not breaking the law. It all depends on the output.
You cannot copyright a style and you can't police every AI in the world to ensure that no copyrighted work was ever used in their training. That would be a fools errand.
Output is key.
RudeRepair5616 t1_ixfvh03 wrote
It is important to understand that copyright does not protect ideas but only particular expressions of ideas ('works'). As such, it is not copyright infringement when subsequent authors independently 'create' existing works.
slashinvestor t1_ixgzi82 wrote
This lawsuit is going to define that. And as such you are simply making my point.
If an AI machine learns specific code, and uses that specific code to create another piece of code then that is copyright infringement. That is the entire point of the argument.
A human on the other hand uses judgement and abstraction, AI does not.
The AI used is a grand neural network and as such it is incapable of original thought. That type of network is not physically capable of expressing unique out of the box thought that it did not learn. We humans call them brain farts.
Yurithewomble t1_ixh18v1 wrote
That protection seems somehow to guarantee bad code.
Only produces code if nobody else made it before. Solving old problems in new (probably worse) ways
RudeRepair5616 t1_ixi2qwh wrote
Right, I didn't mean to disagree with your remarks.
Soupjoe5 OP t1_ixe31q0 wrote
Article:
1
Algorithms that create art, text, and code are spreading fast—but legal challenges could throw a wrench in the works.
THE TECH INDUSTRY might be reeling from a wave of layoffs, a dramatic crypto-crash, and ongoing turmoil at Twitter, but despite those clouds some investors and entrepreneurs are already eyeing a new boom—built on artificial intelligence that can generate coherent text, captivating images, and functional computer code. But that new frontier has a looming cloud of its own.
A class-action lawsuit filed in a federal court in California this month takes aim at GitHub Copilot, a powerful tool that automatically writes working code when a programmer starts typing. The coder behind the suit argue that GitHub is infringing copyright because it does not provide attribution when Copilot reproduces open-source code covered by a license requiring it.
The lawsuit is at an early stage, and its prospects are unclear because the underlying technology is novel and has not faced much legal scrutiny. But legal experts say it may have a bearing on the broader trend of generative AI tools. AI programs that generate paintings, photographs, and illustrations from a prompt, as well as text for marketing copy, are all built with algorithms trained on previous work produced by humans.
Visual artists have been the first to question the legality and ethics of AI that incorporates existing work. Some people who make a living from their visual creativity are upset that AI art tools trained on their work can then produce new images in the same style. The Recording Industry Association of America, a music industry group, has signaled that AI-powered music generation and remixing could be a new area of copyright concern.
“This whole arc that we're seeing right now—this generative AI space—what does it mean for these new products to be sucking up the work of these creators?” says Matthew Butterick, a designer, programmer, and lawyer who brought the lawsuit against GitHub.
Copilot is a powerful example of the creative and commercial potential of generative AI technology. The tool was created by GitHub, a subsidiary of Microsoft that hosts the code for hundreds of millions of software projects. GitHub made it by training an algorithm designed to generate code from AI startup OpenAI on the vast collection of code it stores, producing a system that can preemptively complete large pieces of code after a programmer makes a few keystrokes. A recent study by GitHub suggests that coders can complete some tasks in less than half the time normally required when using Copilot as an aid.
But as some coders quickly noticed, Copilot will occasionally reproduce recognizable snippets of code cribbed from the millions of lines in public code repositories. The lawsuit filed by Butterick and others accuses Microsoft, GitHub, and OpenAI of infringing on copyright because this code does not include the attribution required by the open-source licenses covering that code.