51社区

AI content: Ethics, identification and regulation

According to a report from Europol Innovation Lab as much as 90 percent of online content may be synthetically generated by 2026. With such a staggering projected increase, it is more important than ever to be aware of what content was created by AI.

Alex Semancik | December 11, 2024

Share:

Chances are in the past year you鈥檝e read or seen something that was not created by a human being. Artificial intelligence (AI) has become increasingly prevalent in all walks of life including the creation of text, images and videos. The use of AI has drastically increased since the launch of OpenAI鈥檚 ChatGPT in November 2022 and has no signs of slowing down. According to a from , a European Union law enforcement agency, as much as 90 percent of online content may be synthetically generated by 2026.

With such a staggering projected increase in AI-generated content, it is more important than ever to be aware of what content is synthetically created, what that means and what鈥檚 next.

How to identify text created by AI

51社区 Assistant Professor of English and AI expert Paul Shovlin says detecting text generated by AI can be tricky, especially across different kinds of writing. Faculty might identify a student鈥檚 writing assignment as being AI-generated because it doesn鈥檛 exhibit the kinds of specificity and word choice they are accustomed to from a particular student. This can become difficult when the writing isn鈥檛 as personalized and doesn鈥檛 have as much voice.

鈥淭he issue is that the characteristics [a professor] may be using to intuit aren鈥檛 necessarily stabile in different kinds of writing,鈥 said Shovlin. 鈥淎 scientific report isn鈥檛 going to have an identifiable, eccentric personal voice in it, for example.鈥

At the same time there are instances of someone not using AI to write and their work getting flagged anyway.

鈥淭here have been reports of the writing of some neurodivergent writers as being flagged as likely AI-generated when these individuals did not use any AI-assistance, at all,鈥 emphasized Shovlin.

A headshot of 51社区 English Professor Paul Shovlin
Paul Shovlin is an assistant professor in English 
in 51社区鈥檚 College of Arts and Sciences
specializing in AI and digital rhetorics.

, the AI that specialize in analyzing, generating and understanding text can have a 鈥渢ell鈥 at times. LLMs often function by predicting the best next word to use. This can result in certain "tell" words that are overrepresented in the training data, but not used in colloquial speech says Chad Mourning, an 51社区 assistant professor of computer science and expert in AI and machine learning.

鈥淥ne that shows up a lot, particularly in the academic setting is 鈥榙elve,鈥欌 explained Mourning. 鈥淚 see many student papers using that word, but they don't say that out loud.  Makes one suspicious.鈥

Mourning added that earlier LLM models tended to ramble and didn鈥檛 seem to know when they were done. Newer models, however, can add to the confusion as they tend to do a better job replicating organically created text.

鈥淎dvanced prompt engineering and bot programming can lead to AI-generated writing that looks more like 鈥榦rganically created text,鈥 than the general model many people use as a go to solution for AI-generated text,鈥 said Shovlin.

How to identify images created by AI

When it comes to images, AI often struggles to generate uniquely human features like faces and fingers. A quick method for identifying images that may have been synthetically created is counting the fingers of the people or seeing if their faces appear to be distorted.

A headshot of 51社区 Computer Science Professor Chad Mourning.
Chad Mourning is an assistant professor of 
computer science in 51社区鈥檚 Russ 
College of Engineering and Technology. 
Mourning is an expert in aviation safety, 
artificial intelligence and machine learning, 
cybersecurity and advanced air mobility.

Even if an image does include people, additional steps may need to be taken to distinguish an image as AI-generated. Any sort of distortion or proportions that look extremely out of place can be red flags. For a more objective approach, applications and even AI itself can be used to detect images created by AI.

鈥淚n theory, any image generated with an AI can be detected by an AI, but there's a lot more effort going into generation than detection,鈥 said Mourning. 鈥淚n fact, this task is, itself, a type of technique we call . You train a generator, then tell it which ones are fake to make a discriminator, then exclude the ones the discriminator detects to train a better generator, which can be used to train a better detector.鈥

How data and the internet influence what AI generates

Artificial intelligence and LLMs are strongly influenced by the content they are trained with. Mourning says much of the growth we have seen in AI is based on training data.

鈥淢ost of these generational algorithms are basically weighted combinations of things from the training data, a millionth of this, a millionth of that,鈥 explained Mourning. 鈥淚f every picture labelled butterfly had a certain kind of symmetry, it will ensure that the generated image of a butterfly does too.鈥

Since a LLM like ChatGPT is reliant on the data it is trained with, if the training content is biased or problematic, the resulting content will likely be the same. User-generated content that contains accidental misinformation or intentional disinformation can also pose an issue.

鈥淚f there is enough deliberate disinformation that makes its way into the training models, it will show up in the output,鈥 emphasized Mourning. 鈥淭here have been AI generated search result suggestions telling people they should chew rocks to cure some ailment, based on a humorous response in a Reddit thread. I don't think it was a real danger, but there might be some cases that weren't so obvious.鈥

Shovlin says there are ways to avoid some of this disinformation and misinformation when utilizing AI to generate content.

鈥淵ou can prompt ChatGPT and other AI tools to focus on specific texts you feed into them and only those texts,鈥 he said. 鈥淚n the case of a programmed bot with rules to not access the greater web, you may be reasonably assured that the responses it generates are from the specific sources you loaded into it.鈥

Image

Is it ethical to use AI?

The short answer is it entirely depends on the context. Mourning and Shovlin agree that there is nothing inherently unethical about using generative AI, but aspects of deception and privacy can present more of a complex grey area. Shovlin encourages users of generative AI to use rhetorical awareness鈥攃ritical thinking related to the text they are composing and the audience they are composing it for.

鈥淥ne question to ask one鈥檚 self, is: 鈥榃hat would my audience think if they knew I was generating this text with AI,鈥欌 said Shovlin. 鈥淎nother question is 鈥榳hat are the expectations of my organization regarding privacy, copyright, and artificially generated vs. human generated text.鈥欌

How is AI regulated?

Mourning believes that the big ethical questions are related to deception and the unauthorized use of training data. The deception aspect could be easily remedied by adding disclosures, the data portion is a bit more complex. Some LLM models have been trained using YouTube transcripts, something that creators didn鈥檛 necessarily sign off on.

If companies are made to disclose all of their data, their methods would be public knowledge, but disclosing where they gathered data could be a good compromise.

鈥淚f you make people disclose the actual training data, that's like forcing disclosure of trade secrets,鈥 Mourning explained. 鈥淏ut in aggregate, if you have to list where you got the data from, people can at least see if their rights were violated鈥攚hether it's an artist's copyright or YouTube's terms of service.鈥

Image

Shovlin is more pessimistic about AI regulation and doesn鈥檛 think there will be meaningful regulation of generative tools.

鈥淭he companies are very powerful, the technology prolific and profuse, and politicians seem to be generally technologically ignorant, based on their responses, for example, to social media controversies,鈥 Shovlin emphasized. 鈥淭here is a powerful point of view that AI regulation gets in the way of innovation and that given the extreme potential of AI, politicians may be hesitant to develop guidelines for it.鈥

Will AI replace writers, other creative industries?

AI is already replacing some writers to an extent says Shovlin. to replace what would have been reported on by human journalists for some 鈥渦nderserved鈥 sports.

鈥淲hile times change and jobs change, it鈥檚 important that we carefully consider how AI is affecting the workforce and remember that we have a voice and can use it when it鈥檚 merited,鈥 he said.

Creatives are already being replaced and AI is only going to get better but some creatives may be able to leverage the new technology, according to Mourning.

鈥淭here will always be room for some creatives, but it's going to be fewer of them,鈥 he said. 鈥淓xisting writers may make the best of the inaugural class of 鈥榩rompt engineers鈥 though. It's a transition, not an extinction.鈥