now publishers - Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 2

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Lingyi Yang, Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China, Feng Jiang, Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), and School of Information Science and Technology, University of Science and Technology of China, Hefei, China, jeffreyjiang@cuhk.edu.cn , Haizhou Li, Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China

Suggested Citation

Lingyi Yang, Feng Jiang and Haizhou Li (2024), "Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 2, e103. http://dx.doi.org/10.1561/116.00000250

Publication Date: 12 Feb 2024

Subjects

Keywords

ChatGPT Detection, Polish Ratio, Large-Scale Language Models

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 2419 times

In this article:

Abstract

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the “Polish Ratio” method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text. It provides a mechanism to measure the degree of ChatGPT influence in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the “Polish Ratio” we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement.

DOI:10.1561/116.00000250

Related publications

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Pre-trained Large Language Models for Information Processing
See the other articles that are part of this special issue.

Introduction
Related Work
Method
Experiment and Analysis
Conclusion
References

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Share

Journal details

Abstract

Related publications