J Clin Gynecol Obstet

Journal of Clinical Gynecology and Obstetrics, ISSN 1927-1271 print, 1927-128X online, Open Access

Article copyright, the authors; Journal compilation copyright, J Clin Gynecol Obstet and Elmer Press Inc

Journal website https://jcgo.elmerpub.com

Original Article

Volume 15, Number 2, June 2026, pages 49-54

Detection of AI Influence Within Women’s Health Literature

Hanna R. Perone^{a, c} , Anna Greenwood^a, Kaitlyn Zablock^a, Bernard Gonik^b

^aDepartment of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA
^bDepartment of Obstetrics and Gynecology, Division of Maternal Fetal Medicine, Wayne State University, Detroit, MI, USA
^cCorresponding Author: Hanna R. Perone, Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI 48201, USA

Manuscript submitted February 5, 2026, accepted April 1, 2026, published online June 6, 2026
Short title: AI Influence in Women’s Health Literature
doi: https://doi.org/10.14740/jcgo1638

Abstract

▴Top

Background: The extent of large language models (LLM) being used to generate or edit medical literature without disclosure remains unknown. This study hypothesizes LLM influence and aims to quantify possible usage and disclosure rates in recent women’s health publications.

Methods: Eighty women’s health journals were selected at random, and the first 20 publicly available abstracts from January through December 2025 were included from each journal. After selection for inclusion, a total of 1,600 abstracts were downloaded in addition to subspecialty categories, journal impact factor, and continent of corresponding authors. Abstract text was then individually inputted into GPTZero, an artificial intelligence (AI) detection software, to quantify percentages of sourced text that were human-generated, AI-generated, or AI-edited. Manuscripts with AI detected in their abstracts were then evaluated for disclosure of AI use. Source percentages were compared across subspecialties, impact factors, and continents, with significance defined as P < 0.05.

Results: Of the 1,600 abstracts, 413 (25.8%) were flagged as containing AI influence. Additionally, 79 out of 413 (19.1%) abstracts were identified as entirely AI-generated and 21 out of 413 (5.1%) abstracts disclosed AI use. Abstracts in journals related to reproductive endocrinology and infertility were most often flagged for AI influence (35%, 77/220), whereas family planning-related abstracts showed the lowest proportion (6.7%, 4/60) (P < 0.05). A weak inverse association between impact factor and AI use was observed, but this relationship was not significant and explained only 4% of data variability for AI use (R² = 0.04). No difference across continents and AI detection was found (P = 0.45).

Conclusion: Findings suggest that AI may contribute to approximately one-quarter of published women’s health abstracts in 2025. This early signal underscores potential prevalence of LLM usage and low disclosure rates highlighting the need for clear guidelines and transparency regarding AI use in scholarly communication. Ongoing discussions are needed to outline responsible implementation to harness AI’s potential while maintaining the integrity of medical research.

Keywords: Women’s health research; Artificial intelligence; Large language models; Literature authorship; Obstetrics; Gynecology

Introduction

▴Top

Artificial intelligence (AI) use in medicine has expanded rapidly over the past decade [1–3]. As AI capabilities grow, its potential roles are being explored in many facets of medical practice including participation in authorship of medical literature [4, 5]. This emerging function has generated debate among researchers with some raising ethical concerns with AI authorship [6]. In one publication, ChatGPT was coined the “weapon of mass deception” regarding its use in literature development due to its inaccuracies and false reference claims [2, 7]. The Committee on Publication Ethics (COPE) has stated that AI tools cannot meet authorship requirements due to lack of responsibility for submitted work. COPE further noted that non-legal entities such as large language models (LLM) cannot declare conflicts of interest which is required to publish [8]. Other scientists have speculated that the unchecked use of AI could lead to mass production of lower quality or plagiarized manuscripts and erosion of skills needed to produce peer-reviewed work [9–11]. Another consideration is the selection of data that AI platforms may promote as references or cite in generated text. Some AI platforms may have agreements with certain databases creating selection bias in literature inclusion [12]. Concerns about medical machine learning platforms perpetuating bias, such as sampling and prejudice, have also been proposed. When AI involvement in generating research is undisclosed, biases may go unrecognized and lead to adverse outcomes [13]. In contrast, arguments in favor of AI-assisted literature creation emphasize potential gains such as clarity, readability, and time efficiency [14]. Some praise LLMs for reducing grammatical errors and improving writing style for non-native English speakers [15]. Additionally, some quote that AI allows researchers to redirect time towards research itself rather than writing and formatting for publication [11].

Despite ethical concerns, AI use in medical research continues to expand. LLMs have been reported to assist with article summarization, reference list generation, statistical analyses, and drafting of manuscripts [4, 10, 16, 17]. With this rapid growth, the prevalence of AI-generated text in medical literature remains poorly characterized because machine writing is often indistinguishable from human-written text and its use is rarely disclosed [18]. Human readers demonstrate only modest ability to recognize AI-generated material. In one study, participants achieved 68% accuracy in correctly classifying AI-written material [19]. Another study found human misclassification rates of up to 38% when evaluating blinded AI and human-created abstracts, underscoring the difficulty of reliably detecting AI’s influence in scientific manuscripts [20].

To address human limitations, several AI detection tools have been developed including ZeroGPT, Originality.ai, and GPTZero [21–23]. One studied software, GPTZero, publicly reports 99% accuracy for AI use detection with a false positive rate of less than 1% [23–25]. Several studies have validated GPTZero’s performance, highlighting its ability to correctly identify AI influence within sample text [26–28]. When compared to other AI detection software, GPTZero was found to be superior in its accuracy. One study evaluating GPTZero versus ZeroGPT found an accuracy of AI detection of 97.2% with GPTZero versus 64.4% with ZeroGPT. This study quoted false positive and negative rates of 0% and 2.78%, respectively, for GPTZero [29]. In an ophthalmology-based study comparing GPTZero to three other AI detection programs, GPTZero was found to outperform all with a sensitivity of 100% and specificity of 96% [30]. While inaccuracies exist with AI detection software, studies support that tools such as GPTZero may provide insight into AI usage within medical literature that would not otherwise be correctly recognized by human reviewers [20].

Given the widespread adoption of AI-based writing tools among writers, we hypothesize that a proportion of medical literature includes AI-generated content. The aim of this observational study is to estimate the prevalence of AI-generated or AI-edited text and the rate of AI use disclosure among abstracts related to women’s health.

Materials and Methods

▴Top

First, a list of 150 journals were generated through a Perplexity-executed search query and then manually reviewed to confirm presence of women’s health-related manuscripts [31]. From this sample set (n = 150), 80 journals were selected for inclusion using a randomization sequence generated by Microsoft Excel Version 16.54. Peer-reviewed publications were ordered chronologically by publication date on PubMed from January 1, 2025 to December 31, 2025 [32]. The first 20 manuscripts written in English and with an available abstract were selected from each journal. Sample size was determined a priori using conservative effect size assumptions, as no prior literature was available to inform expected AI detection rates. For each of the resulting 1,600 publications, data extracted included journal name, journal specialty or subspecialty, impact factor, continent of corresponding author, abstract title, and abstract text [32, 33]. Journals were categorized further by specialization including general obstetrics and gynecology (ObGyn), maternal fetal medicine (MFM), reproductive endocrinology and infertility (REI), gynecologic oncology (GynOnc), family planning (FP), and benign gynecologic subspecialties (Gyn) such as minimally invasive gynecologic surgery, pediatric and adolescent gynecology and urogynecology.

Three AI detection softwares, ZeroGPT, Originality.ai, and GPTZero, were considered for inclusion [21–23]. At time of data collection, only GPTZero and Originality.ai publicly provided accuracy rates of their software [23–25]. The three detection platforms were tested using five included abstracts that were regenerated to create known AI versions [31]. The five 100% AI-written abstracts were then inputted into the AI detection software systems. ZeroGPT reported 87.5% AI written, Originality reported 90.5% AI written, and GPTZero reported 100% AI written as expected. Based on this sample assessment as well as literature supporting its accuracy, GPTZero was selected for inclusion in this study [29, 30].

Each abstract was then individually entered into GPTZero [23]. The software’s output included an algorithm-based estimation of the percentage of text believed to be human-generated, AI-generated, or AI-edited which summed to 100%. These percentages represented the probability of text written entirely by a human, entirely by AI or by a combination of both. For each abstract where AI influence was detected, a secondary analysis was performed to assess whether AI use was disclosed in the manuscript.

Descriptive statistics were first collected for all abstracts. Abstracts were then grouped by specialty categories and evaluated using Pearson’s chi-squared tests. Percentages of AI detection were compared across impact factors using logistic regression modeling. Percentages of AI detection were also compared across continents of corresponding authors. These data were determined to be non-normally distributed after a Shapiro-Wilk analysis and thus were analyzed using Kruskall-Wallis tests with post-hoc Dunn’s analysis. A P-value less than 0.05 was indicative of statistical significance. All statistical tests were conducted using Microsoft Excel Version 16.54. The study was deemed exempt from institutional review board approval as it involved no human subjects and was limited to publicly available data.

Results

▴Top

A total of 1,600 abstracts were included from 80 women’s health journals. In this analysis, 413 or 25.8% (95% confidence interval (CI) 23.7–27.9) of abstracts were flagged for having a component of AI-contribution meaning presence of AI-generated or AI-edited text (Fig. 1). Abstracts with at least 5% and 10% AI-detection included 307 abstracts (19.2%, 95% CI 17.3–21.1) and 282 abstracts (17.6%, 95% CI 15.8–19.5), respectively. The 413 abstracts were then grouped by percentage of human-derived material which showed that 202 out of 413 abstracts contained 0–25% human-derived text and 173 out of 413 abstracts contained 76–99% human-derived text (Fig. 1). In other words, AI usage was predominantly detected at either extreme, with abstracts having either a very small amount or a large amount of text influenced by AI. A total of 79 out of 1,600 or 4.9% (95% CI 3.9–6.0) of abstracts were flagged as 100% generated by AI. Out of the 413 abstracts, 21 of the associated manuscripts disclosed the use of AI with 10 required to disclose AI usage at time of publication based on journal guidelines. None of the 413 abstracts contained a disclosure statement denying the use of AI. The distribution of the continents of the corresponding author of the 413 AI-influenced abstracts included five from Africa, 205 from Asia, 14 from Australia, 94 from Europe, 90 from North America, and five from South America. There were no significant differences when percentages of AI influence were compared across continents (P = 0.45).

Click for large image

Figure 1. Distribution of percentages of human-written text across 1,600 abstracts.

Differences in AI detection were compared across specialty journals (Table 1). When assessing rates of abstracts that were 100% human-made, FP had the highest percentage at 93.3% and REI had the lowest percentage at 65% (P < 0.05). The converse showed that REI-related abstracts contained the highest percentage of abstracts with any amount of AI detection (35%, 95% CI 24–46) followed by Gyn, ObGyn, GynOnc, MFM, and finally FP with the lowest percentage (6.7%, 95% CI 1.8–16.5, P < 0.05). The ObGyn category contained the highest number of abstracts with 100% AI detection with a rate of 5.8% (95% CI 4.1–7.8) compared to the 1.7% (95% CI 0.1–9.1) among the Gyn category (P < 0.05). The final row in Table 1 refers to the average proportion of text in each abstract influenced by AI, excluding those equal to 0%. FP had the highest amount of influenced text at 70.8% (95% CI 58.8–81.0) and Gyn had the lowest portion at 43.5% (95% CI 31.0–56.7); however, the differences among subject categories were not statistically significant (P = 0.20).

Click to view

Table 1. AI Detection by Journal Subspecialty Category

Finally, the relationship between impact factor and percentage of AI influence was evaluated. Included journals had an average impact factor of 2.97 and ranged from 0.5 to 16.1. Abstracts with AI detection (n = 413) are shown in Figure 2 where the logistic regression model demonstrates the percentage of AI-influenced text compared across impact factors. Higher impact factor journals showed slightly lower mean AI detection scores with −3.21% AI score per impact factor unit, though this weak linear relationship explained just 4% of variance (R² = 0.04).

Click for large image

Figure 2. Relationship between impact factor and percentage of AI created or edited text. AI: artificial intelligence.

Conclusions

This study hypothesized that AI contributes to the authorship of medical literature and may not be fully disclosed in final publications. Utilizing a single AI detection tool, AI-assisted writing was detected in women’s health journals, with over one-quarter of recent abstracts demonstrating some degree of AI-influenced content with a smaller subset classified as fully AI-generated. The distribution of AI involvement varied by subspecialty, suggesting that norms around AI use may vary across women’s health domains. Findings of this study suggest AI’s contribution to the scientific writing process for women’s health research, even though such use is often not apparent to readers and reviewers.

Despite AI’s growing use, existing journal guidelines on reporting AI assistance are fragmented and potentially outdated. Some policies prohibit AI from being listed as an author because AI tools cannot meet standard authorship criteria or assume responsibility. However, these policies permit undisclosed AI-assisted drafting or editing under human accountability. Publication trends described in this study show a potentially substantial gap between detected AI involvement and explicit disclosure. Among the 413 abstracts in which AI influence was identified, only 21 acknowledged AI use in manuscript development, and 10 of these disclosures came from journals that specifically required an AI-use statement as a condition of publication. The margin of error and potential misclassification inherent to AI detection software must be acknowledged and may partly account for these findings, but they are unlikely to fully explain the low rate of disclosure. One alternative explanation for these findings may be the ongoing debate regarding AI involvement as some may be hesitant to disclose AI assistance if a negative impact to publication is perceived. Another explanation may be the lack of universal disclosure policies and subsequent reinforcement. Further work should include revising current guidelines in order to achieve increased transparency of AI use as it is imperative for readers and reviewers to have a complete understanding of all sources of content to critically evaluate future publications.

Several methodological limitations of this study should be considered. This analysis relied on a single AI detection platform whose reported performance may not fully reflect its behavior across different writing styles and contexts raising the potential for both false positives and false negatives among AI detection rates. While studies suggest that GPTZero has a lower false positive rate for non-native English writers compared to other softwares, this may impact detection rates reported in this international study [29]. Future research should consider pairing author reports of AI use with multiple AI detection tools to further evaluate accuracy and quantify underreporting. Moreover, despite a large sample size of included text, only abstracts were evaluated as they were previously shown to have highest rates of AI usage [16]. AI use in other manuscript sections such as introduction and methods may follow different patterns and the overall extent and nature of AI involvement in women’s health publications may not be fully captured by this abstract-only assessment. Additionally, differences in AI use across subspecialties within women’s health may limit the external validity to other medical specialties not included in this study. The cross-sectional design also limits insight into temporal trends of AI use. Rapid evolution of both AI tools and editorial processes further limits generalizability of these cross-sectional findings.

This study underscores the possible presence yet inconsistently reported role of AI in manuscript development and the need for clearer and standardized reporting practices. Journals, professional societies, and institutions should consider implementing universal policies that require concise descriptions of how AI tools contribute to each manuscript and explicitly reaffirm that human authors retain full responsibility for all submitted content. Such policies would promote transparency, support accurate interpretation of the literature, and help safeguard the integrity of scientific publication.

Acknowledgments

The authors would like to thank Wayne State University School of Medicine for providing resources necessary to complete this project. The authors would like to acknowledge the use of Perplexity.ai for editing our written draft of introduction and discussion sections for clarity and readability. AI was not utilized in other areas of project completion including study design, literature review, concept creation, statistical analysis or to generate references, figures, or tables.

Financial Disclosure

After the initial analysis that deemed GPTZero best fit for this study, the authors were granted unlimited access to the software for research purposes. The authors report no additional financial disclosures or funding for this project.

Conflict of Interest

The authors report no conflict of interest.

Informed Consent

Not applicable.

Author Contributions

HP contributed to study design, data collection, analysis, and manuscript drafting. KZ, AG, and BG contributed to study design and manuscript drafting.

Data Availability

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Abbreviations

AI: artificial intelligence; COPE: Committee on Publication Ethics; FP: family planning; Gyn: benign gynecologic subspecialties; GynOnc: gynecologic oncology; LLM: large language models; MFM: maternal fetal medicine; ObGyn: obstetrics and gynecology; REI: reproductive endocrinology and infertility

References

▴Top

Potocnik J, Foley S, Thomas E. Current and potential applications of artificial intelligence in medical imaging practice: a narrative review. J Med Imaging Radiat Sci. 2023;54(2):376-385.
doi pubmed

Temsah MH, Altamimi I, Jamal A, Alhasan K, Al-Eyadhy A. ChatGPT surpasses 1000 publications on PubMed: envisioning the road ahead. Cureus. 2023;15(9):e44769.
doi pubmed

Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719-731.
doi pubmed

Ahn S. The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions. Korean J Physiol Pharmacol. 2024;28(5):393-401.
doi pubmed

Cotton DRE, Cotton PA, Shipway JR. Chatting and cheating: ensuring academic integrity in the era of ChatGPT. Innov Educ Teach Int. 2023;61(2):228-239.

Ramoni D, Sgura C, Liberale L, Montecucco F, Ioannidis JPA, Carbone F. Artificial intelligence in scientific medical writing: Legitimate and deceptive uses and ethical concerns. Eur J Intern Med. 2024;127:31-35.
doi pubmed

Azamfirei R, Kudchadkar SR, Fackler J. Large language models and the perils of their hallucinations. Crit Care. 2023;27(1):120.
doi pubmed

COPE Council. COPE Position - Authorship and AI [Internet]. Committee on Publication Ethics; 2024.
doi

Dehouche N. Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3). Ethics Sci Environ Polit. 2021;21:17-23.

Dergaa I, Chamari K, Zmijewski P, Ben Saad H. From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport. 2023;40(2):615-622.
doi pubmed

Curtis N, ChatGpt. To ChatGPT or not to ChatGPT? The Impact of Artificial Intelligence on Academic Publishing. Pediatr Infect Dis J. 2023;42(4):275.
doi pubmed

OpenEvidence and the JAMA Network sign strategic content agreement [Internet]. OpenEvidence. 2025. Available from: https://www.openevidence.com/announcements/openevidence-and-the-jama-network-sign-strategic-content-agreement.

Hanna MG, Pantanowitz L, Jackson B, Palmer O, Visweswaran S, Pantanowitz J, Deebajah M, et al. Ethical and bias considerations in artificial intelligence/machine learning. Mod Pathol. 2025;38(3):100686.
doi pubmed

Kapoor MC. Navigating the impact of artificial intelligence on medical writing. Ann Card Anaesth. 2025;28(2):105-106.
doi pubmed

Giglio AD, Costa M. The use of artificial intelligence to improve the scientific writing of non-native english speakers. Rev Assoc Med Bras (1992). 2023;69(9):e20230560.
doi pubmed

Carnino JM, Chong NYK, Bayly H, Salvati LR, Tiwana HS, Levi JR. AI-generated text in otolaryngology publications: a comparative analysis before and after the release of ChatGPT. Eur Arch Otorhinolaryngol. 2024;281(11):6141-6146.
doi pubmed

Wang ZP, Bhandary P, Wang Y, Moore JH. Using GPT-4 to write a scientific review article: a pilot evaluation study. BioData Min. 2024;17(1):16.
doi pubmed

Chen J, Tao BK, Park S, Bovill E. Can ChatGPT Fool the Match? Artificial intelligence personal statements for plastic surgery residency applications: a comparative study. Plast Surg (Oakv). 2025;33(2):348-353.
doi pubmed

Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. Digit Med. 2022.

Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE, Klifto CS, et al. Identification of ChatGPT-generated abstracts within shoulder and elbow surgery poses a challenge for reviewers. Arthroscopy. 2025;41(4):916-924.e912.
doi pubmed

ZeroGPT. ZeroGPT: AI content detection and writing tools. 2025.

Origniality.ai. Originality.ai: AI & Plagiarism detector for serious content publisher. 2025.

Tian E, Cui A. GPTZero: Towards detection of AI-generated text using zero-shot and supervised methods. 2026. Available from: https://gptzero.me.

Perkins M, Roe J, Postma D, McGaughran J, Hickerson D. Detection of GPT-4 generated text in higher education: combining academic judgement and software to identify generative AI tool misuse. J Acad Ethics. 2024;22:89-113.

Tian E, Cui A, Adam A. How AI Detection Benchmarking Works at GPTZero [Internet]. 2025. Available from: https://gptzero.me/news/ai-accuracy-benchmarking/.

Burke H, Kazinka R, Gandhi R, Murray A. Artificial intelligence-generated writing in the ERAS personal statement: an emerging quandary for post-graduate medical education. Acad Psychiatry. 2025;49(1):13-17.
doi pubmed

Pan ET, Florian-Rodriguez M. Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology. Am J Obstet Gynecol. 2024;231(2):276.e1-e10.
doi pubmed

Paullet K, Pinchot J, Kinney E, Stewart T. Detecting AI-generated writing using GPTZero. Proc ISCAP Conf [Internet]. 2024;10. Available from: https://iscap.us/proceedings/2024/pdf/6184.pdf.

Pratama AR. The accuracy-bias trade-offs in AI text detection tools and their impact on fairness in scholarly publication. PeerJ Comput Sci [Internet]. 2025;11. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC12453642/.

Ergen SK. Comparative detector analysis for the identification of academic articles synthesized by artificial intelligence in the field of ophthalmology. Beyoglu Eye J. 2025;10(3):175-180.
doi pubmed

Perplexity AI. Perplexity [Large language model] [Internet]. 2025. Available from: https://www.perplexity.ai/.

PubMed [Internet]. National Library of Medicine; Available from: https://pubmed.ncbi.nlm.nih.gov/.

Gurav R. Journal Citation Reports (JCR): Impact Factor 2024. Web Sci [Internet]. 2024 June; Available from: https://www.researchgate.net/publication/381580823_Journal_Citation_Reports_JCR_Impact_Factor_2024_PDF_Web_of_Science.

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, including commercial use, provided the original work is properly cited.

Journal of Clinical Gynecology and Obstetrics is published by Elmer Press Inc.