Location
Suwanee, GA
Start Date
6-5-2025 1:00 PM
End Date
6-5-2025 4:00 PM
Description
INTRODUCTION: The readability of patient education materials (PEMs) greatly affects patients’ understanding and adherence to medical treatments. To ensure effective healthcare communication to a broad audience, the American Medical Association (AMA) recommends that PEMs be written at a sixth-grade reading level or lower. This study aims to compare the readability of responses to Mohs surgery-related frequently asked questions (FAQs) on the American College of Mohs Surgery (ACMS) website to PEMs generated by popular AI platforms, ChatGPT and Google Gemini.
METHODS: Seven FAQs about Mohs surgery from the ACMS website were obtained. ChatGPT and Google Gemini were asked to (1) generate original responses to the ACMS website FAQs and (2) generate responses at a 6th-grade reading level or lower. A new Google account and incognito browser were used for AI-generated responses. All responses were copied into Microsoft Word to calculate the Flesch-Kincaid Grade Level (FKGL) and Flesch-Reading Ease (FRE) scores. A lower FKGL and higher FRE score indicate easier readability. The scores from each prompt were averaged and analyzed by multiple comparisons post hoc tests from both readability criteria.
RESULTS: FKGL mean scores were 11.26 (the ACMS), 10.06 (Google Gemini), 7.20 (sixth-grade Google Gemini), 14.28 (ChatGPT), and 8.51 (sixth-grade ChatGPT) indicating a high school, high school, middle school, college, and middle school reading level, respectively. FRE mean scores were 47.26 (the ACMS), 51.11 (Google Gemini), 70.79 (sixth-grade Google Gemini), 31.57 (ChatGPT), and 64.98 (sixth-grade ChatGPT) indicating a college, tenth to twelfth-grade, seventh-grade, college, and eighth to ninth-grade reading level, respectively. Multiple comparisons post hoc test showed sixth-grade Google Gemini (p < 0.001) and sixth-grade ChatGPT (p = 0.016) FKGL means to score significantly lower when compared with the ACMS FAQ responses, while showing FRE means to score significantly higher for sixth-grade Google Gemini (p < 0.001) and sixth-grade ChatGPT (p = 0.001).
DISCUSSION:
In utilizing both criteria for readability, the sixth-grade reading level responses from Google Gemini and ChatGPT showed significantly better readability when compared to the ACMS. Statistical differences were not observed between Google Gemini and the ACMS in FKGL means (p = 0.274) and FRE means (p = 0.449) inferring similar readability between the two sources. However, original responses in ChatGPT compared with the ACMS showed significant differences (FKGL: p = 0.009, FRE: p = 0.004) suggesting the potential for longer complex sentences than the ACMS. The optimization of readability in dermatology is beneficial for patient adherence and better outcomes following Mohs micrographic surgery. This study’s results serve to inform healthcare providers and organizations about the role AI-generated PEMs have in improving patient understanding.
Embargo Period
5-29-2025
Included in
Readability Analysis of FAQ Responses from the American College of Mohs Surgery, ChatGPT, and Google Gemini
Suwanee, GA
INTRODUCTION: The readability of patient education materials (PEMs) greatly affects patients’ understanding and adherence to medical treatments. To ensure effective healthcare communication to a broad audience, the American Medical Association (AMA) recommends that PEMs be written at a sixth-grade reading level or lower. This study aims to compare the readability of responses to Mohs surgery-related frequently asked questions (FAQs) on the American College of Mohs Surgery (ACMS) website to PEMs generated by popular AI platforms, ChatGPT and Google Gemini.
METHODS: Seven FAQs about Mohs surgery from the ACMS website were obtained. ChatGPT and Google Gemini were asked to (1) generate original responses to the ACMS website FAQs and (2) generate responses at a 6th-grade reading level or lower. A new Google account and incognito browser were used for AI-generated responses. All responses were copied into Microsoft Word to calculate the Flesch-Kincaid Grade Level (FKGL) and Flesch-Reading Ease (FRE) scores. A lower FKGL and higher FRE score indicate easier readability. The scores from each prompt were averaged and analyzed by multiple comparisons post hoc tests from both readability criteria.
RESULTS: FKGL mean scores were 11.26 (the ACMS), 10.06 (Google Gemini), 7.20 (sixth-grade Google Gemini), 14.28 (ChatGPT), and 8.51 (sixth-grade ChatGPT) indicating a high school, high school, middle school, college, and middle school reading level, respectively. FRE mean scores were 47.26 (the ACMS), 51.11 (Google Gemini), 70.79 (sixth-grade Google Gemini), 31.57 (ChatGPT), and 64.98 (sixth-grade ChatGPT) indicating a college, tenth to twelfth-grade, seventh-grade, college, and eighth to ninth-grade reading level, respectively. Multiple comparisons post hoc test showed sixth-grade Google Gemini (p < 0.001) and sixth-grade ChatGPT (p = 0.016) FKGL means to score significantly lower when compared with the ACMS FAQ responses, while showing FRE means to score significantly higher for sixth-grade Google Gemini (p < 0.001) and sixth-grade ChatGPT (p = 0.001).
DISCUSSION:
In utilizing both criteria for readability, the sixth-grade reading level responses from Google Gemini and ChatGPT showed significantly better readability when compared to the ACMS. Statistical differences were not observed between Google Gemini and the ACMS in FKGL means (p = 0.274) and FRE means (p = 0.449) inferring similar readability between the two sources. However, original responses in ChatGPT compared with the ACMS showed significant differences (FKGL: p = 0.009, FRE: p = 0.004) suggesting the potential for longer complex sentences than the ACMS. The optimization of readability in dermatology is beneficial for patient adherence and better outcomes following Mohs micrographic surgery. This study’s results serve to inform healthcare providers and organizations about the role AI-generated PEMs have in improving patient understanding.