├── ai_words.txt ├── code.py └── README.md /ai_words.txt: -------------------------------------------------------------------------------- 1 | 1. Delve 2 | 2. Harnessing 3 | 3. At the heart of 4 | 4. In essence 5 | 5. Facilitating 6 | 6. Intrinsic 7 | 7. Integral 8 | 8. Core 9 | 9. Facet 10 | 10. Nuance 11 | 11. Culmination 12 | 12. Manifestation 13 | 13. Inherent 14 | 14. Confluence 15 | 15. Underlying 16 | 16. Intricacies 17 | 17. Epitomize 18 | 18. Embodiment 19 | 19. Iteration 20 | 20. Synthesize 21 | 21. Amplify 22 | 22. Impetus 23 | 23. Catalyst 24 | 24. Synergy 25 | 25. Cohesive 26 | 26. Paradigm 27 | 27. Dynamics 28 | 28. Implications 29 | 29. Prerequisite 30 | 30. Fusion 31 | 31. Holistic 32 | 32. Quintessential 33 | 33. Cohesion 34 | 34. Symbiosis 35 | 35. Integration 36 | 36. Encompass 37 | 37. Unveil 38 | 38. Unravel 39 | 39. Emanate 40 | 40. Illuminate 41 | 41. Reverberate 42 | 42. Augment 43 | 43. Infuse 44 | 44. Extrapolate 45 | 45. Embody 46 | 46. Unify 47 | 47. Inflection 48 | 48. Instigate 49 | 49. Embark 50 | 50. Envisage 51 | 51. Elucidate 52 | 52. Substantiate 53 | 53. Resonate 54 | 54. Catalyze 55 | 55. Resilience 56 | 56. Evoke 57 | 57. Pinnacle 58 | 58. Evolve 59 | 59. Digital Bazaar 60 | 60. Tapestry 61 | 61. Leverage 62 | 62. Centerpiece 63 | 63. Subtlety 64 | 64. Immanent 65 | 65. Exemplify 66 | 66. Blend 67 | 67. Comprehensive 68 | 68. Archetypal 69 | 69. Unity 70 | 70. Harmony 71 | 71. Conceptualize 72 | 72. Reinforce 73 | 73. Mosaic 74 | 74. Catering 75 | -------------------------------------------------------------------------------- /code.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import docx2txt 3 | from PyPDF2 import PdfReader 4 | 5 | def extract_text_from_file(uploaded_file): 6 | # Get file extension 7 | file_extension = uploaded_file.name.split(".")[-1] 8 | 9 | if file_extension == "txt": 10 | # Read text file directly 11 | text = uploaded_file.read().decode("utf-8") 12 | elif file_extension == "docx": 13 | # Extract text from docx 14 | text = docx2txt.process(uploaded_file) 15 | elif file_extension == "pdf": 16 | # Extract text from pdf 17 | pdf_reader = PdfReader(uploaded_file) 18 | text = "" 19 | for page in pdf_reader.pages: 20 | text += page.extract_text() 21 | else: 22 | st.error("Invalid file format. Please upload a .txt, .docx, or .pdf file.") 23 | return None 24 | 25 | return text.lower() 26 | 27 | # List of keywords to search for 28 | keywords = [ 29 | "Delve", "Harnessing", "At the heart of", "In essence", "Facilitating", 30 | "Intrinsic", "Integral", "Core", "Facet", "Nuance", "Culmination", 31 | "Manifestation", "Inherent", "Confluence", "Underlying", "Intricacies", 32 | "Epitomize", "Embodiment", "Iteration", "Synthesize", "Amplify", 33 | "Impetus", "Catalyst", "Synergy", "Cohesive", "Paradigm", "Dynamics", 34 | "Implications", "Prerequisite", "Fusion", "Holistic", "Quintessential", 35 | "Cohesion", "Symbiosis", "Integration", "Encompass", "Unveil", "Unravel", 36 | "Emanate", "Illuminate", "Reverberate", "Augment", "Infuse", "Extrapolate", 37 | "Embody", "Unify", "Inflection", "Instigate", "Embark", "Envisage", 38 | "Elucidate", "Substantiate", "Resonate", "Catalyze", "Resilience", 39 | "Evoke", "Pinnacle", "Evolve", "Digital Bazaar", "Tapestry", "Leverage", 40 | "Centerpiece", "Subtlety", "Immanent", "Exemplify", "Blend", 41 | "Comprehensive", "Archetypal", "Unity", "Harmony", "Conceptualize", 42 | "Reinforce", "Mosaic" 43 | ] 44 | 45 | # lower keyword list 46 | keywords = [keyword.lower() for keyword in keywords] 47 | 48 | 49 | st.title("AI Text Detection App") 50 | st.write("Upload a text file, Word document, PDF, or directly enter text to check for specific keywords.") 51 | 52 | # Option 1: File Uploader 53 | uploaded_file = st.file_uploader("Choose a file (optional)", type=["txt", "docx", "pdf"]) 54 | 55 | # Option 2: Text Input 56 | text_input = st.text_area("Or enter text here:") 57 | 58 | if uploaded_file is not None: 59 | text = extract_text_from_file(uploaded_file) 60 | elif text_input: 61 | text = text_input.lower() 62 | else: 63 | st.info("Please upload a file or enter text.") 64 | text = None # Set text to None if no input is provided 65 | 66 | if text: 67 | found_keywords = [] 68 | for keyword in keywords: 69 | if keyword in text: 70 | found_keywords.append(keyword) 71 | 72 | if found_keywords: 73 | st.write("**Found Keywords:**") 74 | for keyword in found_keywords: 75 | st.success(f"- {keyword}") 76 | else: 77 | st.write("No keywords found in the text.") 78 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Streamlit](https://img.shields.io/badge/Streamlit-Webapp-green)](https://ai-text-detect-easy.streamlit.app/) [![Blog](https://img.shields.io/badge/Blog-Link-orange)](https://levelup.gitconnected.com/detect-ai-text-by-just-looking-at-it-24604008027c) [![Code](https://img.shields.io/badge/Code-Link-blue)](https://github.com/FareedKhan-dev/Detect-AI-text-Easily/blob/main/code.py) [![AI Words File](https://img.shields.io/badge/AI_Words-Link-yellow)](https://github.com/FareedKhan-dev/Detect-AI-text-Easily/blob/main/ai_words.txt) 2 | 3 | 4 | ## Detect AI Text by Just Looking at it 5 | 6 | ![Abstract of a Research Paper written using ChatGPT](https://cdn-images-1.medium.com/max/2448/1*fTV_vFjFWyhPSnWDauYOWw.png) 7 | 8 | [ChatGPT](https://chat.openai.com/) often generates words that may require a dictionary for understanding, or it comes up with words that just sound magical. This isn’t only true for ChatGPT, other open-source language models like [Mistral](https://mistral.ai/news/announcing-mistral-7b/) do the same. There’s no harm in seeking assistance from AI to create content, as long as it’s done ethically, but in a [science-writing competition for 14–16 year-olds](https://www.bbc.com/future/article/20230720-how-to-spot-an-ai-cheater-artificial-intelligence-large-language-models#:~:text=%22Labyrinthian%20mazes%22.%20I%20don%27t%20know%20what%20exactly%20struck%20me%20about%20these%20two%20words%2C%20but%20they%20caused%20me%20to%20pause%20for%20a%20moment), a judge got suspicious when he saw the phrase **“Labyrinthian mazes”** in an essay, which seemed too advanced for a teenager writing. So, he used AI tools to check it. Unfortunately, all four tools gave the same result, almost the entire essay, around 90–96%, seemed to be written by AI, not a human. However, not all of us are professionals, If we see the above phrase, we may have skipped it due to our limited awareness. 9 | > # There is a need for critical thinking skills to identify if AI is the author 10 | 11 | The easiest way to spot AI-generated text is by checking for words that you don’t usually use but are common for ChatGPT. Consider a massive corpus of over [19 billion English words](https://www.english-corpora.org/now/) from blogs, articles, news, and more, updated daily from 2010 to now. I looked for the word **“delve” **using a string search algorithm, and it showed up **52,388 times**. I plot its yearly pattern and identified an unusual behavior, a **~200%** growth in its appearance on the internet from 2022, the same year when ChatGPT was released on November 30th. 12 | 13 | ![Trend of Delve word occurrence in [NOW](https://www.english-corpora.org/now/) Corpus (by Fareed Khan)](https://cdn-images-1.medium.com/max/3856/1*Tv76vgfG7kOF5IRudR5EIg.png) 14 | 15 | Other words, like **“intricacies” **or **“unwavering”**, also shows a similar increase, just like **“delve”**. They’re being used more often lately. 16 | 17 | ![Trend of intricacies and unwavering in [NOW](https://www.english-corpora.org/now/) Corpus (by Fareed Khan)](https://cdn-images-1.medium.com/max/6512/1*EgrevS32vUy4eKx3F__oog.png) 18 | 19 | This choice of vocabulary is not necessarily something that AI exclusively uses, as humans also use a diverse range of words. Although, in academic writing, we often use phrases like **“explore”** or **“discuss in more detail”** instead of **“delve”**. I ask ChatGPT to rephrase **“discuss in more detail …”**, ****the initial five suggestions it provides typically include this phrase. 20 | 21 | ![Rephrasing using ChatGPT](https://cdn-images-1.medium.com/max/7248/1*ypnIW51cEn7y5RqzJ0YrBw.png) 22 | 23 | Moreover, I try to analyze the [arXiv database](https://www.kaggle.com/datasets/Cornell-University/arxiv), a famous publishing papers platform containing more than 2 million papers in it up to 2023. I try to detect the word** “delve” **in the papers abstracts and plot its yearly pattern. I was amazed to see that this word has been widely used in the papers abstracts in the year **2023**, the same word that ChatGPT suggested in its top 5 suggestions. 24 | 25 | ![Trend of Delve word occurrence in [arXiv Database](https://www.kaggle.com/datasets/Cornell-University/arxiv) (by Fareed Khan)](https://cdn-images-1.medium.com/max/3856/1*Ri6_R6bLQJ6TVSmj6JXvVg.png) 26 | 27 | This indicates that academic writers may be using ChatGPT, either for rephrasing or generating content. The presence of the word **“delve”** serves as a hint or a doubt that the document submitted from a student or an online blog, either that paragraph or that portion of text, has been rephrased or enhanced using ChatGPT. 28 | 29 | Drawing upon my research expertise and two years of experience working with LLMs, I’ve put together [a pretty comprehensive list of 100 words](https://github.com/FareedKhan-dev/Detect-AI-text-Easily/blob/main/ai_words.txt) you can keep an eye out for in a piece of text to help you figure out if it’s been generated or paraphrased using AI. 30 | 31 | But checking for such number of words is not an easy job so to achieve it quickly, I made a [web app](https://ai-text-detect-easy.streamlit.app/) that quickly checks your text. Just upload your file or paste your text, and it’ll do the rest. Easy peasy! 32 | 33 | ![](https://cdn-images-1.medium.com/max/4612/1*9R0i2dDcwrKXlZAqPNlvNw.png) 34 | 35 | ## Hope you enjoy the read! 36 | --------------------------------------------------------------------------------