January 13, 2025
Chicago 12, Melborne City, USA
Tech

Step-by-Step Guide to Crafting a Reliable AI Detector

Are you, as an educator, troubled by the growing ease with which students might bypass their learning journey using AI tools to craft essays or solve homework? You are certainly not alone in this concern. With the increasing presence of advanced Natural Language Models (LLMs), particularly ChatGPT, a multitude of new AI-powered applications now exist that can churn out coherent content with minimal input.

There have been countless recorded cases of students exploiting AI models to complete their assignments, igniting widespread debate about the ethical boundaries of such practices. While not traditionally viewed as plagiarism, it certainly treads the line of academic dishonesty. OpenAI itself has addressed these concerns in a comprehensive document that delves deeper into these emerging challenges.

In this guide, we will walk through the process of building an AI text detector in Python that examines text and predicts the likelihood of its origin being machine-generated. Such a tool can become your ally in determining whether the content you’re reviewing was genuinely crafted by human effort or merely simulated.

Python Script for AI Text Detection

The objective here is to develop a streamlined Python script capable of:

  • Accepting textual input.
  • Returning a verdict, including a percentage indicating how likely the text was crafted by AI.

Below is a preview of what the final script might resemble:

To meet our objective, we will:

  1. Create an instance of the AITextDetector class, passing a configuration object.
  2. Customize this configuration with parameters that link to different AI-detection APIs (such as ai-text-classifier and GLTR).
  3. Utilize the detect method from the AITextDetector class to analyze the content. The magic happens within this method, and the result is returned.
  4. The confidence score is a key metric that assists in evaluating whether the text under scrutiny is AI-generated or not.

Project Setup for the AI Detector

To follow the examples in this guide, you’ll want to download and set up our pre-built environment, AI-Generated-Text-Detector, which includes:

  • A stable version of Python (3.10).
  • Pre-packaged dependencies for WindowsMac, and Linux users:
    • Requests – essential for API calls.
    • Python-dotenv – to handle environment variables.

For access, create a free ActiveState Platform account using either your email or GitHub credentials. Once registered, you’ll be able to download the complete environment and unlock additional dependency management benefits.

For Windows users:
Run the following at your command prompt after downloading the installer:

For Mac and Linux users:

Run the following script to download and install the environment:

Once your environment is ready, let’s proceed to create the Python script:

Building an AI Text Detection Tool with OpenAI’s Classifier

To construct an AI text detector, we’ll harness the power of OpenAI’s Completions API. By analyzing the API’s output, we’ll compute the likelihood of the input text being AI-generated.

Configuring OpenAI API Access

Begin by generating an API key from OpenAI’s API page. Store it in a .env file:

Now, load this API key in Python using the following snippet:

The AITextDetector Class Mechanics

At the heart of our project is the AITextDetector class. It is tasked with sending requests to OpenAI’s Completions API and assessing the likelihood of the text being AI-generated.

The constructor of the class initializes by setting up the request headers:

Now, let’s dissect the detect method. This method prepares the request payload, sends the request, and processes the response:

Once the response is received, we extract and interpret the probability of the text being AI-generated:

The get_assessment function provides a human-readable verdict based on the calculated probability, such as “likely” or “unlikely AI-generated.”

Detecting AI Text with GLTR

For those preferring to conduct detection locally, the Giant Language Model Test Room (GLTR) offers an alternative approach. It uses color-coding to highlight word probabilities, allowing for a visual analysis of whether the text was AI-generated.

To get started with GLTR, install the necessary dependencies (torch, transformers, and numpy) and preload models:

Conclusion

In an age where AI-generated content is ubiquitous, having reliable tools to detect machine-generated text is becoming crucial across multiple sectors, from academia to marketing. While the methods detailed here offer powerful insights into text detection, they represent only a fraction of the possibilities available. As the world continues to grapple with the ethical implications of AI-generated content, tools like these will prove invaluable in maintaining the integrity of human expression.