🐢 Open-Source Evaluation & Testing for AI & LLM systems
-
Updated
May 1, 2025 - Python
🐢 Open-Source Evaluation & Testing for AI & LLM systems
Deliver safe & effective language models
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
GPT4Go: AI-Powered Test Case Generation for Golang 🧪
A Python library for verifying code properties using natural language assertions.
Übungsaufgaben zum Buch "Basiswissen KI-Testen"
Agent testing library that uses an agent to test your agent, in Go.
Burro is a command-line interface (CLI) tool built with Deno for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.
AI Generated BDD for Java and Junit using ChatGPT4o code
An academic project that facilitates development and testing.
A plug & play framework for generative ai projects to be tested & automated
Agentic Workflow Evaluation: Text Summarization Agent. This project includes an AI agent evaluation workflow using a text summarization model with OpenAI API and Transformers library. It follows an iterative approach: generate summaries, analyze metrics, adjust parameters, and retest to refine AI agents for accuracy, readability, and performance.
This repository contains a study comparing the web search capabilities of four AI assistants: Gemini 2.0 Flash, ChatGPT-4 Turbo, DeepSeekR1, and Grok 3
🤖 The perfect playground for testing AI-generated React components. Built for ChatGPT/Claude users to instantly test and iterate on AI-created components.
AI Generated BDD for Java and Junit using ChatGPT4o code
Add a description, image, and links to the ai-testing topic page so that developers can more easily learn about it.
To associate your repository with the ai-testing topic, visit your repo's landing page and select "manage topics."