Build software better, together

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for AI & LLM systems

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated May 1, 2025
Python

JohnSnowLabs / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated May 9, 2025
Python

Addepto / contextcheck

Star

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

open-source ci testing-tools chatbot-framework testing-framework chatbot-testing rag ai-chat large-language-models llm ai-testing llm-evaluation llm-evaluation-framework prompt-test llm-testing ai-testing-tool generative-ai-testing rag-testing summarization-testing

Updated Dec 11, 2024
Python

tianshanghong / GPT4Go

Star

GPT4Go: AI-Powered Test Case Generation for Golang 🧪

golang test-automation openai code-generation software-testing test-generation golang-testing test-case-generation go-testing golang-utility openai-api golang-test golang-tests gpt-4 chatgpt chatgpt-go ai-testing ai-powered-testing gpt4go

Updated Apr 5, 2023
Go

kdunee / intentguard

Sponsor

Star

A Python library for verifying code properties using natural language assertions.

testing natural-language test-automation pytest unittest code-quality language-models code-verification llm ai-testing

Updated Mar 1, 2025
Python

KI-Testen / Uebungen

Star

Übungsaufgaben zum Buch "Basiswissen KI-Testen"

artificial-intelligence exercises software-testing german-language hands-on ai-testing

Updated Dec 20, 2024
Jupyter Notebook

langwatch / scenario-go

Star

Agent testing library that uses an agent to test your agent, in Go.

testing ai agents qa-automation ai-qa ai-testing

Updated Apr 21, 2025
Go

Sephrim-NightShade / Questions-you-want-answers-to

Star

ai-testing automated-responses

Updated Nov 16, 2023

isaccanedo / shortest

Star

🧑 QA via natural language AI tests

testing qa test qa-test ai-testing aitest

Updated Dec 25, 2024
TypeScript

Burro is a command-line interface (CLI) tool built with Deno for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.

evaluation quality-assurance deno llm ai-testing

Updated Jan 17, 2025
TypeScript

sbittla / ECommerceApp

Star

AI Generated BDD for Java and Junit using ChatGPT4o code

ecommerce-application cucumber-java ai-generated chatgpt-api chatgpt-bot ai-testing ai-testing-best-practices ai-testing-tool ai-driven-automation ai-test-generator ai-driven-qa ai-driven-testing ai-driven-java-junit-testing chatgpt-generated-api-testing ai-assitant-generated-automation

Updated Jan 18, 2025
Java

ymen200 / EmbedAI

Star

An academic project that facilitates development and testing.

python deep-learning rest-api stm32 code-analysis embedded-systems transformer bug-detection ai-testing

Updated May 19, 2025
Jupyter Notebook

Antonio-Bellifemine / generative-ai-testing

Star

A plug & play framework for generative ai projects to be tested & automated

chatbot-testing ai-testing generative-ai-projects ai-testing-best-practices ai-testing-tool generative-ai-testing rag-testing summaraization-testing

Updated Oct 15, 2024
JavaScript

ashleysally00 / agent_eval_testing_workflow

Star

Agentic Workflow Evaluation: Text Summarization Agent. This project includes an AI agent evaluation workflow using a text summarization model with OpenAI API and Transformers library. It follows an iterative approach: generate summaries, analyze metrics, adjust parameters, and retest to refine AI agents for accuracy, readability, and performance.

machine-learning text-summarization semantic-similarity model-performance transformers-library openai-api ai-optimization ai-testing llm-evaluation ai-workflow agentic-ai ai-agent-evaluation ai-metrics readability-n

Updated Feb 23, 2025
Python

Q-Aware-Labs / Evaluating_AI_Web_Search

Star

This repository contains a study comparing the web search capabilities of four AI assistants: Gemini 2.0 Flash, ChatGPT-4 Turbo, DeepSeekR1, and Grok 3

artificial-intelligence gemini ai-evaluation llm ai-testing chatgpt- llm-evaluation grok-3 ai-assistans

Updated May 20, 2025
Python

MohammadShamchi / ai-react-playground

Star

🤖 The perfect playground for testing AI-generated React components. Built for ChatGPT/Claude users to instantly test and iterate on AI-created components.

react typescript tailwindcss vite ai-tools chatgpt ai-testing claude-ai-generated-code

Updated Jan 31, 2025
TypeScript

2DKINGG / ECommerceApp

Star

AI Generated BDD for Java and Junit using ChatGPT4o code

ecommerce-application cucumber-java ai-generated chatgpt-api chatgpt-bot ai-testing ai-testing-best-practices ai-testing-tool ai-driven-automation ai-test-generator ai-driven-qa ai-driven-testing ai-driven-java-junit-testing chatgpt-generated-api-testing ai-assitant-generated-automation

Updated May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-testing

Here are 17 public repositories matching this topic...

Giskard-AI / giskard

JohnSnowLabs / langtest

Addepto / contextcheck

tianshanghong / GPT4Go

kdunee / intentguard

KI-Testen / Uebungen

langwatch / scenario-go

Sephrim-NightShade / Questions-you-want-answers-to

isaccanedo / shortest

thisguymartin / burro

sbittla / ECommerceApp

ymen200 / EmbedAI

Antonio-Bellifemine / generative-ai-testing

ashleysally00 / agent_eval_testing_workflow

Q-Aware-Labs / Evaluating_AI_Web_Search

MohammadShamchi / ai-react-playground

2DKINGG / ECommerceApp

Improve this page

Add this topic to your repo