Visit our website: optexity.com
Optexity enables training foundation models using human demonstrations of computer tasks. This framework allows for recording, processing, and using demonstrations to train AI agents to complete web-based tasks. We will be adding training using self exploration using reinforement learning, training from software documentations and training using youtube videos in future.
Explore our step-by-step video guides to get started with Optexity:
- Optexity Tutorial Part 1 | Introduction and State of Browser Agents for Software Use
- Optexity Tutorial Part 2 | Training AI with Human Demonstrations
- Optexity Tutorial Part 3 | AI Agent in Action!
-
Repository Setup Clone the necessary repositories:
mkdir optexity cd optexity git clone https://github.com/Optexity/ComputerGYM.git git clone https://github.com/Optexity/AgentAI.git git clone https://github.com/Optexity/playwright.git
-
Environment Setup Create and activate a Conda environment with the required Python and Node.js versions:
conda create -n optexity python=3.10 nodejs conda activate optexity
-
Installing Dependencies Install the required packages and build the Playwright framework:
pip install -e ComputerGym pip install -e AgentAI cd playwright git checkout playwright_optexity npm install npm run build playwright install cd ..
To evaluate vanilla gemini 2.0 flash for a specific web task, execute:
EXPORT GEMINI_API_KEY=<YOUR_GEMINI_API_KEY>
python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model gemini
Next section shows you how to improve the performance of these agents on specific tasks.
Pro Tip: You can visit https://aistudio.google.com/apikey to create a free gemini api key to test out any task on any website.
-
Recording Demonstrations Record human demonstrations by creating a configuration file and running the demonstration script:
./ComputerGYM/computergym/demonstrations/demonstrate.sh ComputerGYM/computergym/demonstrations/demonstration_config.yaml
Note: Create your own
demonstration_config.yaml
configuration file before running this script. -
Processing Demonstrations Process the recorded demonstrations to prepare them for training:
python ComputerGYM/computergym/demonstrations/process_demonstration.py --yaml ComputerGYM/computergym/demonstrations/demonstration_config.yaml --seed 5
-
Generating Training Data Convert processed demonstrations into a format suitable for model training:
python AgentAI/agentai/sft/prepare_training_data.py --agent_config AgentAI/agentai/train_configs/hubspot_agent.yaml
-
Training the Model Our data preparation scripts generate JSON data in a format compatible with LLaMA-Factory. The generated training and inference configurations are stored in the
train_data
directory. Please refer to the LLaMA-Factory documentation for detailed instructions on model training. -
Evaluating the Trained Agent After training your model, deploy it as an inference service on
http://localhost:8000
. By default, our framework is configured to work with the vLLM serving capability provided by LLaMA-Factory. If you're using an alternative serving method, you'll need to modify the appropriate scripts.To evaluate your trained agent on a specific web task, execute:
python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model vllm
For comprehensive information on configuration options and advanced usage patterns, please refer to the detailed documentation available in each repository:
- ComputerGYM: Environment setup, demonstration recording, and processing
- AgentAI: Model training configurations, inference settings, and evaluation metrics
- Playwright Integration: Custom extensions and modifications for web automation
- Demonstration configuration: See
ComputerGYM/computergym/demonstrations/demonstration_config_example.yaml
- Training parameters: See
AgentAI/agentai/train_configs/README.md
This project builds upon and extends the work of:
- BrowserGym - For the browser automation environment foundation
- Playwright - For reliable web testing and automation capabilities
- LLaMA-Factory - For efficient foundation model fine-tuning