vitals

vitals is a framework for large language model evaluation in R. It’s specifically aimed at ellmer users who want to measure the effectiveness of their LLM-based apps.

The package is an R port of the widely adopted Python framework Inspect. While the package doesn’t integrate with Inspect directly, it allows users to interface with the Inspect log viewer and provides an on-ramp to transition to Inspect if need be by writing evaluation logs to the same file format.

Important

🚧 Under construction! 🚧

vitals is highly experimental and much of its documentation is aspirational.

Installation

You can install the developmental version of vitals using:

pak::pak("tidyverse/vitals")

Example

LLM evaluation with vitals is composed of two main steps.

library(vitals)
library(ellmer)
library(tibble)

First, create an evaluation task with the Task$new() method.

simple_addition <- tibble(
  input = c("What's 2+2?", "What's 2+3?", "What's 2+4?"),
  target = c("4", "5", "6")
)

tsk <- Task$new(
  dataset = simple_addition, 
  solver = generate(chat_anthropic(model = "claude-3-7-sonnet-latest")), 
  scorer = model_graded_qa()
)

Tasks are composed of three main components:

Datasets are a data frame with, minimally, columns input and target. input represents some question or problem, and target gives the target response.
Solvers are functions that take input and return some value approximating target, likely wrapping ellmer chats. generate() is the simplest scorer in vitals, and just passes the input to the chat’s $chat() method, returning its result as-is.
Scorers juxtapose the solvers’ output with target, evaluating how well the solver solved the input.

Evaluate the task.

tsk$eval()

$eval() will run the solver, run the scorer, and then situate the results in a persistent log file that can be explored interactively with the Inspect log viewer.

Any arguments to the solver or scorer can be passed to $eval(), allowing for straightforward parameterization of tasks. For example, if I wanted to evaluate chat_openai() on this task rather than chat_anthropic(), I could write:

tsk_openai <- tsk$clone()
tsk_openai$eval(solver_chat = chat_openai(model = "gpt-4o"))

For an applied example, see the “Getting started with vitals” vignette at vignette("vitals", package = "vitals").

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
.github		.github
R		R
data-raw		data-raw
data		data
inst		inst
man		man
pkgdown/assets/example-logs		pkgdown/assets/example-logs
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
vitals.Rproj		vitals.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

vitals

Installation

Example

About

Licenses found

Releases

Languages

License

Licenses found

tidyverse/vitals

Folders and files

Latest commit

History

Repository files navigation

vitals

Installation

Example

About

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases

Languages