Skip to content

Conversation

@ollmer
Copy link
Collaborator

@ollmer ollmer commented Feb 7, 2025

TODO:

  1. Reuse tapeagents Tool Collection Environment
  2. Make agents and environments interact through the tape
  3. Introduce a simple way to wrap new benchmarks with their own action space into a new environment

Description by Korbit AI

What change is being made?

Add a new tapeagent with integration for the Gaia benchmarking environment and multitool environment, including new configurations, agent classes, a new Makefile for setup tasks, and support for LLM configurations.

Why are these changes being made?

This integration aims to enhance the Gaia benchmarking capabilities by utilizing tape agents and providing a flexible environment configuration to allow tape agents to run multiple tools. The approach consolidates functionality, enabling advanced testing and evaluation of AI models within complex task environments, while maintaining an extensible and organized codebase structure.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

@ollmer ollmer requested a review from recursix February 7, 2025 12:55
@ollmer ollmer changed the base branch from tau-bench to main March 13, 2025 11:39
@ollmer ollmer changed the title [WIP] Multitool envs [WIP] Gaia bench with tape agent and multitool env Mar 13, 2025
@@ -0,0 +1,25 @@
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have agents self-contained, let's move these scripts to the agent's directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this script is not specific to a single agent, but for the pair "agent + benchmark". Do you want to put it into the tapeagent dir?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to agents/tapeagent/experiments/

@@ -0,0 +1,41 @@
#!/bin/bash

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have agents self-contained, let's move these scripts to the agent's directory.

@recursix recursix merged commit cad0629 into main Apr 21, 2025
3 checks passed
@recursix recursix deleted the multitool_envs branch April 21, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants