This project implements a Mini Virtual Lab that transforms vague, qualitative R&D requirements into structured, explainable, and comparable decision problems. Rather than designing materials or molecules directly, the focus is on making early-stage R&D decisions tractable, transparent, and human-centered. The framework is intentionally domain-agnostic and can be applied across materials science, pharmaceutical R&D, and other research-driven domains. At its core, the project treats problem formulation itself as a first-class, machine-readable research object.
This case study demonstrates how the framework transforms a vague R&D requirement into a structured, explainable decision process.
We need a promising R&D candidate that achieves good performance, but minimizing failure risk is critical in early-stage development.
{
"objectives": [
{
"name": "Overall technical performance",
"metric_key": "performance",
"direction": "maximize",
"weight": 0.6
},
{
"name": "Risk of experimental failure",
"metric_key": "failure_risk",
"direction": "minimize",
"weight": 0.4
}
],
"constraints": [],
"recommended_stage": "early_exploration"
}
- Failure risk is prioritized over performance, reflecting the uncertainty and cost sensitivity of early-stage R&D.
- The problem formulation is optimization-ready and directly evaluable.
All candidates share the same measurable metrics, satisfying the Validated Decision boundary.
[
{"name": "Candidate A", "metrics": {"performance": 0.82, "failure_risk": 0.32}},
{"name": "Candidate B", "metrics": {"performance": 0.78, "failure_risk": 0.28}},
{"name": "Candidate C", "metrics": {"performance": 0.90, "failure_risk": 0.45}},
{"name": "Candidate D", "metrics": {"performance": 0.74, "failure_risk": 0.30}},
{"name": "Candidate E", "metrics": {"performance": 0.80, "failure_risk": 0.18}},
{"name": "Candidate F", "metrics": {"performance": 0.86, "failure_risk": 0.25}}
]
Candidate E: score=0.140
Candidate F: score=0.037
Candidate B: score=-0.034
Candidate D: score=-0.065
Candidate A: score=-0.103
Candidate C: score=-0.151
Candidate E is preferred because it best minimizes failure risk, which is the most critical objective in this decision, while also offering reasonable performance. The ranking reflects a strong emphasis on reducing failure risk over maximizing performance, leading to lower-ranked candidates that may have better performance but higher risk falling behind. Candidate F, while having lower overall preference, might still be reasonable if slightly higher risk is acceptable in exchange for improved performance or other practical considerations in early development. The key trade-off is balancing the priority of safety and reliability against the desire for better performance, with risk reduction driving the choice.
flowchart LR
A[Vague R&D Requirement]
B[LLM-based Problem Formulation]
C[Structured Decision Schema]
D[Candidate Data]
E[Evaluation & Ranking]
F[Visualization]
G[LLM-based Explanation]
A --> B --> C --> E
D --> E
E --> F
C --> G
E --> G
This framework introduces an explicit ValidatedDecision boundary that enforces consistency between problem formulation and candidate data.
flowchart LR
A[Problem Definition]
B[Candidate Metrics]
V[Validated Decision]
R[Ranking]
Z[Visualization]
E[Explanation]
A --> V
B --> V
V --> R
V --> Z
V --> E
-
Problem Formulation as a First-Class Object
Vague R&D intent is translated into explicit, optimization-ready structures consisting of objectives, priorities, and constraints.
This formulation becomes a reusable, machine-readable research artifact rather than an implicit human process. -
Strict Intermediate Schema
A shared, validated schema acts as a data contract between LLM outputs, evaluation logic, and visualization modules.
This prevents silent failures and enforces consistency across the pipeline. -
Decision-Ready Signals
Candidate data is treated as decision-ready signals, not raw measurements. Model predictions and experimental results are abstracted into comparable metrics that can be directly evaluated and ranked. -
Human-in-the-Loop Design
Visualization and natural-language explanations are first-class outputs. The system is designed to support expert judgment by making trade-offs explicit and interpretable, rather than automating decisions blindly. -
Clear Separation of Responsibilities
Each module has a single, well-defined role:- LLMs translate human intent and explain outcomes
- Validation enforces consistency and fail-fast behavior
- Optimization logic evaluates candidates
- Visualization supports reasoning and comparison
This project makes the following key contributions:
-
Problem Formulation as a Machine-Readable Research Object
Elevates decision formulation—objectives, trade-offs, and priorities—to a first-class, inspectable artifact that can be shared, validated, and reused across decision-making workflows. -
An Explicit Validation Boundary for AI-Assisted Decisions
Introduces the ValidatedDecision boundary as a structural safeguard that prevents silent inconsistencies between LLM-generated problem definitions and quantitative candidate data. -
Separation of Semantic Meaning and Evaluation Mechanics
Cleanly decouples human-readable intent (name) from machine-evaluable signals (metric_key), enabling robust evaluation, visualization, and explanation without relying on fragile string matching. -
A Reusable Human-in-the-Loop Decision Pipeline
Combines LLM-based problem formulation and explanation with deterministic, auditable evaluation logic, supporting transparent and reproducible R&D decisions across domains.
This project demonstrates how Generative AI, structured schemas, and optimization logic can be combined to make early-stage R&D decisions explainable, comparable, and robust.
By treating problem formulation and validation as first-class components, the framework moves beyond ad-hoc AI assistance toward reproducible, human-centered decision support systems.
A concise slide deck summarizing the motivation, architecture, and contributions of this project is available in /slides.
