Skip to main content

evaluation.swe_bench.eval_infer

process_instance

def process_instance(instance: pd.Series,
metadata: EvalMetadata,
reset_logger: bool = True,
log_dir: str | None = None) -> EvalOutput

Evaluate agent performance on a SWE-bench problem instance.

Note that this signature differs from the expected input to run_evaluation. Use functools.partial to provide optional arguments before passing to the evaluation harness.

Arguments:

  • log_dir str | None, default=None - Path to directory where log files will be written. Must be provided if reset_logger is set.

Raises:

  • AssertionError - if the reset_logger flag is set without a provided log directory.