evaluation.benchmarks.swe_bench.eval_infer
process_instance
def process_instance(instance: pd.Series,
metadata: EvalMetadata,
reset_logger: bool = True,
log_dir: str | None = None) -> EvalOutput
Evaluate agent performance on a SWE-bench problem instance.
Note that this signature differs from the expected input to run_evaluation
. Use
functools.partial
to provide optional arguments before passing to the evaluation harness.
Arguments:
log_dir
str | None, default=None - Path to directory where log files will be written. Must be provided ifreset_logger
is set.
Raises:
AssertionError
- if thereset_logger
flag is set without a provided log directory.