evaluation.benchmarks.swe_bench.scripts.eval.verify_costs

verify_instance_costs

def verify_instance_costs(row: pd.Series) -> float

Verifies that the accumulated_cost matches the sum of individual costs in metrics. Also checks for duplicate consecutive costs which might indicate buggy counting. If the consecutive costs are identical, the file is affected by this bug: https://github.com/All-Hands-AI/OpenHands/issues/5383

Arguments:

row - DataFrame row containing instance data with metrics

Returns:

float - The verified total cost for this instance (corrected if needed)

verify_instance_costs​

verify_instance_costs