evaluation.benchmarks.swe_bench.scripts.eval.verify_costs
verify_instance_costs
def verify_instance_costs(row: pd.Series) -> float
Verifies that the accumulated_cost matches the sum of individual costs in metrics. Also checks for duplicate consecutive costs which might indicate buggy counting. If the consecutive costs are identical, the file is affected by this bug: https://github.com/All-Hands-AI/OpenHands/issues/5383
Arguments:
row
- DataFrame row containing instance data with metrics
Returns:
float
- The verified total cost for this instance (corrected if needed)