evaluation.benchmarks.bird.run_infer

execute_sql

def execute_sql(db_path, gen_sql, gold_sql)

Execute the generated SQL and the ground truth SQL and compare the results.

load_bird

def load_bird()

Main function to handle the flow of downloading, processing, and loading the bird dataset.

initialize_runtime

def initialize_runtime(runtime: Runtime, instance: pd.Series)

Initialize the runtime for the agent.

This function is called before the runtime is used to run the agent.

complete_runtime

def complete_runtime(runtime: Runtime, instance: pd.Series) -> dict[str, Any]

Complete the runtime for the agent.

This function is called before the runtime is used to run the agent. If you need to do something in the sandbox to get the correctness metric after the agent has run, modify this function.

execute_sql​

load_bird​

initialize_runtime​

complete_runtime​

execute_sql

load_bird

initialize_runtime

complete_runtime