A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.