top of page
big_logo_black.png

KODAH

Silas Liu - April 29, 2026

Automated Code Fixing, Large Language Models

Kodah achieves 81% of the resolve rate of Claude Opus 4.6, the current SWE-bench Lite leader, while operating at 1/38th of the cost.

Kodah is an autonomous code repair system that challenges the dominant assumption in software engineering automation: that performance scales with compute. Using GPT-5-mini, a model that costs fractions of a cent per call, Kodah resolves 51% of SWE-bench Lite issues at an average cost of $0.045, delivering near-frontier results at a fraction of what today's leading agents spend.

Beating the Compute Curve: 51% on SWE-bench Lite at $0.045 per Issue

The dominant assumption in automated software engineering is that resolve rate scales with compute, that higher performance is a direct function of how much you spend at runtime. Today's top-tier autonomous agents and frontier models routinely cost between $0.85 and $1.70 per code repair, relying on massive token windows and brute-force reasoning to navigate complexity.

Kodah challenges this paradigm. I built a system that achieves elite performance using GPT-5-mini, a model that costs fractions of a cent per call.

Kodah resolves 51% of SWE-bench Lite issues at an average cost of $0.045 per issue.

System
Resolve Rate
Cost per Issue
Claude Opus 4.6 (Thinking)

62.7%

~$1.70

GPT-5

54.3%

~$1.25

Kodah

51.0%

$0.045

Devin

13.86%

$2.25/ACU

Kodah reaches 81% of the resolve rate of the current frontier leader, Claude Opus 4.6, while operating at 1/38th of the cost. Against every system in this comparison, Kodah is the only one where performance and cost move in opposite directions: near-frontier results at sub-five-cent operational cost.

The total API cost to evaluate all 300 issues in the benchmark was just $13.59.

Metric
Value
Benchmark

SWE-bench Lite (300 issues, 12 Python repos)

Resolve Rate

51.0% (153/300)

Average cost per issue

$0.0453

Total cost (all 300)

$13.59

Robustness Across Architectures

The system maintains consistent performance across diverse codebases, proving that the efficiency of the architecture holds even as repository size and complexity increase.

Repository
Resolved
Rate
Avg Cost
psf/requests

6/6

100.0%

$0.049

django/django

67/114

58.8%

$0.038

matplotlib/matplotlib

13/23

56.5%

$0.038

scikit-learn/scikit-learn

12/23

52.2%

$0.029

astropy/astropy

3/6

50.0%

$0.064

mwaskom/seaborn

2/4

50.0%

$0.084

sympy/sympy

36/77

46.8%

$0.061

pydata/xarray

2/5

40.0%

$0.075

pytest-dev/pytest

6/17

35.3%

$0.040

pylint-dev/pylint

2/6

33.3%

$0.048

sphinx-doc/sphinx

4/16

25.0%

$0.039

pallets/flask

0/3

0.0%

$0.028

Underlying model: GPT-5-mini, declared for benchmark transparency. Benchmark evaluation was conducted against the official SWE-bench Lite test harness. Results are verifiable against the public leaderboard.

Performance varies across repositories. requests (6/6) and django (67/114) demonstrate strong results on well-structured, widely-used codebases. On the other end, flask (0/3) and sphinx (4/16) highlight challenges in codebases with strong runtime coupling, while sympy (46.8%) shows degraded efficiency on repositories requiring domain-specific mathematical reasoning.

Cost Distribution: Scaling Without Complexity

 

The cost curve for Kodah is heavily right-skewed. While the industry standard assumes that complex issues require significantly more compute, our data shows that strong results do not require consistently high operational costs.​​​​​​​

  • ​​​​​​​75% of issues cost less than $0.05.​

  • 90% of issues cost less than $0.10.​​

  • Fewer than 2% of issues exceeded $0.20.​​

Redefining the Economics of Engineering

The market today operates on two assumptions: that autonomous results require flagship-tier compute, or that lower costs require a human in the loop. Kodah operates outside both constraints: delivering autonomous, high-tier results at a sub-five-cent operational cost.

51% is an early milestone, not an endpoint. Significant headroom remains while maintaining the same cost discipline.

kodah.io

bottom of page