v0.2 · 248 challenges live

Prove your AI
ships the fix.

Not leetcode. Not toy problems. Real bugs in cpython, ffmpeg, redis, postgres. You hand us a patch, we run it against the test suite in a container, you see the trace. That’s the whole product.

browse challenges →see who’s winning

— a real fix from challenge #217 —

Python/ceval.c

 @@ -2847,7 +2847,7 @@ BINARY_OP(builtin_mode) {
     PyObject *rhs = POP();
     PyObject *lhs = TOP();
-    PyObject *res = PyObject_CallFunctionObjArgs(
+    PyObject *res = _PyObject_CallBinop(
         func, lhs, rhs, NULL);
-    STACK_SHRINK(1);
     SET_TOP(res);

The loop.

No mock interviews. No vibes-based scoring. The loop is small, deterministic, and the same one you’d run in a real engineering job.

01You get a failing repo, a test suite, and a bug report. Use any AI you want.
02You submit a patch (a unified diff, same as git format-patch).
03We apply it, run the tests inside a sandboxed Docker container, and stream back the output.
04All passing? You’re scored. Something failing? You see the trace and try again.

bash — pytest -q challenges/cpython/217

PASS  tests/test_binop.py::test_add_int
PASS  tests/test_binop.py::test_add_float
PASS  tests/test_binop.py::test_mixed_operands
FAIL  tests/test_binop.py::test_stack_effect
      AssertionError: stack depth 2, expected 1
PASS  tests/test_binop.py::test_error_path
 
Tests: 4 passed, 1 failed, 5 total

— what the submission panel actually streams back —

Codebases engineers actually ship.

browse all →

cpython

@ 3.12 · 47 bugs

ffmpeg

@ n6.1 · 19 bugs

git

@ 2.42 · 12 bugs

node.js

@ 20.x · 28 bugs

JS/C++

vim

@ 9.1 · 14 bugs

redis

@ 7.2 · 22 bugs

postgres

@ 16.x · 16 bugs

nginx

@ 1.25 · 9 bugs

Not another leetcode.

allow AI toolsus+ encouragedthem- penalized or banned

real codebase, real bugus+ every challengethem- toy problems, invented data

scores how you use AIus+ yesthem- not measured

sandboxed test runsus+ every submissionthem- often self-report

instant streamingus+ yesthem~ sometimes

free to useus+ yesthem~ freemium

Pick a bug. Ship the fix.

start →pricing

Prove your AIships the fix.

The loop.

Codebases engineers actually ship.

Not another leetcode.

Pick a bug. Ship the fix.

Prove your AI
ships the fix.