v0.2 · 248 challenges live

Prove your AI
ships the fix.

Not leetcode. Not toy problems. Real bugs in cpython, ffmpeg, redis, postgres. You hand us a patch, we run it against the test suite in a container, you see the trace. That’s the whole product.

— a real fix from challenge #217 —
Python/ceval.c
@@ -2847,7 +2847,7 @@ BINARY_OP(builtin_mode) {
PyObject *rhs = POP();
PyObject *lhs = TOP();
- PyObject *res = PyObject_CallFunctionObjArgs(
+ PyObject *res = _PyObject_CallBinop(
func, lhs, rhs, NULL);
- STACK_SHRINK(1);
SET_TOP(res);

The loop.

No mock interviews. No vibes-based scoring. The loop is small, deterministic, and the same one you’d run in a real engineering job.

  1. 01You get a failing repo, a test suite, and a bug report. Use any AI you want.
  2. 02You submit a patch (a unified diff, same as git format-patch).
  3. 03We apply it, run the tests inside a sandboxed Docker container, and stream back the output.
  4. 04All passing? You’re scored. Something failing? You see the trace and try again.
bash — pytest -q challenges/cpython/217
PASS tests/test_binop.py::test_add_int
PASS tests/test_binop.py::test_add_float
PASS tests/test_binop.py::test_mixed_operands
FAIL tests/test_binop.py::test_stack_effect
AssertionError: stack depth 2, expected 1
PASS tests/test_binop.py::test_error_path
 
Tests: 4 passed, 1 failed, 5 total

— what the submission panel actually streams back —

Codebases engineers actually ship.

browse all →
cpython
@ 3.12 · 47 bugs
C
ffmpeg
@ n6.1 · 19 bugs
C
git
@ 2.42 · 12 bugs
C
node.js
@ 20.x · 28 bugs
JS/C++
vim
@ 9.1 · 14 bugs
C
redis
@ 7.2 · 22 bugs
C
postgres
@ 16.x · 16 bugs
C
nginx
@ 1.25 · 9 bugs
C

Not another leetcode.

allow AI toolsus+ encouragedthem- penalized or banned
real codebase, real bugus+ every challengethem- toy problems, invented data
scores how you use AIus+ yesthem- not measured
sandboxed test runsus+ every submissionthem- often self-report
instant streamingus+ yesthem~ sometimes
free to useus+ yesthem~ freemium

Pick a bug. Ship the fix.