v0.2 · 248 challenges live
Prove your AI
ships the fix.
Not leetcode. Not toy problems. Real bugs in cpython, ffmpeg, redis, postgres. You hand us a patch, we run it against the test suite in a container, you see the trace. That’s the whole product.
— a real fix from challenge #217 —
Python/ceval.c
@@ -2847,7 +2847,7 @@ BINARY_OP(builtin_mode) {PyObject *rhs = POP();PyObject *lhs = TOP();- PyObject *res = PyObject_CallFunctionObjArgs(+ PyObject *res = _PyObject_CallBinop(func, lhs, rhs, NULL);- STACK_SHRINK(1);SET_TOP(res);
The loop.
No mock interviews. No vibes-based scoring. The loop is small, deterministic, and the same one you’d run in a real engineering job.
- 01You get a failing repo, a test suite, and a bug report. Use any AI you want.
- 02You submit a patch (a unified diff, same as
git format-patch). - 03We apply it, run the tests inside a sandboxed Docker container, and stream back the output.
- 04All passing? You’re scored. Something failing? You see the trace and try again.
bash — pytest -q challenges/cpython/217
PASS tests/test_binop.py::test_add_intPASS tests/test_binop.py::test_add_floatPASS tests/test_binop.py::test_mixed_operandsFAIL tests/test_binop.py::test_stack_effectAssertionError: stack depth 2, expected 1PASS tests/test_binop.py::test_error_pathTests: 4 passed, 1 failed, 5 total
— what the submission panel actually streams back —
Codebases engineers actually ship.
browse all →cpython
@ 3.12 · 47 bugs
C
ffmpeg
@ n6.1 · 19 bugs
C
git
@ 2.42 · 12 bugs
C
node.js
@ 20.x · 28 bugs
JS/C++
vim
@ 9.1 · 14 bugs
C
redis
@ 7.2 · 22 bugs
C
postgres
@ 16.x · 16 bugs
C
nginx
@ 1.25 · 9 bugs
C
Not another leetcode.
allow AI toolsus+ encouragedthem- penalized or banned
real codebase, real bugus+ every challengethem- toy problems, invented data
scores how you use AIus+ yesthem- not measured
sandboxed test runsus+ every submissionthem- often self-report
instant streamingus+ yesthem~ sometimes
free to useus+ yesthem~ freemium