The "Judge": An educational system for checking algorithmic abilities

Authentic Assessment

Algorithmic theory ranges from algorithm design over complexity theory all the way to algorithm analysis. The actual implementation of an algorithm, however, is usually not considered as part of algorithmic theory. Nevertheless, in order to show full command of algorithmic abilities and the underlying theory it is essential that students learn how to convert a given problem setting into a suitable algorithmic setup which they can then successfully implement and test on given sample instances.

Our Judge allows developing exactly these abilities: to learn how to solve algorithmic problems given by a textual description. This includes appropriate problem modeling, choice of suitable algorithms, and implementing them, whenever appropriate using additional libraries like STL, CGAL, or BGL. More precisely, the Judge is an automated and scalable system to run programming exercises and exams. Students submit their solutions to programming problems through a simple web interface. Their programs are automatically compiled and tested against multiple test cases provided by the instructor. Within seconds the students receive feedback on the correctness and performance of their submissions so that they can, if necessary, adapt and/or correct their current solution.

Slideshow is being loaded..

The judge is architected as a distributed system. Student submissions are assigned to multiple backend machines that perform the actual work of compiling and testing the submissions. In this way the system can easily scale to large exams with hundredths of students by simply adding more worker machines.
Over the last years we spent a lot of e ort to ensure that the student programs run in full isolation. Every submission is executed in its own chroot environment and simulated by Valgrind, which lters system calls and limits the amount of memory available to each program to the size stated on the description of the problem at hand. This ensures that student submission will not interfere with each other. Furthermore instead of measuring the time it takes to execute the program we let Valgrind count the number of instructions processed. This ensures that the grading is deterministic and reproducible.
During the development phase we worked hard on trying to detect possible attacks to the system. We invested a lot of e orts ourselves, but also used the creativity of ETH students: during test exams we encouraged students to try to explore ways of cheating, o ering free dinners for every successful attempt reported to us. We were relieved that successes were rare (and ingenious) right from the start. Nevertheless, we happily paid for a couple of dinners that allowed making our system even more secure.

{{ projectHeader.title }}