The Computer Language
Benchmarks Game

How programs are measured

  1. Each program is run and measured at the smallest input value, program output redirected to a file and compared to expected output. As long as the output matches expected output, the program is then run and measured at the next larger input value until measurements have been made at every input value.

  2. If the program gives the expected output within an arbitrary cutoff time (120 seconds) the program is measured again (5 more times) with output redirected to /dev/null.

  3. If the program doesn't give the expected output within an arbitrary timeout (usually one hour) the program is forced to quit. If measurements at a smaller input value have been successful within an arbitrary cutoff time (120 seconds), the program is measured again (5 more times) at that smaller input value, with output redirected to /dev/null.

  4. The measurements shown on the website are either:
    • within the arbitrary cutoff - the lowest time and highest memory use from 6 measurements

    • outside the arbitrary cutoff - the sole time and memory use measurement

  5. For sure, programs taking 4 and 5 hours were only measured once!

How programs are timed

Each program is run as a child-process of a Python script using Popen:

On win32:

(Note: Those measurements include startup time).

How program memory use is measured

By sampling GLIBTOP_PROC_MEM_RESIDENT for the program and it's child processes every 0.2 seconds. Obviously those measurements are unlikely to be reliable for programs that run for less than 0.2 seconds.

On win32: QueryInformationJobObject(hJob,JobObjectExtendedLimitInformation) PeakJobMemoryUsed

How source code size is measured

We start with the source-code markup you can see, remove comments, remove duplicate whitespace characters, and then apply minimum GZip compression. The measurement is the size in bytes of that GZip compressed source-code file.

Thanks to Brian Hurt for the idea of using size of compressed source code instead of lines of code.

(Note: There is some evidence that complexity metrics don't provide any more information than SLoC or LoC.)

How CPU load is measured

The GTop cpu idle and GTop cpu total are taken before forking the child-process and after the child-process exits. The percentages represent the proportion of cpu not-idle to cpu total for each core.

On win32: GetSystemTimes UserTime IdleTime are taken before forking the child-process and after the child-process exits. The percentage represents the proportion of TotalUserTime to UserTime + IdleTime (because that's like the percentage you'll see in Task Manager).

Data files

The summary data shown only includes measurements for programs that successfully completed every workload; only includes measurements for the fastest programs and only includes the fastest measurement for those programs. Additional measurements (not just the fastest programs, not just the fastest measurements) are included in a separate compressed data file.