The ICFP Programing Contest judging is still under way and the results won't be known for nearly another month. The judges recently published comprehensive results of the "correctness" elimination round, which knocked out more than 100 of the 171 entries received, leaving 59 "correct" entries in the contest itself, plus another 28 programs which are correct but are not eligible for prizes for various reasons such as being submitted after the deadline.
The information published about the elimination round by the judges included the running time for each program on each test file together with the number of bytes by which the file was made smaller.
While these test files were used solely to eliminate incorrect programs and will not actually be used to determine the eventual winners, I thought it might be interesting to see what would happen if I pretended that these were the final test files.
On with the show!
I've taken a subset of the correct programs, a subset of the elimination round tests, and invented a scoring formula which I think reasonably accurately models the apparent intent of the judges.
I eliminated the "null" and "almost-empty" tests. These were good for tripping up buggy programs, but show very little useful about the good programs. A surprising number of good programs (including two of our three) failed to optimize the "almost-empty" test. Perhaps this test should be included and given weight in the result, but I haven't largely because that would likely have a pretty major bad effect on those programs and I don't think the judges are going to include such small files in the scoring rounds as they would make discerning the timing far too error-prone. Well, that's my story and I'm sticking to it :-)
I eliminated 29 of the "correct" programs, leaving 30. Why? Largely because I wanted to incorporate the ratio of each program's speed relative to the fastest program in the scoring formula, and there were a number of programs which were obviously either rewrites of "cat" (and therefore very very bad but very very fast) or did very little better than that and were pretty clearly not in the running. My actual cutoff criterion was that each program should have acheived at least 500,000 bytes reduction in the "exhaustive" test. In fact, no program I eliminated did better than 370,000 bytes reduction, and only two programs I kept did worse than 900,000 bytes reduction (625,692 & 712,422), so there was a nearly 2:1 break at around that point between the serious contenders and programs which Just Didn't Get It. No offense intended to those on the wrong side of the cutoff.
This is where the guesswork comes in. The judges have pretty clearly indicated that they will favour good compression heavily over good speed, both in the judging criteria in section four of the task description and in the FAQ. They also commented adversely in the judging notes about how annoying the programs which always use the full time allowed are.
I decided to allocate 10 points to the program acheiving the largest size reduction in each test, and to deduct up to 1 point from the slowest programs.
I divide the number of bytes reduction for each program by the number of bytes reduction
by the best program, thus normalizing each test to the range 0.0 .. 1.0. I then take
10 to the power of this number as the score (i.e. the base ten antilog), thus scaling
the result to between 1.0 and 10.0 and increasing the separation between programs
which acheive nearly perfect results (and heavily penalizing those which produce poor
results).
10^(reduction/bestReduction)
I also tried squaring or cubing the ratio instead of taking the antilog. This is even more severe on the poor programs and seperates the good programs from each other more. This did not change the major placings at all.
There is a very wide range of speeds of these programs. On the five different tests I'm using the speed range is between 2000:1 and 6000:1, with an average of around 3.5 orders of magnitude.
The judges appear to want differences in speed to be not terribly significant so I decided to take the logarithm of the ratio of each program's execution time to the shortest execution time for that particular test file. Thus a program that is ten times slower gets one additional penalty point. This seems too harsh, so I divide it by an arbitrarily choosen number, 5, scaling the penalty points roughly into the range 0 - 0.7.
log10(time/bestTime)/5
I also tried other divisors and it turns out the that 1st and 2nd place programs remain the same with divisors anywhere in the range from 3 to 12 and the 3rd and the 3rd and 4th places are also taken by the same programs but with a change in their relative order at a certain point.
The final and complete scoring formula is:
10^(reduction/bestReduction) - log10(time/bestTime)/5
There's no way of knowing. There are an awfully large number of assumptions in here. Not only have the judges not given their exact scoring criteria but we don't even know what the test files will be and whether the results on them will correlate well to the results on the elimination files.
At best this is a rough indication. Unless I've grossly missed the mark I expect most programs will end up within a few places of where they appear in the list below. How's that for a weasel-worded letout clause?
Having said that, I'm going to be extremely surprised if I haven't picked the winning program!!
BigCatWithTheBlackTail's "Beamer" program, by a country mile. It produces the smallest output file on four of the five test files (1st equal in one case), losing out only on the unrealistic "random" test file. Speed-wise it's in the middle ground. It takes a while to think about the more challenging problems, but still never used more than 10% of the available time and was usually ten to thirty times faster than the 2nd placed program.
Much to the annoyance of the contest organisers, I'm sure, "Beamer" is written in C++ ...
Uhhh ... that would be my program, the Dylan Hackers entry with the optimizer written by me. :-)
So did I fiddle the scoring formula to make my program look good? Well not consciously. I'm certainly gratified to see my program so highly placed, and I'll be even more pleased if it finishes in the money come September, but I really won't be surprised if the judges' scoring ideas differ a little from mine and it finishes a few places lower down.
Perhaps the most I can truthfully claim is that I tried to tune my optimizer to the tradeoffs that I thought the judges would make (the same ones I used to create the scoring formula used here), and appear to have been fairly sucessful in doing exactly that. See my comments about this topic on our team web page, in the section about why we have so many entries.
BigCatWithTheBlackTail appears to have had much the same idea, only executed much better :-)
| 000-example.txt | 003-third-step.txt | 004-random.txt | 005-icfp2000.txt | 006-exhaustive.txt | ||||||||||||||||
| Rank | Entry | TOTAL | Size | Time | Score | Size | Time | Score | Size | Time | Score | Size | Time | Score | Size | Time | Score | Language | Team | Program Name |
| 1 | 126 | 9.464 | 884 | 0.32 | 9.880 | 1458 | 0.51 | 9.814 | 2563 | 8.58 | 8.462 | 7715 | 3.64 | 9.688 | 1019912 | 158.07 | 9.478 | c++ | BigCatWithTheBlackTail | Beamer |
| 2 | 249 | 8.530 | 884 | 0.82 | 9.798 | 1429 | 16.22 | 9.066 | 2136 | 88.26 | 5.568 | 7711 | 85.28 | 9.402 | 999525 | 1770.45 | 8.819 | dylan | Dylan Hackers | Dylan Hackers |
| 3 | 121 | 8.422 | 877 | 170.07 | 9.154 | 1401 | 170.08 | 8.449 | 2493 | 170.04 | 7.691 | 7121 | 590.06 | 7.621 | 1016633 | 1790.78 | 9.194 | haskell | Haskell Carrots | LALR(5000000) |
| 4 | 210 | 8.286 | 877 | 0.20 | 9.740 | 1265 | 0.17 | 7.282 | 2557 | 0.29 | 8.711 | 6897 | 2.03 | 7.572 | 965950 | 1711.66 | 8.125 | eiffel | Ben Lynn | slow |
| 5 | 167 | 8.165 | 877 | 0.25 | 9.720 | 1190 | 0.16 | 6.464 | 2708 | 0.25 | 9.929 | 6863 | 25.36 | 7.274 | 929716 | 1546.09 | 7.438 | sml | Tycon Mismatch | (\x.x) |
| 6 | 140 | 8.146 | 856 | 0.22 | 9.209 | 1228 | 0.25 | 6.830 | 2485 | 0.80 | 8.100 | 6835 | 1.81 | 7.439 | 1007040 | 257.16 | 9.150 | ocaml | [b8] clan | runme |
| 7 | 238 | 7.654 | 849 | 0.18 | 9.058 | 1132 | 0.21 | 5.867 | 2350 | 0.26 | 7.301 | 6828 | 6.01 | 7.318 | 995140 | 1793.07 | 8.724 | ocaml | Funktion im Kopf der Mensch | trender |
| 8 | 208 | 7.583 | 849 | 5.23 | 8.766 | 1090 | 5.22 | 5.205 | 2600 | 5.25 | 8.787 | 6077 | 5.27 | 5.789 | 1001935 | 5.66 | 9.370 | sml | Exo-Plugarite Zulanga | mltong |
| 9 | 243 | 7.355 | 849 | 0.32 | 9.008 | 1004 | 0.39 | 4.720 | 2527 | 0.66 | 8.418 | 6457 | 0.45 | 6.739 | 930300 | 9.46 | 7.891 | ocaml | Coq in Stock | Euh-c'est-quelle-heure? |
| 10 | 194 | 7.350 | 831 | 0.11 | 8.683 | 909 | 0.12 | 4.142 | 2512 | 0.20 | 8.413 | 6059 | 0.29 | 6.008 | 1006714 | 3.95 | 9.505 | ocaml | Les Lampions | lighter |
| 11 | 186 | 7.317 | 849 | 0.09 | 9.118 | 1332 | 0.06 | 8.196 | 1427 | 0.12 | 3.357 | 6401 | 0.10 | 6.756 | 981050 | 0.39 | 9.160 | c++ | Eeniac | reMarkToo |
| 12 | 177 | 7.230 | 877 | 0.11 | 9.792 | 1249 | 0.12 | 7.129 | 730 | 0.13 | 1.846 | 6924 | 0.14 | 7.868 | 1007111 | 3.85 | 9.516 | c | Blobmakers | ElegantBlob |
| 13 | 145 | 7.181 | 884 | 0.25 | 9.901 | 935 | 0.19 | 4.278 | 1295 | 0.59 | 2.862 | 7671 | 3.16 | 9.570 | 1005477 | 33.46 | 9.293 | ocaml | unuclear | unuclear |
| 14 | 149 | 6.385 | 849 | 0.08 | 9.129 | 1086 | 0.06 | 5.557 | 1646 | 0.11 | 4.053 | 5774 | 0.17 | 5.557 | 908615 | 2.21 | 7.627 | ocaml | Danjer | chartreause |
| 15 | 204 | 6.356 | 884 | 173.13 | 9.333 | 1429 | 173.83 | 8.860 | 0 | 173.16 | 0.361 | 7619 | 593.24 | 8.963 | 712422 | 1788.84 | 4.262 | c++ | Jeremy Sawicki | Big Hack |
| 16 | 222 | 6.230 | 849 | 0.14 | 9.080 | 1117 | 0.13 | 5.769 | 1098 | 0.12 | 2.536 | 5873 | 0.24 | 5.695 | 934567 | 3.08 | 8.068 | c++ | Blue emacs | lextest.cc |
| 17 | 237 | 6.212 | 849 | 12.48 | 8.690 | 1009 | 17.80 | 4.426 | 1286 | 116.82 | 2.379 | 6691 | 126.31 | 6.746 | 999525 | 1770.91 | 8.819 | dylan | Dylan Hackers | Beam Search |
| 18 | 184 | 6.105 | 849 | 4.89 | 8.771 | 890 | 14.49 | 3.601 | 1719 | 171.47 | 3.675 | 6240 | 64.43 | 5.877 | 989241 | 1793.79 | 8.598 | python | Team #python | All Your Whitespace Are Belong To Us |
| 19 | 248 | 6.005 | 849 | 0.37 | 8.996 | 881 | 0.31 | 3.878 | 0 | 0.70 | 0.839 | 6555 | 0.59 | 6.919 | 999525 | 2.39 | 9.393 | dylan | Dylan Hackers | Dylan Lightning |
| 20 | 117 | 5.830 | 849 | 135.11 | 8.483 | 1022 | 135.13 | 4.352 | 732 | 135.14 | 1.246 | 6485 | 555.15 | 6.179 | 1002845 | 1755.54 | 8.891 | ocaml | 152 Survivors | Wheeler's Wacky Search |
| 21 | 109 | 5.669 | 849 | 11.19 | 8.700 | 1004 | 11.91 | 4.423 | 692 | 13.33 | 1.384 | 6191 | 9.92 | 5.946 | 934202 | 22.11 | 7.890 | c | dqd | press |
| 22 | 241 | 5.637 | 849 | 4.78 | 8.773 | 752 | 4.45 | 2.905 | 1785 | 4.02 | 4.249 | 4786 | 7.57 | 3.796 | 966812 | 44.69 | 8.458 | java | UNSW Express | unsw891 |
| 23 | 24 | 5.614 | 842 | 0.25 | 8.865 | 823 | 0.33 | 3.520 | 0 | 1.38 | 0.780 | 5825 | 1.46 | 5.456 | 1002046 | 2.38 | 9.448 | erlang | Blue Avengers | Erlightning |
| 24 | 110 | 5.542 | 849 | 0.29 | 9.017 | 842 | 0.70 | 3.567 | 0 | 0.94 | 0.814 | 5582 | 1.20 | 5.075 | 994153 | 3.83 | 9.237 | haskell | Functional Beer | SML/NG Optimizer 3000, Iron Chef Style |
| 25 | 161 | 5.449 | 849 | 0.53 | 8.964 | 812 | 0.68 | 3.394 | 0 | 3.82 | 0.692 | 5958 | 3.11 | 5.621 | 976399 | 111.06 | 8.573 | python | MEMS Exchange Software Group | Smiley |
| 26 | 96 | 5.089 | 849 | 140.14 | 8.480 | 823 | 140.15 | 2.995 | 192 | 140.19 | 0.556 | 5811 | 560.18 | 4.915 | 984323 | 1760.62 | 8.497 | ocaml | 152 Survivors | Heuristic search with space merging |
| 27 | 131 | 5.019 | 0 | 2.10 | 0.716 | 0 | 2.13 | 0.690 | 2593 | 2.08 | 8.813 | 6023 | 2.12 | 5.770 | 988324 | 4.28 | 9.104 | cyclone | Snowstorm | Stackblaster |
| 28 | 44 | 4.708 | 849 | 177.03 | 8.460 | 658 | 177.11 | 2.133 | 192 | 177.14 | 0.536 | 5167 | 597.22 | 3.919 | 984307 | 1801.12 | 8.495 | ocaml | 152 Survivors | Wheeler's Wacky Search |
| 29 | 80 | 4.318 | 772 | 0.13 | 7.428 | 789 | 0.09 | 3.441 | 80 | 0.11 | 1.070 | 2192 | 0.37 | 1.810 | 920566 | 2.21 | 7.840 | ocaml | jcf | jcf |
| 30 | 93 | 3.381 | 791 | 145.75 | 7.197 | 798 | 146.01 | 2.849 | 855 | 146.02 | 1.444 | 3437 | 566.24 | 2.039 | 625692 | 1766.96 | 3.375 | scheme | Brownian Code Motion | Brown |