Updated: September 28 2005, 3:50 am NZST

Our entry has won both the Judges' Prize and 2nd Place!

This year saw the 8th annual ICFP programming competition. The contest traditionally runs over 72 hours and is open to all comers. Entries may be made using any programming language. http://icfpc.plt-scheme.org/

The Task

"In the far-flung future of the year 2000, functional programming has taken over the world and so humans live in an almost unimaginable luxury. Since it's so easy, humans have used robots to automate everything, even law enforcement and bank robbery --- the only job left to humans is to write their robots' control programs."

This year's ICFP competition differed from previous years in having a twofold task. The two halves of the task were roughly two weeks apart. The first half was announced on Friday June 24th, and its task was to write two programs, a Cop-Bot and a Robber-Bot. The Robber-Bot was to rob banks and avoid being caught by the Cop-Bots. The Cop-Bot needed to be able to work together with other Cop-Bots to catch the Robber-Bot. The initial entries had to be submitted on Monday June 27th. The full task description is here.

The second half of the task was announced about 2 weeks later on Saturday July 9th, and required the programs to be modified. Robber-Bots could now bribe Cop-Bots, and Cop-Bots would be able to accuse other Cop-Bots of being bribed. Reuse of code would be very important here. The full description of this second task is here. The final entries were due on Sunday, July 10th.

About Dylan

Dylan is an advanced object-oriented programming language. It combines the advantages of expressive "scripting" languages such as Python or Ruby with performance comparable to C.

Gwydion Dylan refers to a specific compiler, d2c, that was originally developed by the Gwydion group at Carnegie Mellon University. It is now maintained as an open source project by the Gwydion Dylan hackers. As the name indicates, the compiler translates Dylan code into C code and then compiles it. It's stable, and produces very efficient code. It's also extremely portable.

The Dylan hackers have entered ICFP since 2000, recieving the Judges' Prize in 2003 (http://www.hoult.org/~bruce/icfp2003/) and a Second placing in 2001 http://www.gwydiondylan.org/icfp/icfp2001.phtml).

The team

Some contributions were also made by Jes Hall, setting up the team's subversion repository and web-based subversion front end to use for the competition.

The first task

Keith Bauer

I began by writing a simple cop and robber in Ruby, to get a head start on thinking about strategies whilst the Dylan libraries came up to speed. Once the Dylan libraries were usable, I converted the Ruby robber over to Dylan, and worked on improving its cop-avoidance techniques and bank-robbing abilities. I don't think much of what I did ended up in the final robber, but it provided a benchmark to test our cop against during the competition. I didn't participate in the second part of the competition.

Bruce Hoult

At the start of the contest I downloaded the task specification, discussed it briefly with other team members and then went to bed to sleep on it. By the time I woke up Andreas and Hannes had a working framework in Dylan that could correctly parse and respond to all messages from the server, for both the cop and the (much simpler) robber.

Unfortunately, they forgot to check in one file that contained the Dylan equivilent of all the #include directives. This meant that Alex and Keith and I could not compile the program until I'd taken all the "foo not declared" error messages and tracked down what library they came from -- not especially difficult, but tedious and time consuming. While I'm working on this Keith completes the framework necessary for a robber in Ruby and starts working on algorithms for a robber. Alex starts working on the strategy code for a cop even though he can't compile it yet.

The rest of the first day, and all of the second day, I work on improving the usability and features of the framework and build system, helping others with problems, profiling cops- and robbers-in-progress and optimizing hot spots. And thinking about what the ideal robber would do.

The third day I spent entirely on implementing my ideas for a robber. By then Andreas and Hannes had a cop well in hand. It seemed to be better than Alex's cop and was performing well against Keith's robber.

Search Strategy

Ideally, you'd like to choose your move using a minimax algorithm so that you make the best move you have available, given the assumption that your opponent makes the best response that he has available, given the assumption that you make the best reply to his response, and so on to preferably a large number of layers of second-guessing. This approach works fairly well (with alpha-beta pruning) for Chess on modern computers. In the midddle part of a Chess game each player has typically 30 to 40 possible moves each turn. Minimax made computers unbeatable on games with fewer choices each move -- such as Checkers/Draughts or Othello/Reversi -- even on the computers available in the 1960's.

Unfortunately, in our cops and robbers game the five cops have, in combination, more than 1.4 million possibilities for their first move. It's not so bad later in the game, when cops mostly don't have the choice of travelling either by car or on foot at each move, but there are still around a thousand possible cop moves each time. This suggests that even looking ahead two robber moves and two cop moves will be very very hard to acheive within the required five second response time. My sincere congratulations to any team that acheived this, but I don't think such a small amount of look-ahead is useful for the robber anyway in this game where if the robber is touched by a cop even once then he is dead and gets zero points. It's like a game of Chess where you lose if you lose even one pawn.

Because touching a cop is so deadly, I have the robber, each time it is his move, simply start from the current position of each cop and figure out the set of all possible positions that one or more cops could get to on the next move, with no attempt to figure out what a smart cop would actually do. From that set of positions I can then calculate all the places that cops could possibly be after two moves, then after three moves, and so on.

I then have the robber look for the shortest path to a bank such that there is the minimum (preferably zero) possibility of meeting a cop on each move. I do that using a new A* search each time it is the robber's turn to move, with path costs freshly updated from the new positions of the cops, and with the cost of traversing each intersection dependent on whether it is possible to find a cop there at the time the robber will actually get there.

Refinements

The limitation of this technique is that the world is small and connected enough that cops can usually get to anywhere within about ten moves. This is especially so if some cops are in cars, so there is no point in trying to look further ahead than that. In fact, trial and error showed that it seemed to be best to project out only the next six cop moves and then use that same set for any planning out beyond six moves.

Sometimes, when you start to get surrounded by cops there isn't any way out of the trap that has zero probability of meeting a cop. But if they're surrounding you it's probably because they think they know where you are so it's much better to make a break for it than to stay put, even if your present position looks like the safest place to be for the next five or six moves. Therefore when I calculate where the cops might be I assign probabilities to each location rather than just possible or impossible. After you've made your first move towards a gap between cops and the cops have moved, the chances are good -- if the cops aren't close enough to actually smell you -- that the gap has stayed the same size, or even grown larger, and you might now have a clear run through it to safety.

I didn't worry too much about smell. Smell tells a cop approximately where you are: he knows that you're within one or two moves of his current position. This may give the cops a much better idea of where you are than they had before, and so of course they will start to converge on that position. But if you already know that you have a way to get to the next bank you want to rob with no possibility of a cop touching you on the way then it's not at all a bad thing if the cops all converge somewhere that you aren't any more. In fact, in my testing it seemed that an ideal situation that my robber was often able to exploit was to have all the cops just one or two moves behind as my robber dashed in a circuit around and around the four centrally-located banks!

Conclusion

I was pretty happy with my robber. Even against five of our cops cooperating with each other the robber was able to get 85% - 90% of the money about half the time. The other half of the time the robber was caught by our cops. I think they're both pretty good, with the robber being perhaps just a little too much on the risk-taking side. Certainly I expect we'll soundly beat the cop and robber written by the contest organisers and therefore get through to the main round. In the main round I expect to see very little meaningful cooperation between cops written by different teams and daring robbers may well have a field day.

Andreas Bogk & Hannes Mehnert

After start of the competition, we started working on implementing the protocol. We mostly made use of the regular expression matching facility to to the parsing, and wrote one class for each message type used in the protocol. After this was satisfactorily working, we went to bed.

After getting up on the next morning, we discovered that we had forgotten to add the library imports file to our source repository, meaning that the New Zealand team wasted quite some time reproducing that. Oops.

We kept working on the code building classes from protocol messages, replacing string references to objects like locations and players by actual references to the representing object itself. We also wrote a driver loop to drive different kinds of agents while running the protocol. The interface consisted of a number of generic functions called on some subclass of agent, which the agent implementor had to provide. We also provided simple default methods for all messages. If the called generic function threw an exception, we still sent valid messages (we didn't want to get disqualified by an illegal instruction). So the agent implementors code wasn't part of the trusted code base.

We fixed a bug in our regular expression library, it has a cache of regular expressions, and used \== (identity) for comparison whether the expression was already in the cache. We generated the regular expressions dynamically, so they were equal (\=), but not the same object. Our demo cop used >512MB memory, after the change from \== to \= in the regular expression library, it used <10MB memory.

We proceeded by writing code that generated a set of possible moves for an agent, taking care to properly represent the differnt properties of cops moving by either car or foot. Next was code to advance the state of the world, giving a set of moves for all players in the world. At this point, we were able to do random walks, and telling the McGruffs to do random walks too.

We then started to work on something a little more intelligent for the cops. The basic idea we followed was to keep a map of probabilities for each node, representing the likelihood to find the robber at a given node. For this, we collected all the information we could get, such as setting the probability to zero for all fields where the robber should have been smelled, but wasn't, for banks which have been robbed or not robbed, and integrating information coming in from other cops. Probabilities were kept normalized. For every turn, the cop would predict probabilities for the robber's location next turn, based on the known probabilities from all sorts of sources, and assuming that each of the possible robber moves was equally likely.

It was very interesting to see that this simple approach, based solely on accumulation of probabilities and predicting the next turn only, was powerful enough to win big against a lot of simple robbers we threw against it. We could observe behaviour emerging from those simple rules that looked like serious strategic planning, although we never explicitly coded any such behaviour. If the location of the robber was known, such as the start of the game, or after robbing a bank, all cops would head towards the suspected location. After arriving, they started to spread out and scan the area for traces of the robber. If one of the cops came within smelling distance, the cops started chasing him, spreading out when losing the track, coming together again once he was found again, and eventually cornering the robber on the edge of the map.

We decided not to sleep on the second night at this point, and kept debugging and improving the cop, watching lots of trial runs. Bruce eventually started working on a serious robber, and the last few hours were spent on gradually improving both sides.

Somewhere on the way we implemented voting on plans, which by many participants in the contests was regarded as a problem too hard to solve. In fact, we had a pretty east metric to deal for that. We have probabilities for the robber locations, so we can give scores to all cop moves which bring that cop closer to one of the suspected robber locations. We would order plans by this score.

Now if our plan wasn't the winning plan, we decided to just follow the winning plan, and not attempt any tricks. The reasoning was that catching the robber was hard enough when you had cooperating cops, and becomes impossible when you don't. So if we wouldn't cooperate, chances are that the team contolling the robber would score points, and we wouldn't. Of course, we would check that the moves suggested by other players were actually valid.

The second task

Andreas Bogk & Hannes Mehnert

We spent the first eight hours implementing the new game protocol. We thought about strategies for deciding whether and when to bribe and take bribes. After a lengthy discussion with Bruce we decided to play it straight. We considered the robber to be good enough so splitting the loot with other teams wouldn't be worth the trouble. We also decided to use a straight cop, mainly because we couldn't come up with a good strategy to tell when being bribed would increase chances of winning.

What we did work on was detection of rogue cops, though. We used scoring points for behaviour we considered bad, and started to blame a cop once a certain score was reached. Not following the winning plan was considered a minor offense, but repeatedly doing so made a cop suspicious. We figured that somebody not following orders is either stupid (not cooperating, and chances of winning are slim anyway) or a rogue cop. Also, we blamed cops not passing knowledge about evidence, or submitting informs about the robber location that were impossible based on our knowledge of the world.

Of course, we also implemented bad cops and a bribing robber, for the purpose of testing our code. Especially taking over other cops and controlling them had to be tested.

So we spent the rest of the 24 hours testing, fine-tuning and watching battles.

Overall, it was a fun contest. It was a lot like real-world projects: the deadlines are tight, the rules are unforgiving, and the competition is unknown. Some people complained it were too much work for such a short period of time, but we think we've shown that using a sufficiently powerful programming language allows to be done with the mechanics part in time, leaving plenty of opportunity to care about the challenge itself.

Bruce Hoult

Once again I looked at the task specification when it came out, discussed it briefly with other team members, and then went to bed. By the time I woke up Andreas and Hannes had updated the framework to correctly parse the new messages and create default null responses so our old cop and robber code was automatically a valid entry for the second round. So I was able to spend the entire day improving the robber code itself.

Strategy

Nothing in the changed rules made life any easier for cops, so I decided that if our basic robber was good enough in the first phase of the contest then it was still good enough. In particular I didn't see any reason that we'd ever want to try to bribe the cops. It just means giving up a lot of money if you survive, and there is no way to be sure that a bribed cop will do anything useful to you anyway becuase if it does then it is very vulnerable to being accused by other cops. I also didn't implement "shoving".

Refinements

What I spent the day doing was making my robber a better bank robber. Everything I did would have been equally useful in the first phase if I'd thought of it then, or had time to implement it.

There was a certain amount of tweaking of various "fudge factors" in the evaluation function for the A* algorithm, but the main change was in how far ahead the robber planned.

In the first submission the robber planned only how to get to a bank safely but gave no thought to how to safely get away from the bank afterwards! This is bad. When the robber got the cops into the pattern of all the cops following it around this was fine as all the cops would be on the same side of the bank and the robber could just carry on out the other side of the bank and away. But ocassionally there would be cops close to the bank on several sides and once they saw it was robbed they could get there quickly and trap the robber.

So the change was that instead of only trying to find a safe path to a bank the robber looked for a safe path of length at least eight moves that included a bank. Once the robber got within less than eight moves from a bank it would automatically start planning the escape, and the closer it got the more certain the escape had to be.

This made the robber quite a bit more cautious, and lowered the average score a bit on the times that the robber survived. It would often approach to within two or three moves of a target bank and then loiter for several moves until the cops happened to randomly move away far enough (or just into the right configuration) for a safe getaway to be certain.

Conclusion

At the end I was much happier with the robber than I had been after the first submission. It seemed significantly more likely to survive even against well-organized cops. I just wish that I'd been able to get it as good in time for the first submission.

The real wild-card in this round of the contest is I think in how often robbers offer bribes and how often cops accept them and how often other cops are able to detect bribed cops, accuse them, and take them over. That gives a real opportunity for not only much better cooperation between cops if several are controlled by one team, but also for that team to double or triple the score their cop earns from a game. Our robber never offers bribes so it will never come up against such better-cooperating cops. Our cop on the other hand does try to detect bribed cops and may sometimes find itself with an opportunity for a windfall.

I'm very glad that we were using Dylan for this contest. Because Dylan is nearly as fast to program in as the "scripting" languages, we had valid entries ready within eight hours of each task announcement, leaving the rest of the time for working on strategy. But because Dylan is a fast compiled language we were never under any performance pressure. The cop, in particular, had a very large amount of communications to parse and respond to and I suspect that many entries will have used a large fraction of the available five second response time just dealing with that. Both our cop and our robber always respond to the server within a small fraction of a second so we didn't even bother to write checks for getting near the end of the allowed time.

Aftermath

In summary, it seems this approach worked fairly well. The Dylan Hackers find ICFP a fun, challenging opportunity to gain more exposure for the Dylan programming language and submit some interesting sample code into the repository for the curious to examine. http://www.gwydiondylan.org/cgi-bin/viewcvs.cgi/trunk/examples/ICFP2005/trunk/src/

The Dylan hackers believe that using Dylan gave them an edge. Dylan being very expressive and lending itself to fast development allowed them to have a valid entry very early on that they could then refine. Its speed as a compiled language let them concentrate on improving their program without worrying about whether it would be fast enough to avoid being disqualified.

Special thanks go to Jes Hall for setting up the subversion server and repository the team used, setting up WebSVN so we could browse the changes, and for putting together this web page.