In order to have a fair competition, I see to it that the default setup of the competing engines are properly set. This was done by clicking on the "Reset to defaults" button on the Arena Configuration window. These were the defaults of the important parameters:
|Description||Quad Core||Dual Core|
|Houdini 3 Pro:|
|Min Split Depth||4||4|
|Max Threads per Split Point||5||5|
Other tournament parameters:
Ponder = off
Book = Perfect 2012, up to 6 ply
Move limit = 120
Resign = 900 centipawns
Early draw = 40 moves
The tournament began in earnest, and after a day of continuous matches, the partial result showed that Stockfish and Houdini were even on the AMD Dual Core but Stockfish was leading in the AMD Quad Core. I took note of the statistics displayed such as the search depth and speed. Half-way in the matches, Stockfish's lead in the Quad Core was widening, so I scrutinized intently the behavior of the chess engines. Stockfsh search depth and search speed were almost always higher than Houdini. It seems abnormal because Houdini was one of the fastest in terms of search speed KiloNodes per second. The search speed of Houdini was almost the same level as that of Rybka 4.1 most of the time. Then, I realized that Houdini works best in my AMD Quad Core when the threads parameter was manually set to 3 and this value was changed when I reset to the defaults to have a "fair" match. The other 2 chess engines which require special attention in my Quad Core were Critter 1.6a where the thread was set to 2 and Bouquet 1.7b which performed dismally with any number of threads (the default was 16 threads) and had the tournaments played in my other AMD Dual Cores.
The tournament was already past the half-way mark, so I just decided to let the matches finished and see whatever results it may produce. When the multiple tournaments were through, I wanted to observe what will happen when Stockfish and Houdini plays one on one in a single computer with no other other tournaments running using the same fair defaults set by their authors, so I set a rematch with 30 rounds limit. In the rematch, the search speed which was very low in the simultaneous 3 tournaments was gone with Houdini showing 1.7 times faster the search speed of Stockfish. The battle for supremacy should be fair with this changed match environment.
Here are the final results:
AMD Dual Core:
2 Tournaments at 20 rounds each
Result: Draw = Houdini 20, Stockfish 20
|1||Houdini 3 Pro x64||3171||40||50||20||12||12||16||15||95|
|2||Stockfish 130801 x64||3151||40||50||20||12||12||16||5||97|
AMD Quad Core:
3 Tournaments at 30 rounds each
Result: Stockfish 76.5, Houdini 13.5
|1||Stockfish 130801 x64||3151||90||85||76.5||70||7||13||0||0|
|2||Houdini 3 Pro x64||3171||90||15||13.5||7||70||13||47.78||73|
AMD Quad Core:
One on One at 30 rounds
Result: Stockfish 16.5, Houdini 13.5
|1||Stockfish 130801 x64||3151||30||55||16.5||10||7||13||0||0|
|2||Houdini 3 Pro x64||3171||30||45||13.5||7||10||13||26.67||97|
Who said that the engines should never have time forfeit in longer games? I am surprised myself especially taking into account that the contestants were of very high caliber, but it happened. I added two statistics highlighted with red at the edge of the results table which are TF% - Time Forfeit percentage and Ply - the average ply (half-moves) for the games forfeited. It showed that Stockfish and Houdini have losses by time forfeits in the AMD Dual Core and only Houdini experience heavy losses by time forfeits in the AMD Quad Core. I was able to watch in full some games where Houdini lost by time forfeit in which Houdini have time trouble when Stockfish had great advantage and then eventually just let it go when the position was hopeless. This behavior correlates with what Robert Houdart had stated that Houdini thinks deeper when in trouble, so there is no surprise when it loses specially taking into consideration a nasty bug in which Houdini crashed which happened at the TCEC Live Tournament and the reports posted at the chess forums. The Houdini crashes also happened occasionally in my tournaments but the other strongest chess engines do not exhibit this behavior. In this particular Stockfish-Houdini tournaments, there was no single crash that occurred.
I think the most deadly parameter that should be properly set by the chess engine authors is threads. Improper handling of it could lead to abnormal behavior that is the most annoying. Letting chess engine testers set it arbitrarily is not a good idea because it is easy to sabotage a chess engine by this single parameter. The recent change of threads behavior in the latest Stockfish development releases where a user must change it in relation to the number of cores a user have is a bad idea. Why do users will have a burden to do this? Don't blame the engine testers when an engine does not perform properly because there is no guarantee that the change will be optimal. It takes time and observation to really notice the effects and that the setting is not portable to any computer with different cores. Let the engine authors be responsible with whatever default settings it should have and then it is up to the users to change whatever they wanted to experiment. In a chess testing environment, it would be easy to pass the blame to the tester if a certain parameter is not set properly when it is not the obligation of the tester to do so.
Who won in this contest? Of course by the numbers, it is Stockfish. The shortcomings of Houdini were magnified in this tournament. The Rybka like search speed of Houdini computer and the many losses by time forfeit in the very heavily worked computer with many processes/threads had never been been documented before. Perhaps this information may be useful for the author to investigate why such things happened and then try to reproduce it with similar testing environment.
Download the chess engines tournament games here: