The Week in Chess

Sunday, August 11, 2013

Houdini 3 Pro Strangled by Stockfish 130801 with Threads

Right after the gauntlet matches of Stockfish 130801 was published last August 6, 2013, where it gained 20 ELO points, I was curious to know whether it could take on Houdini 3 Pro in longer time control. I decided to have a time control of 30 minutes base + 30 seconds increment to be done in 2 computers.  The first computer is the primary one I used for tournaments which was an AMD Quad Core with 4GB RAM and the second was an AMD Dual Core with 4G RAM. To finish the games as quickly as possible with many matches completed, I set up 3 simultaneous tournaments exclusive for Houdini-Stockfish matches in the AMD Quad Core and 2 tournaments in the AMD Dual Core. The purpose was to have a comparison of the resulting match played in different environments.

In order to have a fair competition, I see to it that the default setup of the competing engines are properly set. This was done by clicking on the "Reset to defaults" button on the Arena Configuration window. These were the defaults of the important parameters:

Description Quad Core Dual Core
Houdini 3 Pro:

   Threads 4 2
   Split Depth 10 10
   Hash 128 128
   Strength 100 100

Stockfish 130801:

   Min Split Depth 4 4
   Max Threads per Split Point 5 5
   Threads 4 2
   Hash 32 32
   Skill Level 20 20
Other tournament parameters:
   Ponder = off
   Book = Perfect 2012, up to 6 ply
   Move limit = 120
   Resign = 900 centipawns
   Early draw = 40 moves
The tournament began in earnest, and after a day of continuous matches, the partial result showed that Stockfish and Houdini were even on the AMD Dual Core but Stockfish was leading in the AMD Quad Core. I took note of the statistics displayed such as the search depth and speed. Half-way in the matches, Stockfish's lead in the Quad Core was widening, so I scrutinized intently the behavior of the chess engines. Stockfsh search depth and search speed were almost always higher than Houdini. It seems abnormal because Houdini was one of the fastest in terms of search speed KiloNodes per second. The search speed of Houdini was almost the same level as that of Rybka 4.1 most of the time. Then, I realized that Houdini works best in my AMD Quad Core when the threads parameter was manually set to 3 and this value was changed when I reset to the defaults to have a "fair" match. The other 2 chess engines which require special attention in my Quad Core were Critter 1.6a where the thread was set to 2 and Bouquet 1.7b which performed dismally with any number of threads (the default was 16 threads) and had the tournaments played in my other AMD Dual Cores.

The tournament was already past the half-way mark, so I just decided to let the matches finished and see whatever results it may produce. When the multiple tournaments were through, I wanted to observe what will happen when Stockfish and Houdini plays one on one in a single computer with no other other tournaments running using the same fair defaults set by their authors, so I set a rematch with 30 rounds limit. In the rematch, the search speed which was very low in the simultaneous 3 tournaments was gone with Houdini showing 1.7 times faster the search speed of Stockfish. The battle for supremacy should be fair with this changed match environment.

Here are the final results:

AMD Dual Core:

2 Tournaments at 20 rounds each
Result: Draw = Houdini 20, Stockfish 20

Rank Engine ELO Games Score% Points Win Loss Draw TF% Ply
1 Houdini 3 Pro x64 3171 40 50 20 12 12 16 15 95
2 Stockfish 130801 x64 3151 40 50 20 12 12 16 5 97

AMD Quad Core:
3 Tournaments at 30 rounds each
Result:  Stockfish 76.5,  Houdini 13.5

Rank Engine ELO Games Score% Points Win Loss Draw TF% Ply
1 Stockfish 130801 x64 3151 90 85 76.5 70 7 13 0 0
2 Houdini 3 Pro x64 3171 90 15 13.5 7 70 13 47.78 73

AMD Quad Core:
One on One at 30 rounds
Result:  Stockfish 16.5, Houdini 13.5

Rank Engine ELO Games Score% Points Win Loss Draw TF% Ply
1 Stockfish 130801 x64 3151 30 55 16.5 10 7 13 0 0
2 Houdini 3 Pro x64 3171 30 45 13.5 7 10 13 26.67 97

Who said that the engines should never have time forfeit in longer games? I am surprised myself especially taking into account that the contestants were of very high caliber, but it happened.  I  added two statistics highlighted with red at the edge of the results table which are TF% - Time Forfeit percentage and Ply - the average ply (half-moves) for the games forfeited. It showed that Stockfish and Houdini have losses by time forfeits in the AMD Dual Core and only Houdini experience heavy losses by time forfeits in the AMD Quad Core. I was able to watch in full some games where Houdini lost by time forfeit in which Houdini have time trouble when Stockfish had great advantage and then eventually just let it go when the position was hopeless. This behavior correlates with what Robert Houdart had stated that Houdini thinks deeper when in trouble, so there is no surprise when it loses specially taking into consideration a nasty bug in which Houdini crashed which happened at the TCEC Live Tournament and the reports posted at the chess forums. The Houdini crashes also happened occasionally in my tournaments but the other strongest chess engines do not exhibit this behavior. In this particular Stockfish-Houdini tournaments, there was no single crash that occurred.  

I think the most deadly parameter that should be properly set by the chess engine authors is threads.  Improper handling of it could lead to abnormal behavior that is the most annoying. Letting chess engine testers set it arbitrarily is not a good idea because it is easy to sabotage a chess engine by this single parameter. The recent change of threads behavior in the latest Stockfish development releases where a user must change it in relation to the number of cores a user have is a bad idea. Why do users will have a burden to do this? Don't blame the engine testers when an engine does not perform properly because there is no guarantee that the change will be optimal. It takes time and observation to really notice the effects and that the setting is not portable to any computer with different cores. Let the engine authors be responsible with whatever default settings it should have and then it is up to the users to change whatever they wanted to experiment. In a chess testing environment, it would be easy to pass the blame to the tester if a certain parameter is not set properly when it is not the obligation of the tester to do so.

Who won in this contest? Of course by the numbers, it is Stockfish. The shortcomings of Houdini were magnified in this tournament.  The Rybka like search speed of Houdini computer and the many losses by time forfeit in the very heavily worked computer with many processes/threads had never been been documented before. Perhaps this information may be useful for the author to investigate why such things happened and then try to reproduce it with similar testing environment.

Download the chess engines tournament games here:

No comments:

Post a Comment

Chessdom News