The Week in Chess

Tuesday, May 28, 2013

Stockfish 130525 x64 - Gauntlet Matches 100 Rounds

Stockfish 130525 x64 by Marco Costalba, Joona Kiiski and +Tord Romstad scored 65.19% with 1158 wins, 429 losses and 813 draws in th 100 rounds gauntlet against the strongest computer chess engines. Stockfish 130525 massacred them all including the mightiest Houdini 3 Pro.

The performance of Stockfish 130525 earned 3142 ELO points but still number 2 in the overall rating list behind Houdini 3 Pro with ELO rating of 3186.  Stockfish 130525 is still a development version which is just over 3 weeks old since the release of the last official release with an ELO increase of 14 points, strong enough to be included in the rating list.

The inclusion of Stockfish development version starting this latest rating list #47 is driven by the need to track the latest strength of Stockfish that would not be publicly known before the next official release which could take 6 months. This action may pollute the rating list because of the fast release of updates and may also be an administrative nightmare. I decided to make the Owl Rating List site not to be bounded by rigid rating list rules since this was created for curiosity satisfaction, fun and entertainment. So, Stockfish is an exception to the rules, with only one strongest development version to be included with the older ones cleared immediatey after a new release. This action is experimental and will be refined as time goes by.
 
Rank Engine ELO Raw Games Score% Points Win Loss Draw Chg
1 Stockfish 130525 x64 3142 100 2400 65.19 1564.5 1158 429 813 3142
2 Houdini 3 Pro x64 3186 65 100 44.50 44.5 26 37 37 -3
3 Robodini 1.1 x64 3105 58 100 43.50 43.5 27 40 33 0
4 LEOpard 0.7c x64 3049 49 100 42.00 42.0 21 37 42 1
5 ComStock 3 x64 3093 47 100 40.50 40.5 11 30 59 0
6 Tactico Power 2011 x64 3048 39 100 40.00 40.0 17 37 46 1
7 RobboLito 0.10 x64 3061 35 100 40.00 40.0 21 41 38 0
8 Firenzina 2.2.2 xTreme x64 3042 32 100 39.00 39.0 17 39 44 1
9 Vitruvius 1.11C x64 3032 28 100 39.00 39.0 20 42 38 1
10 Stockfish 3 x64 3130 25 100 38.50 38.5 21 44 35 -2
11 Ivanhoe 46h x64 3056 22 100 37.50 37.5 16 41 43 0
12 Komodo 5 x64 3037 20 100 37.00 37.0 16 42 42 1
13 Strelka 5.5 x64 3050 13 100 38.00 38.0 28 52 20 -1
14 Fire 2.2 xTreme x64 3059 10 100 36.00 36.0 17 45 38 0
15 Bouquet 1.6 x64 3035 8 100 35.00 35.0 14 44 42 0
16 Critter 1.6a x64 3100 7 100 35.50 35.5 17 46 37 -3
17 Rybka 4.1 x64 3041 6 100 35.00 35.0 13 43 44 0
18 Ippolit 0.080b3 x64 2978 -48 100 30.00 30.0 19 59 22 0
19 Shredder 12 x64 2930 -56 100 29.00 29.0 19 61 20 2
20 Gull R375 x64 2952 -57 100 27.00 27.0 11 57 32 1
21 Saros 3.3b x64 2997 -58 100 29.50 29.5 22 63 15 -1
22 Akkad 0.52b x64 2988 -63 100 28.50 28.5 20 63 17 0
23 Black Mamba 1.4 x64 2946 -78 100 28.50 28.5 24 67 9 1
24 Hannibal 1.3 x64 2890 -90 100 22.50 22.5 7 62 31 2
25 Naum 4.2 x64 2873 -115 100 19.50 19.5 5 66 29 2
.
Download the computer chess engines tournament games here.

7 comments:

  1. Hi
    Very impressive performance by Stockfish. But unfortunately, I wasn't able to duplicate your results. Don't get me wrong, I would really love it if Stockfish trashed Houdini !
    I decided to do a little testing myself. :)
    I read up a tutorial on Adam's Computer Chess Page and decided to try setup an Engine-Engine Tournament on my own.
    ( Its my first time, so don't be too critical, :) )
    I used Arena GUI, and just 2 Engines, Houdini 3 and Stockfish 130525 x64 ( your compile) , both at default settings. I used Perfect 2012a book, which is recommended for Engine-Engine match.
    I set the Level at 80 seconds per move, as you say that Stockfish is better in Long games.
    (I ran only 1 round because of time constraints because of family responsibilities and I work at a day job).
    I report with regret that Houdini 3 easily beat your Stockfish compile.
    I know only 1 match is insignificant and I will run more later, when time permits, but the ease with which Houdini 3 thrashed Stockfish was a tad disappointing . Houdini 3 had White which may have made a difference.
    I had Ponder On. Do you think I should have it off ? as it seemed that Houdini 3 was hogging more of my computer resources than poor ol' Stockfish !
    Your comments please.

    ReplyDelete
    Replies
    1. Experienced testers will laugh when novice testers conclude that an engine is better than the other with just 1 game of play.

      To conduct a test tournament with long time control requires resources and patience. The best illustration is the TCEC live tournament conducted from January to May 2013 which took 4 months to finish. In the end Houdini beat Stockfish with just 2 points out of 48 games and the winner was known only at the last game!

      The Stockfish version was weaker than the last version today. If a rematch will be issued this time using the latest Stockfish development version, I think Houdini will find its nemesis.

      Chess is a random thing. All games it produced are random including the results. The only thing that can conquer randomness is by volume according to Ed Schroeder author of Prodeo/Rebel chess engines. Stockfish team test results by the thousands. Robert Houdart of Houdini does it by millions. I only test by hundreds. You test by just 1.

      As you made your baby steps, you will discover what chess testing is all about.

      Delete
  2. Thank you. Yes, I am a novice, :) and will probably remain one in this field because of time constraints.
    You didn't answer my question as to whether you recommend Ponder On or OFF ?
    If I wish to test Houdini 2 with default Contempt 1 against itself but using Contempt 0 or 2 , how do I go about it ?

    ReplyDelete
  3. Ponder on is best when the opposing engine is not on the same computer like the playing online. It is bad with engine to engine matches on the same computer because some chess engines are cpu hog which will choke the other engine giving it little cpu resource while it is time to move.

    I used ponder off on my computer chess engines tournament with up to 6 simultaneous pairs. If ponder is on, the computer is painfully slow and will not be usable.

    ReplyDelete
  4. I made a quick 100 rounds test between Houdini 3 with Ponder Off against itself with Ponder On in my 4 computers. The results were split.

    I also made the match between Stockfish 130525 against Houdini 3 with Ponder Off. The results were also split.

    There is no conclusive evidence that Ponder On is stronger, otherwise it should be the default setting chosen by Robert Houdart.

    By the way, I do not change the default UCI settings of any chess engine unless it is the only way to make it usable for engine vs. engine match. I will never do it in my tournaments because it is like tampering the results. The chess engine authors should see to it that the default settings are optimized for engine matches without alteration.

    If you found that Houdini 3 with Ponder Off is winning against Stockfish, then well and good, use it. You have a computer that is best for Houdini. My computers are choosy with different engines.

    ReplyDelete
    Replies
    1. Oops, sorry it should not be Ponder Off but Contempt 0. My mind is mixed-up.

      Delete
    2. Heh heh....I was a bit puzzled by your earlier post...now it makes sense.
      I would say that it IS conclusive evidence that Houdini 3 C=0 is the strongest setting, because in your earlier Tests you were crowing about how Stockfish had swallowed Houdini 3 (with default settings) whole , :). Certainly, Houdini 3 C=0 seems to have stuck in its craw ! :)
      Yes, you are right when you say that I have a computer that is best for Houdini. I read somewhere that Houdini 3 and Intel Processors seem to suit each other...they get along like a house on fire !
      Actually, I haven't got around to matching Stockfish against Houdini 3 C=0, yet.
      But then, I don't really need to, as even Houdini 3 with default C=1, is beating Stockfish on my 6-Core i7 3930k overclocked to 4.7 Ghz with 32 GB Vengeance Corsair RAM.
      Not so easily now, now that I have set Ponder Off on your suggestion, but still winning consistently. This Houdini 3 is certainly a resource hog, but I have to admit that the results justify this.
      Let's hope that the Stockfish team makes more changes so as to better utilize the brute power of modern Intel Processors.
      Till then, I'm afraid it is still Houdini 3 RULEZZ ! :)

      Delete

Chessdom News