The Week in Chess

Wednesday, May 22, 2013

Escape Artist Houdini 3 Swallowed by Stockfish 130520

I'ts another random testing result, and this time the great escape artist Houdini 3 who evaded the big mouth Stockfish in the recent TCEC Live Chess Engines Tournament was swallowed whole.

I made a test for the Stockfish development version dated May 20, 2013 to assess whether it made progress in the 3 weeks time since the official release version of Stockfish 3. The outcome was amazing because it demolished the top 4 in the Owl Rating List, namely Houdini 3 Pro, Stockfish 3 (official version), Robodini 1.1 and Critter 1.6a with a comfortable margin.

Here is the gauntlet statistics:


Rank Engine ELO Games Score% Points Win Loss Draw
1 Stockfish 130520 x64 24 400 55.00 220.0 127 87 186
2 Stockfish 3 x64 2 100 45.50 45.5 9 18 73
3 Robodini 1.1 x64 -7 100 45.50 45.5 29 38 33
4 Houdini 3 Pro x64 -7 100 45.00 45.0 29 39 32
5 Critter 1.6a x64 -11 100 44.00 44.0 20 32 48

Please note that the above table is a gauntlet where Stockfish 130520 is the star engine. This is  not a round robin tournament.

In several matches between Houdini and Stockfish I've made so far, the later development version of Stockfish easily matched the strength of Houdini. In terms of scoring power however, Houdini is brutal against all the other chess engines, so this version may not be number 1 in the rating lists yet. But...watch out, it may be number 1 soon!

This is not an official result and is only published for curiosity. The result is good enough for my testing environment but may not be the same in another tester's conditions.

Download the computer chess engines tournament games here.

11 comments:

  1. Houdini 3 beaten by 3 different engines....find that a bit hard to swallow !
    Probably something wrong with your Test environment or maybe your Houdini 3 got corrupt.
    I don't do your kind of testing, but I play online chess with other strong Engine players, and I wasn't too impressed with the Stockfish 3 performance...was forced to switch back to H3 as I reported earlier.
    Don't know if the May 20th Stockfish is drastically stronger, but I seriously doubt the accuracy of your Tests (no offence).

    ReplyDelete
  2. btw the last release date of Stockfish shows as a May 19 release. Where did you get the May 20th release ??

    ReplyDelete
    Replies
    1. I already made a disclaimer that my test environment is not the same as the other tester's environment. You are free to ignore it.

      This is my fun compilation, it is not the same as the development release. The Stockfish 130520 was personally compiled from the last source at github.com/mcostalba/stockfish dated May 20, 2013. The development version at abrok.eu/stockfish compilation was labeled Stockfish 13051922 dated May 19, 2013.

      I used 4 multi-core computers and each of them are different, each produced different match results. All of them are used simultaneously to finish a tournament quickly. If there are some random abnormal results, for example Houdini being defeated by Stockfish with 19 points in this particular test, it is something unbelievable. So, I ran the same Houdini vs. Stockfish in the other computers to see if something was wrong. The result was that Houdini was indeed beaten, but I choose the lower score of 55-45 (the one published here) in Stockfish's favor because nobody would surely believe with Houdini defeated by another engine with 19 points deficit in 100 games! I can post this particular match if anybody is interested.

      If you don't get the same result as mine, that is expected, because everybody has different results, especially that I have my own Stockfish compilation. Just look around at the forum engine tests and rating list sites, you will see that nothing is the same.

      Delete
  3. Hi
    Could you post a d/l link for your compilation ? I'd like to check it out.
    Thanks.

    ReplyDelete
    Replies
    1. Just click on the "Stockfish 130520 x64" words in the engine column of the crosstable and it will lead you to the download site.

      My compilation works best against Moron 1.0, satisfaction guaranteed.

      Delete
  4. This version, like all Stockfish versions, seems to give over-optimistic evaluations for the side to move, even though the move selection is quite good. Why is this so ?
    For example, in Fritz 13 GUI, if Stockfish gives a score of + 1.20 in favor of White for a certain position, if I switch to Houdini 3 Pro, it gives a score of around +0.60 for the same position ! Why ??
    Also, you mention that Houdini is brutally efficient against other Engines, so it is Number One.
    What I don't understand is that if Stockfish beats the pants off Houdini 3, why should it have difficulties against even weaker Engines ?...doesn't seem logical.

    ReplyDelete
  5. Evaluation scoring is a design choice by chess engine authors. It is usally derived by long tests of different values of evaluation parameters and set it as it is when they are happy with the results.

    My comment on Houdini being a brutal scorer is based on my rating list score results. That is expected of the strongest chess engine on the planet. It may not last long as Stockfish is gaining incremental strength every week which is in contrast with Houdini's release every year.

    Stockfish routinely beat Houdini in my tests because it happened that it fits nicely in my testing environment. With the thousand of tournament games I conducted, I observed what computer and what circumstances Stockfish performs best. It maybe called an insider information that gives Stockfish the edge in my tests. That is why I warned that my testing results will not be the same with the other testers.

    Stockfish beat all chess engines in my tests, not with difficulties with weaker engines, but the scoring percentage is just slightly lower than that attained by Houdini. It maybe because Houdini was tweaked to perform at very fast blitz which is what my tournament time control was set. Robert Houdart commented in his website "Fun fact: Over 10 million chess games were played for the development and tuning of Houdini 3!". Don Dailey author of Komodo 5, said that Houdini is best at short time control while Stockfish scales better in longer time control.

    ReplyDelete
  6. Hmm...interesting. Actually, I find that your compile is excellent and I beat most online players today...only drew with 1 guy who was using 12 core dual xeons !
    The site I play on allows 2 minutes/move, so Stockfish gets plenty of time to think to gain full strength,as you suggest.
    Could you tell me what are the circumstances in which Stockfish outperforms Houdini 3 ?

    ReplyDelete
  7. Let me keep silent on that. It's a trade secret. :)

    ReplyDelete
  8. OK, I can respect that.
    But what you should realize is that I'm not your competitor, as my interest is only in playing online Engine-Engine chess and not in running the kind of Tournaments that you do.
    Think of yourself as an Army General and me as a Field Commander who puts your ideas and theories to the Test in the real world of online Engine-Engine chess ! :)
    Could you at least let me know if you are using default settings for Stockfish or using some other settings ?
    If you don't wish to disclose anything on a public blog, we can always correspond through email (anildharan@gmail.com).

    ReplyDelete

Chessdom News