The Week in Chess

Wednesday, October 29, 2014

Stockfish 14102815 Tests - Possible Regression


There is a possible regression with the latest build of Stockfish 14102815 based on observation and sample tests.

I just finished the gauntlet matches ran for Stockfish 14102712 yesterday for rating list publication but withheld it when the resulting ELO rating was lesser than the previously published results. A new version 14102721 was released from abrok.eu and immediately tested it for possible candidate blog posting. About 70% of the test run was completed but another followup release labeled 14102815 appeared again at abrok.eu, so I decided to forego the pending test and set-up new gauntlet matches for the latest release hoping that it will be suitable for the rating list publication.

As I watched the first 10 rounds of Stockfish 14102815 against Komodo 8 and Houdini 4 Pro, I noticed that it has difficulty scoring an advantage over the two great rivals in the 3 computers that it was running.  I went home and arranged another set of tournament gauntlet matches and saw the same negative pattern.  As I was about to sleep, I had a feeling that it will have a lower score than the previous versions after the 100 round matches. Sure enough, when I woke up and scrutinized the results, it produced a lower score.

What went wrong? It is supposed that each new release will most likely produce a better performance. I suspect that the last patch has caused the most negative result, so I immediately arranged a quick match of 30 seconds base + 500 milliseconds increment time control to determine if my suspicion is true. And it was! Stockfish 14102815 lost by 8 points to the older 14102712 version.

In the office, I hurriedly arranged the gauntlet matches in the 3 computers for confirmation of what I observed. This time it is between the two latest consecutive patches 14102815 vs. 14102721 in a longer 200 rounds duration each at quick time control of 30 seconds + 500 ms.  Well, the aggregate result is 314.5-285.5 score with 29 points advantage by the older 14102721 against the latest 14102815 which is approximately 8 ELO rating points. The latest Stockfish 14102815 lost in all the 3 batches.

This test result should be verified by the testers and the Stockfish team.

The seemingly good latest patch may have regression problem which was described as follows:

Author: mstembera
Date: Tue Oct 28 22:23:01 2014 +0800
Timestamp: 1414506181

max_piece_type cleanup, and slight speed increase.

No functional change.

Resolves #81 


Specifically, the change goes as follows;

src/evaluate.cpp 
assert(target & (pos.pieces(C) ^ pos.pieces(C, KING)));
- PieceType pt;
- for (pt = QUEEN; pt > PAWN; --pt)
+ for (PieceType pt = QUEEN; pt > PAWN; --pt)
if (target & pos.pieces(C, pt))
    return pt;
   - return pt;
   + return PAWN;
}






























Here are the statistics of the regression tests;

AGGREGATE RESULTS:
   # PLAYER                               :  RATING    POINTS  PLAYED    (%)
   1 Stockfish 14102721 x64    :    8.48          314.5        600     52.4%
   2 Stockfish 14102815 x64    :   -8.48          285.5        600     47.6%
 
 Batch 1

Stockfish 14102815 x64 vs. Stockfish 14102721 x64 - Match 100R 30S+500ms
RankEngineScoreStStS-B
1Stockfish 14102721 x64101.5/200· ·· ·· ·· ·26-23-151 9997.75 
2Stockfish 14102815 x6498.5/20023-26-151· ·· ·· ·· · 9997.75 


200 games played / Tournament finished

Tournament start: 2014.10.29, 14:10:59
Latest update: 2014.10.29, 15:44:19
Level: Blitz 0:30/0.5
Hardware: AMD Phenom(tm) IIX4 945 Processor with 6.0 GB Memory
Operating system: Windows 7 Ultimate Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5
Batch 2

Stockfish 14102815 x64 vs. Stockfish 14102721 x64 - Match 100R 30S+500ms2
RankEngineScoreStStS-B
1Stockfish 14102721 x64106.0/200· ·· ·· ·· ·33-21-146 9964.00 
2Stockfish 14102815 x6494.0/20021-33-146· ·· ·· ·· · 9964.00 


200 games played / Tournament finished

Tournament start: 2014.10.29, 01:03:54
Latest update: 2014.10.29, 03:12:59
Level: Blitz 0:30/0.5
Hardware: AMD A8-5600K APU with Radeon(tm) HD Graphics, 8GM RAM
Operating system: Linux 3.16.4 with WINE
Table created with: Arena 3.5
Batch 3

Stockfish 14102815 x64 vs. Stockfish 14102721 x64 - Match 100R 30S+500ms3
RankEngineScoreStStS-B
1Stockfish 14102721 x64106.5/200· ·· ·· ·· ·35-22-143 9957.75 
2Stockfish 14102815 x6493.5/20022-35-143· ·· ·· ·· · 9957.75 


200 games played / Tournament finished

Tournament start: 2014.10.29, 14:06:17
Latest update: 2014.10.29, 16:15:00
Level: Blitz 0:30/0.5
Hardware: AMD A8-5600K APU with Radeon(tm) HD Graphics, 8GB RAM
Operating system: Linux 3.16.4 with WINE
Table created with: Arena 3.5
Download the test matches PGN games here.

No comments:

Post a Comment

Chessdom News