377 lines
9.8 KiB
ReStructuredText
377 lines
9.8 KiB
ReStructuredText
|
.. |ce| replace:: :ref:`[CE] <ce_paper>`
|
|||
|
|
|||
|
The cross-entropy tuner
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
:setting:`competition_type` string: ``"ce_tuner"``.
|
|||
|
|
|||
|
The cross-entropy tuner uses the :dfn:`cross-entropy method` described in
|
|||
|
|ce|:
|
|||
|
|
|||
|
.. _ce_paper:
|
|||
|
|
|||
|
| [CE] G.M.J-B. Chaslot, M.H.M Winands, I. Szita, and H.J. van den Herik.
|
|||
|
| Cross-entropy for Monte-Carlo Tree Search. ICGA Journal, 31(3):145-156.
|
|||
|
| http://www.personeel.unimaas.nl/g-chaslot/papers/crossmcICGA.pdf
|
|||
|
|
|||
|
.. caution:: The cross-entropy tuner is experimental. It can take a very large
|
|||
|
number of games to converge.
|
|||
|
|
|||
|
|
|||
|
.. contents:: Page contents
|
|||
|
:local:
|
|||
|
:backlinks: none
|
|||
|
|
|||
|
|
|||
|
The tuning algorithm
|
|||
|
""""""""""""""""""""
|
|||
|
|
|||
|
The algorithm is not described in detail in this documentation. See |ce|
|
|||
|
section 3 for the description. The tuner always uses a Gaussian distribution.
|
|||
|
The improvement suggested in section 5 is not implemented.
|
|||
|
|
|||
|
|
|||
|
.. _ce parameter model:
|
|||
|
|
|||
|
The parameter model
|
|||
|
"""""""""""""""""""
|
|||
|
|
|||
|
The parameter values taken from the Gaussian distribution are floating-point
|
|||
|
numbers known as :dfn:`optimiser parameters`.
|
|||
|
|
|||
|
These parameters can be transformed before being used to configure the
|
|||
|
candidate (see 3.3 *Normalising Parameters* in |ce|). The transformed values
|
|||
|
are known as :dfn:`engine parameters`. The transformation is implemented using
|
|||
|
a Python :ce-setting:`transform` function defined in the control file.
|
|||
|
|
|||
|
Reports show engine parameters (see the :ce-setting:`format` parameter
|
|||
|
setting), together with the mean and variance of the corresponding optimiser
|
|||
|
parameter distribution in the form :samp:`{mean}~{variance}`.
|
|||
|
|
|||
|
|
|||
|
.. _the cem tuning algorithm:
|
|||
|
|
|||
|
.. _sample_cem_control_file:
|
|||
|
|
|||
|
Sample control file
|
|||
|
"""""""""""""""""""
|
|||
|
|
|||
|
Here is a sample control file, illustrating most of the available settings for
|
|||
|
a cross-entropy tuning event::
|
|||
|
|
|||
|
competition_type = "ce_tuner"
|
|||
|
|
|||
|
description = """\
|
|||
|
This is a sample control file.
|
|||
|
|
|||
|
It illustrates the available settings for the cross entropy tuner.
|
|||
|
"""
|
|||
|
|
|||
|
players = {
|
|||
|
'gnugo-l10' : Player("gnugo --mode=gtp --chinese-rules "
|
|||
|
"--capture-all-dead --level=10"),
|
|||
|
}
|
|||
|
|
|||
|
def fuego(max_games, additional_commands=[]):
|
|||
|
commands = [
|
|||
|
"go_param timelimit 999999",
|
|||
|
"uct_max_memory 350000000",
|
|||
|
"uct_param_search number_threads 1",
|
|||
|
"uct_param_player reuse_subtree 0",
|
|||
|
"uct_param_player ponder 0",
|
|||
|
"uct_param_player max_games %d" % max_games,
|
|||
|
]
|
|||
|
return Player(
|
|||
|
"fuego --quiet",
|
|||
|
startup_gtp_commands=commands+additional_commands)
|
|||
|
|
|||
|
FUEGO_MAX_GAMES = 1000
|
|||
|
|
|||
|
def exp_10(f):
|
|||
|
return 10.0**f
|
|||
|
|
|||
|
parameters = [
|
|||
|
Parameter('rave_weight_initial',
|
|||
|
# Mean and variance are in terms of log_10 (rave_weight_initial)
|
|||
|
initial_mean = -1.0,
|
|||
|
initial_variance = 1.5,
|
|||
|
transform = exp_10,
|
|||
|
format = "I: %4.2f"),
|
|||
|
|
|||
|
Parameter('rave_weight_final',
|
|||
|
# Mean and variance are in terms of log_10 (rave_weight_final)
|
|||
|
initial_mean = 3.5,
|
|||
|
initial_variance = 1.5,
|
|||
|
transform = exp_10,
|
|||
|
format = "F: %4.2f"),
|
|||
|
]
|
|||
|
|
|||
|
def make_candidate(rwi, rwf):
|
|||
|
return fuego(
|
|||
|
FUEGO_MAX_GAMES,
|
|||
|
["uct_param_search rave_weight_initial %f" % rwi,
|
|||
|
"uct_param_search rave_weight_final %f" % rwf])
|
|||
|
|
|||
|
board_size = 9
|
|||
|
komi = 7.5
|
|||
|
opponent = 'gnugo-l10'
|
|||
|
candidate_colour = 'w'
|
|||
|
|
|||
|
number_of_generations = 5
|
|||
|
samples_per_generation = 100
|
|||
|
batch_size = 10
|
|||
|
elite_proportion = 0.1
|
|||
|
step_size = 0.8
|
|||
|
|
|||
|
|
|||
|
|
|||
|
.. _cem_control_file_settings:
|
|||
|
|
|||
|
Control file settings
|
|||
|
"""""""""""""""""""""
|
|||
|
|
|||
|
The following settings can be set at the top level of the control file:
|
|||
|
|
|||
|
All :ref:`common settings <common settings>` (the :setting:`players`
|
|||
|
dictionary is required, though it is used only to define the opponent).
|
|||
|
|
|||
|
The following game settings (only :setting:`!board_size` and :setting:`!komi`
|
|||
|
are required):
|
|||
|
|
|||
|
- :setting:`board_size`
|
|||
|
- :setting:`komi`
|
|||
|
- :setting:`handicap`
|
|||
|
- :setting:`handicap_style`
|
|||
|
- :setting:`move_limit`
|
|||
|
- :setting:`scorer`
|
|||
|
|
|||
|
|
|||
|
The following additional settings (they are all required):
|
|||
|
|
|||
|
.. ce-setting:: candidate_colour
|
|||
|
|
|||
|
String: ``"b"`` or ``"w"``
|
|||
|
|
|||
|
The colour for the candidates to take in every game.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: opponent
|
|||
|
|
|||
|
Identifier
|
|||
|
|
|||
|
The :ref:`player code <player codes>` of the player to use as the
|
|||
|
candidates' opponent.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: parameters
|
|||
|
|
|||
|
List of :ce-setting-cls:`Parameter` definitions (see :ref:`ce parameter
|
|||
|
configuration`).
|
|||
|
|
|||
|
Describes the parameters that the tuner will work with. See :ref:`ce
|
|||
|
parameter model` for more details.
|
|||
|
|
|||
|
The order of the :ce-setting-cls:`Parameter` definitions is used for the
|
|||
|
arguments to :ce-setting:`make_candidate`, and whenever parameters are
|
|||
|
described in reports or game records.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: make_candidate
|
|||
|
|
|||
|
Python function
|
|||
|
|
|||
|
Function to create a :setting-cls:`Player` from its engine parameters.
|
|||
|
|
|||
|
This function is passed one argument for each candidate parameter, and must
|
|||
|
return a :setting-cls:`Player` definition. Each argument is the output of
|
|||
|
the corresponding Parameter's :ce-setting:`transform`.
|
|||
|
|
|||
|
The function will typically use its arguments to construct command line
|
|||
|
options or |gtp| commands for the player. For example::
|
|||
|
|
|||
|
def make_candidate(param1, param2):
|
|||
|
return Player(["goplayer", "--param1", str(param1),
|
|||
|
"--param2", str(param2)])
|
|||
|
|
|||
|
def make_candidate(param1, param2):
|
|||
|
return Player("goplayer", startup_gtp_commands=[
|
|||
|
["param1", str(param1)],
|
|||
|
["param2", str(param2)],
|
|||
|
])
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: number_of_generations
|
|||
|
|
|||
|
Positive integer
|
|||
|
|
|||
|
The number of times to repeat the tuning algorithm (*number of iterations*
|
|||
|
or *T* in the terminology of |ce|).
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: samples_per_generation
|
|||
|
|
|||
|
Positive integer
|
|||
|
|
|||
|
The number of candidates to make in each generation (*population_size* or
|
|||
|
*N* in the terminology of |ce|).
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: batch_size
|
|||
|
|
|||
|
Positive integer
|
|||
|
|
|||
|
The number of games played by each candidate.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: elite_proportion
|
|||
|
|
|||
|
Float between 0.0 and 1.0
|
|||
|
|
|||
|
The proportion of candidates to select from each generation as 'elite' (the
|
|||
|
*selection ratio* or *ρ* in the terminology of |ce|). A value between 0.01
|
|||
|
and 0.1 is recommended.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: step_size
|
|||
|
|
|||
|
Float between 0.0 and 1.0
|
|||
|
|
|||
|
The rate at which to update the distribution parameters between generations
|
|||
|
(*α* in the terminology of |ce|).
|
|||
|
|
|||
|
.. caution:: I can't find anywhere in the paper the value they used for
|
|||
|
this, so I don't know what to recommend.
|
|||
|
|
|||
|
|
|||
|
.. _ce parameter configuration:
|
|||
|
|
|||
|
Parameter configuration
|
|||
|
"""""""""""""""""""""""
|
|||
|
|
|||
|
.. ce-setting-cls:: Parameter
|
|||
|
|
|||
|
A :ce-setting-cls:`!Parameter` definition has the same syntax as a Python
|
|||
|
function call: :samp:`Parameter({arguments})`. Apart from :ce-setting:`!code`,
|
|||
|
the arguments should be specified using keyword form (see
|
|||
|
:ref:`sample_cem_control_file`).
|
|||
|
|
|||
|
The :ce-setting:`code`, :ce-setting:`initial_mean`, and
|
|||
|
:ce-setting:`initial_variance` arguments are required.
|
|||
|
|
|||
|
The arguments are:
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: code
|
|||
|
|
|||
|
Identifier
|
|||
|
|
|||
|
A short string used to identify the parameter. This is used in error
|
|||
|
messages, and in the default for :ce-setting:`format`.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: initial_mean
|
|||
|
|
|||
|
Float
|
|||
|
|
|||
|
The mean value for the parameter in the first generation's distribution.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: initial_variance
|
|||
|
|
|||
|
Float >= 0
|
|||
|
|
|||
|
The variance for the parameter in the first generation's distribution.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: transform
|
|||
|
|
|||
|
Python function (default identity)
|
|||
|
|
|||
|
Function mapping an optimiser parameter to an engine parameter; see :ref:`ce
|
|||
|
parameter model`.
|
|||
|
|
|||
|
Examples::
|
|||
|
|
|||
|
def exp_10(f):
|
|||
|
return 10.0**f
|
|||
|
|
|||
|
Parameter('p1', initial_mean = …, initial_variance = …,
|
|||
|
transform = exp_10)
|
|||
|
|
|||
|
If the :ce-setting:`!transform` is not specified, the optimiser parameter is
|
|||
|
used directly as the engine parameter.
|
|||
|
|
|||
|
|
|||
|
.. ce-setting:: format
|
|||
|
|
|||
|
String (default :samp:`"{parameter_code}: %s"`)
|
|||
|
|
|||
|
Format string used to display the parameter value. This should include a
|
|||
|
short abbreviation to indicate which parameter is being displayed, and also
|
|||
|
contain ``%s``, which will be replaced with the engine parameter value.
|
|||
|
|
|||
|
You can use any Python conversion specifier instead of ``%s``. For example,
|
|||
|
``%.2f`` will format a floating point number to two decimal places. ``%s``
|
|||
|
should be safe to use for all types of value. See `string formatting
|
|||
|
operations`__ for details.
|
|||
|
|
|||
|
.. __: http://docs.python.org/release/2.7/library/stdtypes.html#string-formatting-operations
|
|||
|
|
|||
|
Format strings should be kept short, as screen space is limited.
|
|||
|
|
|||
|
Examples::
|
|||
|
|
|||
|
Parameter('parameter_1',
|
|||
|
initial_mean = 0.0, initial_variance = 1.0,
|
|||
|
format = "p1: %.2f")
|
|||
|
|
|||
|
Parameter('parameter_2',
|
|||
|
initial_mean = 5000, initial_variance = 250000,
|
|||
|
format = "p2: %d")
|
|||
|
|
|||
|
|
|||
|
Reporting
|
|||
|
"""""""""
|
|||
|
|
|||
|
Currently, there aren't any sophisticated reports.
|
|||
|
|
|||
|
The standard report shows the parameters of the current Gaussian distribution,
|
|||
|
and the number of wins for each candidate in the current generation.
|
|||
|
|
|||
|
After each generation, the details of the candidates are written to the
|
|||
|
:ref:`history file <logging>`. The candidates selected as elite are marked
|
|||
|
with a ``*``.
|
|||
|
|
|||
|
|
|||
|
Changing the control file between runs
|
|||
|
""""""""""""""""""""""""""""""""""""""
|
|||
|
|
|||
|
Some settings can safely be changed between runs of the same cross-entropy
|
|||
|
tuning event:
|
|||
|
|
|||
|
:ce-setting:`batch_size`
|
|||
|
safe to increase
|
|||
|
|
|||
|
:ce-setting:`samples_per_generation`
|
|||
|
not safe to change
|
|||
|
|
|||
|
:ce-setting:`number_of_generations`
|
|||
|
safe to change
|
|||
|
|
|||
|
:ce-setting:`elite_proportion`
|
|||
|
safe to change
|
|||
|
|
|||
|
:ce-setting:`step_size`
|
|||
|
safe to change
|
|||
|
|
|||
|
:ce-setting:`make_candidate`
|
|||
|
safe to change, but don't alter play-affecting options
|
|||
|
|
|||
|
:ce-setting:`transform`
|
|||
|
not safe to change
|
|||
|
|
|||
|
:ce-setting:`format`
|
|||
|
safe to change
|
|||
|
|