pygo/gomill/docs/cem_tuner.rst

.. |ce| replace:: :ref:`[CE] <ce_paper>`

The cross-entropy tuner
^^^^^^^^^^^^^^^^^^^^^^^

:setting:`competition_type` string: ``"ce_tuner"``.

The cross-entropy tuner uses the :dfn:`cross-entropy method` described in
|ce|:

.. _ce_paper:

| [CE] G.M.J-B. Chaslot, M.H.M Winands, I. Szita, and H.J. van den Herik.
| Cross-entropy for Monte-Carlo Tree Search. ICGA Journal, 31(3):145-156.
| http://www.personeel.unimaas.nl/g-chaslot/papers/crossmcICGA.pdf

.. caution:: The cross-entropy tuner is experimental. It can take a very large
   number of games to converge.


.. contents:: Page contents
   :local:
   :backlinks: none


The tuning algorithm
""""""""""""""""""""

The algorithm is not described in detail in this documentation. See |ce|
section 3 for the description. The tuner always uses a Gaussian distribution.
The improvement suggested in section 5 is not implemented.


.. _ce parameter model:

The parameter model
"""""""""""""""""""

The parameter values taken from the Gaussian distribution are floating-point
numbers known as :dfn:`optimiser parameters`.

These parameters can be transformed before being used to configure the
candidate (see 3.3 *Normalising Parameters* in |ce|). The transformed values
are known as :dfn:`engine parameters`. The transformation is implemented using
a Python :ce-setting:`transform` function defined in the control file.

Reports show engine parameters (see the :ce-setting:`format` parameter
setting), together with the mean and variance of the corresponding optimiser
parameter distribution in the form :samp:`{mean}~{variance}`.


.. _the cem tuning algorithm:

.. _sample_cem_control_file:

Sample control file
"""""""""""""""""""

Here is a sample control file, illustrating most of the available settings for
a cross-entropy tuning event::

  competition_type = "ce_tuner"

  description = """\
  This is a sample control file.

  It illustrates the available settings for the cross entropy tuner.
  """

  players = {
      'gnugo-l10' : Player("gnugo --mode=gtp --chinese-rules "
                           "--capture-all-dead --level=10"),
      }

  def fuego(max_games, additional_commands=[]):
      commands = [
          "go_param timelimit 999999",
          "uct_max_memory 350000000",
          "uct_param_search number_threads 1",
          "uct_param_player reuse_subtree 0",
          "uct_param_player ponder 0",
          "uct_param_player max_games %d" % max_games,
          ]
      return Player(
          "fuego --quiet",
          startup_gtp_commands=commands+additional_commands)

  FUEGO_MAX_GAMES = 1000

  def exp_10(f):
      return 10.0**f

  parameters = [
      Parameter('rave_weight_initial',
                # Mean and variance are in terms of log_10 (rave_weight_initial)
                initial_mean = -1.0,
                initial_variance = 1.5,
                transform = exp_10,
                format = "I: %4.2f"),

      Parameter('rave_weight_final',
                # Mean and variance are in terms of log_10 (rave_weight_final)
                initial_mean = 3.5,
                initial_variance = 1.5,
                transform = exp_10,
                format = "F: %4.2f"),
      ]

  def make_candidate(rwi, rwf):
      return fuego(
          FUEGO_MAX_GAMES,
          ["uct_param_search rave_weight_initial %f" % rwi,
           "uct_param_search rave_weight_final %f" % rwf])

  board_size = 9
  komi = 7.5
  opponent = 'gnugo-l10'
  candidate_colour = 'w'

  number_of_generations = 5
  samples_per_generation = 100
  batch_size = 10
  elite_proportion = 0.1
  step_size = 0.8


.. _cem_control_file_settings:

Control file settings
"""""""""""""""""""""

The following settings can be set at the top level of the control file:

All :ref:`common settings <common settings>` (the :setting:`players`
dictionary is required, though it is used only to define the opponent).

The following game settings (only :setting:`!board_size` and :setting:`!komi`
are required):

- :setting:`board_size`
- :setting:`komi`
- :setting:`handicap`
- :setting:`handicap_style`
- :setting:`move_limit`
- :setting:`scorer`


The following additional settings (they are all required):

.. ce-setting:: candidate_colour

  String: ``"b"`` or ``"w"``

  The colour for the candidates to take in every game.


.. ce-setting:: opponent

  Identifier

  The :ref:`player code <player codes>` of the player to use as the
  candidates' opponent.


.. ce-setting:: parameters

  List of :ce-setting-cls:`Parameter` definitions (see :ref:`ce parameter
  configuration`).

  Describes the parameters that the tuner will work with. See :ref:`ce
  parameter model` for more details.

  The order of the :ce-setting-cls:`Parameter` definitions is used for the
  arguments to :ce-setting:`make_candidate`, and whenever parameters are
  described in reports or game records.


.. ce-setting:: make_candidate

  Python function

  Function to create a :setting-cls:`Player` from its engine parameters.

  This function is passed one argument for each candidate parameter, and must
  return a :setting-cls:`Player` definition. Each argument is the output of
  the corresponding Parameter's :ce-setting:`transform`.

  The function will typically use its arguments to construct command line
  options or |gtp| commands for the player. For example::

    def make_candidate(param1, param2):
        return Player(["goplayer", "--param1", str(param1),
                       "--param2", str(param2)])

    def make_candidate(param1, param2):
        return Player("goplayer", startup_gtp_commands=[
                       ["param1", str(param1)],
                       ["param2", str(param2)],
                      ])


.. ce-setting:: number_of_generations

  Positive integer

  The number of times to repeat the tuning algorithm (*number of iterations*
  or *T* in the terminology of |ce|).


.. ce-setting:: samples_per_generation

  Positive integer

  The number of candidates to make in each generation (*population_size* or
  *N* in the terminology of |ce|).


.. ce-setting:: batch_size

  Positive integer

  The number of games played by each candidate.


.. ce-setting:: elite_proportion

  Float between 0.0 and 1.0

  The proportion of candidates to select from each generation as 'elite' (the
  *selection ratio* or *ρ* in the terminology of |ce|). A value between 0.01
  and 0.1 is recommended.


.. ce-setting:: step_size

  Float between 0.0 and 1.0

  The rate at which to update the distribution parameters between generations
  (*α* in the terminology of |ce|).

  .. caution:: I can't find anywhere in the paper the value they used for
     this, so I don't know what to recommend.


.. _ce parameter configuration:

Parameter configuration
"""""""""""""""""""""""

.. ce-setting-cls:: Parameter

A :ce-setting-cls:`!Parameter` definition has the same syntax as a Python
function call: :samp:`Parameter({arguments})`. Apart from :ce-setting:`!code`,
the arguments should be specified using keyword form (see
:ref:`sample_cem_control_file`).

The :ce-setting:`code`, :ce-setting:`initial_mean`, and
:ce-setting:`initial_variance` arguments are required.

The arguments are:


.. ce-setting:: code

  Identifier

  A short string used to identify the parameter. This is used in error
  messages, and in the default for :ce-setting:`format`.


.. ce-setting:: initial_mean

  Float

  The mean value for the parameter in the first generation's distribution.


.. ce-setting:: initial_variance

  Float >= 0

  The variance for the parameter in the first generation's distribution.


.. ce-setting:: transform

  Python function (default identity)

  Function mapping an optimiser parameter to an engine parameter; see :ref:`ce
  parameter model`.

  Examples::

    def exp_10(f):
        return 10.0**f

    Parameter('p1', initial_mean = …, initial_variance = …,
              transform = exp_10)

  If the :ce-setting:`!transform` is not specified, the optimiser parameter is
  used directly as the engine parameter.


.. ce-setting:: format

  String (default :samp:`"{parameter_code}: %s"`)

  Format string used to display the parameter value. This should include a
  short abbreviation to indicate which parameter is being displayed, and also
  contain ``%s``, which will be replaced with the engine parameter value.

  You can use any Python conversion specifier instead of ``%s``. For example,
  ``%.2f`` will format a floating point number to two decimal places. ``%s``
  should be safe to use for all types of value. See `string formatting
  operations`__ for details.

  .. __: http://docs.python.org/release/2.7/library/stdtypes.html#string-formatting-operations

  Format strings should be kept short, as screen space is limited.

  Examples::

    Parameter('parameter_1',
              initial_mean = 0.0, initial_variance = 1.0,
              format = "p1: %.2f")

    Parameter('parameter_2',
              initial_mean = 5000, initial_variance = 250000,
              format = "p2: %d")


Reporting
"""""""""

Currently, there aren't any sophisticated reports.

The standard report shows the parameters of the current Gaussian distribution,
and the number of wins for each candidate in the current generation.

After each generation, the details of the candidates are written to the
:ref:`history file <logging>`. The candidates selected as elite are marked
with a ``*``.


Changing the control file between runs
""""""""""""""""""""""""""""""""""""""

Some settings can safely be changed between runs of the same cross-entropy
tuning event:

:ce-setting:`batch_size`
  safe to increase

:ce-setting:`samples_per_generation`
  not safe to change

:ce-setting:`number_of_generations`
  safe to change

:ce-setting:`elite_proportion`
  safe to change

:ce-setting:`step_size`
  safe to change

:ce-setting:`make_candidate`
  safe to change, but don't alter play-affecting options

:ce-setting:`transform`
  not safe to change

:ce-setting:`format`
  safe to change