machetli.environments

Environments determine how Machetli executes its search. In a local environment, everything is executed sequentially on your local machine. However, the search can also be parallelized in a grid environment. In that case multiple successors of a state will be evaluated in parallel on the compute nodes of the grid with the main search running on the login node, generating successors and dispatching and waiting for jobs.

class machetli.environments.BaselSlurmEnvironment(**kwargs)[source]

Environment for Basel’s AI group. This will only be useful if you are running Machetli on the grid in Basel. If you want to specialize SlurmEnvironment for your grid, use this class as a template.

See SlurmEnvironment for inherited options.

DEFAULT_MEMORY_PER_CPU = '3872M'

Unless otherwise specified, we reserve 3.8 GB of memory per core which is available on both partitions. To change this use memory_per_cpu in the constructor and either run on “infai_2” (up to 6354 MB) or reserve more cores per task.

DEFAULT_NICE = 5000

We schedule all jobs with a nice value of 5000 so autonice has the option to adjust it.

DEFAULT_PARTITION = 'infai_1'

Unless otherwise specified, we execute jobs on partition “infai_1”. To change this use partition=”infai_2” in the constructor.

DEFAULT_QOS = 'normal'

All jobs run in QOS group “normal”.

MAX_MEM_INFAI_BASEL = {'infai_1': '3872M', 'infai_2': '6354M'}

Maximally available memory per CPU on the infai partitions.

class machetli.environments.Environment(batch_size=1, loglevel=20)[source]

Abstract base class of all environments. Concrete environments should inherit from this class and override its methods.

Parameters
  • batch_size – Number of successors evaluated in one batch. Environments always complete the evaluation of one batch of successors before considering successors from the next batch. Each batch is written to disk in one directory, with one subdirectory for each successor.

  • loglevel

    Amount of logging output to generate. Use constants from the module logging to control the level of detail in the logs.

    • DEBUG: detailed information usually only useful during development

    • INFO (default): provides feedback on the execution of the program

    • WARNING: silent unless something unexpected happens

    • ERROR: silent unless an error occured that causes the search to terminate

    • CRITICAL: silent unless the program crashes

STATE_FILENAME = 'state.pickle'

Filename for stored states. States are written to disk and loaded for evaluation. In grid environments, this assumes a shared file system for login and compute nodes.

evaluate_initial_state(evaluator_path, on_task_completed=None)machetli.environments.EvaluationTask[source]

Evaluate the initial state that was stored with remember_initial_state() earlier. If the state wasn’t stored earlier, a SubmissionError is raised.

remember_initial_state(initial_state)[source]

Store the initial state in a run directory. This is used by the search to evaluate the initial state if no successor of it was improving.

run(evaluator_path, batch, on_task_completed)list[source]

Evaluate the given successors with the given evaluator. The evaluator is run on all successors (possibly in parallel, depending on the environment). Every time an evaluation of a successor is completed, the callback on_task_completed is called.

Parameters
  • evaluator_path – path to a script that is used to evaluate a successor. The user documentation contains more information on how to write an evaluator.

  • successors – list of Successors to be evaluated.

  • on_task_completed – callback function that will be called once for each successor after its evaluation is completed. The callback receives an EvaluationTask as its only parameter that describes the result of the evaluation. As evaluations could be performed in parallel, the order in which the evaluations complete is not necessarily deterministic. The callback may return a list of indices into successors to indicate that those successors need not be evaluated any more.

start_new_iteration()[source]

Notifies the environment that a new iteration is starting. This is relevant for grouping the tasks of one iteration together on the disk.

class machetli.environments.EvaluationJob(name, evaluator_path, batch_dir, tasks)[source]

An EvaluationJob consists of several EvaluationTasks. It represents the evaluation of a batch of successors and carries information about the current status of that evaluation.

class machetli.environments.EvaluationTask(successor, successor_id, run_dir)[source]

An EvaluationTask represents the evaluation of one successor and carries information about the current status of that evaluation.

CANCELED = 'canceled'

Status of tasks that were canceled by Machetli. This happens if the search determines that the evaluation of their successor is not needed.

CRITICAL = 'critical'

Status of tasks that failed to evaluate their successor because the evaluation stopped for an unknown reason, such as crashing the evaluation script.

DONE_AND_BEHAVIOR_NOT_PRESENT = 'behavior not present'

Status of tasks that successfully evaluated their successor but showed that this successor does not exhibit the behavior that the evaluator is checking for.

DONE_AND_BEHAVIOR_PRESENT = 'behavior present'

Status of tasks that successfully evaluated their successor and showed that this successor exhibits the behavior that the evaluator is checking for.

OUT_OF_RESOURCES = 'ran out of resources'

Status of tasks that failed to evaluate their successor because the evaluation ran out of time or memory.

PENDING = 'pending'

Status of tasks from the time they are started until they stop for any reason.

class machetli.environments.LocalEnvironment(batch_size=1, loglevel=20)[source]

This environment evaluates all successors sequentially on the local machine.

See Environment for inherited options.

class machetli.environments.SlurmEnvironment(email=None, extra_options=None, partition=None, qos=None, memory_per_cpu=None, cpus_per_task=1, nice=None, export=None, setup=None, batch_size=200, **kwargs)[source]

This environment evaluates multiple successors in parallel on the compute nodes of a cluster accessed through the Slurm grid engine.

Parameters
  • email – Email address for notification once the search finished

  • extra_options – Additional options passed to the Slurm script

  • partition – Slurm partition to use for job submission

  • qos – Slurm QOS to use for job submission

  • memory_per_cpu – Memory limit per CPU to use for Slurm job

  • cpus_per_task – Number of CPUs to reserve for evaluating a single successor

  • nice – Nice value to use for Slurm jobs (higher nice value = lower priority).

  • export – Environment variables to export from the login node to the compute nodes.

  • setup – Additional bash script to set up the compute nodes (loading modules, etc.).

  • batch_size – (default 200) Number of successors evaluated in parallel.

See Environment for inherited options.

BUSY_STATES = {'PENDING', 'REQUEUED', 'RUNNING', 'SUSPENDED'}

Slurm status codes that indicate that a job has not yet terminated.

DEFAULT_EXPORT = ['PATH']

Environment variables to export from the login node to the compute nodes. May be overridden in derived classes or with a constructor argument.

DEFAULT_MEMORY_PER_CPU = None

Memory limit per CPU to use for Slurm job if no limit is passed to the constructor. Must be overridden in derived classes.

DEFAULT_NICE = 0

Nice value to use for Slurm jobs (higher nice value = lower priority). May be overridden in derived classes or with a constructor argument.

DEFAULT_PARTITION = None

Slurm partition to use for job submission if no other partition is passed to the constructor. Must be overridden in derived classes.

DEFAULT_QOS = None

Slurm QOS to use for job submission if no other QOS is passed to the constructor. Must be overridden in derived classes.

DEFAULT_SETUP = ''

Additional bash script to set up the compute nodes (loading modules, etc.). May be overridden in derived classes or with a constructor argument.

DONE_STATES = {'COMPLETED'}

Slurm status codes that indicate that a job successfully terminated.

FILESYSTEM_TIME_INTERVAL = 3

Files that one node writes are not necessarily immediately available on all other nodes. If a file we expect to be there is not found, we check again after waiting for some seconds.

FILESYSTEM_TIME_LIMIT = 60

When a file is not found after repeated checks, we eventually give up and treat this as an error. This constant controls after how many seconds to give up.

POLLING_TIME_INTERVAL = 15

While running jobs the login node periodically checks the status of all pending tasks. This constant controls how many seconds to wait before polling again.