machetli.environments¶
Environments determine how Machetli executes its search. In a local environment, everything is executed sequentially on your local machine. However, the search can also be parallelized in a grid environment. In that case multiple successors of a state will be evaluated in parallel on the compute nodes of the grid with the main search running on the login node, generating successors and dispatching and waiting for jobs.
-
class
machetli.environments.BaselSlurmEnvironment(**kwargs)[source]¶ Environment for Basel’s AI group. This will only be useful if you are running Machetli on the grid in Basel. If you want to specialize
SlurmEnvironmentfor your grid, use this class as a template.See
SlurmEnvironmentfor inherited options.-
DEFAULT_MEMORY_PER_CPU= '3872M'¶ Unless otherwise specified, we reserve 3.8 GB of memory per core which is available on both partitions. To change this use memory_per_cpu in the constructor and either run on “infai_2” (up to 6354 MB) or reserve more cores per task.
-
DEFAULT_NICE= 5000¶ We schedule all jobs with a nice value of 5000 so autonice has the option to adjust it.
-
DEFAULT_PARTITION= 'infai_1'¶ Unless otherwise specified, we execute jobs on partition “infai_1”. To change this use partition=”infai_2” in the constructor.
-
DEFAULT_QOS= 'normal'¶ All jobs run in QOS group “normal”.
-
MAX_MEM_INFAI_BASEL= {'infai_1': '3872M', 'infai_2': '6354M'}¶ Maximally available memory per CPU on the infai partitions.
-
-
class
machetli.environments.Environment(batch_size=1, loglevel=20)[source]¶ Abstract base class of all environments. Concrete environments should inherit from this class and override its methods.
- Parameters
batch_size – Number of successors evaluated in one batch. Environments always complete the evaluation of one batch of successors before considering successors from the next batch. Each batch is written to disk in one directory, with one subdirectory for each successor.
loglevel –
Amount of logging output to generate. Use constants from the module
loggingto control the level of detail in the logs.DEBUG: detailed information usually only useful during development
INFO (default): provides feedback on the execution of the program
WARNING: silent unless something unexpected happens
ERROR: silent unless an error occured that causes the search to terminate
CRITICAL: silent unless the program crashes
-
STATE_FILENAME= 'state.pickle'¶ Filename for stored states. States are written to disk and loaded for evaluation. In grid environments, this assumes a shared file system for login and compute nodes.
-
evaluate_initial_state(evaluator_path, on_task_completed=None) → machetli.environments.EvaluationTask[source]¶ Evaluate the initial state that was stored with
remember_initial_state()earlier. If the state wasn’t stored earlier, a SubmissionError is raised.
-
remember_initial_state(initial_state)[source]¶ Store the initial state in a run directory. This is used by the search to evaluate the initial state if no successor of it was improving.
-
run(evaluator_path, batch, on_task_completed) → list[source]¶ Evaluate the given successors with the given evaluator. The evaluator is run on all successors (possibly in parallel, depending on the environment). Every time an evaluation of a successor is completed, the callback on_task_completed is called.
- Parameters
evaluator_path – path to a script that is used to evaluate a successor. The user documentation contains more information on how to write an evaluator.
successors – list of
Successorsto be evaluated.on_task_completed – callback function that will be called once for each successor after its evaluation is completed. The callback receives an
EvaluationTaskas its only parameter that describes the result of the evaluation. As evaluations could be performed in parallel, the order in which the evaluations complete is not necessarily deterministic. The callback may return a list of indices into successors to indicate that those successors need not be evaluated any more.
-
class
machetli.environments.EvaluationJob(name, evaluator_path, batch_dir, tasks)[source]¶ An EvaluationJob consists of several
EvaluationTasks. It represents the evaluation of a batch of successors and carries information about the current status of that evaluation.
-
class
machetli.environments.EvaluationTask(successor, successor_id, run_dir)[source]¶ An EvaluationTask represents the evaluation of one successor and carries information about the current status of that evaluation.
-
CANCELED= 'canceled'¶ Status of tasks that were canceled by Machetli. This happens if the search determines that the evaluation of their successor is not needed.
-
CRITICAL= 'critical'¶ Status of tasks that failed to evaluate their successor because the evaluation stopped for an unknown reason, such as crashing the evaluation script.
-
DONE_AND_BEHAVIOR_NOT_PRESENT= 'behavior not present'¶ Status of tasks that successfully evaluated their successor but showed that this successor does not exhibit the behavior that the evaluator is checking for.
-
DONE_AND_BEHAVIOR_PRESENT= 'behavior present'¶ Status of tasks that successfully evaluated their successor and showed that this successor exhibits the behavior that the evaluator is checking for.
-
OUT_OF_RESOURCES= 'ran out of resources'¶ Status of tasks that failed to evaluate their successor because the evaluation ran out of time or memory.
-
PENDING= 'pending'¶ Status of tasks from the time they are started until they stop for any reason.
-
-
class
machetli.environments.LocalEnvironment(batch_size=1, loglevel=20)[source]¶ This environment evaluates all successors sequentially on the local machine.
See
Environmentfor inherited options.
-
class
machetli.environments.SlurmEnvironment(email=None, extra_options=None, partition=None, qos=None, memory_per_cpu=None, cpus_per_task=1, nice=None, export=None, setup=None, batch_size=200, **kwargs)[source]¶ This environment evaluates multiple successors in parallel on the compute nodes of a cluster accessed through the Slurm grid engine.
- Parameters
email – Email address for notification once the search finished
extra_options – Additional options passed to the Slurm script
partition – Slurm partition to use for job submission
qos – Slurm QOS to use for job submission
memory_per_cpu – Memory limit per CPU to use for Slurm job
cpus_per_task – Number of CPUs to reserve for evaluating a single successor
nice – Nice value to use for Slurm jobs (higher nice value = lower priority).
export – Environment variables to export from the login node to the compute nodes.
setup – Additional bash script to set up the compute nodes (loading modules, etc.).
batch_size – (default 200) Number of successors evaluated in parallel.
See
Environmentfor inherited options.-
BUSY_STATES= {'PENDING', 'REQUEUED', 'RUNNING', 'SUSPENDED'}¶ Slurm status codes that indicate that a job has not yet terminated.
-
DEFAULT_EXPORT= ['PATH']¶ Environment variables to export from the login node to the compute nodes. May be overridden in derived classes or with a constructor argument.
-
DEFAULT_MEMORY_PER_CPU= None¶ Memory limit per CPU to use for Slurm job if no limit is passed to the constructor. Must be overridden in derived classes.
-
DEFAULT_NICE= 0¶ Nice value to use for Slurm jobs (higher nice value = lower priority). May be overridden in derived classes or with a constructor argument.
-
DEFAULT_PARTITION= None¶ Slurm partition to use for job submission if no other partition is passed to the constructor. Must be overridden in derived classes.
-
DEFAULT_QOS= None¶ Slurm QOS to use for job submission if no other QOS is passed to the constructor. Must be overridden in derived classes.
-
DEFAULT_SETUP= ''¶ Additional bash script to set up the compute nodes (loading modules, etc.). May be overridden in derived classes or with a constructor argument.
-
DONE_STATES= {'COMPLETED'}¶ Slurm status codes that indicate that a job successfully terminated.
-
FILESYSTEM_TIME_INTERVAL= 3¶ Files that one node writes are not necessarily immediately available on all other nodes. If a file we expect to be there is not found, we check again after waiting for some seconds.
-
FILESYSTEM_TIME_LIMIT= 60¶ When a file is not found after repeated checks, we eventually give up and treat this as an error. This constant controls after how many seconds to give up.
-
POLLING_TIME_INTERVAL= 15¶ While running jobs the login node periodically checks the status of all pending tasks. This constant controls how many seconds to wait before polling again.