Hyperctl

Hyperctl is a general tool for multi-job management, which includes but not limit to training, testing and comparison. It is packaged under Hypernets and intended to provide convenience to every developing stage.

Concepts

Job

A command line job that accepts parameters. Hyperctl provides the python API to read the parameters of the job, so this command line can execute a python script and use the API to obtain the parameters to complete the job.

Batch

A batch of jobs. All the status files and output files of jobs in the same batch are in the working directory of the batch.

Scheduler

Job scheduler, which schedules jobs in batch to run on appropriate machine resources and manages computing resources.

Backend

The backend for running jobs can run in a stand-alone mode or multiple remote nodes through SSH protocol.

Quick start

Run batch using command line tool

After installing hypernets, you could see the following description by typing hyperctl, which includes four arguments run, generate, batch, job:

$ hyperctl
usage: hyperctl [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug] {run,generate,batch,job} ...

hyperctl command is used to manage jobs

positional arguments:
  {run,generate,batch,job}
    run                 run jobs
    generate            generate specific jobs json file
    batch               batch operations
    job                 job operations

optional arguments:
  -h, --help            show this help message and exit

Console outputs:
  --log-level LOG_LEVEL
                        logging level, default is INFO
  -error                alias of "--log-level=ERROR"
  -warn                 alias of "--log-level=WARN"
  -info                 alias of "--log-level=INFO"
  -debug                alias of "--log-level=DEBUG"

```

Take using Hyperctl to tuning the tol parameter of sklearn.linear_model.LogisticRegression as an example, create a job python script ~/sklearn_iris_example.py with following content:

import os
import pickle as pkl
from hypernets import hyperctl
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

job_params = hyperctl.get_job_params()  # read job params as a dict from hyperctl

tol=job_params['tol']

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=8086)

lr = LogisticRegression(tol=tol)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

print(f"tol: {tol}, accuracy_score: {accuracy_score(y_pred, y_test)}")

# persist assets
report_dir = "~/report/"
os.makedirs(report_dir)

with open(os.path.join(reportdir, f"model_tol_{tol}.pkl"), 'wb') as f:
    pkl.dump(lr, f)

Hyperctl uses the JSON format file to define the jobs, for example create a file named batch.json and configures 2 jobs then set the parameter tol to 1 and 100 respectively:

{
    "name": "sklearn_iris_example",
    "jobs": [{
            "name": "tol_1",
            "params": {
                "tol": 1
            },
            "command": "python ~/sklearn_iris_example.py"
        },
        {
            "name": "tol_100",
            "params": {
                "tol": 100
            },
            "command": "python ~/sklearn_iris_example.py"
        }
    ]
}

Note

Make sure that the python used by the command in the job has scikit-learn installed

Run the job with command:

$ hyperctl run --config ./batch.json

After the task finished, view the output log file:

~/hyperctl-batches-working-dir/sklearn_iris_example/tol_1/stdout
----------------------------------------------------------------
tol: 1, accuracy_score: 0.9333333333333333
~/hyperctl-batches-working-dir/sklearn_iris_example/tol_100/stdout
------------------------------------------------------------------
tol: 100, accuracy_score: 0.36666666666666664

Run batch using API

Using API is more flexible than the command line tool to manage batch.

from hypernets.hyperctl.appliation import BatchApplication
from hypernets.hyperctl.batch import Batch

batch = Batch(name="remote-batch-example", data_dir="~/hyperctl/remote-batch-example", job_command="python ~/sklearn_iris_example.py")

batch.add_job(name='job1', params={"tol": 1})
batch.add_job(name='job2', params={"tol": 2})
batch.add_job(name='job3', params={"tol": 3})


backend_conf = {
        "type": "remote",
        "machines": [
            {
                "connection": { 'hostname': "172.20.30.105", 'username': "hyperctl", 'password': "hyperctl"}  # modify to your host configuration
            }, {
                "connection": { 'hostname': "172.20.30.106", 'username': "hyperctl", 'password': "hyperctl"}  # modify to your host configuration
            }, {
                "connection": { 'hostname': "172.20.30.107", 'username': "hyperctl", 'password': "hyperctl"}  # modify to your host configuration
            }
        ]
    }


app = BatchApplication(batch, server_host="172.20.30.105",  # modify to your host configuration
                       server_port=8061,
                       scheduler_exit_on_finish=True,
                       scheduler_interval=1000,
                       backend_conf=backend_conf,
                       independent_tmp=True,
                       scheduler_callbacks=[ConsoleCallback()])

app.start()

Debug job in development stage

Job scripts need to be scheduled through BatchApplication to run, so that they run in two separate processes, This brings inconvenience to the development and debugging of the job script. At this time, we can inject test parameters into the job to run the job script directly:

import os
import pickle as pkl
from hypernets import hyperctl
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

test_params = {  # define your test params
    "tol": 1
}

api.inject(params=mock_params)  # inject test params, and NOTE that remove it when you no longer debug or it can not read params from BatchApplication

job_params = hyperctl.get_job_params()

tol=job_params['tol']

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=8086)

lr = LogisticRegression(tol=tol)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

print(f"tol: {tol}, accuracy_score: {accuracy_score(y_pred, y_test)}")

# persist assets
report_dir = "~/report/"
os.makedirs(report_dir)

with open(os.path.join(reportdir, f"model_tol_{tol}.pkl"), 'wb') as f:
    pkl.dump(lr, f)

And now we can run or debug the job script directly, but note that we need remove api.inject(params=mock_params) if needs to receive params from BatchApplication.

Generate jobs from template

Hyperctl generates jobs config in batch by arranging and combining parameters based on the configuration template, the generated file can be used to run the batch. Here is an example of how to use template file to generate batch config file . First create a template file job-template.yml with following content:

params:
    learning_rate: [0.1,0.2]
    max_depth: [3, 5]
command: python3 cli.py

Then execute command to generate batch config file:

$ hyperctl generate --template ./job-template.yml --output ./batch.json

Here is the generated batch.json file:

{
    "name": "eVqNV5Ut1",
    "job": [{
        "name": "eaqNV5Ut1",
        "params": {
            "learning_rate": 0.1,
            "max_depth": 3
        },
        "command": "python3 cli.py"
    }, {
        "name": "ebqNV5Ut1",
        "params": {
            "learning_rate": 0.1,
            "max_depth": 5
        },
        "command": "python3 cli.py"
    }, {
        "name": "ecqNV5Ut1",
        "params": {
            "learning_rate": 0.2,
            "max_depth": 3
        },
        "command": "python3 cli.py"
    }, {
        "name": "edqNV5Ut1",
        "params": {
            "learning_rate": 0.2,
            "max_depth": 5
        },
        "command": "python3 cli.py"
    }]
}

Batch configuration file references

Examples

LocalBackend

{
    "name": "local_backend_example",
    "jobs": [
        {
            "name": "job1",
            "params": {
                "param1": 1
            },
            "command": "sleep 3"
        }
    ],
    "backend": {
        "type": "local"
    }
}

RemoteSSHBackend

{
    "name": "local_backend_example",
    "jobs": [
        {
            "name": "job1",
            "params": {
                "param1": 1
            },
            "command": "sleep 3"
        }
    ],
    "backend": {
        "type": "remote",
        "machines": [
            {
                "connection": {
                    "hostname": "host1",
                    "username": "hyperctl",
                    "password": "hyperctl"
                }
            }
        ]
    },
    "server": {
      "host": "192.168.10.206"
    }
}

Configuration references

BatchApplicationConfig

Field Name Type Description
name str, required batch name, should be unique in a batch.
jobs list[JobConfig], required Jobs to run.
backend BackendConfig, optional platform where the jobs running on, default is LocalBackendConfig .
server ServerConfig , optional server setting.
scheduler SchedulerConfig , optional scheduler setting.
batches_data_dir str, optional batches working directory, where to store output files of batches, hyperctl will create a sub-directory by the batch name for every batch in this directory. default read from environment by key HYPERCTL_BATCHES_DATA_DIR, if do not set in environments using ~/hyperctl-batches-data-dir.
version str, optional if is None, use the currently running version, default is None.

JobConfig

Field Name Type Description
name str, optional str, unique in batch, optional, if is null will generate a uuid as job name, recommended that you specify one, with the name of the batch name, the executed job can be skipped when the batch is re-executed
params dict, required job params, it can be obtained through API hypernets.hyperctl.get_job_params
command str, required command to the the job, if execute a file, recommend use absolute path or path relative to {execution.working_dir}
working_dir str, optional working dir to run the command, default is {batches_data_dir}/{batch_name}/{job_name}

Note

A job write output file to {batches_data_dir}/{batch_name}/{job_name}, it usually contains files:

  • stdout: standard output
  • stderr: standard error
  • run.sh: shell script to run the job

BackendConfig

Is one of :

LocalBackendConfig

Running batch in standalone mode, please refer to the example LocalBackend.

Field Name Type Description
type "local"  
environments dict, optional Environments setting will export for the job process.

RemoteBackendConfig

Hyperctl supports parallel jobs in remote machines, this mode uses multiple machines to speed up the progress of the batch. It distributes jobs to remote nodes through the SSH protocol, which requires that the nodes running tasks remotely need to run SSH services and provide connection accounts. Please refer to the example RemoteSSHBackend

Field Name Type Description
machines list[RemoteMachineConfig ], required Connection and configuration information of remote machines.

RemoteMachineConfig

Field Name Type Description
connection SHHConnectionConfig, required Connection information for the remote machine.
environments dict, optional Environments setting will export for the job process.

SHHConnectionConfig

Field Name Type Description
hostname hostname, required IP or hostname of remote machine.
username username, required username of remote machine.
password password, required password of remote machine.

ServerConfig

Field Name Type Description
host str, optional where to bind for the http server, it’s should be IP address that can be accessed in remote machines if is remote backend, otherwise, the job will fail because the api server cannot be accessed, default is localhost.
port int, optional http server port, default is 8060

SchedulerConfig

Field Name Type Description
interval int, optional Scheduling interval, the unit is milliseconds, default value is 5000
exit_on_finish boolean, optional whether to exit the process when all jobs are finished, default is false

Job template configuration file references

Examples

Basic example

Refer to Generate jobs from template .

Configuration references

JobTemplateConfig

Field Name Type Description
name str, required refer to BatchApplicationConfig.name
params dict[str, list], required job params list, used to arrange and combine to generate jobs config.
command str, required refer to JobConfig.command
working_dir dict[str, list], required  
backend BackendConfig, optional refer to BatchApplicationConfig.backend
batches_data_dir str, optional refer to BatchApplicationConfig.batches_data_dir
server ServerConfig , optional refer to BatchApplicationConfig.server
scheduler SchedulerConfig , optional refer to BatchApplicationConfig.scheduler
version str, optional refer to BatchApplicationConfig.version