Training a Model & Configuration Explanation
This tutorial shows how one can use EasyTPP
to train the implemented models.
In principle, firstly we need to initialize a config yaml file, containing all the input configuration to guide the training and eval process. The overall structure of a config file is shown as below:
pipeline_config_id: .. # name of the config for guiding the pipeline
data:
[Dataset ID]: # name of the dataset, e.g, taxi
....
[EXPERIMENT ID]: # name of the experiment to run
base_config:
....
model_config:
...
After the config file is setup, we can run the script, by specifying the config directory and experiment id, to start the pipeline. We currently provide a preset script at examples/train_nhp.py.
Step 1: Setup the config file containing data and model configs
To be specific, one needs to define the following entries in the config file:
pipeline_config_id: registered name of EasyTPP.Config objects, such as runner_config or hpo_runner_config. By reading this, the corresponding configuration class will be loaded for constructing the pipeline.
pipeline_config_id: runner_config
data: dataset specifics. One can put multiple dataset specifics in the config file, but only one will be used in one experiment.
[DATASET ID]: name of the dataset, e.g., taxi.
train_dir, valid_dir, test_dir: directory of the datafile. For the moment we only accept pkl file (please see Dataset for details)
data_spec: define the event type information.
data:
taxi:
data_format: pkl
train_dir: ../data/taxi/train.pkl
valid_dir: ../data/taxi/dev.pkl
test_dir: ../data/taxi/test.pkl
data_spec:
num_event_types: 7 # num of types excluding pad events.
pad_token_id: 6 # event type index for pad events
padding_side: right # pad at the right end of the sequence
truncation_side: right # truncate at the right end of the sequence
max_len: 100 # max sequence length used as model input
[EXPERIMENT ID]: name of the experiment to run in the pipeline. It contains two blocks of configs:
base_config contains the pipeline framework related specifications.
base_config:
stage: train # train, eval and generate
backend: tensorflow # tensorflow and torch
dataset_id: conttime # name of the dataset
runner_id: std_tpp # registered name of the pipeline runner
model_id: RMTPP # model name # registered name of the implemented model
base_dir: './checkpoints/' # base dir to save the logs and models.
model_config contains the model related specifications.
model_config:
hidden_size: 32
time_emb_size: 16
num_layers: 2
num_heads: 2
mc_num_sample_per_step: 20
sharing_param_layer: False
loss_integral_num_sample_per_step: 20
dropout: 0.0
use_ln: False
thinning_params: # thinning algorithm for event sampling
num_seq: 10
num_sample: 1
num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
look_ahead_time: 10
patience_counter: 5 # the maximum iteration used in adaptive thinning
over_sample_rate: 5
num_samples_boundary: 5
dtime_max: 5
trainer_config contains the training related specifications.
trainer_config: # trainer arguments
seed: 2019
gpu: 0
batch_size: 256
max_epoch: 10
shuffle: False
optimizer: adam
learning_rate: 1.e-3
valid_freq: 1
use_tfb: False
metrics: ['acc', 'rmse']
A complete example of these files can be seen at examples/example_config.
Step 2: Run the training script
To run the training process, we simply need to call two functions:
Config
: it reads the directory of the configs specified in Step 1 and do some processing to form a complete configuration.Runner
: it reads the configuration and setups the whole pipeline for training, evaluation and generation.
The following code is an example, which is a copy from examples/train_nhp.py.
import argparse
from easy_tpp.config_factory import Config
from easy_tpp.runner import Runner
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--config_dir', type=str, required=False, default='configs/experiment_config.yaml',
help='Dir of configuration yaml to train and evaluate the model.')
parser.add_argument('--experiment_id', type=str, required=False, default='RMTPP_train',
help='Experiment id in the config file.')
args = parser.parse_args()
config = Config.build_from_yaml_file(args.config_dir, experiment_id=args.experiment_id)
model_runner = Runner.build_from_config(config)
model_runner.run()
if __name__ == '__main__':
main()
Checkout the output
During training, the log, the best model based on valid set performance, the complete configuration file are all saved. The directory of the saved files is specified in ‘base’ of model_config.yaml
, i.e.,
In the ./checkpoints/ folder, one find the correct subfolder by concatenating the ‘experiment_id’ and running timestamps. Inside that subfolder, there is a complete configuration file, e.g., NHP_train_output.yaml
that records all the information used in the pipeline. The
data_config:
train_dir: ../data/conttime/train.pkl
valid_dir: ../data/conttime/dev.pkl
test_dir: ../data/conttime/test.pkl
specs:
num_event_types_pad: 6
num_event_types: 5
event_pad_index: 5
data_format: pkl
base_config:
stage: train
backend: tensorflow
dataset_id: conttime
runner_id: std_tpp
model_id: RMTPP
base_dir: ./checkpoints/
exp_id: RMTPP_train
log_folder: ./checkpoints/98888_4299965824_221205-153425
saved_model_dir: ./checkpoints/98888_4299965824_221205-153425/models/saved_model
saved_log_dir: ./checkpoints/98888_4299965824_221205-153425/log
output_config_dir: ./checkpoints/98888_4299965824_221205-153425/RMTPP_train_output.yaml
model_config:
hidden_size: 32
time_emb_size: 16
num_layers: 2
num_heads: 2
mc_num_sample_per_step: 20
sharing_param_layer: false
loss_integral_num_sample_per_step: 20
dropout: 0.0
use_ln: false
seed: 2019
gpu: 0
thinning_params:
num_seq: 10
num_sample: 1
num_exp: 500
look_ahead_time: 10
patience_counter: 5
over_sample_rate: 5
num_samples_boundary: 5
dtime_max: 5
num_step_gen: 1
trainer:
batch_size: 256
max_epoch: 10
shuffle: false
optimizer: adam
learning_rate: 0.001
valid_freq: 1
use_tfb: false
metrics:
- acc
- rmse
seq_pad_end: true
is_training: true
num_event_types_pad: 6
num_event_types: 5
event_pad_index: 5
model_id: RMTPP
If we set use_tfb
to true
, it means we can launch the tensorboard to track the training process, one
can see Running Tensorboard for details.