Training a Model & Configuration Explanation

This tutorial shows how one can use EasyTPP to train the implemented models.

In principle, firstly we need to initialize a config yaml file, containing all the input configuration to guide the training and eval process. The overall structure of a config file is shown as below:

pipeline_config_id: ..  # name of the config for guiding the pipeline

data:
    [Dataset ID]:  # name of the dataset, e.g, taxi
        ....

[EXPERIMENT ID]:   # name of the experiment to run
    base_config:
      ....
    model_config:
       ...

After the config file is setup, we can run the script, by specifying the config directory and experiment id, to start the pipeline. We currently provide a preset script at examples/train_nhp.py.

Step 1: Setup the config file containing data and model configs

To be specific, one needs to define the following entries in the config file:

  • pipeline_config_id: registered name of EasyTPP.Config objects, such as runner_config or hpo_runner_config. By reading this, the corresponding configuration class will be loaded for constructing the pipeline.

pipeline_config_id: runner_config
  • data: dataset specifics. One can put multiple dataset specifics in the config file, but only one will be used in one experiment.

    • [DATASET ID]: name of the dataset, e.g., taxi.

    • train_dir, valid_dir, test_dir: directory of the datafile. For the moment we only accept pkl file (please see Dataset for details)

    • data_spec: define the event type information.

data:
  taxi:
    data_format: pkl
    train_dir: ../data/taxi/train.pkl
    valid_dir: ../data/taxi/dev.pkl
    test_dir: ../data/taxi/test.pkl
    data_spec:
        num_event_types: 7  # num of types excluding pad events.
        pad_token_id: 6    # event type index for pad events
        padding_side: right   # pad at the right end of the sequence
        truncation_side: right   # truncate at the right end of the sequence
        max_len: 100            # max sequence length used as model input
  • [EXPERIMENT ID]: name of the experiment to run in the pipeline. It contains two blocks of configs:

base_config contains the pipeline framework related specifications.

base_config:
    stage: train       # train, eval and generate
    backend: tensorflow   # tensorflow and torch
    dataset_id: conttime   # name of the dataset
    runner_id: std_tpp     # registered name of the pipeline runner
    model_id: RMTPP # model name  # registered name of the implemented model
    base_dir: './checkpoints/'   # base dir to save the logs and models.

model_config contains the model related specifications.

model_config:
      hidden_size: 32
      time_emb_size: 16
      num_layers: 2
      num_heads: 2
      mc_num_sample_per_step: 20
      sharing_param_layer: False
      loss_integral_num_sample_per_step: 20
      dropout: 0.0
      use_ln: False
      thinning_params:   # thinning algorithm for event sampling
            num_seq: 10
            num_sample: 1
            num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
            look_ahead_time: 10
            patience_counter: 5 # the maximum iteration used in adaptive thinning
            over_sample_rate: 5
            num_samples_boundary: 5
            dtime_max: 5

trainer_config contains the training related specifications.

trainer_config:   # trainer arguments
    seed: 2019
    gpu: 0
    batch_size: 256
    max_epoch: 10
    shuffle: False
    optimizer: adam
    learning_rate: 1.e-3
    valid_freq: 1
    use_tfb: False
    metrics: ['acc', 'rmse']

A complete example of these files can be seen at examples/example_config.

Step 2: Run the training script

To run the training process, we simply need to call two functions:

  1. Config: it reads the directory of the configs specified in Step 1 and do some processing to form a complete configuration.

  2. Runner: it reads the configuration and setups the whole pipeline for training, evaluation and generation.

The following code is an example, which is a copy from examples/train_nhp.py.

import argparse
from easy_tpp.config_factory import Config
from easy_tpp.runner import Runner


def main():
    parser = argparse.ArgumentParser()

    parser.add_argument('--config_dir', type=str, required=False, default='configs/experiment_config.yaml',
                        help='Dir of configuration yaml to train and evaluate the model.')

    parser.add_argument('--experiment_id', type=str, required=False, default='RMTPP_train',
                        help='Experiment id in the config file.')

    args = parser.parse_args()

    config = Config.build_from_yaml_file(args.config_dir, experiment_id=args.experiment_id)

    model_runner = Runner.build_from_config(config)

    model_runner.run()


if __name__ == '__main__':
    main()

Checkout the output

During training, the log, the best model based on valid set performance, the complete configuration file are all saved. The directory of the saved files is specified in ‘base’ of model_config.yaml, i.e.,

In the ./checkpoints/ folder, one find the correct subfolder by concatenating the ‘experiment_id’ and running timestamps. Inside that subfolder, there is a complete configuration file, e.g., NHP_train_output.yaml that records all the information used in the pipeline. The

data_config:
  train_dir: ../data/conttime/train.pkl
  valid_dir: ../data/conttime/dev.pkl
  test_dir: ../data/conttime/test.pkl
  specs:
    num_event_types_pad: 6
    num_event_types: 5
    event_pad_index: 5
  data_format: pkl
base_config:
  stage: train
  backend: tensorflow
  dataset_id: conttime
  runner_id: std_tpp
  model_id: RMTPP
  base_dir: ./checkpoints/
  exp_id: RMTPP_train
  log_folder: ./checkpoints/98888_4299965824_221205-153425
  saved_model_dir: ./checkpoints/98888_4299965824_221205-153425/models/saved_model
  saved_log_dir: ./checkpoints/98888_4299965824_221205-153425/log
  output_config_dir: ./checkpoints/98888_4299965824_221205-153425/RMTPP_train_output.yaml
model_config:
  hidden_size: 32
  time_emb_size: 16
  num_layers: 2
  num_heads: 2
  mc_num_sample_per_step: 20
  sharing_param_layer: false
  loss_integral_num_sample_per_step: 20
  dropout: 0.0
  use_ln: false
  seed: 2019
  gpu: 0
  thinning_params:
    num_seq: 10
    num_sample: 1
    num_exp: 500
    look_ahead_time: 10
    patience_counter: 5
    over_sample_rate: 5
    num_samples_boundary: 5
    dtime_max: 5
    num_step_gen: 1
  trainer:
    batch_size: 256
    max_epoch: 10
    shuffle: false
    optimizer: adam
    learning_rate: 0.001
    valid_freq: 1
    use_tfb: false
    metrics:
    - acc
    - rmse
    seq_pad_end: true
  is_training: true
  num_event_types_pad: 6
  num_event_types: 5
  event_pad_index: 5
  model_id: RMTPP

If we set use_tfb to true, it means we can launch the tensorboard to track the training process, one can see Running Tensorboard for details.