Model Implementation Details

Basic structure

In the model folder, torch_basemodel (/model/torch_model/torch_basemodel.py) / tf_basemodel (/model/tf_model/tf_basemodel.py) implements functionalities of computing loglikelihood and sampling procedures that are common to all the TPP models. In the inherited class, models with specific structures are defined, explained in below sections.

Computing the loglikelihood of non-pad event sequence

The loglikelihood computation, following the definition in Equation 8 of The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process, is shared by all the TPP models.

it takes time_delta_seqs, lambda_at_event, lambdas_loss_samples, seq_mask,

lambda_type_mask as the input and output the loglikelihood items, please see torch_basemodel (/model/torch_model/torch_basemodel.py) / tf_basemodel (/model/tf_model/tf_basemodel.py)

for details.

It is noted that:

1. Sequential prediction: because we performance sequential prediction, i.e., predict next one given previous, we do not consider the last one as it has no labels. To implement the forward function, we take input of time_seqs[:, :-1] and type_seqs[:, :-1]. For time_delta_seqs it is different; please see the next point.

2. Continuous-time evolution: recall the definition in [dataset](./dataset.rst), assume we have a sequence of 4 events and 1 pad event at the end, i.e.,

index:          0,          1,         2,         3     4
dtimes:         0     t_1-t_0,    t_2-t_1,   t_3-t_2,  pad
types:          e_0,      e_1,        e_2,       e_3,   pad
non_pad_mask:  True,      True,      True,      True,   False

For the i-th event, i-th dtime denotes the time evolution (e.g., decay in NHP) to the current event and (i+1)-th dtime denotes the time evolution to the next event. To compute the non-event loglikelihood, we should consider the time evolution after the event happens. Therefore we should use type_delta_seqs[:, 1:] with masks specified in the below step.

3. Masking: suppose we have predictions of 0,1,2,3-th event and their labels are 1,2,3,4-th events where $4$-th event needed to be masked. So we should set the sequence mask as True, True, True, False, i.e., seq_mask=batch_non_pad_mask[:, 1:]. The same logic applies to the attention mask and event type mask.

Therefore the following code is a typical example of calling the loglikelihood computation:

event_ll, non_event_ll, num_events = self.compute_loglikelihood(lambda_at_event=lambda_at_event, # seq_len = max_len - 1
                                                                lambdas_loss_samples=lambda_t_sample, # seq_len = max_len - 1
                                                                time_delta_seq=time_delta_seq[:, 1:],
                                                                seq_mask=batch_non_pad_mask[:, 1:],
                                                                lambda_type_mask=type_mask[:, 1:])

Computing the integral inside the loglikelihood

The loglikelihood of the parameters is the sum of the log-intensities of the events that happened, at the times they happened, minus an integral of the total intensities over the observation interval over [0,T]:

\[\sum_{t_i}\log \lambda_{k_i}(t_i) - \int_0^T \lambda(t) dt\]

The first term refers to event loglikelihood and the second term (including the negative sign) refers to the non-event loglikelihood.

Neural Hawkes Process (NHP)

We implement NHP based on author’s official pytorch code Github:nce-mpp.

  1. A continuous-time LSTM is introduced, with the code mainly come from Github:nce-mpp.

  2. A forward function in NHP class that recursively update the states: we compute the event embedding, pass to the LSTM cell and then decay afterwards. Noted that for i-th event, we should use (i+1)-th dt for the decay. So we do not consider the last event as it has no decay time.

Attentive Neural Hawkes Process (AttNHP)

We implement AttNHP based on the authors’ official pytorch code Github:anhp-andtt and similar to NHP, we factorize it into based model and inherited model.

The forward functions is implemented faithfully to that of the author’s repo.

Transformer Hawkes Process (THP)

We implement THP based on a fixed version of pytorch code Github:anhp-andtt/thp and we factorize it into based model and inherited model.

Self-Attentive Hawkes Process (SAHP)

We implement SAHP based on a fixed version of pytorch code Github:anhp-andtt/sahp and we factorize it into based model and inherited model.

SAHP basically shares very similar structure to that of THP.

Recurrent Marked Temporal Point Processes (RMTPP)

We implement RMTPP faithfully to the author’s paper.

Intensity Free Learning of Temporal Point Process (IntensityFree)

We implement the model based on the author’s torch code Github:ifl-tpp.

A small difference between our implementation and the author’s is we ignore the context_init (the initial state of the RNN) because in our data setup, we do not need a learnable initial RNN state. This modification generally makes little impact on the learning process.

It is worth noting that the thinning algorithm can not be applied to this model because it is intensity-free. When comparing the performance of the model, we only look at its log-likelihood learning curve.

Fully Neural Network based Model for General Temporal Point Processes (FullyNN)

We implement the model based on the author’s keras code Github:NeuralNetworkPointProcess.

ODE-based Temporal Point Process (ODETPP)

We implement a TPP with Neural ODE state evolution, which is a simplified version of Neural Spatio-Temporal Point Processes. The ODE implementation uses the code from the blog

Attentive Neural Hawkes Network (ANHN)

We implement the model based on the author’s paper: the attentive model without the graph regularizer is named ANHN.