KLOOS/Implementation_details.txt at master · ehalit/KLOOS · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Using in-scope labeled data, a one directional LSTM of hidden size 128 followed by a single linear layer of shape (hidden size, number of intents) with log softmax activation function is trained.
The input of LSTM is FastText word embeddings.
The loss function is negative log-likelihood loss and SGD optimizer of pytorch with learning rate 0.1 is used.
Batch size is 1 due to varying size of input sentences.

The model is trained for 5 epochs but the hyperparameters in this in-scope intent classification task are not optimized.
There may be room for improvement in the in-scope classification performance (around 0.85 accuracy) because it is easily beaten by Tfidf based machine learning classifiers.
Since the aim of the project is to detect out-of-scope instances, the in-scope classification performance is seen satisfactory.

After the model is fully trained, both the in-scope and out-of-scope data is given to the model in the evaluation mode with no gradient update.
The output of the model is a tensor of shape (number of intents, length of the sentence), call it (n, k).
The KLOOS feature vector is obtained by calculating the KL Divergence between (n, 0) and (n, 1), (n, 1) and (n, 2) ... (n, k-1) and (n, k).
These values are appended to construct the KLOOS vector.

The following algorithm is used for KL Divergence calculation where p and q are outputs of the Linear layer (before log softmax activation):

def KL(p, q):
  p = scipy.special.softmax(p)
  q = scipy.special.softmax(q)
  res = numpy.sum(p * numpy.log(p/q)) # Formula for KL Divergence
  if numpy.isnan(res):
    return 0.0
  else:
    return res

The resultant KLOOS feature vectors are padded with zeros to match the longest feature vector length.
KLOOS vectors used for binary classification with Machine Learning classifiers as discussed in the paper.