How to calculate perplexity of RNN in tensor flow

Question

How to calculate perplexity of RNN in tensor flow

I use the word RNN to cast the tensor flow of Word RNN

How to calculate perplexity RNN.

The following is a training code that shows learning loss and other things in each era:

for e in range(model.epoch_pointer.eval(), args.num_epochs): sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e))) data_loader.reset_batch_pointer() state = sess.run(model.initial_state) speed = 0 if args.init_from is None: assign_op = model.batch_pointer.assign(0) sess.run(assign_op) assign_op = model.epoch_pointer.assign(e) sess.run(assign_op) if args.init_from is not None: data_loader.pointer = model.batch_pointer.eval() args.init_from = None for b in range(data_loader.pointer, data_loader.num_batches): start = time.time() x, y = data_loader.next_batch() feed = {model.input_data: x, model.targets: y, model.initial_state: state, model.batch_time: speed} summary, train_loss, state, _, _ = sess.run([merged, model.cost, model.final_state, model.train_op, model.inc_batch_pointer_op], feed) train_writer.add_summary(summary, e * data_loader.num_batches + b) speed = time.time() - start if (e * data_loader.num_batches + b) % args.batch_size == 0: print("{}/{} (epoch {}), train_loss = {:.3f}, time/batch = {:.3f}" \ .format(e * data_loader.num_batches + b, args.num_epochs * data_loader.num_batches, e, train_loss, speed)) if (e * data_loader.num_batches + b) % args.save_every == 0 \ or (e==args.num_epochs-1 and b == data_loader.num_batches-1): # save for the last result checkpoint_path = os.path.join(args.save_dir, 'model.ckpt') saver.save(sess, checkpoint_path, global_step = e * data_loader.num_batches + b) print("model saved to {}".format(checkpoint_path)) train_writer.close()

+6

python tensorflow lstm recurrent-neural-network

Shan Khan Jan 26 '17 at 19:22

source share

2 answers

It depends on whether your loss function gives the log probability of the data in base 2 or base e. This model uses legacy_seq2seq.sequence_loss_by_example, which uses the TensorFlow binary crossentropy, which is used to use the base e logs . Therefore, despite the fact that we are dealing with a discrete probability distribution (text), we must be indexed with e, i.e. Use tf.exp (train_loss) as suggested by Colin Skou.

0

Mattias arro Oct 6 '17 at 12:57

source share

Kilian batzner · Accepted Answer · 2017-02-03T13:14:02+0000

The project you are referencing uses sequence_to_sequence_loss_by_example , which returns the loss of cross entropy. Thus, to calculate perplexity in learning, you just need to amplify the loss, as described here .

 train_perplexity = tf.exp(train_loss)

We should use e instead of 2 as the base, because TensorFlow measures the cross-entropy loss by the natural logarithm ( TF Documentation ). Thanks, @Matthias Arro and @Colin Skow for the tip.

Detailed explanation

The cross-entropy of the two probability distributions P and Q tells us the minimum average number of bits we need to encode P events when we design a coding scheme based on Q. Thus, P is a true distribution that we usually don’t know. We want to find Q as close to P as possible so that we can develop a good coding scheme with as many bits per event as possible.

I should not say bits, because we can only use bits as a measure, if we use base 2 in calculating cross-entropy. But TensorFlow uses the natural logarithm, so instead measure cross-entropy in nats .

So, let's say we have a bad language model in which each character (symbol / word) in the body is equally likely to be next. For a case of 1000 tokens, this model will have a cross-entropy of log (1000) = 6.9 nats. When predicting the next token, he should choose evenly between 1000 tokens at each step.

The best language model will determine the probability distribution of Q, which is closer to P. Thus, the cross-entropy is lower - we can get a cross-entropy of 3.9 nats. If we now want to measure perplexity, we simply index cross-entropy:

exp (3.9) = 49.4

So, on the samples for which we calculated the loss, a good model was just as vague as if she had to choose evenly and independently between about 50 tokens.

How to calculate perplexity of RNN in tensor flow

Detailed explanation

More articles: