Retraining the captioning model