DeepLearning Caption (Mac)

Chainerで画像のキャプション生成 - Qiita
の記事を試す。

Ubuntuでやった時のメモ:
http://kubotti.hatenablog.com/entry/2016/06/07/142850

ソースコード取得。
https://github.com/dsanno/chainer-image-caption

git clone https://github.com/dsanno/chainer-image-caption.git

http://cs.stanford.edu/people/karpathy/deepimagesent/ から
Flickr8K (50MB), Flickr30K (200MB), をダウンロード。

hdf5

$ pip install h5py
Downloading/unpacking h5py
  Downloading h5py-2.6.0.tar.gz (245kB): 245kB downloaded
  Running setup.py (path:/Users/kubotad/.virtualenvs/ml/build/h5py/setup.py) egg_info for package h5py
    zip_safe flag not set; analyzing archive contents...
    
    Installed /Users/kubotad/.virtualenvs/ml/build/h5py/.eggs/pkgconfig-1.1.0-py2.7.egg
    Searching for Cython>=0.19

省略

.virtualenvs/ml/build/h5py/h5py/api_compat.h:27:10: fatal error: 'hdf5.h' file not found

#include "hdf5.h"

         ^

1 warning and 1 error generated.

error: command 'cc' failed with exit status 1

----------------------------------------
Cleaning up...
brew tap homebrew/science
brew install hdf5

pip install h5py

train

GeForceGPUが入っていないので、オプションを変える。
繰り返し5回(デフォルト100回)。

python src/train.py -s dataset.pkl -i vgg_feats.mat -o model/caption_gen --iter 5
word count:  2540
/Users/kubotad/.virtualenvs/ml/lib/python2.7/site-packages/chainer/functions/activation/lstm.py:15: RuntimeWarning: overflow encountered in exp
  return 1 / (1 + numpy.exp(-x))
epoch: 1 done
train loss: 0.0477752295387 accuracy: 0.240047393365
test loss: 0.0503208395787 accuracy: 0.314788910424
epoch: 2 done
train loss: 0.0407720221888 accuracy: 0.304062288422
test loss: 0.0463195297886 accuracy: 0.33422094885
epoch: 3 done
train loss: 0.0389543428261 accuracy: 0.319383322049
test loss: 0.0449947251971 accuracy: 0.345344231904
epoch: 4 done
train loss: 0.037905781938 accuracy: 0.330012976755
test loss: 0.0435648040157 accuracy: 0.356770877223
epoch: 5 done
train loss: 0.0370137848389 accuracy: 0.339999435793
test loss: 0.0429580021762 accuracy: 0.36130445774

caffe model

ダウンロード。
http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel

変換。

python src/convert_caffemodel_to_pkl.py VGG_ILSVRC_19_layers.caffemodel vgg19.pkl

generate caption

python src/generate_caption.py -s dataset.pkl -i vgg19.pkl -m model/caption_gen_0004.model -l image/list.txt
chainer-image-caption% python src/generate_caption.py -s dataset.pkl -i vgg19.pkl -m model/caption_gen_0004.model -l image/list.txt
#  image/asakusa.jpg
a man is sitting on a bench
a group of people sit on a bench
a group of people are sitting on a bench
a group of people are standing in the snow
a group of people are walking through the snow
#  image/tree.jpg
a man is sitting on a bench
a group of people sit on a bench
a group of people are sitting on a bench
a group of people sitting on a bench
a group of people sit on a sidewalk