Deep Leaningで画像の説明(キャプション)
画像に文章の説明を付けるという、ディープラーニングの記事を見つけたので試してみる。
処理内容はまだ理解できないので実行するだけ。
http://t-satoshi.blogspot.jp/2016/01/blog-post_1.html
環境構築
別記事にした。
ubuntuにchainerとcudaをインストール - kubotti’s memo
ソースコードのチェックアウト(clone)
git clone https://github.com/dsanno/chainer-image-caption.git
実行
https://github.com/dsanno/chainer-image-caption
の説明にしたがって実行。
変換
python src/convert_dataset.py dataset.json dataset.pkl
エラーメモ
$ python src/train.py -g 0 -s dataset.pkl -i vgg_feats.mat -o model/caption_gen /home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/cuda.py:85: UserWarning: cuDNN is not enabled. Please reinstall chainer after you install cudnn (see https://github.com/pfnet/chainer#installation). 'cuDNN is not enabled.\n'
→NVIDIAのcuDNNを設定すると直る。
$ python src/train.py -g 0 -s dataset.pkl -i vgg_feats.mat -o model/caption_gen word count: 2540 epoch: 1 done train loss: 0.0459010656298 accuracy: 0.256903082132 test loss: 0.0497476285998 accuracy: 0.317653983831 Traceback (most recent call last): File "src/train.py", line 168, in <module> train(args.iter) File "src/train.py", line 165, in train serializers.save_hdf5(args.output + '_{0:04d}.model'.format(epoch), caption_net) File "/home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/serializers/hdf5.py", line 70, in save_hdf5 with h5py.File(filename, 'w') as f: File "/home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/h5py/_hl/files.py", line 272, in __init__ fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/h5py/_hl/files.py", line 98, in make_fid fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/kubotad/.virtualenvs/ml1/build/h5py/h5py/_objects.c:2682) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/kubotad/.virtualenvs/ml1/build/h5py/h5py/_objects.c:2640) File "h5py/h5f.pyx", line 96, in h5py.h5f.create (/home/kubotad/.virtualenvs/ml1/build/h5py/h5py/h5f.c:2095) IOError: Unable to create file (Unable to open file: name = 'model/caption_gen_0000.model', errno = 2, error message = 'no such file or directory', flags = 13, o_flags = 242)
→modelというディレクトリが無いのでエラーになったっぽい。
mkdir model
chainer-image-caption$ mkdir model
train 学習
chainer-image-caption$ python src/train.py -g 0 -s dataset.pkl -i vgg_feats.mat -o model/caption_gen word count: 2540 epoch: 1 done train loss: 0.045725645361 accuracy: 0.257317751646 test loss: 0.0493454146218 accuracy: 0.31885060668 epoch: 2 done train loss: 0.0400679612972 accuracy: 0.31300213933 test loss: 0.0464303263971 accuracy: 0.335451245308 epoch: 3 done train loss: 0.0386252489649 accuracy: 0.324912548065 test loss: 0.0442742546894 accuracy: 0.352658629417 epoch: 4 done train loss: 0.0374745318931 accuracy: 0.335945606232 test loss: 0.0441596489941 accuracy: 0.352911442518 epoch: 5 done train loss: 0.0368611614726 accuracy: 0.340930372477 test loss: 0.0432623657668 accuracy: 0.36116963625 epoch: 6 done train loss: 0.0361832392497 accuracy: 0.348138123751 test loss: 0.0423972052143 accuracy: 0.365753769875 epoch: 7 done train loss: 0.0356026551704 accuracy: 0.355444580317 test loss: 0.042021223891 accuracy: 0.371568202972 epoch: 8 done train loss: 0.0355471082923 accuracy: 0.353055179119 test loss: 0.0414815161085 accuracy: 0.377298384905 epoch: 9 done train loss: 0.0350207043004 accuracy: 0.360088020563 test loss: 0.0408468031924 accuracy: 0.378764629364 epoch: 10 done train loss: 0.0346867476396 accuracy: 0.363693296909 test loss: 0.0408041208308 accuracy: 0.381831973791 epoch: 11 done train loss: 0.0344453609535 accuracy: 0.366818994284 test loss: 0.0404426625487 accuracy: 0.385101556778 epoch: 12 done train loss: 0.0340982141649 accuracy: 0.371871471405 test loss: 0.0405440570554 accuracy: 0.384966701269 epoch: 13 done train loss: 0.0339502898191 accuracy: 0.373081684113 test loss: 0.0399705957483 accuracy: 0.389753103256 epoch: 14 done train loss: 0.0337517845904 accuracy: 0.376165091991 test loss: 0.0401440467461 accuracy: 0.389466583729 epoch: 15 done train loss: 0.0335793790123 accuracy: 0.378232896328 test loss: 0.040198393832 accuracy: 0.390528351068 epoch: 16 done train loss: 0.0333063088851 accuracy: 0.382083624601 test loss: 0.040168921178 accuracy: 0.391050815582 epoch: 17 done train loss: 0.0332140159146 accuracy: 0.382659107447 test loss: 0.0400675989685 accuracy: 0.391640692949 epoch: 18 done train loss: 0.0330915914119 accuracy: 0.3841175735 test loss: 0.0401205218199 accuracy: 0.394792288542 epoch: 19 done train loss: 0.0328632636593 accuracy: 0.386089473963 test loss: 0.0398161334222 accuracy: 0.395634949207 epoch: 20 done train loss: 0.0326901086549 accuracy: 0.389043092728 test loss: 0.0398805887529 accuracy: 0.395550698042 epoch: 21 done train loss: 0.0326634119886 accuracy: 0.389251857996 test loss: 0.0397074450841 accuracy: 0.396241664886 epoch: 22 done train loss: 0.0325366721848 accuracy: 0.391771048307 test loss: 0.0394001827826 accuracy: 0.398196667433 epoch: 23 done train loss: 0.0324190159283 accuracy: 0.392806351185 test loss: 0.0398247712362 accuracy: 0.39888766408 epoch: 24 done train loss: 0.0322716244829 accuracy: 0.39495036006 test loss: 0.0394409952726 accuracy: 0.399477541447 epoch: 25 done train loss: 0.0321579991855 accuracy: 0.396456778049 test loss: 0.0399258874583 accuracy: 0.399898886681 epoch: 26 done train loss: 0.0321149320884 accuracy: 0.398030906916 test loss: 0.0393185117686 accuracy: 0.40096065402 epoch: 27 done train loss: 0.032024884975 accuracy: 0.398823618889 test loss: 0.0389585335362 accuracy: 0.402140378952 epoch: 28 done train loss: 0.0319135850017 accuracy: 0.400141060352 test loss: 0.0390808230285 accuracy: 0.404179662466 epoch: 29 done train loss: 0.0317952594949 accuracy: 0.4023668468 test loss: 0.0394417998737 accuracy: 0.402443736792 epoch: 30 done train loss: 0.0318133246293 accuracy: 0.401306152344 test loss: 0.0392306046288 accuracy: 0.403488665819 epoch: 31 done train loss: 0.031604909176 accuracy: 0.404256939888 test loss: 0.0393333604735 accuracy: 0.403353840113 epoch: 32 done train loss: 0.0315624695159 accuracy: 0.404347211123 test loss: 0.0394831810098 accuracy: 0.404095381498 epoch: 33 done train loss: 0.0314850438466 accuracy: 0.40556588769 test loss: 0.0395926828151 accuracy: 0.403775185347 epoch: 34 done train loss: 0.0314135441664 accuracy: 0.407235950232 test loss: 0.0392156924876 accuracy: 0.405797600746 epoch: 35 done train loss: 0.0313472052038 accuracy: 0.408299475908 test loss: 0.0395177092657 accuracy: 0.405072897673 epoch: 36 done train loss: 0.0312555465321 accuracy: 0.409642279148 test loss: 0.0392123978115 accuracy: 0.403370678425 epoch: 37 done train loss: 0.0312898968348 accuracy: 0.408646464348 test loss: 0.0394170638405 accuracy: 0.406539142132 epoch: 38 done train loss: 0.0311821257477 accuracy: 0.410767883062 test loss: 0.0389353885986 accuracy: 0.405949264765 epoch: 39 done train loss: 0.0311064603533 accuracy: 0.41182294488 test loss: 0.0391744424661 accuracy: 0.407196432352 epoch: 40 done train loss: 0.0310275608092 accuracy: 0.411952733994 test loss: 0.0392464721004 accuracy: 0.407331258059 epoch: 41 done train loss: 0.0309951041262 accuracy: 0.412976741791 test loss: 0.0394795492837 accuracy: 0.406960487366 epoch: 42 done train loss: 0.0309132562296 accuracy: 0.414596021175 test loss: 0.0392007111711 accuracy: 0.406522274017 epoch: 43 done train loss: 0.030854928098 accuracy: 0.415072768927 test loss: 0.0398743693777 accuracy: 0.401752769947 epoch: 44 done train loss: 0.0308340306912 accuracy: 0.414954304695 test loss: 0.0394592419536 accuracy: 0.407263845205 epoch: 45 done train loss: 0.0308552646899 accuracy: 0.415834456682 test loss: 0.0391359392147 accuracy: 0.408982902765 epoch: 46 done train loss: 0.0307133651365 accuracy: 0.416725903749 test loss: 0.0393263368754 accuracy: 0.408763796091 epoch: 47 done train loss: 0.0307274839519 accuracy: 0.417357832193 test loss: 0.0391286147831 accuracy: 0.408291906118 epoch: 48 done train loss: 0.0306134443411 accuracy: 0.418878346682 test loss: 0.0392422489899 accuracy: 0.408780664206 epoch: 49 done train loss: 0.0305940605279 accuracy: 0.419558227062 test loss: 0.0391119772276 accuracy: 0.408612132072 epoch: 50 done train loss: 0.0305521406738 accuracy: 0.420782566071 test loss: 0.039111297826 accuracy: 0.409218847752 epoch: 51 done train loss: 0.0304824923714 accuracy: 0.420734584332 test loss: 0.0393835133839 accuracy: 0.409758150578 epoch: 52 done train loss: 0.0304757518019 accuracy: 0.421180307865 test loss: 0.0391355557646 accuracy: 0.408612132072 epoch: 53 done train loss: 0.0304182588031 accuracy: 0.42181506753 test loss: 0.0393587586418 accuracy: 0.410583972931 epoch: 54 done train loss: 0.0303629793872 accuracy: 0.422833442688 test loss: 0.0392465169726 accuracy: 0.408730089664 epoch: 55 done train loss: 0.0303913491318 accuracy: 0.422077417374 test loss: 0.0394179263682 accuracy: 0.409437924623 epoch: 56 done train loss: 0.0302250756743 accuracy: 0.423346877098 test loss: 0.0397991454344 accuracy: 0.410061508417 epoch: 57 done train loss: 0.0302548638034 accuracy: 0.425191819668 test loss: 0.0391612867019 accuracy: 0.409454792738 epoch: 58 done train loss: 0.0302209331451 accuracy: 0.424306035042 test loss: 0.0396091171309 accuracy: 0.409623324871 epoch: 59 done train loss: 0.0302147069428 accuracy: 0.424506306648 test loss: 0.0390710623017 accuracy: 0.40957275033 epoch: 60 done train loss: 0.0301481750811 accuracy: 0.426540285349 test loss: 0.0392560725803 accuracy: 0.411106437445 epoch: 61 done train loss: 0.030122064582 accuracy: 0.426215857267 test loss: 0.0395058371298 accuracy: 0.410567104816 epoch: 62 done train loss: 0.0301009105946 accuracy: 0.426227152348 test loss: 0.0393597140856 accuracy: 0.410078376532 epoch: 63 done train loss: 0.030041947949 accuracy: 0.427474051714 test loss: 0.039556781847 accuracy: 0.409926682711 epoch: 64 done train loss: 0.0300641104696 accuracy: 0.427626371384 test loss: 0.0393399471659 accuracy: 0.41098845005 epoch: 65 done train loss: 0.0300008373717 accuracy: 0.428799927235 test loss: 0.0390807943841 accuracy: 0.41184797883 epoch: 66 done train loss: 0.029940692683 accuracy: 0.430464327335 test loss: 0.0398363739474 accuracy: 0.410819917917 epoch: 67 done train loss: 0.0299442960929 accuracy: 0.429524928331 test loss: 0.0392615676315 accuracy: 0.41098845005 epoch: 68 done train loss: 0.0299426687636 accuracy: 0.429866284132 test loss: 0.039283643197 accuracy: 0.411730021238 epoch: 69 done train loss: 0.0298568634205 accuracy: 0.430385351181 test loss: 0.0393123223015 accuracy: 0.41053339839 epoch: 70 done train loss: 0.0298064194255 accuracy: 0.431293725967 test loss: 0.0391522611266 accuracy: 0.410769373178 epoch: 71 done train loss: 0.0297600804316 accuracy: 0.432757854462 test loss: 0.0396972762443 accuracy: 0.411123275757 epoch: 72 done train loss: 0.0298023518677 accuracy: 0.431488364935 test loss: 0.0392568858392 accuracy: 0.411797434092 epoch: 73 done train loss: 0.0297010307977 accuracy: 0.433166891336 test loss: 0.0395133065146 accuracy: 0.410499691963 epoch: 74 done train loss: 0.0297632095724 accuracy: 0.43278041482 test loss: 0.0394347384682 accuracy: 0.412555813789 epoch: 75 done train loss: 0.0296932368664 accuracy: 0.434357374907 test loss: 0.0392466530493 accuracy: 0.412134498358 epoch: 76 done train loss: 0.0296490048835 accuracy: 0.43413451314 test loss: 0.0395070840623 accuracy: 0.41149404645 epoch: 77 done train loss: 0.0296292832112 accuracy: 0.435356020927 test loss: 0.0400614062899 accuracy: 0.410314321518 epoch: 78 done train loss: 0.0296808992019 accuracy: 0.434557676315 test loss: 0.0394106372067 accuracy: 0.412774920464 epoch: 79 done train loss: 0.0295214926944 accuracy: 0.435076743364 test loss: 0.0393325641544 accuracy: 0.412589520216 epoch: 80 done train loss: 0.0295438448904 accuracy: 0.435209333897 test loss: 0.0393370113468 accuracy: 0.411258101463 epoch: 81 done train loss: 0.0295522417766 accuracy: 0.436490058899 test loss: 0.0393007405932 accuracy: 0.410634547472 epoch: 82 done train loss: 0.0294843228681 accuracy: 0.436797559261 test loss: 0.0396772198445 accuracy: 0.411039024591 epoch: 83 done train loss: 0.029514518406 accuracy: 0.434645116329 test loss: 0.0397642040583 accuracy: 0.412926614285 epoch: 84 done train loss: 0.0294512088967 accuracy: 0.437463313341 test loss: 0.039932036003 accuracy: 0.410617679358 epoch: 85 done train loss: 0.0294844052118 accuracy: 0.436597257853 test loss: 0.0401390195943 accuracy: 0.411443501711 epoch: 86 done train loss: 0.0293843876742 accuracy: 0.437040179968 test loss: 0.039622603174 accuracy: 0.410836786032 epoch: 87 done train loss: 0.0294554910136 accuracy: 0.436653703451 test loss: 0.039845085678 accuracy: 0.411342382431 epoch: 88 done train loss: 0.0294100674418 accuracy: 0.437813133001 test loss: 0.039865503411 accuracy: 0.412555813789 epoch: 89 done train loss: 0.0293330790189 accuracy: 0.439384460449 test loss: 0.0401588284775 accuracy: 0.408376157284 epoch: 90 done train loss: 0.0293456023455 accuracy: 0.439178526402 test loss: 0.0396922601628 accuracy: 0.410904198885 epoch: 91 done train loss: 0.029273456991 accuracy: 0.439601659775 test loss: 0.0395943685775 accuracy: 0.411763727665 epoch: 92 done train loss: 0.0293210331401 accuracy: 0.440913438797 test loss: 0.0395775310945 accuracy: 0.411982804537 epoch: 93 done train loss: 0.0292821050549 accuracy: 0.440741360188 test loss: 0.0398033293399 accuracy: 0.413701862097 epoch: 94 done train loss: 0.0293138611642 accuracy: 0.439153134823 test loss: 0.0398919043791 accuracy: 0.411359220743 epoch: 95 done train loss: 0.029254008358 accuracy: 0.439917623997 test loss: 0.0396663537059 accuracy: 0.411241263151 epoch: 96 done train loss: 0.0292041851444 accuracy: 0.440755486488 test loss: 0.0398077456467 accuracy: 0.409673899412 epoch: 97 done train loss: 0.0292420347517 accuracy: 0.441015005112 test loss: 0.0397308980631 accuracy: 0.410786211491 epoch: 98 done train loss: 0.0292156278825 accuracy: 0.441201210022 test loss: 0.0397790745478 accuracy: 0.412201911211 epoch: 99 done train loss: 0.0291669691484 accuracy: 0.442070066929 test loss: 0.0398969514088 accuracy: 0.410819917917 epoch: 100 done train loss: 0.0291818802014 accuracy: 0.441469192505 test loss: 0.04006588599 accuracy: 0.411308676004
Core i5
GeForce 840M
のノートPCで2時間〜2時間30分くらいかかった。
VGG_ILSVRC_19_layers
chainer-image-caption$ python src/convert_caffemodel_to_pkl.py VGG_ILSVRC_19_layers.caffemodel vgg19.pkl Traceback (most recent call last): File "src/convert_caffemodel_to_pkl.py", line 2, in <module> from chainer.functions import caffe ImportError: No module named chainer.functions kubotad@kubotad-Diginnos:~/deep_caption/chainer-image-caption$ source ~/.virtualenvs/ml1/bin/activate (ml1)kubotad@kubotad-Diginnos:~/deep_caption/chainer-image-caption$ (ml1)kubotad@kubotad-Diginnos:~/deep_caption/chainer-image-caption$ python src/convert_caffemodel_to_pkl.py VGG_ILSVRC_19_layers.caffemodel vgg19.pkl Traceback (most recent call last): File "src/convert_caffemodel_to_pkl.py", line 8, in <module> model = caffe.CaffeFunction(model_path) File "/home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/links/caffe/caffe_function.py", line 127, in __init__ with open(model_path, 'rb') as model_file: IOError: [Errno 2] No such file or directory: 'VGG_ILSVRC_19_layers.caffemodel'
VGG_ILSVRC_19_layersというモデルデータが必要らしい。
ILSVRC-2014 model (VGG team) with 19 weight layers · GitHub
からダウンロードした。
549MB
キャプション表示
/chainer-image-caption$ python src/generate_caption.py -s dataset.pkl -i vgg19.pkl -m model/caption_gen_0010.model -l image/label.txt Traceback (most recent call last): File "src/generate_caption.py", line 51, in <module> with open(args.list) as f: IOError: [Errno 2] No such file or directory: 'image/label.txt'
image/label.txt が無いというエラー。
引数の、image/label.txtをimage/list.txtにしたら直った。
結果
python src/generate_caption.py -s dataset.pkl -i vgg19.pkl -m model/caption_gen_0010.model -l image/list.txt # image/asakusa.jpg a man sits on a bench a man is sitting on a bench a man in a red shirt is riding a bike a man in a red shirt is riding a bike through the woods a man in a blue shirt is riding a bike through the woods # image/tree.jpg a man sits on a bench a man is sitting on a bench a man on a bike stands in front of a brick wall a man in a red shirt is sitting on a bench a man on a bike stands in front of a building /home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/functions/activation/lstm.py:15: RuntimeWarning: overflow encountered in exp return 1 / (1 + numpy.exp(-x)) # image/racket1.jpeg a young child in a t shirt on a skateboard a young child in a t shirt on a subway a young child in a t shirt on a bike stands in the snow a young child in a t shirt on a bike stands in the woods a young child in a t shirt on a bike stands in front of a brick wall
とりあえず動いたっぽいけど、びっくりするほど外してる。
原因を調査したい。
image/asakusa.jpg
image/tree.jpg
image/racket1.jpeg
改善調査
引数の、
-m model/caption_gen_0010.model
を
-m model/caption_gen_0099.model
に変えたら、3番目の画像でtennisの文字が出てきた。
やり方はあってるけど、学習が足りないってことなのか・・・?
chainer-image-caption$ python src/generate_caption.py -s dataset.pkl -i vgg19.pkl -m model/caption_gen_0099.model -l image/list.txt /home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/functions/activation/lstm.py:15: RuntimeWarning: overflow encountered in exp return 1 / (1 + numpy.exp(-x)) # image/asakusa.jpg two people sit on a bench a group of people stand on a balcony a group of people sit on benches a group of people watch a drag race a group of people sit on a bench # image/tree.jpg a group of people stand on a balcony two people sit on a bench a group of people sit on benches a group of people sit on a bench two people are sitting on a bench # image/racket1.jpeg a boy plays tennis a young child in a t shirt on a bike stands in on a road in front of a beach while others are on the sand near the water in the background a young boy playing tennis a boy hits a tennis ball a boy hits a tennis ball with a racket
flickr30k.zipを試した
flickr30k.zipを試したら、データ変換でエラーになった。(KeyError)
https://github.com/dsanno/chainer-image-caption/blob/master/src/convert_dataset.py#L27
の51を79まで増やしたら変換はできた。
trainは、GPUのメモリが足りないというエラーで失敗。
chainer-image-caption$ python src/train.py -g 0 -s dataset.pkl -i vgg_feats.mat -o model/caption_gen word count: 7416 Traceback (most recent call last): File "src/train.py", line 168, in <module> train(args.iter) File "src/train.py", line 137, in train loss.backward() File "/home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/variable.py", line 349, in backward gxs = func.backward(in_data, out_grad) File "/home/kubotad/.virtualenvs/ml1/local/lib/python2.7/site-packages/chainer/functions/connection/linear.py", line 48, in backward gW = gy.T.dot(x).astype(W.dtype) File "cupy/core/core.pyx", line 257, in cupy.core.core.ndarray.astype (cupy/core/core.cpp:6859) File "cupy/core/core.pyx", line 281, in cupy.core.core.ndarray.astype (cupy/core/core.cpp:6649) File "cupy/core/core.pyx", line 309, in cupy.core.core.ndarray.copy (cupy/core/core.cpp:7066) File "cupy/core/core.pyx", line 87, in cupy.core.core.ndarray.__init__ (cupy/core/core.cpp:4935) File "cupy/cuda/memory.pyx", line 275, in cupy.cuda.memory.alloc (cupy/cuda/memory.cpp:5497) File "cupy/cuda/memory.pyx", line 414, in cupy.cuda.memory.MemoryPool.malloc (cupy/cuda/memory.cpp:8058) File "cupy/cuda/memory.pyx", line 430, in cupy.cuda.memory.MemoryPool.malloc (cupy/cuda/memory.cpp:7984) File "cupy/cuda/memory.pyx", line 337, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc (cupy/cuda/memory.cpp:6952) File "cupy/cuda/memory.pyx", line 357, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc (cupy/cuda/memory.cpp:6779) File "cupy/cuda/memory.pyx", line 255, in cupy.cuda.memory._malloc (cupy/cuda/memory.cpp:5439) File "cupy/cuda/memory.pyx", line 256, in cupy.cuda.memory._malloc (cupy/cuda/memory.cpp:5360) File "cupy/cuda/memory.pyx", line 31, in cupy.cuda.memory.Memory.__init__ (cupy/cuda/memory.cpp:1534) File "cupy/cuda/runtime.pyx", line 180, in cupy.cuda.runtime.malloc (cupy/cuda/runtime.cpp:2950) File "cupy/cuda/runtime.pyx", line 110, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:1865) cupy.cuda.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory