Essay
Action Recognition Overview
Legacy presentation notes on action recognition, optical flow, and video CNNs.
Action Recognition Overview
灏忕粍鎴愬憳锛氭櫙鏅?璋㈣嫳鏉?鍚存€濊繙
Definition
Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents’ actions and the environmental conditions.
- Spatial information (image)
- Temporal information (motion)
Dataset
- UCF101 - Action Recognition Data Set.
- HMDB51: A Large Video Database for Human Motion Recognition.
- Sports-1M Dataset (not available)
- Human Activity Video Datasets
- THUMOS Challenge 2015 Action Recognition in Temporally Untrimmed Videos
- …
Intuition
Optical flow
“
Traditional method
Dense trajectories and motion boundary descriptors for action recognition (Wang et al., 2013)

Spatio-Temporal ConvNets
- Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014

Note: The motion information didn鈥檛 add all that much…
- Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan and Zisserman 2014

“Slow fusion” VS “Two-Stream”
- Spatio-Temporal ConvNets

- Two-Stream Convolutional Networks


- 鍒╃敤鍏夋祦鏄捐憲鎻愬崌绮惧害銆?
- 鍒╃敤pretrained model鐨勭壒寰佹彁鍙栫殑浼樺娍銆?
- Two-Stream 鏄剧劧姣斾箣鍓嶇殑鏁堟灉瑕佸ソ寰堝銆?
- Learning Spatiotemporal Features with 3D Convolutional Networks, Tran 2015

- 瀹為獙缁忛獙寰楀埌 3x3x3 kernel 鏁堟灉鏈€濂姐€?
- 鍗风Н鏍哥殑鏃堕棿缁村害鐨勯暱搴︿篃鍙湁3锛屾槸鍚﹁幏寰楄繍鍔ㄤ俊鎭憿锛?
- 缂虹偣: 鏃犳硶浣跨敤棰勮缁冨ソ鐨?缁村嵎绉綉缁溿€?
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al., 2015

- CNN -> feature -> RNN(LSTM)
- 鑾峰緱闀挎湡搴忓垪淇℃伅銆?
- 妯″瀷鏇村鏉傦紝闇€瑕佸厛鍚庤缁冧袱涓ā鍨嬨€?
- 濡傛灉鎶奀NN鍜孯NN缁撳悎锛屼細濡備綍鍛紵
Problems
- 鏃犳硶瀹屽叏鍒╃敤鍗风Н绁炵粡缃戠粶瀛︿範闀挎湡鐨勮棰戜俊鎭€?
- 瑙嗛鏁版嵁闆嗘牱鏈噺涓嶅澶э紝瀹规槗杩囨嫙鍚堬紝浣嗘槸鏁版嵁闆嗘€讳綋浣撶Н澶с€?
- 澶ч儴鍒嗙殑妯″瀷杩橀渶瑕佷緷璧栧厜娴佺殑甯姪銆?
Expectation
- 瑙嗛鐩搁偦甯у瓨鍦ㄥぇ閲忕殑鍐椾綑锛屽笇鏈涜兘闄嶄綆璁$畻澶嶆潅搴︼紝鎻愰珮鏁堟灉銆?
- 闅忕潃纭欢锛堢壒鍒槸GPU锛夌殑鍙戝睍锛屽湪涓嶄箙鐨勫皢鏉ワ紝鍙互鏈夊儚鐜板湪杩欐牱鐨勫浘鐗囨暟鎹泦锛圛magenet锛夛紝璁粌鏃堕棿涔熷彲浠ユ帴鍙椼€?
- 瑙嗛钑村惈鐫€姣斿浘鐗囧寰堝鐨勪俊鎭€?
- 鐩告瘮浜庡浘鐗囷紝瑙嗛鍏锋湁鏌愮搴忓垪锛屽彲浠ョ敤鍦ㄦ棤鐩戠潱寮忓涔犱腑鐨勭洃鐫e洜绱犮€?