This article is automatically translated.
In this research, we constructed a system that recommends attractive and relevant image thumbnails for videos. Video sharing services have indicators such as comments that show the attention each frame receives, in addition to frames for each video. Time periods with high comment density per frame are the highlights of the video. Frames with high comment density are assumed to explain the content of the video the most and are defined as attractive. In this case, videos without comment density assigned to each frame cannot be judged. In our method, we constructed a comment density estimation model that predicts comment density from frames using 600,000 frames and comment densities. Additionally, we constructed a multi-tag prediction model to identify video tags. By using these two models, we became able to predict the comment density and tags for each frame. Comment density and tag scores are calculated for all frames. The frames with the highest scores, which are the product of the tag score and comment density for the specified tag, are recommended as attractive frames.
This research was conducted by Yamaguchi (@wktk0), who interned at Dwango Media Village. Over the course of about six months, I received cooperation from many people and was able to complete the project successfully. During the internship, I learned a wide range of things from basic to applied concepts every day. I was particularly happy to be involved in the video domain, which is of great interest to me. Thank you very much.