I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest User Generated Content Video System

Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, Sue Moon
Proc. ACM Internet Measurement Conference (IMC), San Diego, CA, October 2007

User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world's largest UGC VoD system. Based on a large amount of data collected, we provide an in-depth study of YouTube and other similar UGC systems. In particular, we study the popularity life-cycle of videos, the intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g., utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distortions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications.

[PDF (1,343KB)]

@inproceedings{imc2007cha,
   author =       "Meeyoung Cha and Haewoon Kwak and Pablo Rodriguez and Yong-Yeol Ahn and Sue Moon",
   title =        "{I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest User Generated Content Video System}",
   booktitle =    {ACM Internet Measurement Conference},
   year =         {2007},
   month =        {October}
}



Data

We share our traces on user-generated videos for the wider community use. Our traces include meta-information about videos from YouTube and Daum services. We provide snapshot of all videos in some of their video categories. For more information on the traces, please refer to our paper.
If you have a publication using our trace, please let us know by email at haewoon ATT an.kaist.ac.kr.

YouTube Entertainment Category

  • Format: url | length | views | ratings | stars
  • Example: /watch?v=abc|01:30|100|5|4.0
  • Description:
    This trace provides meta-information of all the videos in Entertainment category. Each line represents a single video. The example above indicates that the length of YouTube video http://www.youtube.com/watch?v=abc is 1:30 or 90 seconds and this video was viewed 100 times. 5 users rated this video and the average score of rating was 4.0. Please note that there may be empty fields in our traces.
  • Download YouTube Ent Trace (collected at December 21, 2006, number of videos = 1,687,506)


    YouTube Science & Technology Category

  • Format: url | length | views1 | ratings1 | user_id | upload_date | views2 | comments2 | favorited2 | ratings2 | stars2 | honors2 | links2 | related2
  • Example: watch?v=abcd1234567|01:30|100|5|mia|January 16, 2007|200|10|10|10|4.0|5|10 https://www.myspace.com/::13 https://www.blogspot.com|/watch?v=a /watch?v=b
  • Description:
    This trace provides meta-information of all the videos in Science & Technology category. This category is now called "Howto & DIY." The example above indicates that video http://www.youtube.com/watch?v=abcd1234567, uploaded by user ID mia, has length of 1:30 or 90 seconds. The views1, ratings1 fields reflect the number of views and ratings collected at January 15, 2007 -- which in this example are 100 and 5, respectively.
    We have collected video information for the same set of videos after a month. Views2, ratings2, ..., related2 fields indicate the number of views, comments, favorites, ratings, stars, honors, linking pages and their clicks, and related videos, collected at February 14, 2007. Please note that deleted videos will appear with empty fields in our trace.
    Linking videos are shown as a tuple of clicks page_url, concatenated by :: sign. The example above indicates that 10 clicks were made from myspace.com web site and 13 clicks, from blogspot.com web site. Finally, related2 shows the list of related selected by YouTube. Note that both linking pages and related videos in our traces are based on any information shown in the front page of the corresponding video (i.e., there may be other linking pages and related videos).
  • Download YouTube Sci Trace (collected at January 15 / February 14, 2007, number of videos = 252,255)


    Daum Food and Travel Categories

  • Format: video_id | upload_date | length | user_id | recommended | views
  • Example: /ClipView.do?clipid=994690&type=chal|06.11.03|322|80757|3|267
  • Description:
    Each line includes the meta-information of a video. The example above indicates that Daum video with URL /ClipView.do?clipid=994690&type=chal, uploaded by user 80757, has length of 322 seconds (or 5:22). The view and recommended fields show the number of views and recommendations for the corresponding video, collected at April, 12, 2007 -- which in this example are 267 and 3, respectively.
  • Download Daum Food Trace (collected at April 3, 2007, number of videos = 1,393)
  • Download Daum Travel Trace (collected at April 12, 2007, number of videos = 9,295)


  • Contact

    Meeyoung Cha (meeyoung.cha ATT gmail.com)
    Haewoon Kwak (haewoon ATT an.kaist.ac.kr)