Tuning squid and videocache for youtube

 

How youtube cache at googlevideo works
If FireFox caches many times the video, squid too?

Don't cache yourself if you are using videocache

Don't cache twice the video with videocache

What to do with video.google.com and localised

Josep Pujadas i Jubany

?

1-March-2009


How youtube cache (at googlevideo) works

Add to your FireFox the following extensions:

Ensure that your FireFox doesn't use your (squid) proxy server.

Open Live HTTP Headers, in order to capture the headers.

If you clean the cache of your FireFox your testing will be easier.

Watch any video at www.youtube.com. Better if you chose a small video, you will spent less time for testing.

With Live HTTP Headers you will see that when you fetch a video at youtube you receive a 303 HTTP response redirecting you to a new URL.

HTTP/1.x 303 See Other
Date: Sat, 28 Feb 2009 18:29:20 GMT
Server: Apache
X-Content-Type-Options: nosniff
Expires: Tue, 27 Apr 1971 19:44:06 EST
X-YouTube-MID: ZARg7-aAGvgIhj98y1Y9zFBt-qQ8BYvKA_FElHA1RfN3uICq-TYJCQ
Cache-Control: no-cache
Location: http://v2.cache.googlevideo.com/videoplayback?id=986fef5cdbeb7e25&itag=34&ip=00.00.000.000&signature=1BAD9AFC21CC55D09A6DE0F3526E78B7889D945F.4A46A515C1A1647A1D48F44220C0866F88B30E13&sver=2&expire=1235867360&key=yt1&ipbits=0
Keep-Alive: timeout=300
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
----------------------------------------------------------
http://v2.cache.googlevideo.com/videoplayback?id=986fef5cdbeb7e25&itag=34&ip=00.00.000.000&signature=1BAD9AFC21CC55D09A6DE0F3526E78B7889D945F.4A46A515C1A1647A1D48F44220C0866F88B30E13&sver=2&expire=1235867360&key=yt1&ipbits=0

GET /videoplayback?id=986fef5cdbeb7e25&itag=34&ip=00.00.000.000&signature=1BAD9AFC21CC55D09A6DE0F3526E78B7889D945F.4A46A515C1A1647A1D48F44220C0866F88B30E13&sver=2&expire=1235867360&key=yt1&ipbits=0 HTTP/1.1
Host: v2.cache.googlevideo.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ca; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: es-us
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
content-disposition: attachment; filename=video.flv
Last-Modified: Sat, 27 Dec 2008 15:27:28 GMT
Content-Type: video/x-flv
Content-Length: 2179290
Expires: Sat, 28 Feb 2009 19:29:20 GMT
Cache-Control: public,max-age=3600
Connection: close
Date: Sat, 28 Feb 2009 18:29:20 GMT
Server: gvs 1.0

This new URL:

You can see the content cached by FireFox using CacheViewer. If you order your cache by MIME Type and look to video/x-flv you will find your video. Even you can save your object as a flv file!

If you refresh the video page with FireFox you will see (with Live HTTP Headers) that you are redirected each time to a new URL. And with CacheViewer you will one video cached for each refresh!

Yes, www.youtube.com eats your bandwidth and your harddisk! Each time you ask for the a video you have a new download! Of course you can see also this with the FlashPlayer downloading/playing bar, but I wanted to be sure looking the HTTP dialog. The only way not to download a new time the video it is using the Replay feature of FlashPlayer.


If FireFox caches many times the video, squid too?

If youtube cache at googlevideo uses an unique URL for each video fetching and the content expire after one hour it doesn't make any sense to cache it with squid.

There are a lot of web pages about squid telling how to cache these contents. Some of them:

But I recommend to you just to ensure you are'nt caching with squid these contents. Choose one of these two solutions for your squid.conf (please see the NOTE at wiki.squid-cache.org/ConfigExamples/DynamicContent):

acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl GOOGLEVIDEO urlpath_regex /videoplayback\?id= /get_video\?origin=
cache deny GOOGLEVIDEO

acl YOUTUBE urlpath_regex /get_video\?video_id=
cache deny YOUTUBE

refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

Change your settings (if needed), restart squid and configure your FireFox to use your (squid) proxy server.

Looking your store.log:

perl -pe 's/^\d+\.\d+/localtime $&/e;' /usr/local/squid/logs/store.log 

you should see a RELEASE -1 FFFFFFFF line for your video:

Sat Feb 28 20:33:59 2009 RELEASE -1 FFFFFFFF 82E0348C925D182F5022C41DF5046C27 200 1235849627 1186629713 1235853227 video/flv 440391/440391 GET http://tc.cache.googlevideo.com/get_video?

I used squidpurge to clean may squid cache and I saved 30 GByte!


cleaning.sh
#!/bin/sh
purge -p localhost:3128 -P 1 -sf cleaning.txt
cleaning.txt /videoplayback\?id=
/get_video\?origin=
/get_video\?video_id=

Don't use squidpurge at production time. It is slow and time consuming, www.wa.apana.org.au/~dean/squidpurge/README.

squid must be working, but preferable without videocache redirector. If videocache is working the PURGE action is seen for the redirectors and videocache can try to download many videos from expired URLs.


Don't cache yourself if you use videocache

If you have the (squid) proxy server running with videocache redirector and the local webserver in the same machine it doesn't make sense to cache with squid the contents cached by videocache.

Depending of your cache policy (discussed above) it is possible you need in your squid.conf something like:

acl CACHE_HOST dstdomain proxy.domain
cache deny CACHE_HOST
acl CACHE_HOST_IP dst 192.168.0.3
cache deny CACHE_HOST_IP

assuming you are using proxy.domain or 192.168.0.3 at videocache.conf for your cache_host:

cache_host = proxy.domain
cache_host = 192.168.0.3


Don't cache twice the video with videocache

Don't use google caching with videocache. Use only youtube caching. Put at your videocache.conf:

enable_youtube_cache = 1
enable_google_cache = 0

Like this you will have only two downloads for each new youtube video. One for the workstation and second for videocache caching.

If not, you will have a third download because videocache will also cache the 'temporary' video at googlevideo.com.

If you look at /var/spool/videocache/youtube you should have only files with names composed by 11 characters. These are the 'static' video id at www.youtube.com.

If you have files with 'temporary' id  (16 characters long) you are doing double unnecessary caching for www.youtube.com.

I used videocache 1.8 & 1.9  to test this ...


What to do with video.google.com and localised

If some user goes to googlevideo.com it is permanently moved to video.google.com (301 HTTP response).

In my case video.google.com and localised (video.google.cat, video.google.es, video.google.fr, ...) are not much used. More than that, the majority of videos at video.google.com are www.youtube.com videos today.

And, in my opinion, there is a bug at videocache 1.8 & 1.9 that impedes caching for video.google.com. Please see at cachevideos.com/forum/post/httpvideogoogleesvideoplaydocid-933124738927337625

For more information please go to the videocache website, cachevideos.com.