Monday, November 26, 2012 - 05:12

Using the Cache in Drupal. How does it work and when to use it.

The caching system in Drupal can be a very powerful ally if you need to get some performance out of your module. It however has some pitfalls that can ruin your day.

Introduction

There are two options for you to cache data in Drupal. The first one is in memory storage for the duration of a page load. The other is a more general caching in Drupal's actual cache. They can - and should be - combined if applicable.
function fu()
{
	$data = do_expensive_task(); // DO VERY BAD THINGS HERE THAT TAKE TIME;
}

Every time you call function fu() whatever horse crap you are doing in fu() is going to be repeated. This may be necessary. It however may very well be not or it is undesired even if the data has changed over the time passed between the two calls. Say you display page hits in a block and in a node. If your page needs enough time to produce more hits may already have occurred and the data in the block may claim something else than the one on the page. While this is correct it may be desired to withhold this fact from the user as it simply looks broken.

Static it

To prevent the problem above you can declare the variable holding the data as static.
function fu()
{
	static $data;
 
	if( empty( $data ) ) $data = do_expensive_task(); // DO VERY BAD THINGS HERE THAT TAKE TIME;
}
Static variables keep their data until the script terminates. If you desire the same data twice this is a very convenient way to bypass potential performance issues. The call to the block will not only deliver the same data as the node. It will also require no more processing as the variable is already populated. One thing to remember here. If you do need fresh data you need to include an option to clear the variable and enforce reprocessing. A simple boolean parameter will do.
function fu( $refresh = FALSE )
{
	static $data;
 
	if( empty( $data ) || $refresh ) $data = do_expensive_task(); // DO VERY BAD THINGS HERE THAT TAKE TIME;
}

Calling fu() will populate $data or use it as it is...if not empty. Calling fu( true ) will always populate or repopulate $data.

This is very efficient if you need fresh data but you need it only once during a page load or under certain circumstances. It however does not cache any data over page loads. If you include remote data or have data that doesn't change rapido you'll do it over and over again for every page load. This can be required. Usually however it is not. A good caching strategy can make the difference between sluggish and speedy.

Drupal's Cache

Let's start with something important here that should be obvious. Don't cache twice. If you get data from another module that data may - and if done right - will already be cached. So be careful not to cache data twice.

The two important functions to utilize Drupal's cache are cache_set() and cache_get(). As the names suggest the first feeds data into the cache while the second retrieves it again.

The setter function has the following signature: cache_set( $cid, $data, $bin = 'cache', $expire = CACHE_PERMANENT )

  • $cid - the Id of the cache.
  • $data - the data to be cached. Serialized on demand.
  • $bin - the cache to use. For general purpose this is 'cache'.
  • $expire - the expiration time in s. Note that the default is permanent.

cache_set( 'cache_object', $insane_data, 'cache', 3600 ) will cache whatever $insane_data holds into cache_object with an expiration time of 1hr ( 60 seconds x 60 minutes ).

It's counterpart is cache_get( $cid, $bin = 'cache' ). A simple $obj = cache_get( 'cache_object' ) will retrieve the cached object again.

There however is a pitfall here. First we obviously need to check if that object actually was retrieved. Speak is it set and not empty. The other part is less obvious. We need to check for expiration. The cache is cleared by Drupal for example when a cron job runs. After that expired entries will be gone with the wind. If we however retrieve an object in between it can be expired but still cached. So we always need to check the object is still valid before actually using it. In my opinion it would be a lot slicker if this process would be more transparent. I.e. the object is only returned by cache_get if it exists and has not expired. Since this is not the case we have to watch out for expired objects ourselves. A more complex example below:

function fu( $refresh = FALSE )
{
	static $data;
 
 
	// we have no insanely expensive data yet or we want to refresh it.
 
	if( empty( $data ) || $refresh )
	{
 
		// cached data ok?
 
		if( !refresh )
		{
			// do we have it cached?
 
			$cache = cache_get( 'expensive_data' );
 
			// is that a valid cache object? We assume an expiration of 1hr here.
 
			if( ! empty( $data ) && ( $data->expire > time() ) )
			{
				// We do have it cached and it still is valid.
 
				$data = $cache->data;
			}
			else
			{
				// Not cached or object has expired so do our mumbo jumbo
 
				$data = do_expensive_task();
 
				// we might want to cache it now for 1hr.
 
				cache_set( 'expensive_data', $data, 'cache', time() + 3600 );
			}
		}
		else
		{
			// we don't want any cached data.
 
			$data = do_expensive_task();
 
			// but we might still want to cache it now.
 
			cache_set( 'expensive_data', $data, 'cache', time() + 3600 );
		}
	}
 
	return $data;
}

The function above does do_expensive_task() only if it is required. In our example that means we either want it to or 1hr has passed since we last did the mumbo jumbo. Whatever is being stressed doing it will thank us. Remote queries that do not require real time information are predestined for caching. Your Piwik stats for example. There's no need to store them twice. But it makes sense to cache them in Drupal if you don't want to query them on every page load.

Final word

Caching is a database look up. It's a quite simple one so it's not very likely that you can kill your performance when using the cache instead of getting live data. Unless you do some very simple look ups virtually any look up will be more complex than a cache look up. Overly excessive caching however won't give you much performance. In rare instances it can have a negative impact. Always keep in mind that something might already be cached! Cache the big boys. Tasks that are computationally intensive. Complex database queries and most certainly remote queries. Also keep in mind that the cache is not a storage replacement. It's a cache and can be flushed any time of the day by any module or any admin. Don't keep data in the cache that is supposed to be somewhere safe. Your Piwik stats go into the cache. Your twitter feed however should be somewhere else.

Add new comment

This form is protected by Google Recaptcha. By clicking here you agree to include Google Recaptcha for this session. The page will reload and the form will become avaiable.