EhCache – Using Multiple CacheManagers

Well since today in my posts about EhCache I’ve always been talking and assuming of using only one CacheManager, by specifying SingletonEhCacheProvider as the cache provider for Hibernate. But there are situations when you would not be able(or not want to use SingletonEhCacheProvider). Say for example you would have two Hibernate connections for two different database connections and you want to keep things separate in two cache managers each with it’s own ehcache.xml config file.
I’m going to show you how this can be done. You can skip right to the short list at the the end, or read the detailed thing.

        <property name="hibernateProperties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.MySQL5Dialect</prop>
                <prop key="hibernate.cache.use_query_cache">true</prop>
                <prop key="hibernate.cache.use_second_level_cache">true</prop>

                <prop key="hibernate.cache.provider_class">net.sf.ehcache.hibernate.EhCacheProvider</prop>    <!-- this instead of SingletonEhCacheProvider -->
                <prop key="net.sf.ehcache.configurationResourceName">ehcache.xml</prop> <!-- this way you can specify the location of the config file for every CacheManager -->
            </props>
        </property>

Well when using SingletonEhCacheProvider and doing our thing of getting the CacheManager instance.

        CacheManager cacheManager = CacheManager.getInstance();

We see in the log

2010-02-22 WARN net.sf.ehcache.CacheManager - Creating a new instance of CacheManager using the diskStorePath "C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\" which is already used by an existing CacheManager.

So we see that a new CacheManager is being created by our call of CacheManager.getInstance(). Since we do not actually want to create a new instance of a CacheManager, but to access one of the caches, we need a way of first obtaining the right CacheManager from which to obtain the desired cache. Well CacheManager has a static ALL_CACHE_MANAGERS member variable which is actually the list of … you guessed it – ALL the CacheManagers you started(or Hibernate did for you).

public class CacheManager {
    public static final List<CacheManager> ALL_CACHE_MANAGERS = new CopyOnWriteArrayList<CacheManager>();
....
}

The problem still remains on how to actually obtain your desired CacheManager from this list containing ALL of them. There must be a way to distinguish between them, and there is actually the simplest way of doing this, by setting a name to your CacheManager and then look through the list for the CacheManager with that name. The problem is that it’s not evident how to set this name. If we browse through the source code, we see that the name of the CacheManager is actually taken from the ehcache.xml config file, and so we should specify it there.

<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="ehcache.xsd"
             updateCheck="true" name="myCacheManager">
....

Now we can maybe save it in a static variable so maybe save the time on lookup.

public class CacheUtil {
  public static CacheManager myCacheManager;

  public CacheManager getMyCacheManager() {
    if(myCacheManager == null) {
    for (CacheManager cacheManager : CacheManager.ALL_CACHE_MANAGERS) {
       if("myCacheManager".equals(cacheManager.getName())) {
          myCacheManager = cacheManager;
       }
    }
    return myCacheManager;
  }
}
Short list
  1. We want not to use more than one CacheManager so SingletonCacheManager not an option.
  2. solution: use as the cache provider for Hibernate EhCacheProvider
  3. problem: call of CacheManager.getInstance() creates new instance of a CacheManager instead of giving the singleton instance as in the case. Need another way of getting the right CacheManager
  4. solution: use CacheManager.ALL_CACHE_MANAGERS which is a list of CacheManagers
  5. CacheManager has a name property which we can use to get the right CacheManager from the ALL_CACHE_MANAGERS list.
  6. ehcache.xml can be used to set the property name of the CacheManager
Advertisements

Caching – Part III – Distributed caches

We’ve moved here

In this part we’ll be talking about general notions of a distributed cache and an actual implementation in EhCache by using JGroups. A distributed cache is useful if you have a group of processes(even running on different machines) and you want these processes to share the data in the cache. Meaning that you want the other process know if elements from the cache have been changed by one of the processes. In EhCache or JbossCache for example each process holds a local copy of the data and replication messages(adding elements, removing, etc.) are sent from one process to the others when a modification in the cache occurs.

Another type of a distributed cache is a partitioned distributed cache, one where part of data lives in one process(computer) while other parts of the data is distributed among other processes(computers), sometimes with a backup copy of the data in another process. This could be useful if the amount of data is very large, or it is more efficient to partition the data and have the machines work on a query in parallel on their subset of data.

A nice simple to use open source solution that offers data structures like maps and lists but with data distributed in a cluster is HazelCast. Another product that is actually a full data grid solution could be Oracle Coherence(formerly Tangosol Coherence, the inventors of partitioned caching). They offer data distribution with backup copies, locality of data(sending the processing commands to processes that have the needed data, and not having to send that data through the network to a process that does not have it), write through cache to a persistent store(database for example), in a word, very powerful but with a matching price tag.

Synchronous or Asynchronous replication?

The top issue when talking about data replication from one process to another is if the data replication is to be synchronous or asynchronous.

  1. In synchronous data replication a put request from a process will block all other processes access to the cache until it successfully replicates the data change to all other processes that use the cache. You can view in a term of a database transaction. It will update this process’s cache and propagate the data modification to the other processes in the same unit of work. This would be the ideal mode of operation because it means that all the processes see the same data in the cache and no ever gets stale data from the cache. However it’s likely that in a case of a distributed cache, the processes live on different machines connected through a network, the fact that a write request in one process will block all other reads from the cache this method may not be considered efficient. Also all involved processes must acknowledge the update before the lock is released. Caches are supposed to be fast and network I/O is not, not to mention prone to failure so maybe not wise to be very confident that all the participants are in sync, unless you have some mechanism of failure notification.
    • Advantages : data kept in sync
    • Disadvantages : network I/O is not fast and is prone to failure
  2. In contrary, the asynchronous data replication method does not propagate an update to the other processes in the same transaction. Rather, the replication messages are sent to the other processes at some time after the update of one of the process’s cache. This could be implemented for example as another background thread that periodically wakes and sends the replication messages from a queue to the other processes. This means that an update operation on a process to it’s local cache will finish very fast since it will not have to block until it receives an acknowledgment of the update from the other processes. If a peer process is not responding to a replication message, how about retrying later, but in no way hinder or block the other processes.
    • Advantages : Updates do not generate long blocks across processes. Simpler to deal with, for example in case of network failure maybe resend the modifications
    • Disadvantages : Data may not be in sync across processes

In most scenarios, considering the fact that when using caches it’s acceptable to receive stale data, asynchronous caches are in most cases preferred.

Protocol or method of replication – Multicast

The next important thing you need to decide is what protocol or method of replication you should pick. The most widely used method, or at least one that you should be aware it exists is by multicasting. In IP Multicast there is a specific range of multicast addresses, and a packet sent to one such address is propagated to all the computers who have processes using that same address. It’s the same thing as in a JMS Topic – one sent message, multiple receivers who subscribed to your topic.
The nice thing when using multicast is that you do not have to know beforehand who your receivers will be, another computer can join the group by using the same multicast address and be part of the conversation without any modification to the configuration files of the other computers to add the new computer’s IP. This can be used for horizontal scalability, just add another computer to the network and that computer can join your cluster without modifications on the configuration of the already running processes.
The problem with IP multicast however is that technically is implemented on top of UDP and not TCP(UDP is not reliable by itself, packets might not reach the destination and the sender be unaware of this, also ordering of packet arrival at the source is not guaranteed), the sender does not know if any of it’s listeners missed on receiving a message. This is acceptable for what multicast was intended for: streaming audio and video, where a discarded frame would not be catastrophic for the overall viewing experience of a movie clip and having the player wait for the successful transmission of a frame would be more of a degradation. But multicast had to be made reliable to be used for distributed computing, in case a subscriber missed say a replication message of a put event on the cache. This could be achieved if the listeners are expected to send back to the sender an acknowledgment message. Enter into the picture the cool open source java library JGroups it’s specialty is to make “groups” of processes, and the members can send messages to the other participants.

JGroups

JGroups exposes an API for dealing with high group level communication like sending messages between the participants in a group, and internally it uses a stack of protocols one atop the other which can be configured together to implement the desired behavior. For example a typical stack of protocols (JGroups can be configured through a xml file):

 <config>
  <UDP mcast_addr=228.10.10.10 mcast_port=45588/> <!-- sets the underlying protocol to be used as UDP multicast -->
  <PING timeout="2000"/> <!-- discovery of members through multicast ping -->
  <MERGE2/>  <!-- handles regrouping of the members. Kicks in after a healed network partition when subgroups might have been created. -->
  <FD_SOCK/>   <!-- Failure detection of group members -->
  <VERIFY_SUSPECT timeout="1500"  />  <!-- double check that a suspected failed member had really failed -->
  <pbcast.NAKACK retransmit_timeout="2400,4800"/> <!--makes multicasting reliable and keeps the order of messages-->
  <UNICAST/>  <!-- makes UDP unicast messages reliable and ordered, UDP unicast is still used for sending messages to a specific member of a group for example-->
  <pbcast.STABLE/> <!--messages that have not been acknowledged as received are kept in memory for retransmission, and this protocol deals with removing those that have been acknowledged by all the subscribers-->
  <pbcast.GMS/> <!-- Group Membership, handles joining or leaving of members from the group -->
</config>

The comments are self explanatory. We have added some layers on top of standard UDP multicast to make multicasting reliable, and be notified when another process joins or leaves our group.
Although it seems a bit complicated these are mostly standard setups that can be reused and need little changing, but if you know what you are doing, you could play around with the parameters for every layer since some layers do have lots of parameters to play with.
The good thing is that changing the protocol stack and thus the method of replication can be obtained by modifying the xml config file, without change to the source code. For example we could make the underlying replication mechanism use tcp instead of udp multicast
But using tcp means that we need to configure the ip of every participant beforehand in order to have the processes open unicast connections to them. This is not a problem in setups where you do not have to dynamically add a computer to the cluster, and you would get the benefit of reliability inherent from using tcp.

JGroups is a proven library, that is used in many open source and commercial projects, and I encourage you to have a look at it especially if you are interested in distributed computing. But JGroups is not the only available replication method available for EhCache. Other methods of replication include using a JMS server and have the processes listen for JMS messages. Another replication method available is standard RMI among the processes and finally using a Terracota server. Since EhCache was acquired by Terracota I suppose they will try to make the integration with the Terracota server very easy, but hope they will not neglect the other replication methods to get more people using Terracota replication (currently Ehcache JGroups replication does not work with the latest JGroups 2.8.0 library version but with the previous 2.4.7 version).

Cache listener

The starting point for replication in EhCache is the ability to add a cache listener to be able to intercept when a modification to the cache occurs. So basically since a cache listener is informed of the cache modification it will send an update message to the other processes that will take the same action on their version of the cache.

To add your own cache listener you need to implement the CacheEventListener interface in your custom class and have another factory class(you need a class that extends the abstract class CacheEventListenerFactory) that returns your custom cache listener. Next you add this cache listener factory to the cache. This can be done in the ehcache.xml configuration file. An example:

public class MyCacheEventListenerFactory extends CacheEventListenerFactory {
    
    public CacheEventListener createCacheEventListener(Properties properties) {
        String configValue = (String) properties.get("someProperty1");
        return new MyCacheEventListener();
    }
}

public class MyCacheEventListener implements CacheEventListener {

    public void notifyElementRemoved(Ehcache cache, Element element) throws CacheException {
        System.out.println("Element was removed");
    }

    public void notifyElementPut(Ehcache cache, Element element) throws CacheException {
        System.out.println("Element was put");
    }

    public void notifyElementUpdated(Ehcache cache, Element element) throws CacheException {
        System.out.println("Element was updated");
    }

    public void notifyElementExpired(Ehcache cache, Element element) {
        System.out.println("Element expired ");
    }

    public void notifyElementEvicted(Ehcache cache, Element element) {
        System.out.println("Element evicted");
    }

    public void notifyRemoveAll(Ehcache cache) {
        System.out.println("Elements removed");
    }
}

and ehcache.xml configuration:

    <cache name="customCache"
           maxElementsInMemory="30"
           eternal="false"
           timeToLiveSeconds="300"
           overflowToDisk="false">

        <cacheEventListenerFactory class="com.balamaci.MyCacheEventListenerFactory" properties="someProperty1=true"/>

    </cache>
Implementing JGroups replication

To set up replication, EhCache provides cache listeners classes for every replication method(JMS, RMI and JGroups). Let’s see how we can configure JGroups replication. First download the jgroups-all.jar library and add it to the project classpath. Also ehcache-jgroupsreplication.jar must be in your classpath:

<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../config/ehcache.xsd"
             updateCheck="false" monitoring="autodetect">

<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
     properties="connect=UDP(mcast_addr=231.12.21.132;mcast_port=45566;ip_ttl=32;
     mcast_send_buf_size=150000;mcast_recv_buf_size=80000):
     PING(timeout=2000;num_initial_members=6):
     MERGE2(min_interval=5000;max_interval=10000):
     FD_SOCK:VERIFY_SUSPECT(timeout=1500):
     pbcast.NAKACK(gc_lag=10;retransmit_timeout=3000):
     UNICAST(timeout=5000):
     pbcast.STABLE(desired_avg_gossip=20000):
     FRAG:
     pbcast.GMS(join_timeout=5000;join_retry_timeout=2000)"
 propertySeparator="::"
     />


<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"/>
    <cache name="customCache"
           maxElementsInMemory="30"
           eternal="false"
           timeToLiveSeconds="300"
           overflowToDisk="false">

        <cacheEventListenerFactory
           class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
           properties="replicateAsynchronously=true, replicatePuts=true,
           replicateUpdates=true, replicateUpdatesViaCopy=true, replicateRemovals=true,
           asynchronousReplicationIntervalMillis=1000" />

        <bootstrapCacheLoaderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory"/>
    </cache>

First the JGroups protocol stack that we already talked about is configured in the JGroupsCacheManagerPeerProviderFactory.
We can see that the JGroupsCacheReplicatorFactory class also received a set of properties. Most are self explanatory:

  • replicateAsynchronously = true or false. It’s about what we talked about asynchronous or synchronous replication.
  • replicatePuts, replicateRemovals, … = true or false if any of these types of modifications are to be sent to the peer processes or not
  • replicateUpdatesViaCopy = true or false if the updated object is passed along with the update message, otherwise a remove message is sent to the peer caches so that a cache miss occurs and the element is refreshed from the persistent storage.

You may notice that we also introduced a bootstrapCacheLoaderFactory element. Well, having the cache listeners is not enough for a good replication implementation. Consider that not all the processes start at the same time. The processes that started late must receive from the others the initial state of the cache(it’s elements) to be in sync, and afterwards they will receive the update messages. This initial state transfer is handled by JGroupsBootstrapCacheLoaderFactory. Nothing else changes in the way the cache is handled programatically, you still put and remove elements in the same way.

Conclusion

We saw how we can implement a distributed cache in EhCache, and that a distributed cache is not hard to obtain. On the other hand, the realm of distributed caches is one that brings additional problems, one that you must be aware for example that other processes can be separated by a network failure, have stale data, or fail to respond for some reason.

Caching with EhCache – Part II

We’ve moved here.

In this part of the article about caching we’ll discuss using EhCache for a different purpose other than as Hibernate second level cache.

Before working with EhCahe, I used a static variable for a cache. For example, I would have:

public class MemoryStore {
    public static List<Player> players;
}

//A main class
public class Main {
    public static void main(String[] args) {
        ClassPathXmlApplicationContext appContext =
                new ClassPathXmlApplicationContext(new String[] {
                        "applicationContext.xml"
                });

        List<Player> lstPlayers = remoteService.getAllPlayers(); //Just a costly method that would justify using a cache

        MemoryStore.players = lstPlayers; //init the static, as early as possible as not to have

        ....
        //Continue further to the business code implementation and the part that would use the MemoryStore.players
    }
}

What are the problems we may encounter when using this setup:

  • Make sure we init the static variable before we use it. Do not start other threads that may use this variable until it has been set.
  • What about refreshing the value of this “cache” after some time? Solution – maybe have another thread that “expires” and refreshes the static cache variables after some time.
  • Do we need to worry about synchronization if a get occurs while we update the cache? We may ignore it and hope that the cache is used rarely that there are very small chances of a read and write ever overlapping. Otherwise since we would not want to synchronize the reads(we would not want that reads block other reads), we could use a ReentrantLock for example, so that reads not block other reads, but writes block reads.

As we can see, although not hard to implement, this classic approach has some disadvantages that come with it.

Let’s modify this example to use EhCache instead of the static variable. We would first get a hold of the cache manager. From the cache manager we can get a reference of an already existing cache, or add a new cache.

Remember the SingletonCacheManager from Part I? We can still have the cache configuration for hibernate and in the same ehcache.xml file declare our own cache. Let’s call it the “customCache” cache.

<cache name="customCache" maxElementsInMemory="30" eternal="false" timeToLiveSeconds="60" overflowToDisk="false"/>

It would be nice that the cache knows how to populate itself and get around the first problem of having to populate it at the beginning of the application. Since it would be an EhCache type of cache, it would have the properties of the cache configured in the ehcache.xml file, and we are especially interested in the expiration property if we want to refresh this cache at some interval. The cache knows how to populate itself, after expiration it would again use the method to retrieve the new values, so the second problem of expiration would be solved.

EhCache offers the class SelfPopulatingCache that extends BlockingCache class. The BlockingCache class is a cache decorator, which allows concurrent read access to elements already in the cache. If the element is null, other reads will block until an element with the same key is put into the cache. This means that we solved our last issue with the old way of using the static variable for a holder and not having to worry about synchronization.

Let’s see it in action:

CacheManager cacheManager = CacheManager.getInstance();

Cache customCache = (Cache) cacheManager.getCache("customCache");

SelfPopulatingCache selfPopulatingCache = new SelfPopulatingCache(customCache, new CacheEntryFactory() {
public Object createEntry(Object key) throws Exception {
if("players".equals((String)key)) {
     List<Player> lstPlayers = remoteService.getAllPlayers();
     return lstPlayers;
}
return null;
}

cacheManager.replaceCacheWithDecoratedCache(customCache, 
selfPopulatingCache); //this method does what the name implies and from now on any call to cacheManager.getEhCache("customCache") will return the SelfPopulatingCache. Be carefull to not call cacheManager.getCache("customCache") since this method will not return null, and not the decorated cache.


List players = customCache.get("players"); //This first call invokes the createEntry method and the cache is populated

customCache.get("players"); //When this gets called the data is pulled from the cache and the createEntry method do not gets called

//After 60 seconds - the value of the timeToLiveSeconds passes
customCache.get("players"); //this call will block until createEntry(Object key) which gets called, will finish repopulating the cache

for example:

        CacheManager cacheManager = CacheManager.getInstance();

        Cache customCache = (Cache) cacheManager.getCache("customCache");

        SelfPopulatingCache selfPopulatingCache = new SelfPopulatingCache(customCache, new CacheEntryFactory() {
           public Object createEntry(Object key) throws Exception {
               log.info("*** Create entry is being called ***");

               if("players".equals((String)key)) {
                   List<Player> playersList = new ArrayList<Player>();

                   playersList.add(new Player(1, "John"));
                   playersList.add(new Player(2, "Serban"));
                   playersList.add(new Player(3, "Weasley"));

                   return playersList;
               }

               return null;
           }
        });
        
        cacheManager.replaceCacheWithDecoratedCache(customCache, 
selfPopulatingCache);

        log.info("Before first call");
        List players = (List) selfPopulatingCache.get("players").getObjectValue();
        log.info("Players " + players.size());

        log.info("Before second call");
        players = (List) selfPopulatingCache.get("players").getObjectValue();
        log.info("Players " + players.size());

        try {
            Thread.sleep(70 * 1000); //sleep so we expire the elements
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        log.info("Before third call - after expired");
        players = (List) selfPopulatingCache.get("players").getObjectValue();
        log.info("Players " + players.size());

        //We put another element in the cache just to show that we can
        selfPopulatingCache.put(new Element("coach", "Robin Hood"));

And the logs:


2009-12-15 15:20:51,609 INFO com.balamaci.Main - Before first call
2009-12-15 15:20:51,609 DEBUG net.sf.ehcache.Cache - customCacheCache: customCacheMemoryStore miss for players
2009-12-15 15:20:51,609 DEBUG net.sf.ehcache.Cache - customCache cache - Miss
2009-12-15 15:20:51,609 INFO com.balamaci.Main - *** Create entry is being called ***
2009-12-15 15:20:51,625 INFO com.balamaci.Main - Players 3
2009-12-15 15:20:51,625 INFO com.balamaci.Main - Before second call
2009-12-15 15:20:51,625 DEBUG net.sf.ehcache.Cache - customCacheCache: customCacheMemoryStore hit for players
2009-12-15 15:20:51,625 INFO com.balamaci.Main - Players 3
2009-12-15 15:22:01,625 INFO com.balamaci.Main - Before third call - after expired
2009-12-15 15:22:01,625 DEBUG net.sf.ehcache.Cache - customCache Memory cache hit, but element expired
2009-12-15 15:22:01,625 DEBUG net.sf.ehcache.Cache - customCache cache - Miss
2009-12-15 15:22:01,625 INFO com.balamaci.Main - *** Create entry is being called ***
2009-12-15 15:22:01,625 INFO com.balamaci.Main - Players 3

We can see in the log the cache being populated as a result of the get call. The cache is not being populated until this first call. For the second call we have a cache hit and the list of players from the cache is returned. After waiting longer than the timeToLiveSeconds parameter, the third call finds the elements expired, so the cache is repopulated by calling again the createEntry method.

As you browse through the sources you may see that there is another class that extends BlockingCache, the UpdatingSelfPopulatingCache, an extension of the SelfPopulatingCache. This class adds the updateEntryValue(Object key, Object value) method which for some reason seems to be called every time a customCache.get(key) method is invoked. I do not see any value for a cache class that we expect to call an update method for every cache read, whoever I’ll still put up an example of usage:

UpdatingSelfPopulatingCache updatingCache = new UpdatingSelfPopulatingCache(teamsCache,            
      new UpdatingCacheEntryFactory() {
            public void updateEntryValue(Object key, Object value) throws Exception {
                log.info("Updating entry for key " + key);
                if(key == 1) {
                    Player player = (Player ) value;
                    player.setName("Smith");
                }
            }

            public Object createEntry(Object key) throws Exception {
                log.info("Creating entry for key " + key);
                if((Integer) key == 1) {
                    return new Player(1, "John");
                }

                return null;
            }
   });
   cacheManager.replaceCacheWithDecoratedCache(customCache, updatingCache);
   
   log.info("Before first call");
   Player player = (Player) updatingCache.get(1).getObjectValue();
   log.info("Got Player " + player);

   log.info("Before second call");
   player = (Player) updatingCache.get(1).getObjectValue();
   log.info("Got Player " + player);

We see that the element is refreshed:

2009-12-15 18:37:14,765 DEBUG net.sf.ehcache.Cache - customCacheCache: customCacheMemoryStore miss for 1
2009-12-15 18:37:14,765 DEBUG net.sf.ehcache.Cache - customCache cache - Miss
2009-12-15 18:37:14,765 INFO com.balamaci.Main - Creating entry for key 1
2009-12-15 18:37:14,765 INFO com.balamaci.Main - Got Player Id=1 Name=John
2009-12-15 18:37:14,765 INFO com.balamaci.Main - Before second call
2009-12-15 18:37:14,765 DEBUG net.sf.ehcache.Cache - customCacheCache: customCacheMemoryStore hit for 1
2009-12-15 18:37:14,765 DEBUG net.sf.ehcache.constructs.blocking.SelfPopulatingCache - customCache: refreshing element with key 1
2009-12-15 18:37:14,765 INFO com.balamaci.Main - Updating entry for key 1
2009-12-15 18:37:14,765 INFO com.balamaci.Main - Got Player Id=1 Name=Jersey

As you can see, the second get call triggers an update which I do not understand why it could be useful. Perhaps some comments would help.

In conclusion, you can choose to use EhCache to store your own data, not just use it as a Hibernate cache and you’ll have the benefit of “out of the box” expiration of elements and repopulation, synchronization, access statistics and if you need overflow to disk.

Another interesting feature of EhCache is the ability to add a cache event listener. By adding a cache event listener to a cache, you can be notified when an element has been added, removed, or expired. To add a custom cache event listener you need to create a class that extends the CacheEventListener and a factory method that extends CacheEventListenerFactory abstract class which will return your custom cache event listener. You can assign this factory to the cache in the ehcache.xml file like this:

<cache name="customCache" maxElementsInMemory="30" eternal="false" timeToLiveSeconds="60" overflowToDisk="false">
        <cacheEventListenerFactory class="com.balamaci.MyCacheEventListenerFactory"/>
</cache>

This is the base they are building upon for registering cache statistics, and for making a distributed cache – every put, update or remove is caught and through different means and protocols replicated to another instance of EhCache-.

In Part III of the EhCache tutorials, I’ll be planning to talk about how we can make a distributed cache, to be accessed by another application even on a remote computer.

Caching with EhCache – Part I

We’ve moved here
The need for caching is quite obvious and I’ll not insist on it in this post. Usually a much lower response time than querying a database. Also saving the resources and not hitting the database, which can be used to handle other requests, should provide a clear picture of the benefits of caching. The exact benefit of using a cache however will depend totally on a particular case, – it is not wise to cache objects that are changing all the time in the back end database if you’ll have much more trouble trying to stay in sync than gain performance, if badly used, you might end up with less performance than using the database-.

We’ll start by using EhCache to implement a second level cache for Hibernate and help us with data retrieval from a MySQL database. We’ll also be using Spring. Any database will do, since we’re actually interesting in the caching part and we’re working with entities after all.

The entity that we’ll be using in our tests is Player which looks like this.

package com.balamaci.domain.entity;

import org.hibernate.annotations.Cache;
import org.hibernate.annotations.CacheConcurrencyStrategy;
import org.hibernate.annotations.GenericGenerator;

import javax.persistence.*;
import java.io.Serializable;

@Entity
@Table(name = "PLAYERS")
@Cache(usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
public class Player implements Serializable {

	@Id
	@GeneratedValue(generator = "INCREMENT")
	@GenericGenerator(name = "INCREMENT", strategy = "INCREMENT")
	@Column(name = "ID")
	private Long id;

	@Column(name = "NAME")
	private String name;

	@Column(name = "AGE")
	private Integer age;

	@Column(name = "NICKNAME")
	private String nickName;

.....
/* setters and getters for properties */
....
}
<?xml version="1.0" encoding="UTF-8"?>
<beans ...>

	<!-- Resolves ${...} placeholders from app.properties-->
	<bean id="propertyConfigurer"
		class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
		<property name="location">
			<value>classpath:/app.properties</value>
		</property>
	</bean>

	<!-- Import contexts -->
	<import resource="classpath:serviceContext.xml" />
	<import resource="classpath:persistenceContext.xml" />
</beans>

File persistenceContext.xml :

<beans ....>

    <!-- Database -->
    <bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
        <property name="driverClassName">
            <value>com.mysql.jdbc.Driver</value>
        </property>

        <property name="url">
            <value>${jdbc.url}</value>
        </property>

        <property name="username">
            <value>${jdbc.username}</value>
        </property>

        <property name="password">
            <value>${jdbc.password}</value>
        </property>
    </bean>

    <bean id="sessionFactory" class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
        <property name="annotatedClasses">
            <list>
                <value>com.balamaci.domain.entity.Player</value>
            </list>
        </property>

        <property name="dataSource">
            <ref local="dataSource"/>
        </property>

        <property name="hibernateProperties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.MySQL5Dialect</prop>
                <prop key="hibernate.show_sql">true</prop>
                <prop key="hibernate.cache.use_second_level_cache">true</prop> <!-- This is very important -->
                <prop key="hibernate.cache.use_query_cache">true</prop> <!-- This is very important also for using query caches -->
                <prop key="hibernate.cache.provider_class">net.sf.ehcache.hibernate.SingletonEhCacheProvider</prop>
                <prop key="hibernate.generate_statistics">true</prop>
            </props>
        </property>

    </bean>

    <bean id="hibernatePersistenceDao" class="com.balamaci.domain.dao.HibernatePersistenceDao">
        <property name="sessionFactory">
            <ref bean="sessionFactory"/>
        </property>
    </bean>
</beans>

The important properties to look for are hibernate.cache.provider_class which has been set to SingletonEhCacheProvider. This means that we can get a reference to this “global” cache manager by calling the static method CacheManager.getInstance(), and if more than one hibernate configurations are used, it will be only one cache provider used for all of them.
Also use_second_level_cache is the property that actually determines that the second level cache is to be used or not. If you need to disable caching, just set this property to false.

File serviceContext.xml :

<beans ...>

    <bean id="transactionManager"
        class="org.springframework.orm.hibernate3.HibernateTransactionManager">
        <property name="sessionFactory" ref="sessionFactory" />
    </bean>

    <bean id="abstractService"
		class="org.springframework.transaction.interceptor.TransactionProxyFactoryBean"
		abstract="true">
		<property name="transactionManager" ref="transactionManager" />
		<property name="transactionAttributes">
			<props>
				<prop key="get*">PROPAGATION_SUPPORTS,readOnly</prop> <!-- Specifying readOnly here helps by not creating a transaction, we'll discuss later why this is an improvement -->
				<prop key="add*">PROPAGATION_REQUIRED</prop>
            </props>
		</property>
	</bean>

	<bean id="persistenceService" parent="abstractService">
		<property name="target" ref="persistenceServiceTarget" />
	</bean>

	<bean id="persistenceServiceTarget" class="com.balamaci.service.DefaultPersistenceService">
		<property name="hibernatePersistenceDao" ref="hibernatePersistenceDao" />
	</bean>
</beans>

Now let’s have the code to retrieve all players from database:

// --- The service interface PersistenceService --
public interface PersistenceService {
    public List<Player> getAllPlayers();
}

// --- The implementation of the service class DefaultPersistenceService ---
public class DefaultPersistenceService implements PersistenceService {

    public HibernatePersistenceDao hibernatePersistenceDao;

    public List<Player> getAllPlayers() {
        return hibernatePersistenceDao.getAllPlayers();
    }

    public void setHibernatePersistenceDao(HibernatePersistenceDao hibernatePersistenceDao) {
        this.hibernatePersistenceDao = hibernatePersistenceDao;
    }
}

// --- The DAO class ---
public class HibernatePersistenceDao extends HibernateDaoSupport {
    public List<Player> getAllPlayers() {
        Criteria crit = getSession().createCriteria(Player.class);
        return crit.list();
    }
}

// --- The main function ---
public static void main(String[] args) {
        ClassPathXmlApplicationContext appContext =
                new ClassPathXmlApplicationContext(new String[] {
                        "applicationContext.xml"
                });

        PersistenceService persistenceService = (PersistenceService) appContext.getBean("persistenceService");
        List<Player> lstPlayers = persistenceService.getAllPlayers();
}

EhCache can be configured by creating an ehcache.xml file. Caches are characterized by a set of different properties:
eternal to be set to true or false means that the cache should be refreshed or not, on some interval. In fact it means that the cache expires and is invalidated and should be refreshed from the database. For example there are caches for which you know that they are not updated in the database(or do not care) and those you would declare them eternal, and others that are likely to change and you would like to sometimes re-query the database for such changes and those you would declare and set a timeToLiveSeconds to a value after which the cache expires. Caches with eternal=true ignore the timeToLiveSeconds property.
maxElementsInMemory should be self explanatory. Caches can sometimes grow to become huge beasts which take up a whole lot of memory. You can limit the number of elements to a maximum. However when you reach that maximum level, other older entries are evicted – that is the default lru(least recently used) eviction strategy, but it can be changed to a lfu(least frequently used) or fifo(first in first out) mode of eviction for example-. Or you can even chose to “spill” the overflow of elements to a disk store by setting overflowToDisk=true property. Take care, as the disk can be quite slow and thus you might introduce a big performance bottleneck that would cause the cache to perform even worse than a call to a database. You may want to check the cache statistics for any caches that have overflowed to the disk store in case of bad cache performance as retrieving elements that were persisted to disk are many times slower than those retrieved from memory.


A model you can think of to understand, a cache is like a map, a collection of <key – values> pairs with generic methods of put(key, value) and get(key) to retrieve the value associated with the key
. I should point out that the value referenced by a key can be a list of objects and not only a single object.

The Entity Cache and Query Cache

When using Hibernate with EhCache one can distinguish between two types of caches.

  • The Entities cache where a particular entity is referenced by it’s primary key and that key is used to retrieve the particular instance of the . For example by issuing getSession.get(Player.class, id), Hibernate will look in it’s second level cache for the entry with the key and not go to the database to retrieve that particular entity.
  • The Query cache. In many cases, querying for data(issue a select statement in the database) can be very costly(time and cpu intensive). The select may take a long time to complete although it returns only one item. Maybe we can improve the situation if we take advantage of the fact that the same slow query we are about to execute was already run a second ago, and since it had the same parameters, we could use the same result and not actually do the query. This is the query cache, it keeps the results of the queries, and if the same query with the same parameters is issued again, instead of executing the query, the results from the cache are used. More specifically the key of the cache is the select statement with parameters and as the value is kept the list of entities returned from the query.
    The thing to be aware of, is that the values of the keys in the cache are not list of the entities instances, but lists of ids of the entities. So when you receive the list of ids, Hibernate has to do an intermediate step to build the list of entities. Again if the entities with ids returned from the query cache are not in the entity cache(or maybe are expired) and Hibernate must retrieve them one by one from the database, this could take more time than not using the cache at all – so you may want to think about this if you experience slow queries-.

    Query caches are not enabled by default. You must explicitly say query.setCacheable(true), also remember to have hibernate.cache.use_query_cache property in the Spring config file to true.Let’s think of an example for the query cache: We are going to retrieve players that have ages less than 10 years.

    public List<Player> getLittleLeaguePlayers() {
            Query query = getSession().createQuery("Select p from Player p where p.age < ?");
            query.setLong(0, 10);
            query.setCacheable(true);
            return query.list();
    }
    

    This creates a cache entry with the select string(and parameters values) as the key and a list of ids say (id=2 and id=4). When this method is executed again, the query does not go through to the database, instead the ids of the entities are retrieved from the query cache, and Hibernate looks up the entities using the id in the entity cache, and then builds a list with them to be returned to the caller.

    Query Cache looks like:
    Key —-> Value
    { { query }, [parameters]} —-> [ids of cached entities]
    { {“Select p from Player p where p.age < ?”}, [10]} —-> [2, 4]

    In the event that the select is supposed to return scalar values, and not mapped entities, those values are kept as they are in the query cache.

    Query caches apply also to query results obtained from using the Criteria api:

            Criteria crit = getSession().createCriteria(Player.class);
            crit.add(Restrictions.lt("age", 10));
            crit.setCacheRegion("query.LittleLeague");
            crit.setCacheable(true);
    

    Since the query cache is just a cache, the same properties as a cache can be set, for example the maxElementsInMemory would limit the number of queries for which the results are cached.

Hibernate creates entities caches with the full name of the entity class, in our example there will be a cache named com.balamaci.domain.entity.Player. You can use this name to configure the properties of this cache in ehcache.xml for example:

<cache name="com.balamaci.domain.entity.Player"
           maxElementsInMemory="30"
           eternal="false"
           timeToLiveSeconds="200"
           overflowToDisk="false"/>

You also can retrieve the cache from the cache manager by using this name. For example to show statistics about the cache. Or to force a clear cache from the interface and thus force hibernate to go to the database to retrieve new values.

Cache playerCache = (Cache) CacheManager.getInstance().getCache("com.balamaci.domain.entity.Player");

/** -- Obtaining different statistics of the cache -- **/
Statistics stats = playerCache.getStatistics();
stats.getDiskStoreObjectCount();
stats.getCacheHits();
stats.getCacheMisses();

/** -- Clearing the cache for forcing a database reload -- **/
playerCache.removeAll();
playerCache.clearStatistics();

If no entry is configured specifically for a cache in ehcache.xml, the settings for <defaultCache> are used.

Hibernate creates a cache named org.hibernate.cache.StandardQueryCache and this cache is the query cache. We can configure this cache properties through ehcache.xml file. You can use a different cache for a particular query by setting query.setCacheRegion(), and you can set different properties for that particular cache also.

Query q = getSession().createQuery("Select p from Player p where p.age &lt; ?");
q.setLong(0, 12);
q.setCacheRegion("query.LittleLeague");
q.setCacheable(true);

By enabling the query cache, Hibernate creates another cache named org.hibernate.cache.UpdateTimestampsCache besides the StandardQueryCache. This cache is used to determine if the results from a query cache are still valid. This is the nice thing about Hibernate, if you update the database entries through Hibernate, the query caches entries related to that entity are invalidated and queries go to the database and the cache will be refreshed. Of course if the database entries change in other way, say by calling a stored procedure, Hibernate cannot know that, and the cache is not invalidated.

In the end we’ll discuss about the setting of cache concurrency(the annotation CacheConcurrencyStrategy.NONSTRICT_READ_WRITE in the Player entity definition). This answers the question, what happens when a thread wants to update a players name and another thread is reading the player name from the cache at the same time . Should the reader be blocked until the data is updated, or is it ok for the reader to receive the old version of the cache, and not be blocked by the writing thread?

  • READ_ONLY – An error is received if the entity with this type is updated.
  • NONSTRICT_READ_WRITE – A reader receives the old version of the cache in case
  • READ_WRITE – An effort is made to block the reader until the writer finishes

When working with caches, you pretty much expect to sometimes receive stale data, so in my opinion, mostly you’ll be using the first two of them.

Wow, this was a long article and I guess I better leave something for part II, where we’ll try to use EhCache as a generic cache, not only for hibernate entities.