PHP and Memcached Returning Random Data
This took a long time to track down and only applies in specific circumstances:
- Your process is forking
- You are using persistent memcache connections
- The data you request does not match what you receive, for instance you expect an array and an integer is returned.
The problem is caused by the persistent connection being shared after the process has forked. This is a known issue on database connections which is why you should always stop and restart your database connections after a process forks however this is not a documented ‘feature’ of the memcache connections. Trying to close and reopen the memcache connections doesn’t help either. What is happening is two processes are using the same pipe to talk to memcache, they both request data at roughly the same time however the responses go back to the wrong processes, so process A gets the data meant for B, and process B gets process A’s data.
The best approach I’ve found to this issue so far is:
- Mark the data stored with the key it was stored under.
- When the data is returned check the key returned to ensure its the same as the key requested.
- If there is a mismatch treat it as a cache miss – return false.
This increases your cache misses however it ensures your application works as you coded. Its a simple change, instead of storing the just the data, you add an additional parameter to indicate the key the data was stored under:
Instead of:
function CacheAdd( $Key, $Data , $TTL ){ // add code to create your memcache connection $memcache->add( $Key, $Data , false, $TTL ); }
You use:
function CacheAdd( $Key, $Data , $TTL ){ // add code to create your memcache connection $DataToStore = array( 'Key'=>$Key, 'Data'=>$Data ) ; $memcache->add( $Key, $DataToStore , false, $TTL ); }
and then when you retrieve:
function CacheGet($Key){ // add code to create your memcache connection $DataFromStore = $memcahe->get( $Key ); if ( !isset($DataFromStore['Key']) || $DataFromStore['Key'] != $Key ) { return false; } return $DataFromStore['Data']; }
This is exactly the issue we wasted 2 days on. We figured it out on our own a couple hours before we discovered your article 🙂
There is an alternate solution for this. When you create a new Memcached instance you can specify a persistent connection ID. Somehow the same connection ID is shared between your master thread and all the forked ones if you do not specify it. The solution here would be to simply do a new Memcached(getmypid()) – or something alike – instead. It will force each forked thread to create a new connection to memcached instead of trying to share the existing one.
Hmm, but it seems I was too quickly to reply, my apologies. The argument is used to create a persistent connection instead, leading to a whole new world of other issues. Sigh, guess we have to continue looking then.
I spent days trying to find the issue and fix or make a workaround but couldn’t – this solution stops your application receiving the incorrect data – it doesn’t address the underlying issue. Our application handles missed cache hits by recreating the data thus this solution causes the system to work a little harder than it needs to but does stop your application getting incorrect data and the problems that causes.
Let me know if you find something better.
Thanks
We’ve also spend quite a bit of time and in the end we decided with sticking to MemcacheD as connection library (as it is more actively being developed) and using the persistent connection ID.
We only have a couple daemons that use threading and we devised an algorithm that keeps the amount of different persistent connections limited to the max amount of threads being active.
Since our regular web processes don’t do threading they are not suffering from this issue. Our setup can easily handle the couple dozen extra connections for the things that we do threaded. So for us this is definitely the way to go.
Did you experience this issue when you were forking your own process or in an Apache environment?
This was when php was running from the command line. We never forked any Apache processes so didn’t see this issue there.