Potential for read of free'd memory on refresh

Jun 26, 2008 at 7:31 PM
I've been evaluating this filter for use and as part of a code review came across a thread safety issue.  I haven't created a firm repro but here's the potential problem:
The variable config is a global variable for the filter.  During handling of the request thread, there are multiple accesses to the config variable (and the linked list of rules it contains) that assume the same instance of config is still valid.  The config variable is swapped out in a critical section, but the rest of the uses aren't also guaranteed by that critical section.  There are two scenarios I see that are problematic. 
1) if the MaxMatchCount may change, which I haven't determined if it is problematic
2) A request thread may still be evaluating the linked list of rules when FreeRuleList is invoked by the refreshing thread in FileChangeWatcher.  The list is not nulled, so the request thread will continue processing rules until either it completes or the memory holding those rules changes, at which point you now have a wild read of memory.

I assume this should be made into an issue, but wanted to get confirmation first from someone more familiar with the code.
My suggested fix would be to make config auto ref-counted and have the DoRewrites or somewhere higher increment the count when it enters / exits.  Then have all the accesses use a config parameter instead of the global config.  This then guarantees that the request thread will see the same instance of config throughout.

P.S.  Depending on the project schedule, I may be able to submit a patch for this later.
Jun 26, 2008 at 9:46 PM

Sven, it's an issue.

The code has been re-written for the v2.0 release so that there is not a single config variable.

Have you looked at the source for the 2.0 release?  (it is not "checked in", as I do not believe there is a possibility to have separate trees).

Jun 27, 2008 at 12:30 AM
Thanks for the quick reply.  I was looking at the source associated with the stable 1.2.14 release.  I'm now reviewing the 1.2.15a (changeSetId=33939)  and 2.0a source (uploaded jun 14).  The problem still exists, although with 2.0a it is probably more readily reproducable in a multithreaded environment now since a request thread won't switch its config when a refresh happens. 
In 2.0a:
EvaluateRules - line 2650 - A reference to the cfg comes from the IirfRequestContext now so the reference will persist for the life of IIS request, if I read these changes right.  This reference is the same reference that could be freed by GetSiteConfig - line 693 when the next request thread starts processing.  If you have a request thread active in the filter when a refresh happens, this is a problem.

Jul 1, 2008 at 8:00 AM
Edited Jul 2, 2008 at 6:40 AM

Yes you are correct.

I updated the v2.0a source code...to do reference counting of the site config data (something close to an auto ptr in C++), combined with expiration logic, tied to the last-mod-time of the ini file.

Each worker thread that services a request, gets a pointer to the site config data, and increments the reference count on it.  When grabbing the config data, the thread checks the filesystem.  If the config file has been updated, then new config is read in, the old config is marked "Expired".  The new config is spliced into the global config list (proected by a lock).

At the end of the request, the thread decrements the ref count. 

If the ref count goes to zero AND the site data has been marked expired, then the config data is freed.

have a look at the v2.0a code.  Remember, it is not "checked in" yet. you can only get it in the zip file on the releases tab.

Aug 20, 2008 at 10:40 PM
I think that there is still a gotcha with the lock code (2.0a 7-29 build). 

The locks in use for increment vs decrement are different locks. 
In ReleaseOrExpireSiteConfig, its holding the IirfConfig->pCS lock to decrement the counter.
In GetSiteConfig, its holding the gcsSiteConfig when it increments the counter.
This means that the RefCount is going to be invalid in a multithreaded environment since ++ and -- aren't atomic.
Instead of trying to lock, perhaps InterlockIncrement and InterlockedDecrement may be better alternatives.  While its in the gSiteConfigList, it has a refcount of 1 or more (1 + the requests holding it.  GetSiteConfig InterlockIncrements the refcount.  If it replaces a file, it does an additional increment on the new cfg, and call ReleaseOrExpire on the old cfg, after it is removed from the list.  ReleaseOrExpire would InterlockDecrement and when it reaches 0 is free to the object.

I actually think there is a way to optimize the critical section block in GetSiteConfig, but I need to think a bit more on it before making a recommendation
Aug 21, 2008 at 1:24 AM
Here's my recommendation on the critical section block in GetSiteConfig.  The problem is that the current setup is going to be a hotspot by combining the file IO and a global critical section.  Each request ends up doing at least one File IO in checking the timestamp, which means spending a measurable amount of time in a serialized section of code.  Your comment about a R/W lock is probably also feasible.

Proposed Changes:
Add a "IirfConfig* NewConfig;" element to IirfConfig to represent the config that replaces this one (a replacement for Expired, except that it is a pointer)

    current = gSiteConfigList;
    // see if we can find a match in the stack:
    while (current != NULL) {
        if (strcmp(current->ApplMdPath, ApplMdPath)==0) {
    if(current == NULL)
    { // this IO needs to be done inside CS because otherwise we may add the same config multiple times.
        current= ReadNewSiteConfig(ApplMdPath, ApplPhysicalPath);
        current->RefCount = 0;
        current->Era = 0;
        current->NewConfig = NULL:

        // adjust the list
        current->Next = gSiteConfigList;
        gSiteConfigList = current;
    if(IsIniFileUpdated(current)) {
        if(current->NewConfig != NULL) {
            current = current->NewConfig;
        else {
            LogMessage( current, 4, "GetSiteConfig: Obtain  site '%s' , Ini file has been updated.",
            IirfSiteConfig * newCfg = ReadSiteConfig(current->IniFileName, current->ApplMdPath, current);
            newCfg->RefCount = 1; 
            current->NewConfig = newCfg;

            // for tracing purposes only. Era shows how many times the ini has been updated.
            newCfg->Era = current->Era+1;

            // adjust the list
            // add newCfg to the head of the list.  Saves having to worry about updates happening simultaneously to multiple nodes in the list.
            previous->Next = current->Next; // remove current
            newCfg->Next = gSiteConfigList; // prepare newCfg to be head
            gSiteConfigList = newCfg; // newCfg is head

            current = newCfg;
    else {

    LogMessage( current, 4, "GetSiteConfig: Obtain  site '%s' (era=%d) (rc=%d) (Expired=%d) (ptr=0x%08X)...",

    return current;
The primary change is that the gcsSiteConfig is only held for iterating the list and reading on first rule load.  Refreshs hold the current config instances lock to guarantee uniqueness, and use the correct lock on incrementing the refcount (note that the refcount could still benefit from my suggestions in the previous comment.  It also removes the need for the Expired flag.  Note that you also have some uninitialized values on IirfConfig (Era and RefCount), which can lead to incorrect values, when doing a refresh.  The initialization of IirfConfig should probably be put entirely inside ReadSiteConfig to make sure they start at good values.

Aug 21, 2008 at 6:27 AM
Edited Aug 21, 2008 at 7:06 AM
I really appreciate your time and effort on this.

I see what you are saying.  I have to look again at the use of different locks for increment  and decrement (gcsSiteConfig versus the config-specific critical section).

The logic you suggested - I see how it removes some of the IO from inside the gcsSiteConfig critical section.  BUT, I think that calling ExpireSiteConfig() outsite a critical section leads to a race condition.  One thread can be waiting on, or even inside the current->pCS critical section, while the other thread calls ExpireSiteConfig() on the same config, which frees that structure. 

Obviously that's no good.
This is what I was trying to prevent by calling ExpireSiteConfig() only within the gcsSiteConfig critical section. 
I have to look at it again.

I still think the R/W lock is an optimization for later.

I like your idea of using the NewConfig ptr instead of an Expired flag and I also like the idea of putting the new config's at the head of the list.  I have to see if I can use these.

On the uninitialized values  - I must have caught that previously because they are initialized in the code I have, in a routine NewSiteConfig() which allocates and initializes an IirfConfig. 
Aug 21, 2008 at 6:14 PM
Oops, I missed NewSiteConfig.  Yeah that takes care of the initialization.  My bad.

I think you are right.  There is a race condition in what I posted with the ExpireSiteConfig.