Skip to content

Mac: Prevent double hydration and directory expansion#334

Merged
wilbaker merged 3 commits intomicrosoft:masterfrom
wilbaker:mac_lock_on_expansion
Oct 4, 2018
Merged

Mac: Prevent double hydration and directory expansion#334
wilbaker merged 3 commits intomicrosoft:masterfrom
wilbaker:mac_lock_on_expansion

Conversation

@wilbaker
Copy link
Member

@wilbaker wilbaker commented Oct 3, 2018

Fixes #315

There were a few issues in the ProjFS for Mac user-mode library:

  • All messages from the kernel would be coalesced into a single callback to the provider, meaning it was possible that notifications could get lost.

  • The path comparisons when deciding if messages should be collapsed were case sensitive.

  • When messages were collapsed, the library was leaking the memory allocated for any of the collapsed messages.

  • The hydrate\enumerate code was not checking if the file\directory had already been expanded. As a result, if two requests came in sequentially, it's possible that with the right timing they would both run and problems would occur when trying to expand the file\directory a second time.

This PR addresses these issues with the following changes:

  • Messages are no longer collapsed

  • There is a new per-path collection of mutexes to ensure that two threads don't expand the same path at the same time.

  • The hydration\enumeration handles check if the path is expanded prior to asking the provider to expand it.

@wilbaker wilbaker added type: bug domain: user-mode type: test-reliability Issues that contribute to test failures labels Oct 3, 2018
@wilbaker wilbaker self-assigned this Oct 3, 2018
@wilbaker wilbaker removed the type: bug label Oct 3, 2018
@wilbaker wilbaker requested review from nickgra, pmj and sanoursa October 3, 2018 23:07
};

// Map of relative path -> MutexAndUseCount for that path, plus mutex to protect the map itself.
typedef map<string, MutexAndUseCount, CaseInsensitiveStringCompare> PathToMutexMap;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmj do you know if we need to worry about being case insensitive? Or will the kext always use the same\actual case when sending paths to user-mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @sanoursa as well

Copy link
Contributor

@pmj pmj Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure; the path is ultimately coming from vn_getpath() and I believe it gets its information from the name cache. The question is therefore if the name cache is populated with canonical names or the component names that came in from user space. I'll need to check that.

One correct way to check whether 2 paths refer to the same file would be to stat() them both and check if fsid & inodes match. The kext should already have this information, so perhaps we should be sending that tuple along with each kernel -> user message and using that as our map key?

Given that macOS does support case-sensitive file systems, case-insensitive string comparison is not really ideal here. I mean, we can of course declare that we won't support VFS for Git on case-sensitive file systems, but I'd still be worried about edge cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, the inode approach also tackles any hardlink issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One correct way to check whether 2 paths refer to the same file would be to stat() them both and check if fsid & inodes match. The kext should already have this information, so perhaps we should be sending that tuple along with each kernel -> user message and using that as our map key?

That's a good idea, I'll see about plumbing that through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if fsid & inodes match

@pmj Totally unrelated to this PR, your comment reminded me of something I'd been thinking about related to how the kext could handle a failures of vn_getpath (or perhaps it wouldn't even call it at all).

Do you have any thoughts an approach where the kext passes fsid & inode and then the user-mode library calls fsgetpath? I wasn't sure if that would potentially be more reliable than vn_getpath.

};

// Map of relative path -> MutexAndUseCount for that path, plus mutex to protect the map itself.
typedef map<string, MutexAndUseCount, CaseInsensitiveStringCompare> PathToMutexMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, the inode approach also tackles any hardlink issues.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change. Most of the suggestions seem like they could go in as future PRs, since this already fixes a very serious bug.

}
else
{
iter->second.useCount++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, this could happen outside of the global lock (and use interlocked operations instead). How much do you think that would reduce contention during a parallel build that is doing lots of hydrations? Is it worth the trouble?

Copy link
Member Author

@wilbaker wilbaker Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, this could happen outside of the global lock (and use interlocked operations instead).

I think there is a scenario where that approach would not be thread safe:

Thread 1: Calls CheckoutPathMutex and a new mutex is added to the global collection
Thread 2: Calls CheckoutPathMutex and releases the global lock before incrementing an atomic useCount
Thread 1: Calls ReturnPathMutex, useCount drops to 0 and so the mutex is removed from the global collection
Thread 3: Calls CheckoutPathMutex, a new mutex is created for the same path

Thread 2 and 3 would now be holding two different mutexes for the same path.

I think we can leave this as-is for now, and if we identify that it's causing perf problems we can look into improvements\alternatives.

Copy link
Member

@nickgra nickgra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I agree with Saeed that we should get this in before considering moving to fsid & inode matching rather than blocking on it.

@wilbaker
Copy link
Member Author

wilbaker commented Oct 4, 2018

LGTM, I agree with Saeed that we should get this in before considering moving to fsid & inode matching rather than blocking on it.

@sanoursa @nickgra, I just pushed a new commit that updates the map key to be fsid+inode, could you take another look?

Copy link
Member

@nickgra nickgra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, still looks good to me :)

@wilbaker wilbaker merged commit 8b27e06 into microsoft:master Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: kext domain: user-mode type: test-reliability Issues that contribute to test failures

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants