Skip to content

Mac: Support projection changes that add new files and delete folder#239

Merged
wilbaker merged 18 commits intomicrosoft:masterfrom
wilbaker:mac_project_new_files
Sep 11, 2018
Merged

Mac: Support projection changes that add new files and delete folder#239
wilbaker merged 18 commits intomicrosoft:masterfrom
wilbaker:mac_project_new_files

Conversation

@wilbaker
Copy link
Member

@wilbaker wilbaker commented Sep 4, 2018

Changes for #231 and #219

Summary of Changes

  • Updated PrjFS_DeleteFile so that it can delete files and folders on Mac (Mac: PrjFSLib should provide a way to delete directory placeholders #231)
  • Updated VFSForGit to track expanded folders
  • Updated the projection change code so that after updating file placeholders VFSForGit will re-expand folders that were previously expanded. VFSForGit does this by checking the contents of each expanded folder in the the projection and ensuring that all of the that folder's children (files and folders) are on disk.

Folder re-expansion logic

  1. GVFS records what folders have been expanded
  2. When a projection change occurs VFSForGit looks up the expanded folders in the projection
  3. If there are any children in the projection for the folder it means that their skip-worktree bit is on, and if the skip-worktree bit is on and the file\folder is not in the placeholder list VFSForGit needs to write it out.

TODOs

  • Address new TODOs that have been added in the code
  • Enable additional functional tests
 Test Run Summary
   Overall result: Warning
   Test Count: 330, Passed: 326, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 4
     Skipped Tests - Ignored: 4, Explicit: 0, Other: 0
   Start time: 2018-09-06 19:21:38Z
     End time: 2018-09-06 19:37:06Z
     Duration: 928.064 seconds

@wilbaker wilbaker removed the WIP label Sep 6, 2018
@wilbaker wilbaker force-pushed the mac_project_new_files branch from 22fefa5 to 4674617 Compare September 6, 2018 19:58
Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't totally digested the GitIndexProjection changes yet. Going to test this out for a bit, then come back to that code.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this a bunch and it seems to work well! And the code is complex but makes sense. Nice work on this change.


public static IEnumerable<T> ShouldMatchInOrder<T>(this IEnumerable<T> group, params T[] expectedValues)
{
return group.ShouldMatchInOrder(expectedValues, (t1, t2) => t1.Equals(t2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be calling the method that takes IEnumerable<T>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I had trouble doing that before, but I think I forgot to cast expectedValues to an IEnumerable

{
ConcurrentHashSet<string> folderPlaceholdersToKeep = new ConcurrentHashSet<string>();
ConcurrentBag<PlaceholderListDatabase.PlaceholderData> updatedPlaceholderList = new ConcurrentBag<PlaceholderListDatabase.PlaceholderData>();
ConcurrentDictionary<string, PlaceholderListDatabase.PlaceholderData> updatedPlaceholderList = new ConcurrentDictionary<string, PlaceholderListDatabase.PlaceholderData>(StringComparer.Ordinal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a list anymore. Also how much overhead does adding the string key add? Should this be a ConcurrentDictionary<string, string> and we build the PlaceholderListDatabase.PlaceholderData when calling WriteAllEntriesAndFlush?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how much overhead does adding the string key add?

Let me measure that. If it's significant we might need to further split this code out between Mac and Windows (because we do need the dictionary on Mac).

Should this be a ConcurrentDictionary<string, string> and we build the PlaceholderListDatabase.PlaceholderData when calling WriteAllEntriesAndFlush?

That's a good idea, let me give that a try before making any perf measurements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a ConcurrentDictionary<string, string> and we build the PlaceholderListDatabase.PlaceholderData when calling WriteAllEntriesAndFlush

I'm actually going to start with the perf overhead (starting that testing this morning) and see how things look before and after this change. We're using the same string instances in both the ConcurrentDictionary and the PlaceholderData and so the memory overhead is just an extra reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a ConcurrentDictionary<string, string> and we build the PlaceholderListDatabase.PlaceholderData when calling WriteAllEntriesAndFlush?

Also, to confirm, what was the main concern here? Perf or memory usage?

// so that we don't try to delete a folder placeholder that has file placeholders and just fails
IEnumerable<PlaceholderListDatabase.PlaceholderData> folderPlaceholderData = placeholderListCopy.Where(x => x.IsFolder);

HashSet<string> folderPlaceholders = new HashSet<string>(folderPlaceholderData.Select(x => x.Path), StringComparer.OrdinalIgnoreCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholderListCopy is now being enumerated twice. Here and the line below. What is the performance impact when there are say a million placeholders?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me measure that as well.

One question I had looking at this code, do you remember why the folder placeholder list is ordered by descending before calling TryRemoveFolderPlaceholder?

I assumed it was being done so that child folders would be removed before their parents, does that sound right to you?

is now being enumerated twice. Here and the line below

Another option I'll look into is removing this call (higher up in the method):

placeholderListCopy.Where(x => !x.IsFolder).ToList(),

And replacing it with code that will walk placeholderListCopy to build both a file list and the HashSet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to make sure all the children are removed first so that we can remove the folder when it is empty since we are only removing individual folders and are depending on getting the directory not empty error if we can't delete because of existing files or folders.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests that fail if we drop that ordering? This is a very subtle detail that is easy to lose track of over time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests that fail if we drop that ordering? This is a very subtle detail that is easy to lose track of over time.

I'll try taking that out and see if any tests fail. The tricky part is even if the ordering were to be removed, I'm not sure that we can come up with a test that would be guaranteed to catch it (as whatever random ordering VFSForGit would use in the future might just happen to work with the test).

@wilbaker
Copy link
Member Author

@sanoursa @kewillford

Here are the initial numbers comparing the total time of UpdatePlaceholders:

Results Baseline With Changes
Average 4056ms 4735ms
Min 3436ms 2985ms
Max 4619ms 5767ms
StdDev 354ms 866ms
Count 11 11

We are noticabley worse with the changes, and so I'll keep digging into where the time is going.

Details

  • The time reported is the time of the full UpdatePlaceholders procedure
  • File placeholders: 404,130
  • Folder placeholders: 60,612
  • Ran git checkout to move between two branches with the same commit (and so no placeholder would need to be updated on disk)

@wilbaker
Copy link
Member Author

wilbaker commented Sep 10, 2018

@sanoursa @kewillford I just pushed a commit that goes back to using ConcurrentBag on Windows:

Results Commit cdf91d6
Average 4019ms
Min 2944ms
Max 4935ms
StdDev 638ms
Count 11

@@ -1374,7 +1554,7 @@ private void ProcessGvUpdateDeletePlaceholderResult(
PlaceholderListDatabase.PlaceholderData placeholder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but the method name ProcessGvUpdateDeletePlaceholderResult needs to change at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants