Skip to content

Remove temp paths from ModifiedPathsDatabase#207

Merged
kewillford merged 14 commits intomicrosoft:masterfrom
kewillford:cleanup_modifiedpaths
Sep 26, 2018
Merged

Remove temp paths from ModifiedPathsDatabase#207
kewillford merged 14 commits intomicrosoft:masterfrom
kewillford:cleanup_modifiedpaths

Conversation

@kewillford
Copy link
Member

@kewillford kewillford commented Aug 23, 2018

Builds and other processes will write temp files or create temp directories in the source directory which then adds the file or folder path to the ModifiedPathsDatabase. This causes performance issues because the ModifiedPathsDatabase is currently append only.

This PR is an initial change to address the issue of files or folder being created and deleted. This change only removes from the ModifiedPathsDatabase when the file or folder was created and then delete and the index has not changed in between to solve the issue with builds and other processes creating and deleting a bunch of files and folders.

See #117

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass through the code. Haven't had a chance to test it yet, hoping to do so by EOD.

@sanoursa
Copy link
Contributor

Any idea why the Mac functional tests failed?

case FileSystemTask.OperationType.OnFileDeleted:
metadata.Add("virtualPath", gitUpdate.VirtualPath);
result = this.AddModifiedPathAndRemoveFromPlaceholderList(gitUpdate.VirtualPath);
if (this.newPaths.Contains(gitUpdate.VirtualPath))
Copy link
Member

@wilbaker wilbaker Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two related comments:

  1. Currently OnFileDeleted is scheduled from the PreDelete callback on Mac. However, on Mac we can't remove files from ModifiedPaths in PreDelete because we don't actually know that the file has been deleted.

  2. Related to 1 (and something @jrbriggs and I were just discussing), should we try to design this to work on Mac and Windows from the start? Or at this point should we just ensure what whatever we do for Windows does not break Mac?

For this specific feature it might only take a few tweaks to get things working on both platforms (e.g. on Mac we could take an extra step of checking if the file actually was deleted before calling TryRemoveModifiedPath). Thoughts?

CC: @sanoursa for his thoughts as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, on Mac we can't remove files from ModifiedPaths in PreDelete because we don't actually know that the file has been deleted.

The bare minimum change for this PR could be to wrap the new code here in an IsUnderConstruction check.

There's still the open question though of whether we design\implement for both platforms from the start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to start thinking about how to make all new features cross-platform from the start. Unless we're up against a part of the Mac design that is still unknown or in heavy flux, I would take the time now to make this work for both.

@wilbaker
Copy link
Member

@sanoursa I filed #211 for the Mac functional test failures related to mapping.dat in the .gvfsCache folder

@kewillford kewillford force-pushed the cleanup_modifiedpaths branch 2 times, most recently from 241eea9 to 259e056 Compare August 30, 2018 18:46
Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments so far, I still need to finish reviewing FileSystemCallbacks.cs

{
string tempFolder = this.CreateDirectory(fileSystem, "Temp");
string tempFile1 = this.CreateFile(fileSystem, "Temp\\temp1.txt");
string tempFile2 = this.CreateFile(fileSystem, "Temp\\temp2.txt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Path.Combine rather than hardcoding \\ so that this will work on Mac as well

@kewillford kewillford force-pushed the cleanup_modifiedpaths branch 2 times, most recently from b1b3d0e to bc005ee Compare September 5, 2018 20:37
metadata.Add("Area", "ModifiedPathsDatabase");
metadata.Add(nameof(entry), entry);
metadata.Add(nameof(isFolder), isFolder);
metadata.Add("Exception", e.ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think of adding a null check before calling e.ToString()? Most of our other CreateEventMetadata helper methods handle null.

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, a few outstanding questions on the folder rename scenario.

if (!this.newlyCreatedFileAndFolderPaths.Contains(itemVirtualPath))
{
relativeFolderPaths.Enqueue(itemVirtualPath);
this.newlyCreatedFileAndFolderPaths.Add(itemVirtualPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like before we were adding itemVirtualPath to ModifiedPaths.dat as well (old line 688):

result = this.TryAddModifiedPath(itemVirtualPath, isFolder: false);

Do we still need to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding the top level folder result = this.TryAddModifiedPath(gitUpdate.VirtualPath, isFolder: true); so we don't need to add all the subfolder and files. This is kind of left over from when we would add the top level folder to that sparse-checkout and then need to add files to the always_exclude. Since this is just one list now we only need the top level folder and git will handle it correctly for both the sparse and exclude cases.

string itemVirtualPath = Path.Combine(folderPath, itemInfo.Name);
if (itemInfo.IsDirectory)
string oldItemVirtualPath = gitUpdate.OldVirtualPath + itemVirtualPath.Substring(gitUpdate.VirtualPath.Length);
if (!this.newlyCreatedFileAndFolderPaths.Contains(itemVirtualPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to call Contains first, we can just call this.newlyCreatedFileAndFolderPaths.Add(itemVirtualPath); because ConcurrentHashSet.Add is really TryAdd:

        public bool Add(T entry)
        {
            return this.dictionary.TryAdd(entry, true);
        }

@kewillford kewillford force-pushed the cleanup_modifiedpaths branch 2 times, most recently from 59d5580 to 39f9448 Compare September 12, 2018 15:27
if (result == FileSystemTaskResult.Success)
{
this.newlyCreatedFileAndFolderPaths.Add(gitUpdate.VirtualPath);
this.TryRemoveModifiedPath(gitUpdate.OldVirtualPath, isFolder: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the return value of TryRemoveModifiedPath here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was up in the air about that because on the one hand there won't be any issues if we fail to remove the path except the modified paths doesn't get as small. On the other we could retry when needed and keep the modified paths small but I'm not sure we should kill the mount process if we get a FatalError. What are you thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there's minimal harm to this failing, but I'm also thinking that if there are failures it could indicate some larger issue with the state of the file system (that I'm not sure we should ignore).

I lean toward checking the value (and killing the process if we see a FatalError). @sanoursa, what are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't think we should ever ignore a FatalError. That said, why is TryRemoveModifiedPath ever returning FatalErrors if we don't intend to shut down as a result? It feels like that method shouldn't even be a Try method - if we focus it down to tmp file scenarios, it can just be a best effort and not even return a result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still trying to write to the modified paths database file and the pattern is to retry if there was and IOException and return Fatal if there is some other exception. To keep with the pattern I think we should shut down because we don't know what the state of the modified paths database file is in if we get a fatal error writing to it.

{
if (!this.newlyCreatedFileAndFolderPaths.Contains(virtualPath))
{
return FileSystemTaskResult.Invalid;
Copy link
Member

@wilbaker wilbaker Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this now, but I believe that FileSystemTaskResult.Invalid is never supposed to be used in code (i.e. it's an invalid value for FileSystemTaskResult).

Given the choice I think it would be cleaner to move this check back out of TryRemoveModifiedPath and that way the return value of TryRemoveModifiedPath reflects only whether we succeeded\failed when trying to update ModifiedPaths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that William commented on the Invalid value too, and I agree, from reading this code it seems like this check maybe just doesn't belong in here, at least as currently written.

I would probably change the usage of this method to something like:

    if (this.IsTempFile(virtualPath))
    {
        result = this.TryRemoveModifiedPath(virtualPath, isFolder);
    }
    else
    {
        result = this.TryAddModifiedPath(virtualPath, isFolder);
    }

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good overall, just needs some cleanup around the pattern for actually removing the entries.

{
if (!this.newlyCreatedFileAndFolderPaths.Contains(virtualPath))
{
return FileSystemTaskResult.Invalid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that William commented on the Invalid value too, and I agree, from reading this code it seems like this check maybe just doesn't belong in here, at least as currently written.

I would probably change the usage of this method to something like:

    if (this.IsTempFile(virtualPath))
    {
        result = this.TryRemoveModifiedPath(virtualPath, isFolder);
    }
    else
    {
        result = this.TryAddModifiedPath(virtualPath, isFolder);
    }

if (result == FileSystemTaskResult.Success)
{
this.newlyCreatedFileAndFolderPaths.Add(gitUpdate.VirtualPath);
this.TryRemoveModifiedPath(gitUpdate.OldVirtualPath, isFolder: true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't think we should ever ignore a FatalError. That said, why is TryRemoveModifiedPath ever returning FatalErrors if we don't intend to shut down as a result? It feels like that method shouldn't even be a Try method - if we focus it down to tmp file scenarios, it can just be a best effort and not even return a result.

@kewillford kewillford force-pushed the cleanup_modifiedpaths branch 2 times, most recently from 7618ea2 to a6ba326 Compare September 20, 2018 14:46
Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Just some perf questions, but feel free to ignore if that makes sense.

return false;
}

return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line since the very next line is also the same :-)

return false;
}

return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, remove redundant line


public virtual void OnFileRenamed(string oldRelativePath, string newRelativePath)
{
this.newlyCreatedFileAndFolderPaths.Add(newRelativePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this? How often does someone rename into a temp path? If we don't expect benefit from this, I would remove it so that we're not adding unnecessary contention on the concurrent data structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be so that if something deleted the renamed file we would remove it from the modified paths. Not sure how often something would rename, then delete though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I get that that's the scenario it's supporting. What I don't know is how often that actually happens, but my guess is it's pretty unlikely. We know that we're adding lock contention by including this line, so we should only do that if we can also measure an actual benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this for now because we aren't measuring how often this is happening so we can't measure any benefit if we don't know what or if there are current difficulties that are from this.

@kewillford kewillford force-pushed the cleanup_modifiedpaths branch from 72472d8 to 8b16562 Compare September 21, 2018 19:42
Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit looks good, just a minor suggestion on adding some comment(s)

@kewillford kewillford force-pushed the cleanup_modifiedpaths branch from 19f3acd to 243f621 Compare September 25, 2018 23:32
@kewillford kewillford merged commit 08ee71a into microsoft:master Sep 26, 2018
@kewillford kewillford deleted the cleanup_modifiedpaths branch September 26, 2018 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants