Added a multithreaded hydration functional test#212
Added a multithreaded hydration functional test#212sanoursa merged 3 commits intomicrosoft:masterfrom
Conversation
…attempting to concurrently read a previously unhydrated file from several threads
|
On my first run I hit the failure: I'll run some more iterations tomorrow. |
|
On my tiny Mac Mini, I have reproed the failure 5 times in a row. I'll try on my Mac Pro and the iMac Pro tomorrow. Code LGTM though. |
| } | ||
| catch (Exception e) | ||
| { | ||
| readException = e; |
There was a problem hiding this comment.
Does the number of failed reads provide any interesting signal?
There was a problem hiding this comment.
I wondered about that too, but I'm not sure that it would. The number needs to be 0, always. And if it's ever non-zero, it signals that there's at least one race, somewhere, but gives no more signal than that. So I think this is a binary test, unless you can think of more we can get out of it?
There was a problem hiding this comment.
So I think this is a binary test, unless you can think of more we can get out of it?
@sanoursa, have you checked if the thread(s) that are failing are all failing with the same exception?
There was a problem hiding this comment.
Every time I've seen it fail, it has been one of the two failures that you and Nick reported. One of the directories or the file appears missing.
There was a problem hiding this comment.
There's no expectation that they will all fail with the same exception. What's happening is that they are all racing on creating the placeholder directories, expanding those directories, then creating the placeholder file, and finally hydrating it. Since we currently have races in our enumeration of directories, one of the directories or the leaf file will sometimes appear to be missing.
That said, I can definitely update it to print out every exception we get, if you think it'll add more diagnostic value.
There was a problem hiding this comment.
That said, I can definitely update it to print out every exception we get, if you think it'll add more diagnostic value.
I think we can start with just printing one. I agree the most important\interesting part of this test is simply whether or not it succeeds.
| threads[i].Join(); | ||
| } | ||
|
|
||
| readException.ShouldBeNull("At least one of the reads failed"); |
There was a problem hiding this comment.
Can you print the Exception callstack here as well?
There was a problem hiding this comment.
It already is - see your test output up above :-)
There was a problem hiding this comment.
Should have checked the output first :) You're right I can see the exception, thanks!
| } | ||
| catch (Exception e) | ||
| { | ||
| readException = e; |
There was a problem hiding this comment.
So I think this is a binary test, unless you can think of more we can get out of it?
@sanoursa, have you checked if the thread(s) that are failing are all failing with the same exception?
|
Confirmed that the Mac build agent also fails on this test: |
| Thread[] threads = new Thread[32]; | ||
| for (int i = 0; i < threads.Length; ++i) | ||
| { | ||
| int myIndex = i; |
There was a problem hiding this comment.
No need for this index, it's not being used anywhere
| { | ||
| try | ||
| { | ||
| FileSystemRunner.DefaultRunner.ReadAllText(virtualPath).ShouldBeNonEmpty(); |
There was a problem hiding this comment.
I think there would be value in using all of our runners here, since they may have slightly different timing or system calls that trigger hydration in different ways.
There was a problem hiding this comment.
And that implies a need to make this class create an enlistment per test case, since the test is guaranteed to pass the second time around, regardless of races
There was a problem hiding this comment.
Interestingly enough, I can't repro the failures with the BashRunner, so I'm not going to add the overhead of a new enlistment per test case for now. It must be that having to spawn a new bash process per request slows things down enough to avoid hitting these races, because the SystemIORunner fails completely reliably and the BashRunner passes completely reliably, on my MacBook Pro.
| [TestCase] | ||
| public void CanReadUnhydratedFileInParallelWithoutTearing() | ||
| [TestCase, Order(1)] | ||
| [Category(Categories.Windows)] |
There was a problem hiding this comment.
Marked as Windows-only for now. I'm rerunning the tests on both platforms now, and will merge in once those pass. I'll then remove this category in my other PR for reproing and prototyping fixes for the races.
There was a previously existing functional test, but it would hydrate a file on a single thread, and then attempt to repeatedly read from it from 4 parallel threads. That test was ensuring that it's possible to read from a hydrated placeholder, but it did not cover the races that we are currently investigating on Mac.
I added a new test that uses 32 threads to concurrently read from a path that was previously completely virtual, all the way to the root of the repo. On my MacBook Pro, this test has failed every single time with 32 threads (and about 80% with 16 threads, 50% with 4 threads). I'm posting the test now to ensure that it is failing on our build servers, and anyone else's machines if you have time to test it out.
Before completing this PR, I will need to either mark the test as Windows-only, or else fix the races on Mac, though I suspect that will take longer.