Skip to content

Tweak Named pipe protocol#259

Merged
jamill merged 3 commits intomicrosoft:masterfrom
jamill:named_pipe_stream_reader
Sep 21, 2018
Merged

Tweak Named pipe protocol#259
jamill merged 3 commits intomicrosoft:masterfrom
jamill:named_pipe_stream_reader

Conversation

@jamill
Copy link
Member

@jamill jamill commented Sep 11, 2018

Tweak the protocol used for communicating over Named Pipes. Named Pipes is the mechanism used for interprocess communication between the git hooks, mount process, service, etc... This is to address problems of transmitting messages that include newline characters (i.e. #142).

Data is currently encoded / transmitted using a TextWriter derived class, which uses newline characters to delineate the end of a line of text. This approach will have trouble transmitting messages that include newline characters as part of the message. Any encoding scheme we choose will need to replicated for both a managed and native implementation.

To address this issue, this PR switches to writing bytes into the named pipe (instead of text into a StreamWriter), and switching the byte value used to delineate the end of a message to 0x3 (ASCII code for End of text). Text will be encoded via UTF-8, as that is the format currently used implicitly via the StreamWriter. This will result in the minimal change, and not require further reactions on the native pipe clients to handle a different encoding.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really a new protocol per se, but I do like the idea of picking a terminator character that is less likely to appear in the body of a message. Changing the encoding could also be a reasonable thing to do, as long as it's meaningfully improving reliability and not overly hurting perf.

Debug.Assert(bytesRead != 0, "Not expecting 0 bytes at this time");
}

return Encoding.UTF8.GetString(bytes.ToArray());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the real value add of this overall change is using a different terminator character. What additional protection do we get from using UTF8 encoding? At first glance, it appears that this will just add perf overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points - you are correct - the main value of this change is using a different terminator character. We don't need to use UTF8 (it doesn't add any additional protection) - we just need the sender / receiver to agree on the encoding. There are some reasons I chose UTF8 for this iteration:

  1. GVFS currently sends bytes as UTF8 (implicitly via the StreamReader / StreamWriter classes). Using UTF8 again would be keeping with current behavior, so there shouldn't be any additional overhead.

  2. The native code that reads / writes data from the named pipe treats the data as ASCII (which I understand is a subset of UTF8). If we switched away from UTF8, we would need to do convert the encoding in the native code.

  3. The ModifiedPathsList hook also reads data from the named pipe and directly sends those bytes to git, which works with UTF8 (at least for our current scenarios), but would not work with UTF16 (which is the default representation of strings in C#).

I propose we continue using UTF8 (explicitly now) for this change? We can consider if a different encoding would make more sense...

@jamill
Copy link
Member Author

jamill commented Sep 12, 2018

I don't think this is really a new protocol per se,

Right - this is just a tweak / refinement of the existing behavior. I can tweak the PR description to make the intention more clear.

Changing the encoding could also be a reasonable thing to do, as long as it's meaningfully improving reliability and not overly hurting perf.

I wasn't trying to change the encoding (as you point out - UTF8 encoding doesn't afford extra protection) - but am just being explicit about it now. GVFS currently uses UTF8 (implicitly via the StreamWriter), and the native readers currently expect ASCII / UTF8. I can also include this in the PR description to make this point more clear.

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions, overall the approach looks good to me

}

[TestCase]
[Category(Categories.MacTODO.M4)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail on Mac? If so, what's the error (out of curiosity)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tried running this on the Mac - I will test and see what happens (or remove this, if possible)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on the "cleanup" code (not related to the protocol changes). After "resetting" the repository, there are unexpected changes in the test repository. Because of this (and because the failure is not related to the protocol changes), I will leave this as a to for Mac

#define DLO_RESPONSE_LENGTH 2
// "S\x3" -> Success
// "F\x3" -> Failure
#define DLO_RESPONSE_LENGTH 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed from 2 to 3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a 2 - I will undo that

@jamill jamill force-pushed the named_pipe_stream_reader branch 2 times, most recently from 77681ee to d226563 Compare September 14, 2018 02:21
@jamill jamill changed the title RFC: Named pipe protocol Tweak Named pipe protocol Sep 14, 2018
@jamill
Copy link
Member Author

jamill commented Sep 17, 2018

This is ready for review now. Thanks for the initial feedback on the approach!

/cc @sanoursa @benpeart @wilbaker

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to hear your thoughts on a length field rather than a terminator, and some other minor cleanup

@jamill jamill force-pushed the named_pipe_stream_reader branch from d226563 to 6278f87 Compare September 18, 2018 21:08
Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimized read here avoids the reallocations, so that looks good to me. Thanks for making those changes.

// this was the end of the message), but the stream has been closed. Throw an exception
// and let upper layer deal with this condition.

throw new IOException("Incomplete message read from stream. Stream closed before message terminating byte.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would throw a BrokenPipeException here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that - I was trying to decide between those two exceptions. Currently, NamedPipeClient and NamedPipeServer both handle IOException , which is why I chose this exception. If I switch to BrokenPipeException, I will also update the handling code in NamedPipeServer to handle BrokenPipeExceptions in the same manner as it currently does for BrokenPipeExceptions.

I can go with either approach - do you have a preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we were handling IOExceptions before because that's what the built in reader/writer were throwing. I would update this to throw/catch our specific exception type, and remove the handler for IOException unless there's still a valid reason for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - the underlying Stream class can still throw IOException. We could have the NamedPipeStreamReader and NamedPipeStreamWriter class catch and wrap all underlying IOExceptions as BrokenPipeExceptions.

Would you be OK with keeping this as an IOException for now, which would allow us to keep all existing error handling in place? I think it would be easier if we extracted that change into its isolated change, so we can see the update to the exception type (and all handlers) by itself?

/// </summary>
public class NamedPipeStreamReader
{
public const int BufferLength = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only external code that uses this is unit tests, so please make it private. In my opinion it's not a good idea to share constants between production and test code, because they will just end up agreeing with each other no matter what.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to enforce wat that we test sending messages that are larger than the buffer size. If we completely hide this, then we could update the production code to use a buffer larger than what the test is using.

What if we instead expose an option for callers to control the buffer size. Then the tests could force a smaller buffer size and ensure it is testing the expected functionality...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do either of those things. A unit test should be testing a requirement, not knowledge of an implementation. So the unit test should decide how big of a message is reasonable to send and test that limit, and the implementation is then responsible for handling that.

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with suggestions

catch (IOException)
catch (IOException ex)
{
this.tracer.RelatedError($"Error reading message from NamedPipe: {ex.Message}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions:

  • Include the full stack trace
  • Create an EventMetadata and add a value for the stack trace


if (bytesRead == 0)
{
// We have read a partial message (the last byte received does not indicate that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log the partial message?

@jamill jamill force-pushed the named_pipe_stream_reader branch 2 times, most recently from 79e1fd2 to 524c434 Compare September 20, 2018 16:50
GVFS currently sends a text based data across a named pipe with lines /
messages separated by newline characters. This prohibits sending
messages that include a newline as part of the message content. The main
current manifestation of this problem is that GVFS does not handle git
command lines that contain a newline character. There is another problem
that GVFS needs to send paths across the named pipe (for
ModifiedPathsList queries), and these paths can include newline
characters on certain filesystems (macOS, Linux).

Additionally, the protocol that GVFS uses to communicate across the
named pipe is currently not consistent. It usually uses a text based
stream, but for the ModifiedPathsList response, it uses null bytes to
seperate entries in a list.

To address these issues, this change tweaks the protocol used to
communicate via the named pipe to use the 0x3 byte to indicate the end
of a line / message (this is the End of text ASCII code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants