-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Describe the feature
Currently, @swc/wasm-typescript takes a JS string as input and returns a JS string as output. This can incur a lot of unnecessary overhead in string transcoding if the user needs them in UTF-8 encoded data. It would nice if swc accepts UTF8-encoded data as input and return UTF8-encoded data as output, at least stored in Uint8Arrays.
In particular this would be useful for Node.js, which typically reads the source code as UTF-8 encoded buffers from disk first, and when integrating TypeScript into the compile cache, it needs to write the transpiled code as UTF-8 encoded data to disk as well.
Babel plugin or link to the feature description
No response
Additional context
And as far as I can tell, swc needs to internally convert these strings into UTF8-encoded data before performing transpilation. So something like this is very likely to happen:
- Users read the TypeScript code from disk, which is typically stored in UTF-8, so the UTF8 input data is already first read into a Uint8Array (or a Node.js Buffer, which is a subclass of Uint8Array)
- Since swc needs a string input, users have to convert that UTF-8 content into a JS string. In the case of strings in V8, it needs to be transcoded into either Latin-1 (if it fits) or UTF-16 in the underlying storage.
- AFAICT swc needs to convert that JS string into UTF-8 encoded data in a Uint8Array and pass it into the rust layer to be converted into a UTF-8 rust string, that code is generated by wasm-bindgen using a TextEncoder.
- After transpilation is done the result is converted again from a UTF-8 rust string into a Uint8Array and then into a JS string. That is done by wasm-bindgen-generated code using a TextDecoder.
- The user needs to convert that JS string returned by swc into UTF-8 data in a Uint8Array again before writing it to disk to store the result in UTF-8.
If swc just supports UTF8 input/output in Uint8Array, 2-5 can be skipped in the case where users don't need the intput/output as JS strings for additional manipulation. Even if they do, they can skip 3-4 by keeping the Uint8Arrays with UTF8 data on the side.