Motivation - Usability and Reliability:
- The current writer is a ref struct, which requires passing by ref.
- The use of array pool could result in reliability concerns if misused.
- Async writes and state passing could be problematic.
- No
IBufferWriter
implementation built in.
Goals:
- Continue to support
IBufferWriter
(i.e. PipeWriter) directly - Keep ability for user to control buffering with ability to avoid data copies.
Current API shape: https://github.com/dotnet/corefx/blob/dc63fa0a60e68174f423fe5316e323c6635c08fe/src/System.Text.Json/ref/System.Text.Json.cs#L218-L310
public ref partial struct Utf8JsonWriter
{
public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterState state = default)
public void Flush(bool isFinalBlock = true)
public JsonWriterState GetCurrentState()
// All the write APIs
}
public partial struct JsonWriterState
{
public JsonWriterState(JsonWriterOptions options = default)
public long BytesCommitted { get; }
public long BytesWritten { get; }
public JsonWriterOptions Options { get; }
}
Original Proposal and issue: https://github.com/dotnet/corefx/issues/33552
New API shape proposal:
public partial class Utf8JsonWriter
{
public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterOptions options = default)
public void Flush(bool isFinalBlock = true)
public void Clear() { }
public JsonWriterOptions Options { get; }
// All the write APIs
// Future/stream considerations:
public Utf8JsonWriter(Stream utf8Json, JsonWriterOptions options = default)
public void FlushAsync(CancellationToken cancellationToken = default, bool isFinalBlock = true)
}
Changes:
- No longer a ref struct, but rather a class
- No more
JsonWriterState
/state shuffling requried - Add support for stream and flushasync
- Add the options property
- Remove the optional escape bool on all the APIs and re-evaluate adding "JsonEncodedString" overloads that are pre-escaped by the caller
Also, provide heap array-based IBufferWriter
implementation in-box:
https://github.com/dotnet/corefx/issues/34894
https://github.com/dotnet/corefx/blob/dc63fa0a60e68174f423fe5316e323c6635c08fe/src/System.Text.Json/src/System/Text/Json/Serialization/ArrayBufferWriter.cs#L11
Changes:
- No longer
IDisposable
- No more Arraypool rent/return, regular GC tracked arrays.
- Can be cleared and re-used.
- Implemented within
System.Memory
assembly instead ofSystem.Buffers
.
Various prototypes: https://github.com/ahsonkhan/corefx/tree/Utf8JsonWriter_experiments
Sample Usage:
public void WriteLargeSyncIBW()
{
using var output = new ArrayBufferWriter<byte>();
var jsonUtf8 = new Utf8JsonWriter_Final(output, new JsonWriterOptions { Indented = false, SkipValidation = true });
byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");
//jsonUtf8.WriteStartArray();
//for (int i = 0; i < 200_000_000; i++)
//{
// jsonUtf8.WriteStringValue(utf8String); // integer overflow/OOM
//}
//jsonUtf8.WriteEndArray();
//jsonUtf8.Flush(isFinalBlock: true);
const int SyncWriteThreshold = 1_000_000;
jsonUtf8.WriteStartArray();
for (int i = 0; i < 200_000_000; i++)
{
jsonUtf8.WriteStringValue(utf8String);
if (jsonUtf8.BytesWritten > SyncWriteThreshold)
{
jsonUtf8.Flush(isFinalBlock: false);
// Write to some output stream, or advance pipe forward, etc.
output.Clear();
jsonUtf8.Clear(); // or reset, refresh, etc.
}
}
jsonUtf8.WriteEndArray();
jsonUtf8.Flush(isFinalBlock: true);
}
public void WriteLargeSyncStreamOverIBW() // Could be task returning async method
{
const int SyncWriteThreshold = 1_000_000;
string filePath = @"some path to json.json";
using var stream = new FileStream(filePath, FileMode.OpenOrCreate); // or MemoryStream/etc.
using var output = new ArrayBufferWriter<byte>();
var jsonUtf8 = new Utf8JsonWriter_Final(output, new JsonWriterOptions { Indented = false, SkipValidation = true });
byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");
jsonUtf8.WriteStartArray();
for (int i = 0; i < 200_000_000; i++)
{
jsonUtf8.WriteStringValue(utf8String);
if (jsonUtf8.BytesWritten > SyncWriteThreshold)
{
jsonUtf8.Flush(isFinalBlock: false);
stream.Write(output.WrittenMemory.Span); // Could be async
output.Clear();
jsonUtf8.Clear(); // or reset, refresh, etc.
}
}
jsonUtf8.WriteEndArray();
jsonUtf8.Flush(isFinalBlock: true);
stream.Write(output.WrittenMemory.Span); // Could be async
}
API Review Feedback:
- We should aim to simplify the stream usage even more:
public async Task WriteLargeStreamAsync(string path)
{
const int SyncWriteThreshold = 1_000_000;
string filePath = @"some path to json.json";
using var stream = new FileStream(filePath, FileMode.OpenOrCreate);
var options = new JsonWriterOptions { Indented = false, SkipValidation = true };
await using var jsonUtf8 = new Utf8JsonWriter(stream, options);
byte[] utf8String = Encoding.UTF8.GetBytes("some string 1234");
jsonUtf8.WriteStartArray();
for (int i = 0; i < 200_000_000; i++)
{
jsonUtf8.WriteStringValue(utf8String);
if (jsonUtf8.BytesPending > SyncWriteThreshold)
{
await jsonUtf8.FlushAsync();
}
}
jsonUtf8.WriteEndArray();
}
- The
Utf8JsonWriter
should implementIDisposable
andIAsyncDisposable
BytesWritten
should be renamed toBytesPending
- Auto-flush on dispose and remove isFinalBlock bool on flush (the user would need to validate correctness themselves)
- Add
Reset()
andReset(...)
overloads that allow re-using theUtf8JsonWriter
instance. - Remove
Clear()
and clear the memory field on flush to avoid forcing the caller to reset after flushing. - Consider adding leaveOpen/dispose bools on ctor to close/dispose the stream on
Utf8JsonWriter.Dispose
. - Keep the
JsonWriterOptions
a struct. It cannot be modified once passed in to the constructor. - Internally create an ArrayBufferWriter to have the same code path for both
Stream
andIBufferWriter
modes.
I guess you'll be changing the Reader to class too?