Conversation
|
Hi @gulugulubing how much time do you have to finish this up? I want to get the serialization stuff in the books for release. If you have time, I can do some more review. If you don't have time, I can take what you have so far and get it to a mergeable state. No worries if you don't have time, I would completely understand. |
|
Sorry @schveiguy, I just checked the conservations but I could't recall what still need to do in this PR. |
|
Yes, and that's my fault. I was going to go through this again, but wanted to first make sure you still are able to work on it. I'll review this again and get more feedback for you. Thanks! |
source/iopipe/json/serialize.d
Outdated
| private bool isValidJSONNumber(string str) { | ||
| import std.regex; | ||
| static auto numberRegex = regex(r"^-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?$"); | ||
| return !str.matchFirst(numberRegex).empty; |
There was a problem hiding this comment.
I don't love using regex here. We do have a number validator in the parser, but it's awkward to use. Probably should factor out the validation.
This is OK for now.
|
I had a whole thing written up for how I want this to look, and I'm still mulling over it in my head, so I erased it all. Let me get this all written down, and I will get back to you. I think this is the right path, but maybe tweak some things. The formatter needs either options for everything, or we need specific formatter types (this is what I'm leaning towards). Will work on a review this weekend. |
schveiguy
left a comment
There was a problem hiding this comment.
In addition to all the comments below, we need a state system.
Many of these functions are only valid if you are in the right state. For example, it only makes sense to accept string data inside a string. It only makes sense to accept an object key if you are in an object, etc.
If we add the comma function, there's another state that needs handling.
source/iopipe/json/serialize.d
Outdated
|
|
||
| void addMember(T)(T key, MemberOptions options) { | ||
|
|
||
| if (key.length == 0 ) { |
There was a problem hiding this comment.
I don't like this requirement to call addMember for arrays. I had expected the different value starting functions (beginObject, beginArray, beginString, addNumericData, addKeywordValue) to handle the comma.
Thinking about comments, I think actually we probably want to add a specific function to handle adding the comma. Because comments can go anywhere.
|
Needs a rebase and resolve the conflict. |
It makes sense, but I haven't thought previously. Is this something like below in struct JSONFormatter: enum FormatterState {
TopLevel,
InObjectKey, // expecting a key (addMember)
InObjectValue, // expecting a value after colon
InArray, // inside an array, expecting a value
InString, // inside string data
}
FormatterState state = FormatterState.TopLevel;
// track nested containers: Obj/Array
enum ContainerKind { Obj, Arr }
ContainerKind[] containerStack; |
|
That's close I think. You can determine the current aggregate type (object or array) by the containerStack. You can actually probably just reuse the state enum from the parser, though there is no "colon" state as that's already handled. At least you can reuse the state names that do apply. Note the Oh, one more thing, the formatter needs to go in its own file. This is not part of the serializer, just like the parser isn't. I think an |
In the latest commit, I find comma state is not needed neither. |
schveiguy
left a comment
There was a problem hiding this comment.
Stopped looking, the state machine seems wrong. Each thing that adds data to the stream should change the state.
Maybe diagram it out? I'm not sure how it's supposed to work.
| // Parse array elements | ||
| size_t elementCount = 0; | ||
| while(true) { | ||
| while(tokenizer.peekSkipComma() != JSONToken.ArrayEnd) { |
There was a problem hiding this comment.
Unsure why this is here. Maybe you need a rebase?
There was a problem hiding this comment.
Since here was a conflict between this branch and master, I guess I need to accept the master version before rebase or merge.
source/iopipe/json/formatter.d
Outdated
| // subsequent member: comma + indent | ||
| putStr(","); | ||
| putIndent(); | ||
| // stay in Member |
There was a problem hiding this comment.
Seems incorrect:
startObjectMember();
startObjectMember();will output 2 consecutive commas.
Yeah, the process is a little bit subtle. Some functions of adding data seems no need to change state. Let's check a simple example: {
"name": "John",
"age": 30,
"hobbies": ["reading", "swimming"]
}
|
There was a problem hiding this comment.
So there are a few states which allow adding a "value":
BeginFirstfor arraysMemberfor arraysValuefor objects
And a "value" can either be the start of a string, a number, a keyword, an object start, an array start.
I suggest to write a helper method called canAddValue which returns true if the next thing can be a value. Then use this in all your code where you are validating state. This will help make the code cleaner also.
In fact, helper methods to validate the state in all cases, named appropriately, would make things a lot clearer. Methods like:
canAddValue-> adding a value is allowedcanAddStringData-> adding string data (or ending string) is allowedcanAddComment-> adding a comment is allowedcanCloseAggregate-> closing the aggregate is allowed
These can be private methods. They will just contain state validation and return a boolean value. The code that uses it will read a lot cleaner. It also allows the same validation code to be used for both throwing an exception and asserting.
source/iopipe/json/formatter.d
Outdated
| // // add a comment (JSON5 only), must be a complete comment (validated) | ||
| void addComment(T)(T commentData) { | ||
| static if (is(T == string)) { | ||
| if (!commentData.startsWith("//") && !commentData.startsWith("/*") && !commentData.endsWith("*/")) { |
There was a problem hiding this comment.
// comments need to end with a newline
This also requires // comments to end with */
The logic is hard to parse as well. I suggest:
bool isValidComment =
(commentData.startsWith("//") && commentData.endsWith("\n")) ||
(commentData.startsWith("/*") && commentData.endsWith("*/"));
if(!isValidComment)
throw new ...|
One thing here is that there are no unittests. The serialize code has formatting unittests, but we haven't hooked up serialization yet. So maybe a few simple ones, and maybe a few that validate the state handling is correct (i.e. you get an exception if you try doing things in the wrong order). |
schveiguy
left a comment
There was a problem hiding this comment.
Sorry for the delay, I've been super busy. More changes. I haven't reviewed the unittests yet.
source/iopipe/json/formatter.d
Outdated
| static if (validate) { | ||
| // Validate that the string doesn't contain invalid characters | ||
| foreach(char c; value) { | ||
| if (c < 0x20 && c != '\t' && c != '\n' && c != '\r') { |
There was a problem hiding this comment.
This isn't exactly correct, it needs to check for quote characters and lone backslashes.
In addition, I would expect both add escapes and validate as true should work with characters like newline, which should replace with a literal \n
I know I made this API in the discussion, I think it's wrong now. You can validate, add escapes, or do neither. You can't do both (adding escapes would ensure a valid string).
Maybe change to an enum?
passThru -> don't do anything, assume the string data is correct
addEscapes -> any invalid character is escaped into a valid sequence (this should be the default)
validate -> verify escapes are correctly used in the data.
Reimplemented the formatjson example program with new Formatter and tested some json files locally.