Performance woes. I'm appalled.

Josh Gray2

Are you familiar with the Compiler Explorer[^] ? It's a very useful tool for looking at the assembly generated by gcc and other compilers

honey the codewitch

I like to do broad, algorithmic optimizations before I try to outsmart the compiler. I've gotten at least a 3 times speed improvement by changing my parsing to use strpbrk() over a memory mapped file. :-D

Approx stack size of local JSON stuff is 176 bytes
Read 1231370 nodes and 20383269 characters in 268.944000 ms at 70.646677MB/s
Skipped 1231370 nodes and 20383269 characters in 35.784000 ms at 530.963559MB/s
utf8 scanned 20383269 characters in 78.679000 ms at 241.487563MB/s
raw ascii i/o 20383269 characters in 58.141000 ms at 326.791765MB/s
raw ascii block i/o 19 blocks in 3.369000 ms at 5639.655684MB/s

The bold is the relevant line here. That's doing a parse of the bones of the document (looking for {}[]") in order to skip over it in a structured way. That style of parsing is used for searching, for example, when you're trying to find all ids in a document. It's using the mmap technique i mentioned. Here's snagging all "id" fields out of a 20MB file and reading their values.

Approx stack size of local JSON stuff is 152 bytes
Found 40008 fields and scanned 20383269 characters in 34.664000 ms at 548.119086MB/s

The bytes used stuff is roughly how much memory the query takes - including the sizes of the JsonReader and LexSource member variables.

Real programmers use butterflies

obermd

I suspect that the utf8 scanning is using fgetc underneath to return one character at a time. This would greatly simplify the implementation of the utf8 scanner.

honey the codewitch

What I use under the covers depends on what kind of LexSource you use. Mainly I use memory mapped files now, for speed, but I'm implementing one using fread and buffered access and we'll see how that stacks up. I'm very nearly breaking 600MB/s of JSON searching on my machine. :)

Real programmers use butterflies