c++ - Stream while serializing with Cap'n'Proto -


consider cap'n'proto schema this:

struct document {   header @0 : header;   records @1 :list(record); // large number of records.   footer @2 :footer; } struct header { numberofrecords : uint32; /* fields */ }; struct footer { /* fields */ }; struct record {    type : uint32;    desc : text;    /* more fields, relatively large in total */ } 

now want serialize (i.e. build) document instance , stream remote destination.

since document large don't want build in memory before sending it. instead looking builder directly sends struct struct on wire. such additional needed memory buffer constant (i.e. o(max(sizeof(header), sizeof(record), sizeof(footer))).

looking @ tutorial material don't find such builder. mallocmessagebuilder seems create in memory first (then call writemessagetofd on it).

does cap'n'proto api support such use-case?

or cap'n'proto more meant used messages fit memory before sending?

in example, document struct omitted , 1 send sequence of 1 header message, n record messages , 1 footer. since cap'n'proto message self-delimiting, should work. loose document root - perhaps not option.

the solution outlined -- sending parts of document separate messages -- best use case. fundamentally, cap'n proto not designed streaming chunks of single message, since not fit random-access properties (e.g. happens when try follow pointer points chunk haven't received yet?). instead, when want streaming, should split large message series of smaller messages.

that said, unlike other similar systems (e.g. protobuf), cap'n proto not strictly require messages fit memory. specifically, can tricks using mmap(2). if document data coming file on disk, can mmap() file memory , incorporate message. mmap(), operating system not read data disk until attempt access memory, , os can purge pages memory after accessed since knows still has copy on disk. lets write simpler code, since no longer need think memory management.

in order incorporate mmap()ed chunk cap'n proto message, you'll want use capnp::orphanage::referenceexternaldata(). example, given:

struct mydocument {   body @0 :data;   # (other fields) } 

you might write:

// map file memory. void* ptr = (kj::byte*)mmap(     nullptr, size, prot_read, map_private, fd, 0); if (ptr == map_failed) {   kj_fail_syscall("mmap", errno); } auto data = capnp::data::reader((kj::byte*)ptr, size);  // incorporate message. capnp::mallocmessagebuilder message; auto root = message.getroot<mydocument>(); root.adoptdocumentbody(     message.getorphanage().referenceexternaldata(data)); 

because cap'n proto zero-copy, end writing mmap()ed memory directly out socket without ever accessing it. it's os read content disk , out socket appropriate.

of course, still have problem on receiving end. you'll find lot more difficult design receiving end read mmap()ed memory. 1 strategy might dump entire stream directly file first (without involving cap'n proto library), mmap() file , use capnp::flatarraymessagereader read mmap()ed data in-place.

i describe because it's neat thing possible cap'n proto not other serialization frameworks (e.g. couldn't protobuf). playing tricks mmap() useful -- i've used in several places in sandstorm, cap'n proto's parent project. however, suspect use case, splitting document series of messages makes more sense.


Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

java.lang.NoClassDefFoundError When Creating New Android Project -