Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did something similar in a messaging application... We had a data structure consisting of several tables describing various message queues in 4GB of persistent storage. The initialization of those tables was quite regular, as they were just pointers into blocks in the persistent memory. When the application was ported to run on AWS instances, people decided to try to run the application on instances with the crappiest 20MB/s storage. I had specified during implementation that the journalling layer would not work well on crappy storage, as the actual persistent memory the system was originally designed for had gobs of bandwidth (it was supercap backed DRAM on a PCIe card built using an FPGA and a pair of 10Gbps SFP+ ports to mirror to another host -- you had a couple of GB/s of write throughput). Customers ignored the requirement, and opened a bug saying the system couldn't reinitialize the table within the 30 second limit for commands on the CLI. To fix the "bug", I ended up delta encoding the persistent memory tables to make the data more regular, ran it through libz at the lowest compression level, then wrote the compressed data out to disk. 4GB of data ended up taking less than 100MB in the journal. gzip compression on its own was not able to get under the 600MB requirement. It was a total hack, but it worked and took a few hours to implement. Domain specific knowledge really helps improve compression ratios!


If you zig zag encode the deltas you might be able to get it down to 10mb.

δzd is the best. https://justine.lol/sizetricks/#dzd


For anyone wondering, zig-zag encoding is just encoding integers with a sign bit in the LSB instead of twos-complement, hence the name. See also https://news.ycombinator.com/item?id=31382907




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: