Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We built something very similar back in 2016, in the jvm with unsafe memory and garbage-free data structures to avoid GC pauses. The dynamic clustering is not too hard, are you able to dynamically undo a cluster when new information shows up?

Are you running separate instances per customer to separate the information they have access to?



Assuming by undoing you mean splitting the cluster:

A linked list can be split in two in O(1). When it comes to updating the roots for all the removed nodes, there is no easy way out, but luckily:

- This process can be parallelized.

- It could be done just once for multiple clustering changes.

- This is a multi-level disjoint set, not all the levels or sub-clusters are usually affected. Upper level clustering, which is based on lower confidence level, can be rebuilt more easily.

If by undoing you mean reverting the changes, we don’t use a persistent data structure. When we need historical clustering, we use a patched forest with concurrent hash maps to track the changes, and then apply or throw them away.

We use a single instance for all clients, but when one CFD server processes new block data, it becomes fully blocked for read access. To solve this, we built a smart load balancer that redirects user requests to a secondary CFD server. This ensures there's always at least two servers running, and more if we need additional throughput.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: