I wonder if this segment is ready for disruption. Splunk is very expensive, ElasticSearch is still lacking many of the features of Splunk and when hosted on AWS is very expensive. SumoLogic was acquired by private equity, which means that it won't get cheaper. DataDog is also very expensive.
Solution like SnowFlake for logs / telemetry where compute and storage are separated might be the future.
We're[1] building the OSS equivalent when it comes to the observability side of Splunk/DD, on Clickhouse naturally of course but believe in the same end goal of lowering cost via separation of compute and storage.
We’re also giving this a shot. The annual Splunk bill at our last startup exploded from $10k to $1M when we reached 1TB of logs generated per day, which is actually an easy threshold to hit when you have decent traction and aren’t proactively reducing logs. So we built Scanner.dev to drop these costs by 10x.
Decoupling compute and storage is definitely the way to go. We’re using Lambda functions and ECS Fargate containers for compute that scales up and down rapidly, and S3 for storage. Getting ~1TB/sec log scan speeds, which feels fairly good. We keep sparse indices in S3 to narrow down regions of logs to scan. Eg. if you’re searching for an IP address that appears 10 times in a 25TB log set, the indices reduce the search space to around 300MB. Takes a few seconds to complete that query, whereas Athena and CloudWatch take like 20 minutes.
We’re also using Rust to maximize memory efficiency and speed - there are lots of great SIMD optimized string search and regex libraries on crates.io.
We’re early, so there are a lot of SIEM features like detection rules that we are still building. But Splunk/DataDog users might find it useful if costs are a problem and they use mostly log search:
Solution like SnowFlake for logs / telemetry where compute and storage are separated might be the future.