Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most people that went after this tried for text-to-sql (e.g. ask a question and generate a ton of SQL to answer it). That approach has pretty much failed. The LLM could never have enough context to generate accurate SQL at a high enough rate to trust.

What we've found to actually work at Definite (I'm the founder) is text-to-semantic-query. This is an older video, but here's an example: https://www.youtube.com/watch?v=44mhLgUYOp8



How is this any different (text-to-sql vs text-to-semantic-query). Isn't this just comparing text-to-sql to text-to-slightly-simpler-sql?


Yes, it's simpler, but there's a few key differences:

1. You also have complete control over what the LLM can do / access thru the semantic layer (e.g. you can remove tables that the LLM shouldn't consider for analytical questions).

2. One of the biggest choke points for text-to-sql is constructing joins. All the joins are already built into the semantic layer.

3. Calculating metrics / measures is handled in the semantic layer instead of on the fly with SQL (e.g. if you ask something like "how much revenue did we generate from product X", you wouldn't want the LLM to come up with a calculation for revenue on the fly. Instead, revenue is clearly defined in the semantic layer).

4. The query format for our semantic layer (we use cube.dev) is JSON, which is much easier to control then free form SQL.

The semantic layer gives the LLM a well defined and constrained space to operate within whereas there are hundreds of ways for it to fail writing raw SQL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: