Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the link I think this is a really interesting example:

> In qSQL this is: aj[`sym`time; t; q], which means perform an asof-join on t, looking up the nearest match from table q based on the sym and time column.

> In standard SQL, again you’ll have difficulty: sql nearest date, sql closest date even just the closest lesser date isn’t elegant. One solution would be:

> WITH cte AS (SELECT t.sym, t.time, q.bid, ROW_NUMBER() OVER (PARTITION BY t.ID, t.time ORDER BY ABS(DATEDIFF(dd, t.time, p.time))) AS rowNum FROM t LEFT JOIN q ON t.sym = q.sym) SELECT sym,time,bid FROM cte WHERE rowNum = 1

> It’s worth pointing out this is one of the queries that is typically extremely slow (minutes) on row-oriented databases compared to column-oriented databases (at most a few seconds).

This is a really nice example but I think it’s more about this as of join being a really useful operation, it appears both pandas https://pandas.pydata.org/docs/reference/api/pandas.merge_as... And duckdb https://duckdb.org/docs/guides/sql_features/asof_join.html

Pandas:

> pd.merge_asof(t, q, on='time', by='sym', direction='backward')

Duckdb:

> SELECT t.sym, t.time, q.value FROM t ASOF JOIN q ON t.sym = q.sym AND t.time >= q.time;

So it seems more like this is benefitting from a useful feature of time series databases rather than the features of an APL-family language.

Personally I find the pandas syntax to be the most straightforward here.



To someone unfamiliar with the pandas api I think the SQL version is by far the most readable.

It's just a regular join with the `asof` keyword in front.

The pandas version assumes both fields have the same name, it separates the on and by conditions in a way that's not necessary in SQL version, and imo the "backwards" keyword is a bit ugly and error prone.


Yeah I can see the benefit of pandas for those reading the code. I think they learnt from kdb that both backwards and forwards was needed. Kdb had to create a new aj0 function. But your point is valid that kdb may be over optimising for first writing code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: