Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can remember the columnar approach used in Fortran back in the day. Fortran did not have records, only scalar arrays, so a natural way to represent a bunch of "objects" was to have a bunch of arrays for each property value.

IDK if such prior art is any helpful today, though :)



This approach is still used (conceptually, at least) in statistical computing. NumPy and R are good examples of this approach, and also have solutions for the indexing problems outlined in the article.

That being said, statistical computing is at the mercy of the high cost of matrix multiplication.

I think that you can alter the order of arrays in numpy, and examine the performance difference.


I think the desired functionality is closer to Pandas than Numpy — you want coordinated arrays of different data types, not higher-dimension arrays of the same data type.

R and Pandas aren’t exactly known for their outstanding native performance (they’re generally fast after converting dataframes to arrays and outsourcing to C or Fortran) so they’re great demonstrations of the syntax for some of these indexing problems but leave a lot to be desired.


ROOT TTrees will "split" class members to effectively transform them into AoS although this is mostly for IO efficiency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: