I think the forEach issue is a bad example, and something that could (and arguably should) be handled by the native implementation. The reason they get faster execution here is by breaking the spec.
A native implementation could have a single flag associated with the array recording whether it is sparse, and use the more efficient code path given here in the common space where it's non-sparse.
The point I took away from that one example was: The native functions have to deal with lots of edge cases, which causes them to be slower. By implementing similar functions which, per their spec, do not handle those edge cases, we can gain a significant performance improvement.
A native implementation could have a single flag associated with the array recording whether it is sparse, and use the more efficient code path given here in the common space where it's non-sparse.