Ah, then I don’t see what’s wrong with “the number of ways in which the system can be non-uniform in temperature is much lower than the number of ways it can be uniform in temperature”. In equilibrium one doesn’t have a gradient of temperature because “…” indeed.
If you take "temperature" to mean "average kinetic energy of molecules" then it's fine. But that's sort of the same class of simplification as saying "entropy is the amount of disorder".
I don't follow you. Whatever you take temperature to mean, for an isolated system in equilibrium that intensive thermodynamic property will have the same value everywhere and the entropy of the system will thus be maximized given the constraints.
If you put two subsystems at different temperatures in thermal contact the combined system will be in equilibrium only when the cold one warms up and the hot one cools down. The increase in the entropy of the first is larger than the decrease in the entropy of the second (because ΔQ/T1 > ΔQ/T2 when T1<T2) and the total entropy increases.
No kinetic energies of molecules are involved in that phenomenological description of heat flowing from hot to cold.