It's the "god" part of "god king" that was the delusion, and all of the wasted effort that went into ensuring the Pharoah's resurrection and immortality after death. And yes, it's a delusion regardless of how many people believe in it.
That's all well and good until a really bad drought or a plague blows through and people start to wonder if maybe, just maybe, the inbred jackass on the golden throne doesn't control the weather after all.
Except, the Egyptian society was quite stable for 3,000 years. Can you imagine the USA existing for 3,000 years? Will there ever be another human civilization that lasts as long as the ancient Egyptian civilization?
My understanding of Egyptian chronology is that Egypt was far from stable for 3000 years. In fact, Ancient Egypt is broken up into the Old, Middle, and New Kingdom periods, separated by "intermediate periods" of a few centuries. Even then, it's generally reckoned around 2500 years from the beginning of the Old Kingdom to the incorporation by the Persian Empire.
But, even during the intermediate periods, the invaders became the pharaohs and kept the old time religion going.
Imagine back when Europe was under the thumb of the Roman Catholic church, but then it went on pretty much the same for 3,000 years. There would be some hiccups along the way, but for the normal peasant, it would pretty much be the same old same old from millennium to millennium.
Isn't the Catholic church becoming the roman Catholic church or vice versa sort of the same thing? Even with the split into Protestant it's still essentially the same core and looking back 5000 years from now it would probably be reasonable to glue it all together as the Ancient Roman-Christian period / civilization.
Imagine today there is not all these different European governments, but just the Catholic church controlling all of the different governments, which are all really branches of the Catholic Church. Their kings are determined by the Catholic church. All of them are under the Pope. Their laws have to be approved by the Catholic Church. Everyone is Catholic. Catholic bishops are more powerful than any king. Etc. And that is the way it is and continues for 3,000 years.
Building a big thing because you think it would be neat is not really a waste. It's a big thing, everyone knows it's not strictly necessary, but whether it's for the glorification of your nation or your people or your God doesn't matter so much. If people don't have gods they build big monuments to other stuff.
If we all have the delision that you can fly with the power of your mind that is still a delusion. Because one can perform an experiment and see that you in fact can’t fly with the power of your mind.
But if we all believe that you are the eastern bunny, or the coolest dude on the planet, or the twice crowned poet laurate, those are social constructs. We believe you are the eastern bunny and that makes you the eastern bunny, and that’s no longer a delusion.
I think your hang up is that you have a set of expectations you think a “god” should fulfill, and clearly the pharao did not fulfill them. And that is an objective fact. But there is no reason to expect that the ancient Egyptians shared your expectations about what a god is.
> ensuring the Pharoah's resurrection and immortality after death
That does not sound correct. I don’t think they believed that the Pharao will walk again after he died. That is what the world “resurection” would imply. Their belief was that there is some form of afterlife where you need to perform certain rituals. The pyramids and the treasures were there to aid the pharao in peforming those rituals so he can obtain a better position in the afterlife.
> I don’t think they believed that the Pharao will walk again after he died.
Don't count on it. At times, they believed everybody would.
It's complicated because concepts varied over time, and people had maybe five or eight souls (alright, soul-aspects) and there were two or three thousand years over which this changed (sometimes for ideological reasons).
> one form of the ba that comes into existence after death is corporeal—eating, drinking and copulating.
> The idea of a purely immaterial existence was so foreign to Egyptian thought ...
> the ba of the deceased is depicted in the Book of the Dead returning to the mummy and participating in life outside the tomb in non-corporeal form ...
I was under the impression that the Greeks leaned into a practice that was already well-established as part of rulership in the region. At the very least it seems we have evidence that Tutankhamun's parents were brother-sister and he appears to have had some severe abnormalities as a result:
> The results of the DNA analyses show that Tutankhamun was, beyond doubt, the child born from a first-degree brother-sister relationship between Akhenaten and Akhenaten’s sister (see Fig. 3). ... Pharaoh Tutankhamun suffered from congenital equinovarus deformity (also called ‘clubfoot’). The tomography scans of Tutankhamun’s mummy also revealed that the Pharaoh had a bone necrosis for quite a long time, which might have caused a walking disability. This was supported by the objects found next to his mummy. Did you know that 130 sticks and staves were found in its tomb?
And then we have Cleopatra the last Ptolemy and she seems normal. Even the famous inbred Charles Habsburg have relatively normal sister. Nature really plays dice sometimes.
https://acoup.blog/2023/05/26/collections-on-the-reign-of-cl... gives a good overview of Cleopatra parents as we understand them. Note too different family trees - the official one which is so inbred as to believe it isn't possible her parents could survive; and the unofficial one that recognizes nobles often sleep around and so we have no clue.
"I tell you, Winston, that reality is not external. Reality exists in the human mind, and nowhere else. Not in the individual mind, which can make mistakes, and in any case soon perishes: only in the mind of the Party, which is collective and immortal. Whatever the Party holds to be the truth, is truth. It is impossible to see reality except by looking through the eyes of the Party."
We like to think everyone was dumb but I'm pretty sure if those dudes could build pyramids, a lot of them also knew the Pharaoh wasn't a God even if lots of people believed, same as today with religions or cult of personality leaders.
Then there's the crown family of the UK or GB or whatever the proper calling, which claims to believe the same divine touch. You may call them ancients if you want, but they still get to make headlines.
You are missing the point here, while you might see a similar concept “divine right of kings” the lived experience was a lot different from modern times vs anything BCE.
That there similar social mechanics might be more appropriate.
"Lived experience" usually means first hand knowledge and experience, as opposed to the knowledge or information they would gain from external sources.
So, understanding this meaning, I hope it's quite obvious that lived experience is much different for people today than ancient people. Our technology is far more advanced, more information is available to us. And it is all influenced by the vast amount of information that is external to us which puts our first hand experience in different contexts.
All experience is necessarily firsthand. The word experience describes things that come in through the senses. Lived experience means something, but only if you buy into 20th century phenomenology.
re: changes. Yes things have changed. The point of the discussion is some people have asserted without argument that those differences lead to a fundamentally different concept of gods. There is no real reason to believe that that I've seen, and yet people keep pointing out that things are different as if differences in the world necessarily implies different experiences.
The Pharaoh wasn't a god, it was a ruler. I think they had the sun and other elements as "God". Kinda makes sense to praise the sun as it makes their agriculture go.
The pharaos were indeed worshipped as literal gods. Echnaton famously negated them all except for the sun and himself as the incarnation, but after his death all was restored to the normal system of polytheistic theoraty. The sungod Ra was still important, but not the most important. It was a complicated system and very different from our modern thinking.
There is an implication here that the Fortran implementation of `SGEMM` is somehow inadequate. But any modern Fortran compiler will quite easily apply the AVX and FMA optimizations presented here without any additional changes. Both GNU and Intel make these substitutions with the correct flags.
The unrolling optimization is also just another flag away (`-funroll-all-loops`). The Intel Compiler will even do this without prompting. In fact, it appears to only do a modest 2x unroll on my machine, suggesting that the extreme unroll in this article would have been overkill.
Parallelization certainly a lot to ask of Fortran 77 source, but there there is little stopping you from adding OpenMP statements to the `SGEMM` function. In fact, modern Fortran even offers its own parallelization constructs if you're willing to go there.
Which is to say: Let's not belittle this old Fortran 77 function. Yes it is old, and does not even resemble modern Fortran. But the whole point of Fortran is to free the developer from these platform-specific details, and hand the job off to the compiler. If you don't like that approach, then you're welcome to go to C or C++. But this little block of Fortran code is already capable of doing just about everything in this article.
The Fortran implementation is just a reference implementation. The goal of reference BLAS [0] is to provide relatively simple and easy to understand implementations which demonstrate the interface and are intended to give correct results to test against. Perhaps an exceptional Fortran compiler which doesn't yet exist could generate code which rivals hand (or automatically) tuned optimized BLAS libraries like OpenBLAS [1], MKL [2], ATLAS [3], and those based on BLIS [4], but in practice this is not observed.
Justine observed that the threading model for LLaMA makes it impractical to integrate one of these optimized BLAS libraries, so she wrote her own hand-tuned implementations following the same principles they use.
Fair enough, this is not meant to be some endorsement of the standard Fortran BLAS implementations over the optimized versions cited above. Only that the mainstream compilers cited above appear capable of applying these optimizations to the standard BLAS Fortran without any additional effort.
I am basing these comments on quick inspection of the assembly output. Timings would be equally interesting to compare at each stage, but I'm only willing to go so far for a Hacker News comment. So all I will say is perhaps let's keep an open mind about the capability of simple Fortran code.
Check out The Science of Programming Matrix Computations by Robert A. van de Geijn and Enrique S. Quintana-Ort. Chapter 5 walks through how to write an optimized GEMM. It involves clever use of block multiplication, choosing block sizes for optimal cache behavior for specific chips. Modern compilers just aren't able to do such things now. I've spent a little time debugging things in scipy.linalg by swapping out OpenBLAS with reference BLAS and have found the slowdown from using reference BLAS is typically at least an order of magnitude.
You are right, I just tested this out and my speed from BLAS to OpenBLAS went from 6 GFLOP/s to 150 GFLOP/s. I can only imagine what BLIS and MKL would give. I apologize for my ignorance. Apparently my faith in the compilers was wildly misplaced.
No, you can still trust compilers: 1) The hand-tuned BLAS routines are essentially a different algorithm with hard-coded information. 2) The default OpenBLAS uses OpenMP parallelism, so much speed likely originates from multithreading. Set OMP_NUM_THREADS environment variable to 1 before running your benchmarks. You will still see a significant performance difference due to a few factors, such as extra hard-coded information in OpenBLAS implementation.
I ran with OMP_NUM_THREADS=1, but your point is well taken.
As for the original post, I felt a bit embarrassed about my original comments, but I think the compilers actually did fairly well based on what they were given, which I think is what you are saying in your first part.
using AVX/FMA and unrolling loops does extremely little in the way of compiling to fast (>80% peak) GEMM code. These are very much intro steps that don't take into account many important ideas related to cache hierarchy, uop interactions, and even instruction decode time. The Fortran implementation is entirely and unquestionably inadequate for real high performance GEMMs.
I just did a test of OpenBLAS with Intel-compiled BLAS, and it was about 6 GFLOP/s vs 150 GFLOP/s, so I must admit that I was wrong here. Maybe in some sense 4% is not bad, but it's certainly not good. My faith in current compilers has certainly been shattered quite a bit today.
Anyway, I have come to eat crow. Thank you for your insight and helping me to get a much better perspective on this problem. I mostly work with scalar and vector updates, and do not work with matrices very often.
The inequality between matrix multiplication implementations is enormous. It gets even more extreme on GPU where I've seen the difference between naïve and cuBLAS going as high as 1000x. Possibly 10000x. I have a lot of faith in myself as an optimization person to be able to beat compilers. I can even beat MKL and hipBLAS if I focus on specific shapes in sizes. But trying to beat cuBLAS at anything makes me feel like Saddam Hussein when they pulled him out of that bunker.
BLIS does that in their kernels. I've tried doing that but was never able to get something better than half as good as MKL. The BLIS technique of tiling across k also requires atomics or an array of locks to write output.
I don't disagree, but where are those techniques presented in the article? It seems like she exploits the particular shape of her matrix to align better with cache. No BLAS library is going to figure that out.
I am not trying to say that a simple 50+ year old matrix solver is somehow competitive with existing BLAS libraries. But I disagreed with its portrayal in the article, which associated the block with NumPy performance. Give that to a 2024 Fortran compiler, and it's going to get enough right to produce reasonable bytecode.
Modern Fortran's only parallel feature is coarrays, which operate at the whole program level.
DO CONCURRENT is a serial construct with an unspecified order of iterations, not a parallel construct. A DO CONCURRENT loop imposes requirements that allow an arbitrary order of iterations but which are not sufficient for safe parallelization.
I have only looked at the GCC Fortran frontend. I would not say it is necessarily difficult but rather very ad-hoc and disconnected from the rest of gcc. I don't see many tools for specifying a grammar or tokens. Instead, there are a lot of constructs custom-written to handle Fortran. Looking around, the Go and Rust frontends feel equally disconnected.
I'm told that you eventually make your way to the "GCC IR", but it seems like you are largely on your own with how to get there.
I did not invest a lot of time here, so I could be wildly off base, but thought I'd at least try to give a very-slightly-hands-on perspective.
The motivation to "implement" physics in code, is that you can't "cheat." You have to spell out every step in a formal way. The motivating example in SICM is that the usual way the Euler-Lagrange equations are written ... doesn't make sense.
The authors explain: "Classical mechanics is deceptively simple. It is surprisingly easy to get the right answer with fallacious reasoning or without real understanding. Traditional mathematical notation contributes to this problem. Symbols have ambiguous meanings that depend on context, and often even change within a given context."
And why not just "code" but "functional code"? Well, it makes a lot more sense to "take a derivative of a function" if that function doesn't have side effects (etc). There is a tighter correspondence between functions in the programming sense and in the mathematical sense.
I don't think the MIT guys have the same motivations as the author of this book. He (Walck) discusses the suitability of (a subset of) Haskell in this article: https://arxiv.org/abs/1412.4880
Maybe someone else can shed light on the MIT mindset. Certainly some of Walck's points apply to Scheme as much as to Haskell, but Scheme lacks the type system, syntax and syntactical "convenience" of curried functions. The basic strength of functional programming is the lack of complex imperative book-keeping: your code looks more like math.
My impression is that SICP and SICM are eccentric.
Yes, and that's like arguing that spaces between words is syntactic distraction. It's clearly not, more syntax rules can make a language simpler to understand (for both humans and computers).
A very smart CS guy I know pitched functional programming for scientific computing- he said it would greatly speed up the performance of codes by not spending time computing results that weren't going to be used.
Although that's not a terrible idea, I have never actually seen any major scientific code that was based on functional programming and was significantly faster than its non-FP competitors. My guess is that the folks writing the codes are already pretty smart, not doing any extra work that could be easily removed, and already take advantage of algorithms that use non-functional paradigms which give them significant speedups
I've heard that before, usually from people with no experience in actual scientific computing. There's nothing wrong with using functional programming in scientific applications. I do. But I don't see how it's "specifically" good for scientific programming.
The thing about performance in scientific programming, it is often binary: You either need the very best, or you don't care about it at all. Unlike other areas of programming, there is no middle ground. If you need your scientific code to be performant, then you need to squeeze every last bit of performance out of your hardware, which you can only do with something like Fortran or C. If you don't care about performance, then it doesn't matter. That's why Python is so popular.
Ideally I would love for something like F# to replace python in the scientific computing space, but the ecosystem is so much larger in python. That's what matters to most scientists.
Generally agree, but: the idea for FP in scientific computing would be for the FP-optimizing compiler to elide any computation that doesn't contribute to the final result.
The analogy I think of is is tree traversal. A smart person can write an optimal tree traversal algorithm and make their program finish quickly, whether or not the user requested that part of the algorithm's results, but FP can realize the program doesn't output the tree, so traversing it can be skipped. OK, that's not a great analogy but the point is that in principle, FP optimization could find a cheaper way to produce the same exact values as a simulation written in a non-functional language.
How often are there competing implementations in scientific computing? Most of the time people are doing just enough to publish a paper, or maybe maintaining a single library that everyone uses. Few people have the inclination, and even fewer the funding, to "rewrite everything".
In finance, which has a lot of parallels with scientific computing but tends to end up with semi-secret, parallel, competing implementations of the same ideas, functional programming has had significant (though by no means universal) success in doing exactly what you describe.
Let's see. The two big codes I worked with- BLAST and AMBER- have competitors. For BLAST there have been a long history of codes that attempted to do better than it, and I don't think anybody really succeeded until that except possibly HMMER2. Both BLAST and HMMER2 had decades of effort poured into them. BLAST was rewritten a few times by its supporting agency (NCBI) and the author(s) of HMMER rewrote it to be HMMER2. I worked with the guy who wrote the leading competitor to HMMER, he was an independently wealthy computer programmer (with a physics background). In the case of AMBER, there are several competitors- gromacs, NAMD, and a few others are all used frequently. AMBER has been continuously developed for decades (I first used it in '95 and already it was... venerable).
All the major players in these fields read each other's code and papers and steal ideas
In other areas there are no competitors, there's just "write the minimal code to get your idea that contributes 0.01% more to scientific knowledge, publish, and then declare code bankruptcy". And a long tail of low to high quality stuff that lasts forever and turns out to be load-bearing but also completely inscrutable and unmodifiable.
After typing that out I realize I just recapitulated what you said in your first paragraph. My knowledge of finance is limited beyond knowing "jane street capital has been talking about FP for ages" and most of the people I've talked to say their work in finance (HPC mostly) is C++ or hardware-based.
Yes, a lot of things can be parallelised with OpenMP or MPI, just like in C/C++. These extensions and libraries are not core language features, though.
DO CONCURRENT is not a parallel construct. It is a serial loop with arbitrary ordering of iterations, and limitations on data accesses that were intended to ease parallelization but turned out to be incorrect.
Unfortunately, the necessary restrictions on data accesses to enable parallel execution are not required to hold true in the body of a DO CONCURRENT loop by its botched specification, and neither can they be verified at compilation time. And the committee has known about these problems for many years and has refused to fix them; Fortran 2023 still has them and the topic is not going to be brought up again for Fortran 2028.
So it is possible for a conforming program to be non-parallelizable, due to holes in the default data localization rules, despite the name of the construct and the obvious intent of the long list of restrictions imposed on code in the construct.
It's an excellent essay, and the Fortran community owes you a major gratitude for promoting these issues. But surely there loops can be safely parallelized if the iterations do not interact, e.g. per-element array arithmetic, and a compiler ought to safely identify such arithmetic.
Also, isn't your employer promoting do concurrent as a method of GPU parallelization? Has this been controversial within Nvidia?
It was inspired by him, a sort of philosophically oriented professor of religious history, attending an interdisciplinary conference on game theory in which he was the token philosopher.
But his book is more about play, novelty, authenticity, open-endedness, paradox and "life in general".
It emphases always keeping in mind the nesting of competition and cooperation and that one should always leave open the possibility to tweak the rules to keep the game in play (reminds me of fallibilsm).
I think he was influenced by Nicolas of Cusa's On Learned Ignorance hence the title oh another Carse book, The Religious Case Against Belief:
> "Therefore, every inquiry proceeds by means of a comparative relation, whether an easy or a difficult one. Hence, the infinite, qua infinite, is unknown; for it escapes all comparative relation." — Nicholas of Cusa, De Docta Ignorantia (On Learned Ignorance)
Also Carse's book has footnotes but no bibliography!
I actually read it through the lens of AGI but that was not his intention. Because of that it pairs well with Kenneth Stanley and Joel Lehman's Myth of the Objective.
Welp, guess I'm out.