More

matu3ba · 2026-01-10T02:10:26 1768011026

1 Does this mean that Sledgehammer and Coqhammer offer concolic testing based on an input framework (say some computing/math system formalization) for some sort of system execution/evaluation or does this only work for hand-rolled systems/mathematical expressions?

Sorry for my probably senseless questions, as I'm trying to map the computing model of math solvers to common PL semantics. Probably there is better overview literature. I'd like to get an overview of proof system runtime semantics for later usage. 2 Is there an equivalent of fuzz testing (of computing systems) in math, say to construct the general proof framework? 3 Or how are proof frameworks (based on ideas how the proof could work) constructed? 4 Do I understand it correct, that math in proof systems works with term rewrite systems + used theory/logic as computing model of valid representation and operations? How is then the step semantic formally defined?

wizzwizz4 · 2026-01-10T02:26:34 1768011994

These questions are hard to understand.

1. Yes, the Sledgehammer suite contains three testing systems that I believe are concolic (quickcheck, nitpick, nunchaku), but they only work due to hand-coded support for the mathematical constructs in question. They'd be really inefficient for a formalised software environment, because they'd be operating at a much lower-level than the abstractions of the software environment, unless dedicated support for that software environment were provided.

2. Quickcheck is a fuzz-testing framework, but it doesn't help to construct anything (except as far as it constructs examples / counterexamples). Are you thinking of something that automatically finds and defines intermediary lemmas for arbitrary areas of mathematics? Because I'm not aware of any particular work in that direction: if computers could do that, there'd be little need for mathematicians.

3. By thinking really hard, then writing down the mathematics. Same way you write computer programs, really, except there's a lot more architectural work. (Most of the time, the computer can brute-force a proof for you, so you need only choose appropriate intermediary lemmas.)

4. I can't parse the question, but I suspect you're thinking of the meta-logic / object-logic distinction. The actual steps in the term rewriting are not represented in the object logic: the meta-logic simply asserts that it's valid to perform these steps. (Not even that: it just does them, accountable to none other than itself.) Isabelle's meta-logic is software, written in a programming language called ML.

matu3ba · 2026-01-10T02:56:13 1768013773

Sorry, it's 4am and I should sleep, but got very interested. Thank you very much for the excellent overview. This explains all my current questions.

zozbot234 · 2026-01-10T02:27:58 1768012078

1. Sledgehammer/CoqHammer are automated proof-search/assistant tools that bridge interactive proof assistants (Isabelle/Coq) to external SMT provers, they aren't about concolic testing in a PL sense. They translate goals and context to formats the external provers understand, call those provers, and replay/translate returned proofs or proof fragments back into the ITP, that's search and translation of a complete proof, not running concrete+symbolic program executions like concolic testing.

2. there is no exact analogue of fuzz-testing bytecode for "theorem fuzzing", but arguably the closest match is counterexample generators - model finders, finite-model search, SMT instantiation with concrete valuations, all serve a role similar to fuzzers by finding invalid conjectures quickly.

these are weaker than fuzzing for finding execution bugs because mathematical statements are higher-level and provers operate symbolically.

3. here's how proof frameworks are constructed, at the high-level:

a. start by picking a logical foundation: e.g., first-order logic (FOL), higher-order logic (HOL), dependent type theory (Martin-Löf/CIC).

b. define syntax (terms, types, formulas) and typing/judgment rules (inference rules or typing rules).

c. define proof objects or proof rules: natural deduction, sequent calculus, Hilbert-style axioms, or type-as-proofs (Curry-Howard).

d. pick or design the kernel: small, trusted inference engine that checks proof objects (reduction rules, conversion, rule admissibility).

e. add automation layers: tactics, decision procedures, external ATP/SMT, rewriting engines.

f. provide libraries of axioms/definitions and extraction/interpretation mechanisms (code extraction, models).

g. implement proof search strategies and heuristics (backtracking, heuristics, lemma databases, proof-producing solvers).

this is the standard engineering pipeline behind modern ITPs and automated systems.

4. yes, many proof assistants and automated provers treat computation inside proofs as:

a. term rewriting / reduction (beta-reduction, delta-unfolding, normalization) for computation oriented parts; this is the "computation" layer.

b. a separate deductive/logic layer for reasoning (inference rules, quantifier instantiation, congruence closure).

the combined model is: terms represent data/computation; rewriting gives deterministic computation semantics; logical rules govern valid inference about those terms. Dependent type theories conflate computation and proof via conversion (definitional equality) in the kernel.

5. here's how a proof step's semantics is defined:

proof steps are applications of inference rules transforming sequents/judgments. formally:

a. a judgment form J (e.g., Γ ⊢ t : T or Γ ⊢ φ) is defined.

b. inference rules are of the form: from premises J1,...,Jn infer J, written as (J1 ... Jn) / J.

c. a proof is a finite tree whose nodes are judgments; leaves are axioms or assumptions; each internal node follows an inference rule.

d. for systems with computation, a reduction relation → on terms defines definitional equality; many rules use conversion: if t →* t' and t' has form required by rule, the rule applies.

e. in type-theoretic kernels, the step semantics is checking that a proof object reduces/normalizes and that constructors/eliminators are used respecting typing/conversion; the kernel checks a small set of primitive steps (e.g., beta, iota reductions, rule applications).

operationally: a single proof step = instantiate a rule schema with substitutions for metavariables, perform any required reductions/conversions, and check side conditions (freshness, well-formedness).

equational reasoning and rewriting: a rewrite rule l → r can be applied to a term t at a position p if the subterm at p matches l under some substitution σ; result is t with that subterm replaced by σ(r). Confluence/termination properties determine global behavior.

higher-level tactics encode sequences of such primitive steps (rule applications + rewrites + searches) but are not part of kernel semantics; only the kernel rules determine validity.

relevant concise implications for mapping to PL semantics:

treat proof state as a machine state (context Γ, goal G, local proof term), tactics as programs that transform state; kernel inference rules are the atomic instruction set; rewriting/normalization are deterministic evaluation semantics used inside checks.

automation ≈ search processes over nondeterministic tactic/program choices; model finders/SMTs act as external oracles producing either counterexamples (concrete) or proof certificates (symbolic).

matu3ba · 2026-01-05T10:11:22 1767607882

> we could design something faster, safer, overall simpler, and easier to program

I do remain doubtful on this for general purpose computing principles: Hardware for low latency/high throughput is at odds with full security (absence of observable side-channels). Optimal latency/throughput requires time-constrained=hardware programming with FGPAs or building hardware (high cost) usually programmed on dedicated hardware/software or via things like system-bypass solutions. Simplicity is at odds with generality, see weak/strong formal system vs strong/weak semantics.

If you factor those compromises in, then you'll end up with the current state plus historical mistakes like missing vertical system integration of software stacks above Kernel-space as TCB, bad APIs due to missing formalization, CHERI with its current shortcomings, etc.

I do expect things to change once security with mandatory security processor becomes more required leading to multi-CPU solutions and potential for developers to use on the system complex+simple CPUs, meaning roughly time-accurate virtual and/or real ones.

> The second is that there isn’t strong demand.

This is not true for virtualization and security use cases, but not that obvious yet due to missing wide-spread attacks, see side-channel leaks of cloud solutions. Take a look at hardware security module vendors growth.

matu3ba · 2026-01-05T09:05:20 1767603920

CHERI on its own does not fix many of the side-channels, which would need something like "BLACKOUT : Data-Oblivious Computation with Blinded Capabilities", but as I understand it, there is no consensus/infra on how to do efficient capability revocation (potentially in hardware), see https://lwn.net/Articles/1039395/.

On top of that, as I understand it, CHERI has no widespread concept of how to allow disabling/separation of workloads for ulta-low latency/high-throughput/applications in mixed-critical systems in practical systems. The only system I'm aware of with practical timing guarantees and allowing virtualization is sel4, but again there are no practical guides with trade-offs in numbers yet.

matu3ba · 2026-01-01T21:23:21 1767302601

The blog looks nice, especially having simple to understand numbers. To me the memory subsystem articles are missing the more spicy pieces like platform semantics, barriers, de-virtualization (latter discussed in an article separate of the series). In the other articles I'd also expect debugging format trade-offs (DWARF vs ORC vs alternatives), virtualization performance and relocation effects briefly discussed, but could not find them. There are a few C++ article missing: 1. cache-friendly structures in C++, because standard std::map etc are unfortunately not written to be cache-friendly (only std::vector and std::deque<T> with high enough block_size), ideally with performance numbers, 2. what to use for destructive moves or how to roll your own (did not make it into c++26).

matu3ba · 2025-12-02T22:38:31 1764715111

> I still think sans-io at the language level might be the future, but this isn't a complete solution. Maybe we should be simply compiling all fns to state machines (with the Rust polling implementation detail, a sans-io interface could be used to make such functions trivially sync - just do the syscall and return a completed future).

Can you be more specific what is missing in sans-io with explicit state machine for static and dynamic analysis would not be a complete solution? Serializing the state machine sounds excellent for static and dynamic analysis. I'd guess the debugging infrastructure for optimization passes and run-time debugging are missing or is there more?

zamalek · 2025-12-02T23:01:43 1764716503

Exactly the caveat that they themselves disclose: some scenarios are too dynamic for static analysis.

matu3ba · 2025-11-26T13:12:44 1764162764

What would be more sane alternatives, when it becomes obvious that any side-effect of timing is a potential attack vector? See https://www.hertzbleed.com/ for frequency side channels. I do only see dedicated security cores as options with fast data lanes to the CPU similar to what Apple is doing with Secure Enclave or do you have better suggestions that still allow performance and power savings?

amluto · 2025-11-26T16:06:53 1764173213

A design such that it would actually make sense for a compiler to mark code that should permit data-dependent CPU optimizations differently from code that should not.

This could be done using an opcode prefix, which would bloat code but would work perfectly. Or it could use an RFLAGS bit or a bit in MXCSR or a new register, etc.

Almost anything would be better than an MSR that is only accessible to privileged code.

cesarb · 2025-11-26T16:23:30 1764174210

> Or it could use an RFLAGS bit or a bit in MXCSR or a new register, etc.

> Almost anything would be better than an MSR that is only accessible to privileged code.

ARM does that: their flag (DIT) is accessible by non-privileged code. If you know the architecture has that flag, either because your -march= is recent enough or because the operating system told you so through the hwcaps or the emulated id registers, you can use it freely without needing to switch to privileged mode through a syscall.

JoshTriplett · 2025-11-26T16:43:17 1764175397

The one reason I can imagine to make it privileged-only is that it could be high-overhead to switch: if a CPU conditioned various kinds of predictors on it, it might have to flush those predictors when toggling it.

amluto · 2025-11-26T21:06:15 1764191175

IMO the best design would be to keep the flag with the data. Give each register an extra bit indicating whether it’s sensitive. Any data-dependent-timing operation can’t possibly leak the data until the data is available to it, and that’s exactly when the ALU would find out that the data is sensitive anyway. No pipeline stalls.

JoshTriplett · 2025-11-28T19:08:36 1764356916

Makes sense, but "give each register an extra bit" seems like something that would very easily get lost with a spill or any number of other ways.

matu3ba · 2025-12-09T20:54:47 1765313687

Sorry for necro-bumping, but there is a paper doing exactly that besides various other things to eliminate timing channels claiming also to prevent attacks based on speculative execution etc: "BLACKOUT : Data-Oblivious Computation with Blinded Capabilities" https://arxiv.org/abs/2504.14654. They basically utilize another bit of CHERI for "blinded capability" and methods to mitigate potential problems you identified.

matu3ba · 2025-11-25T23:26:09 1764113169

My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium. In other words knowing how to manipulate the world directly and indirectly via deterministic actions and known inputs and teach others via various mediums. As example, you can be very intelligent at software programming, but socially very dumb (for example unable to socially influence others).

As example, if you do not understand another person (in language) and neither understand the person's work or it's influence, then you would have no assumption on the person's intelligence outside of your context what you assume how smart humans are.

ML/AI for text inputs is stochastic at best for context windows with language or plain wrong, so it does not satisfy the definition. Well (formally) specified with smaller scope tend to work well from what I've seen so far. Known to me working ML/AI problems are calibration/optimization problems.

What is your definition?

Yizahi · 2025-11-26T10:46:47 1764154007

Forming deterministic actions is a sign of computation, not intelligence. Intelligence is probably (I guess) dependent on the nondeterministic actions.

Computation is when you query a standby, doing nothing, machine and it computes a deterministic answer. Intelligence (or at least some sign of it) is when machine queries you, the operator, on it's own volition.

matu3ba · 2025-11-26T12:03:37 1764158617

> Forming deterministic actions is a sign of computation, not intelligence.

What computations can process and formalize other computations as transferable entity/medium, meaning to teach other computations via various mediums?

> Intelligence is probably (I guess) dependent on the nondeterministic actions.

I do agree, but I think intelligent actions should be deterministic, even if expressing non-deterministic behavior.

> Computation is when you query a standby, doing nothing, machine and it computes a deterministic answer.

There are whole languages for stochastic programming https://en.wikipedia.org/wiki/Stochastic_programming to express deterministically non-deterministic behavior, so I think that is not true.

> Intelligence (or at least some sign of it) is when machine queries you, the operator, on it's own volition.

So you think the thing, who holds more control/force at doing arbitrary things as the thing sees fit, is more intelligent? That sounds to me more like the definition of power, not intelligence.

Yizahi · 2025-11-26T12:38:02 1764160682

> So you think the thing, who holds more control/force at doing arbitrary things as the thing sees fit, is more intelligent? That sounds to me more like the definition of power, not intelligence.

I want to address this item. I think not about control or comparing something to something. I think intelligence is having at least some/any voluntary thinking. A cat can't do math or write text, but he can think on his own volition and is therefore intelligent being. A CPU running some externally predefined commands, is not intelligent, yet.

I wonder if LLM can be stepping stone to intelligence or not, but it is not clear for me.

matu3ba · 2025-11-26T13:00:27 1764162027

I like the idea of voluntary thinking very much, but I have no idea how to properly formalize or define it.

pron · 2025-11-25T23:42:10 1764114130

> My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium.

I don't think that's a good definition because many deterministic processes - including those at the core of important problems, such as those pertaining to the economy - are highly non-linear and we don't necessarily think that "more intelligence" is what's needed to simulate them better. I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction. Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources.

> What is your definition?

I don't have one except for "an overall quality of the mental processes humans present more than other animals".

matu3ba · 2025-11-26T09:53:16 1764150796

> I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction.

I do understand proofs as formalized deterministic action for given inputs and processing as the solving of various proofs.

> Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources.

Induction is only one way to construct a process and there are various informal processes (social norms etc). It is true, that the overall process depends on various things like available data points and resources.

> I don't have one except for "an overall quality of the mental processes humans present more than other animals".

How would your formalize the process of self-reflection and believing in completely made-up stories of humans often used as example that distinguishes animals from humans? It is hard to make a clear distinction in language and math, since we mostly do not understand animal language and math or other well observable behavior (based on that).

VMG · 2025-11-26T10:23:55 1764152635

ML/AI is much less stochastic than an average human

matu3ba · 2025-11-23T11:15:57 1763896557

Can you recommend a more factual and complete overview on Apple security architecture and bootchain than this bug-ridden article? I'm interested in hardware security (models).

matu3ba · 2025-11-21T07:51:58 1763711518

> Regulations are practically the only thing standing between the rich and the powerful and their incessant attempt to drive even more wealth into their own pockets at the expense ordinary people's health, wealth, future, welfare, housing, etc.

Try to rethink how money is created and how money gets its value and how and by whom that wealth is distributed. Regulation as in "make rules" does not enforce rules, which is the definition of (political) power.

> The other important requirement is to increase the staffing of the regulatory agencies so that their individual workload doesn't become a bottleneck in the entire process. There is a scientific method to assess the staffing requirements of public service institutions. According to that, a significant number of government departments all over the world are understaffed.

Why are you claiming "There is a scientific method" and do not provide it? Governments do (risk) management by 1 rules, 2 checks and 3 punishment and we already know from software that complexity in system is only bounded by system working with eventual necessary (ideally partial) resets. Ideally governments would be structured like that, but that goes against governments interest of extending power/control. Also, "system working" is decided by the current ruling class/group. Besides markets and physical constrains.

scotty79 · 2025-11-21T07:57:11 1763711831

> Try to rethink how money is created and how money gets its value and how and by whom that wealth is distributed.

Please elaborate.

matu3ba · 2025-11-21T08:34:17 1763714057

Money is created and distributed via 1 banking system and 2 government. Are 1 rules, 2 checks and 3 punishment enforced against the banking system and government or only to stabilize and extend those systems? I'd argue the introduction of (arbitrary) rules are often just the excuses to amass power, but enforcement of checks and punishments decides who holds (political) power.

Mitochondriac · 2025-11-21T08:44:16 1763714656

Money is printed out of thin air by the FED and then loaned out to the government for them to spend, so it enters the economy. Something along those lines.

matu3ba · 2025-11-19T18:14:35 1763576075

> Call me grumpy and sleep deprived, but every year I look at this talk again, and every year I wonder... "now, what" ? What am I supposed to do, as a programmer, to change this sad state of things ?

That depends on your goals. If you are into building systems for selling them (or production), then you are bound by the business model (platform vs library) and use cases (to make money). Otherwise, you are more limited in time.

To think more realistically about reality you have to work with, take a look at https://www.youtube.com/watch?v=Cum5uN2634o about types of (software) systems (decay), then decide what you would like to simplify and what you are willing to invest. If you want to properly fix stuff, unfortunately often you have to first properly (formally) specify the current system(s) (design space) to use it as (test,etc) reference for (partial) replacement/improvement/extension system(s).

What these type of lectures usually skip over (as the essentials) are the involved complexity, solution trade-offs and interoperability for meaningful use cases with current hw/sw/tools.