Hacker Newsnew | past | comments | ask | show | jobs | submit | gweinberg's commentslogin

Yeah, given that it's been 5 years I would think there would be some followup.

Bob isn't giving you any actionable information. If Alice and Bob agree, you're more confident than you were before, but you're still going to be trusting Alice. If they disagree you're down to 50% confidence, but you still might as well trust Alice.

Better than 50% confidence: they only lie 20% of the time, so when they disagree it's still 64% likely to be heads (.8 x .8)

No, it's 50% -- given that e.g. the flip is H, the base probability is both 16% for HT and 16% for TH.

To complete the circle, now that we have winnowed the space down to these options, we would normalize them and end up with 0.16 / (0.16 + 0.16) = 0.5 = 50% in both cases.

The reason I'm not putting % signs on there is that, until we normalize, those are measures and not probabilities. What that means is that an events which has a 16% chance of happening in the entire universe of possibility has a "area" or "volume" (the strictly correct term being measure) of 0.16. Once we zoom in to a smaller subset of events, it no longer has a probability of 16% but the measure remains unchanged.

In this previous comment I gave a longer explanation of the intuition behind measure theory and linked to some resources on YouTube.

https://news.ycombinator.com/item?id=35796740


In the US they don't hang it on trees, they just leave by the side of the trail or road or whatever. But it is very common to see bags of dogshit on the sidewalk or by the side of a trail in the US.

Can't we just leapfrog to IPv7? or 8 for that matter?


The first thing I do whenever I see a discussion about IPv6 is to search for the jokers who talk about IPv5 or IPv7.


I have fun, but I probably wouldn't if the AI was right all the time. Or if I was helpless when it was wrong. But for now I'm still in the centaur zone.


How is "more than 100" people "rallying" even remotely newsworthy? What's the threshold, three?


if you read the article instead of just criticizing the headline:

> They listened to Michigan Attorney General Dana Nessel criticizing the lack of transparency with DTE, the utility that's associated with the Saline Township proposal, and legislators who protested tax breaks for data center projects.

> ...

> "We're talking about 1.4 gigawatts, which is, of course, enough to provide energy to a city of a million people," Nessel said. "I think we should be taking this extremely seriously, don't you? Do you guys trust DTE? Do you trust Open AI? Do we trust Oracle to look out for our best interests here in Michigan?"

this wasn't just a random group of 100 people, they were organized enough to get the state AG as well as multiple state legislators to speak. seems fairly newsworthy to me.


In Lansing, it was below freezing and windy most of the day. If I noticed 100 people standing around on the pavement for hours in that, I'd probably imagine they deserved at least some regard for their concerns. But then, I'm not a Michigan politician that needs to get gamer Johnny out of my basement and on to a cushy non-profit no-show kickback job, courtesy of whatever big tech outfit wants a data center.


Three people could be a group of friends. More than 100 is clearly different.

Given that there are usually _zero_ people rallying in Lansing, this is notable enough for the local newspaper.


It’s not just this group. A co-worker of mine went to his town meeting about a proposed data center. When he showed up it was standing room only and they had to move the meeting to a bigger venue. I’ve heard stories like this from a few people now around Michigan where they have been trying to put data centers. No one wants them.


> What's the threshold, three?

The threshold is an organization organizing it. Getting 100 people out demonstrates your political power to your supporters and the people you seek to influence. Getting 1,000 people demonstrates that you have more of it.


There is very little common space in Michigan. There is a lot of private land, and a lot of public land, but very few spaces where people congregate. So when they do, it stands out quite a bit.


Since the population is around 112k-114k people that would be around 111,900 people didn’t rally on the low end.


Lol, I prefer that version of the headline:

"99.9% of residents did not show up to protest new datacenters in Michigan"


You could use that same statistic for literally every protest ever, doesn't mean they're not worth the causes


there are dozens of us !


It would be noteworthy if 100 people showed up to my 5 year old's piano recital.

not so much for a 300 acre noisy, water hogging data center.


It’s a movement at 50[1].

[1] A. Guthrie, 1967


It wouldn't be newsworthy if we could trust our representatives not give extra weight to the opinions of the people who yell the loudest.


I don't understand why anyone thinks self-reported happiness scores mean anything at all. I don't see how they possibly could. If someone says he's a 10 on his personal scale I guess that means he can't imagine being much happier, but I don't see how that means he's particularly happy.


I read the page on Lindsey's paradox, and it's astonishing bullshit. It's well known that with sufficiently insane priors you can come up with stupid conclusions. The page asserts that a Bayesian would accept as reasonable priors that it's equally likely that the probability of child being born male is precisely 0.5 as it is that it has some other value, and also that if it has some other value that all values in the interval from zero to one are equally likely. But nobody on God's green earth would accept those as reasonable values, least of all a Bayesian. A Bayesian would say there's zero chance of it being precisely 0.5, but it is almost certainly really close to 0.5, just like a normal human being would.


A few points because I actually think Lindley’s paradox is really important and underappreciated.

(1) You can get the same effect with a prior distribution concentrated around a point instead of a point prior. The null hypothesis prior being a point prior is not what causes Lindley’s paradox.

(2) Point priors aren’t intrinsically nonsensical. I suspect that you might accept a point prior for an ESP effect, for example (maybe not—I know one prominent statistician who believes ESP is real).

(3) The prior probability assigned to each of the two models also doesn’t really matter, Lindley’s paradox arises from the marginal likelihoods (which depend on the priors for parameters within each model but not the prior probability of each model).


Are you seriously saying that, because a point distribution may well make sense if the point in question is zero (or 1) other points are plausible also? Srsly?

The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.


Srsly? Srsly.

> The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.

The birth sex ratio in humans is about 51.5% male and 48.5% female, well outside of your 99.99% interval. That’s embarrassing.

You are extremely overconfident in the ratio because you have a lot of prior information (but not enough, clearly, to justify your extreme overconfidence). In many problems you don’t have that much prior information. Vague priors are often reasonable.


Wikipedia has a section on this that I thought was presented fine.

https://en.wikipedia.org/wiki/Lindley%27s_paradox#The_lack_o...

Indeed, Bayesian approaches need effort to correct bad priors, and indeed the original hypothesis was bad.

That said. First, in defense of the prior, it is infinitely more likely that the probability is exactly 0.5 than it is some individual uniformly chosen number to each side. There are causal mechanisms that can explain exactly even splits. I agree that it's much safer to use simpler priors that can at least approximate any precise simple prior, and will learn any 'close enough' match, but some privileged probability on 0.5 is not crazy, and can even be nice as a reference to help you check the power of your data.

One really should separate out the update part of Bayes from the prior part of Bayes. The data fits differently under a lot of hypotheses. Like, it's good to check expected log odds against actual log odds, but Bayes updates are almost never going to tell you that a hypothesis is "true", because whether your log loss is good is relative to the baselines you're comparing it against. Someone might come up with a prior on the basis that particular ratios are evolutionarily selected for. Someone might come up with a model that predicts births sequentially using a genomics-over-time model and get a loss far better than any of the independent random variable hypotheses. The important part is the log-odds of hypotheses under observations, not the posterior.


Wikipedia is infamously bad at teaching math.

This Veritasium video does a great job at explaining how such skewed priors can easily appear in our current academic system and the paradox in general: https://youtu.be/42QuXLucH3Q?si=c56F7Y3RB5SBeL4m


Yeah, it may seem like a "better" (because stronger) conclusion to the author that if you have more pigeons than pigeonholes you must have more than one pigeon in a hole even if negative or irrational numbers of pigeons. But you're pretty much only invoking the pigeonhole principle in discrete math, where "more than one" means "at least 2".


Yeah, I think that was something that irked me about the "general" formulation: It suddenly brought in an average, i.e. a real, even though the "common" formulations only dealt with integers. This may be more general but made reasoning and understanding harder.


For a fingerprint to be useful it must not only be unique but also persistent. If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time, but the me of today can't be linked to the me of tomorrow, right?


Point still taken, however you can only really check if a given font is installed, not obtain a list of all fonts. Thus, installing a wacky font is pointless as the fingerprinter won’t bother to check that particular font. There is queryLocalFonts on chrome but this requires a permission popup.


Correct, however:

> By following users over time, as their fingerprints changed, they could guess when a fingerprint was an ‘upgraded’ version of a previously observed browser’s fingerprint, with 99.1% of guesses correct.

https://coveryourtracks.eff.org/static/browser-uniqueness.pd...

https://mullvad.net/en/browser/browser-fingerprinting


>If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time, but the me of today can't be linked to the me of tomorrow, right?

See: https://xkcd.com/1105/

Services with a large enough fingerprinting database can filter out implausible values and flag you as faking your fingerprint, which is itself fingerprintable.


The problem we’re falling into under this (ostensibly accurate) point is when we start making this a game, where fingerprinting is either “100% effective and insidious”, or “can’t be 100% certain 100% of the time, so it’s ineffective and nobody will use it against me”.

The point is that a sufficiently motivated actor could use a very broad array of tactics, some automated and some manual, to identify, observe, track, and/or locate a target. Maybe they can’t pin you down with your browser fingerprints because you’ve been smart enough to use tools that obfuscate it, but that’s not happening in a vacuum. Correlating one otherwise useless datapoint that happens to persist long enough to tie things together at even low-ish confidence is still a hugely worthwhile sieve with which to filter people out of the possibility pool.

The problem isn’t that it doesn’t affect most average people, or that it it’s terribly imprecise. The problem is that it’s even a little effective, while being nearly impossible to completely avoid. It’s also a problem if that’s used by a malicious state actor against a journalist, to pick a rather obvious example. Because even in isolation, this kind of violation of civil liberties necessarily impacts all of society.

The public should be given more information and control, broadly speaking, for when they are asked to trade their rights for convenience, security, and/or commerce. In particular, I think the United States has allowed bad faith arguments against regulatory actions and basic consumer rights so corporate lobbyists can steamroll any chance of even baseline protections. It would behoove all of us to be more distrustful of companies and moneyed interests, while being more engaged with, and demanding of, our governments.


But they still wouldn't be able to confidently connect his different fingerprints to the same individual, just that he is one of a group of individuals who fake their fingerprints.


It would depend on what your existing fingerprint is. If you're using some sort of rare browser/OS/hardware combination (eg. pale moon/gentoo linux/IBM thinkpad) it might be worth spoofing, but if your configuration is relatively "normie" (eg. firefox/windows/relatively recent intel or amd cpu/igpu)you're probably making yourself stick out more by faking your fingerprint.


The issue is that, especially on desktop, I doubt there are many fingerprints that more than 100 people have, given everything that they test. I would even suspect that most common desktop fingerprints are classified as bots.


> If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time

Technically for fonts, there’s no API for listing installed fonts, so trackers have to check each font by name. Likely they won’t be checking super obscure font names.

That method might help for other signals though.


It's likely that yes, you will end up with an alias that links you because of a cookie somewhere, or a finger print of the elliptic curve when do do a SSL handshake, or any number of other ways.

The ironic thing is that because of GDPR and CCPA, ad tech companies got really good at "anonymizing" your data. So even if you were to somehow not have an alias linking your various anonymous profiles, you will still end up quickly bucketed into a persona (and multiple audiences) that resemble you quite well. And it's not multiple days of data we're talking about (although it could be), it's minutes and in the case of contextual multi-armed bandits, your persona is likely updates "within" a single page load and you are targeted in ~5ms within the request/response lifecycle of that page load.

The good news is that most data platforms don't keep data around for more than 90 days because then they are automatically compliant with "right to be forgotten" without having to service requests for removal of personal data.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: