More

AlanSE · on Jan 15, 2024

Site won't connect with https, this works for me.

http://www.econ.yale.edu/smith/econ116a/keynes1.pdf

AlanSE · on Aug 23, 2023

Can someone explain to me where the feeling_lucky variables comes from in the match example? I don't get it. The variable isn't defined anywhere in the relevant scope.

steveklabnik · on Aug 23, 2023

It's not a full example, that code would not compile because that variable is not defined.

AlanSE · on Aug 10, 2023

> For the same reason, the enthusiasm for the idea of installing solar plants in north africa and exporting power over interconnectors to Europe is gone. It's cheaper to install twice as many panels in a field in cloudy northern Europe (where capacity factors are half of what they would be in the Sahara) than deal with the costs and complexity of underwater interconnectors and transmission system upgrades.

I don't know about that, because long-range transmission significantly reduces the intermittency, even for the same longitude. It's not just about smoothing out the curve of solar production, demand patterns (due to weather, industry, behavior) will also have variations from place-to-place and these allow opportunities for power export.

Some places will have more dispatchable and expensive power plants than others, which also creates an export opportunity so that power is exported so that one country can burn less fuel. However, I admit this argument is slightly undercut by the question "can't we just build solar panels in the country with dispatchable dirty power?" True, but there's also a connection to wind power which is super local and random.

derriz · on Aug 10, 2023

In general, yes, more interconnection and better transmission can only be good.

But in the specific example I gave, the finances just don't work. Underwater HDVC is just too expensive and panels are too cheap - even with the value of the uncorrelated intermittency.

And even if the numbers could be made work financially, I can't see European countries lining up to become dependent on fixed infrastructure in politically unaligned and/or unstable countries like Algeria or Libya - especially given recent experience with Russia. Securitywise, Nord Stream has shown that underwater infrastructure is vulnerable to attack and difficult to protect/guard.

coryrc · on Aug 11, 2023

Winter solar in Northern Europe, when it produces 10-15% of summer, is not displacing the winter heating load supplied by fossil fuels. North African solar could do that.

trompetenaccoun · on Aug 11, 2023

Keep in mind the political instability in NA and geopolitical risks.

AlanSE · on Aug 1, 2023

It's still not believed. It's just Twitter that I've seen the juicy stuff, like a (very sketchy) claimed replication from a Russian, and now another claim from a Chinese group.

I could still see this being something "new" but not a true superconductor. If you read the link, there seems to be some kind of discovery brewing, but the original discovers may not have understood what they found.

throwaway23354 · on Aug 1, 2023

While you're correct about the scarce and sketchy evidence currently it is in the process of narrowing down. Or to put it better the evidence is mounting that it might be a breakthrough. It likely hard to manufacture correctly, but simulations done at Berkeley Labs seem to support the claims of the original paper https://arxiv.org/abs/2307.16892 And aside from that to be honest I hope that we have time to prove or disprove those claims, before any major news outlet jumps onto the hype train and ruins it.

Q6T46nT668w6i3m · on Aug 1, 2023

The paper you referenced doesn’t say anything about the room-temperature superconductivity claims.

lolinder · on Aug 1, 2023

Why a throwaway account for this? It honestly makes me immediately question your take.

AlanSE · on Aug 1, 2023

No no no, this eliminates the need for cryo, not crypto.

AlanSE · on Aug 1, 2023

Fusion was already starting to heat up in recent years. The entire SPARC reactor concept is based on (low-temp) superconductor materials breakthroughs.

If these room-temp superconductors pan out, it will be dumping gasoline on the funding fire for new fusion attempts. Given less than a year from scientific verification, fusion will go red hot.

AlanSE · on July 26, 2023

The ideal conditions to stimulate daydreaming seems pretty obvious to me - sit me down in any class lecture.

Imagine that my mind like a glider. Glider is connected to a Cessna by a tether, but it's a magical tether that can disappear. In fact, it takes me active work to keep it connected, like I'm holding on to the end of the rope with my hands. The plane takes off and I'm following in lockstep about the main subject matter... looking down at the landscape below, to the left, to the right. After a quick climb, and at velocity, I've forgotten about keeping tethered, holy crud, I'm in the air! I want to bank, dive, maybe loop! What's over here? I can catch a thermal and go on forever depending on the landscape, but still often see more appealing currents and need to switch over. Oh wait, where did the plane go?

AlanSE · on July 25, 2023

Yeah, good to bring it back to the original point. Reading the article felt exciting, but in hindsight I am now missing a key detail.

The equations all seem to be matrix operations with a fixed number of rows / columns (you can take me as a real layman here). Unless you change that, I don't understand _how_ you can reduce memory needs. Granted, I'm probably putting my foot in my mouth not understanding transformers.

zamalek · on July 25, 2023

More ELI5 than the other comments. Considering the softmax network:

During quantization we find that values in the network vary from 0->5000, but 95% of values are <100. Quantizing this to 8bits would mean that our values would be in increments of about 20. Remembering that 95% of our values are below 100, we would only have about 5 discrete values for 95% of our values - so we would be losing a lot of "resolution" (entropy/information). For example (assuming rounding is used), an original value of 19 would be quantized to 20 and 30 would be quantized to 40. The original values differ by 11, but the quantized values differ by 20!

This is where exotic encodings come into play. We might try to use a logarithmic scheme, for example. This would result in higher value densities at lower values - but we would probably still waste bits and it would require more APU cycles.

Now switch to the softmax1 network:

The range of values is less important than the distribution - instead of 95% of the values falling in a small range, we would see the values more evenly spread out. Assuming that the range is now 105 (so the 5% outlying neurons from the softmax network are still >100), we would have 243 values to represent everything under 100. The same example with 19 and 30 would result in 19.27 and 30.34 respectively, a difference of 11.07 - which is very close to the unquantized difference of 11. We have retained more information in the quantized version of the network.

Information is lost either way, but what's important is how much information is lost.

The reason that the large values appear is because the heads attempt to "scream really loud" when they are certain that they are right. This is an emergent behavior due to softmax - it ironically sucks at paying attention to a few of the heads: it boosts the volume of the heads that are trying to abstain, and mutes the volume of the heads that are trying to vote.

sampo · on July 25, 2023

> During quantization we find that values in the network vary from 0->5000, but 95% of values are <100. Quantizing this to 8bits would mean that our values would be in increments of about 20.

Instead of using an 8bit integer with even step size quantification, wouldn't they still use an 8bit float?

zamalek · on July 25, 2023

Possibly, it depends on the distribution of the vales. It would also make my examples far less straightforward :)

Either way you would still only have 256 discrete values.

TimPC · on July 25, 2023

No one quantizes blindly without accounting for data. If 95% of your values are in 0-100 you’ll probably do something like have 20 values for 0-100 and the remaining 12 for 101-5000. You don’t have to apply a uniform distribution and shouldn’t when your data is that concentrated.

zamalek · on July 25, 2023

Third paragraph.

samwillis · on July 25, 2023

If I'm following correctly, does this mean that with this change along with a model being quantized, we could see models that are 5% the size (on file system) and memory usage but almost identical in output?

zamalek · on July 25, 2023

The vales are selected were arbitrary. The size reduction will be 32bits/8bits - so it will be 4 times smaller.

jablongo · on July 25, 2023

It has to do with the precision of the values stored in those rows and columns. If they could be coerced into a narrower range (without losing information) then we could effectively store them each with 8 bits or something. The +1 prevents blowups when the denominator in its current form approaches 0, and without those blowups, then we can use less bits, in theory.

cycomanic · on July 25, 2023

That is only true if the using the new softmax changes the dynamic range of the values. We are using floating point not fixed point. So if before our values went from 1 to 5000 and now they go from 0.0002 to 1 we still have the same dynamic range and so still need the same resolution.

IanCal · on July 25, 2023

The quantized versions are not floats but ints.

uoaei · on July 25, 2023

The activations (outputs) of one layer must be encoded in the same way as the weights of that layer as well as the weights of the next layer or the computation fails (unless you manage to write clever kernels for doing math at different levels of precision simultaneously, but even then you're introducing even more lossiness than just using a binary representation for those values).

Example: multiplying a bunch of float16s together gives you a float16. That is passed on to the next layer of float16s. Why should forcing the output of the first step to be float8 confer any advantage here? The only way I can see this argument working is if you make all the layers float8 too, and the reason you can do that is that the output of the first step can be faithfully represented as float8 because it doesn't ever blow up. If that's what the author is saying, it wasn't very clear.

sudosysgen · on July 25, 2023

You can reduce the number of bits per float (scalar).

AlanSE · on July 24, 2023

As in, they use tax to avoid reporting the earnings? That works fine for the scale of a lemonade stand.

AlanSE · on July 24, 2023

Higher cash prices are a violation of the card-holder agreement.

dspillett · on July 24, 2023

kldavis4 was meaning that everyone pays the same higher price to account for the fees present in card payment processing, regardless of whether they are paying in cash, be debit card, or by credit card.

> Higher cash prices are a violation of the card-holder agreement.

I think what you are meaning is “higher card prices are a violation of the merchant agreement” – so a merchant can't offer different prices for cash payments to account for differences in processing costs or they risk losing their merchant account with the card/payment processing company.

In some places this is no longer legal¹, you'd have to check your local legislation to know what applies in that regard where you are.

Also the reverse, offering discounts for card use rather than cash², is illegal in some places because it is unfair to those without cards⁴.

--

[1] so card processing companies can't (legally) punish merchants for offering a discount for cash

[2] usually because the merchant gets a kick-back from the card payment processor, though sometimes these days it is because cash has become the minority payment method³ and vendors would rather only deal with one so want to further discourage cash

[3] it is worth noting while discussing potential costs or kick-backs for card payment processing, that for businesses the act of dealing with cash has associated admin and/or costs too

[4] which is disproportionately the disadvantaged, because they find it harder to qualify for any card, or can only qualify for one with a monthly charge that they can ill afford

schnable · on July 24, 2023

This hasn't been true since Dodd-Frank in 2010: https://en.wikipedia.org/wiki/Durbin_amendment

"included provisions which allow retailers to refuse to use credit cards for small purchases and offer incentives for using cash or another type of card."

kldavis4 · on July 24, 2023

I think you mean the merchant agreement. There are many gas stations in the area where I live that have different cash prices and still take credit cards, so I'm not sure if this is still a thing. And honestly the fact that they would have that lends support to the fact that all of these credit card "benefits" are subsidized by cash payers.

AlanSE · on July 24, 2023

Yes, I should have Googled that before using the wrong word.

I am also operating on old information, perhaps the law changed and this it varies by state now.

HWR_14 · on July 24, 2023

AFAIK, this changed in 2010 at a federal level.