Step Away from Stack Overflow

hooby · on Dec 23, 2021

I kinda disagree with the implied notion that there's only these two options mentioned in the article: A) blindly copy paste whatever code shows up on stack overflow without even trying to understand it - or - B) don't ever look up stack overflow at all.

I'll keep doing what I've always done:

Look things up on stack overflow (and other sites), compare different answers and check the discussions as well, figure out WHY people are doing what they do - and then do my own experiments based on that.

For me personally, this leads to faster learning results than experimenting completely on my own, without any guidance.

lostcolony · on Dec 23, 2021

You don't even need to compare answers, per se. I use SO as a way to know what documentation to check. That is, if I want to X, the official documentation is oftentimes extremely difficult to parse through; to do X you have to instantiate a Foo, with an instance of Bar, etc etc etc, and all you have are API level docs. They're usually written the reverse of what I need to solve the problem (given function X -> understand what it does, rather than given what I want to do -> use function X; this gets even worse when you get higher level concepts, where what I want to do -> implies doing 4 different things in this API -> 4 different functions/data types that need to be strung together).

SO and its ilk give me an example that I can start to cross reference with the API level docs, to figure out how to pull it all together, and also where to look to find the "gotchas", tweaks, and options.

dfinninger · on Dec 24, 2021

Definitely. Although one of the reasons I do check other answers is that (at least for Java) the accepted answer is oftentimes Java 8 and then one of the other answers has a Java 11+ version that is half the LoC.

remram · on Dec 23, 2021

I go one further. If I do use a piece of code from Stack overflow (usually a workaround for some API pain point or specific technical gimmick), I linked to the answer from a comment. I then built a tool to notify me if that answer gets new comments or new answers are added. [1] Similar to how I check for new versions of my npm/pip/gem/cargo dependencies, I check for new info on my Stack overflow dependencies.

[1]: https://github.com/remram44/depoverflow

Larrikin · on Dec 24, 2021

I link to SO in a comment if there's some bug I'm working around or similar with an explanation about why a strange code snippet is doing something that looks strange. There is usually a TODO so that the code can be removed when the bug is fixed.

remram · on Dec 24, 2021

That's good too, I just find personally that in larger codebases, those TODOs tend to stay forever.

georgemcbay · on Dec 23, 2021

99% of the value of StackOverflow for me as an older developer is just finding workarounds to bugs in platforms/APIs/tools that people have run into before me.

Example:

Compiler throws error 0x19822943 on what looks like valid code. Googling the error code along with the compiler name almost invariably returns stackoverflow results near the top, so I look at the post(s) with answers and see "Oh yeah this is a known issue, upgrade to version 1.29 and it goes away".

For all the issues I have with the site it is still pretty great for these sorts of problems that should be covered in the support documentation for whatever it is that failed (but often isn't -- or isn't in a way that google is likely to find it easily).

dfxm12 · on Dec 23, 2021

Yeah, this quote:

trying to find some kind of shortcut to just get my code working by copy pasta-ing random bits of spagetti and crossing my fingers. Stop. There’s a better way.

is about as honest as the Juice Loosener pitch from the Simpsons [0]. It's impossible to be able to just copy and paste something you find online and have it work, especially while ignoring all context and learning nothing.

0 - https://www.youtube.com/watch?v=kDiSWKaS3N0

jonnycomputer · on Dec 23, 2021

always read the discussions; and the different answers; and any answers linked to too. and check the date too.

rbanffy · on Dec 23, 2021

> and check the date too.

This, a billion times.

I wonder if there is a way to voluntarily downrank pages in search engines so that answers that are obsolete do not show up first in results.

If I'm having issues with Ubuntu 22.04, the solution to a similar problem in Ubuntu 10.04 is very much likely to be wrong.

notreallyserio · on Dec 23, 2021

I would be happy enough if Stackoverflow put as much emphasis on citing version numbers as it does the less critical x-y problem. At least the former benefits future readers.

jonnycomputer · on Dec 23, 2021

Yeah, version info is more important than date, really.

johndubchak · on Dec 23, 2021

>> I kinda disagree with the implied notion that there's only these two options mentioned in the article

I took that away from the article as well, although I doubt that is the author's original intent.

One thing I like to do, if I have the extra time, is pull out a couple of different algorithm's books (Cormen always being one) and see if I can find a new way of viewing a problem in light of an "algorithmic approach."

Admittedly the research takes time and is always a lot of fun as I explore many other "rabbit holes" but it leads to other insights as well. The thing I will try to do is capture a few bullet points that I may have gleaned during this exercise. The approach is not conducive to goal, or time, sensitive deadlines, however.

stinos · on Dec 23, 2021

This. Even if the solution you come up with in the end is almost literally copied from SO, you'll still have learned something and saw different ways of doing things, which is often more valueable than just writing up a naive but working piec of code. Nothing wrong with that, but especially when still learning a language it's more interesting to figure out the typical way of doing things because you'll come across them again and again. Hence also helps reading other code which uses these constructs. And once you know them they're often faster to write and easier to read/grasp.

gopher_space · on Dec 23, 2021

The article seems geared towards someone working from a book or guide at their leisure. Building things up from first principles is a great way to learn but I personally never have time for the approach on the job.

DemocracyFTW · on Dec 24, 2021

I think what the article does get right is a fundamental technique to solve a problem when you're stuck: reduce to the essentials, get your terms / naming things right, write a teeny self-contained thing (I call it 'building a model') to try it out. When the solution is understood and has proven to work, integrate. Especially when you're hard pressed for time, this may prove faster overall than the approach where you fumble with many screws.

cassandratt · on Dec 23, 2021

I have a rule about ZERO copy-pasting from SO or anywhere else on the internet for that matter. If you don't understand the code, when you have all the time in the world to understand it, seek help, test it, and figure it out; you certain won't suddenly understand it better, when it breaks in production. Just because 100 idiots upvoted something, doesn't make it right.

Personally, I avoid SO because I've seen too many wrong answers. I prefer documentation, or blogs explaining approaches. But, it can be a resource, especially to new developers, to help you understand how you need to think and how to approach problems. As a replacement for coding it yourself, it's might be hot garbage, so you should understand it before committing to having produced that code and assign your name to it.

vanusa · on Dec 23, 2021

I prefer documentation,

I do too, in theory. In reality though, 90 percent of the time it's essentially unnavigable / unsearchable / basically intractable to use.

What you want is simply something that tells you "how do I do X" where X is some real-world thing, like strip the end-of-line characters from a file. What you get from "documentation" as such is a huge sprawl of text explaining the guys of 90 different command line options. Good luck digging your way through that.

Imperfect though it is, SO was created to address precisely this huge, gaping disparity between what users want, and what most "documentation" actually delivers. Did I mention it's imperfect? And that you actually have to (shudder) think about the examples given, before randomly cut-and-pasting them into production?

The whole point is though, that it's better than nothing, and frequently is mostly correct (and it's not that hard to tell when the answer is wrong or requires a bit of fine-tuning for your use case). And at the end of the day, still saves me hundreds of hours compared to the nearly useless "documentation" that ships with most running languages and platforms these days.

cassandratt · on Dec 27, 2021

I prefer to hire programmers.

marginalia_nu · on Dec 23, 2021

Some 15 years ago when I was an undergraduate, I took a class in Linear Algebra. A month or whatever in we got to the chapter of transformation matrices: Rotation and translation of 3D points. I immediately saw the utility: I could make 3D graphics with these tools! I pretty much ran home and started building a first person spaceship game, pouring over the book trying to work out how to make all these bits fit together. The graphics were rudimentary, I used SDL and most stars were just white squares (SDL_FillRect), but eventually I got to the point where I could travel in "space" and have the stars zoom by.

Turns out, software matrix multiplication is not the best way of doing 3D graphics, but it's a great way of learning how the mathematics of it works. In the mean time of doing all this I had gotten linear algebra. LA was one of the first "hard" classes we took, and a lot of people barely passed. I got top grades in that class.

The moral of the story is that if you want to learn how something works, then this (what OP is doing), is correct. This is a great way to learn. There probably is a faster or better way of doing it, but in terms of learning-efficiency, this is great.

lloeki · on Dec 23, 2021

> The separation of talent and skill is one of the greatest misunderstood concepts for people who are trying to excel, who have dreams, who want to do things. Talent you have naturally. Skill is only developed by hours and hours and hours of beating on your craft.

- Will Smith

This kind of exploration, banging your head against the wall, is exactly "beating on your craft".

No matter if the result is crappy by any metric, trying to wrap one's head around things and producing output - any output - over and over is oh so important.

And dare I say talent without practice is useless and a waste, because one's going to stay in the comfort zone, and usually get cocky about it, whereas relentless practice teaches humility.

s/art/literally anything/g in this comic:

https://www.deviantart.com/scotchi/art/keep-tryin-690533685

marginalia_nu · on Dec 23, 2021

There is also an epistemological aspect to a lot of this. In school you are often taught true statements ("facts"), but that is not the same as knowledge. If I accurately predict that the next time I flip a coin, it's going to land heads; I did not know it would. I made a correct correct statement, but I did not know.

Knowledge is about understanding how and why things work. That is what I was attempting to figure out about these linear algebra-operations. My classmates were attempting to retain the facts for the test, that typically doesn't work particularly well, and long term retention is poor. I've done that too. I could barely tell you what it is I'm supposed to have learned in those classes. But I could still probably construct 3D rotation matrices by hand without looking them up, all these years later.

lostcolony · on Dec 23, 2021

You also had a goal beyond "pass a test", which I think is more important still. After all, Geometry isn't generally something taught or tested as merely a collection of facts (but rather their application, a la "given this, prove that", and you have to intuit a path through applying other proofs to get to the final one), and yet it still sucks as a subject in school. Because there isn't a real "why".

novok · on Dec 23, 2021

A per-cursor form of talent is burning passion, focus and interest, which leads to better & more practice. The person who is willing to push hard into their field is probably %80 of what people call talent. And then when they encounter the obsessed after the fact, they call it talent because they're effectively the same outliers.

Beyond that, what you describe as talent should be called built in ability, much like IQ or similar, because current usage of talent is too muddled with the well practiced.

pjc50 · on Dec 23, 2021

Twenty-five years ago, slightly before I was an undergraduate, I got hold of the PC Games Programmers Encyclopedia http://bespin.org/~qz/pc-gpe/ and built myself a software renderer. You can see it on github: https://github.com/pjc50/ancient-3d-for-turboc

Software matrix multiplication is a perfectly reasonable way of doing 3D graphics, when you don't have a GPU. They were just starting to become a consumer product at that time: https://fabiensanglard.net/3dfx_sst1/

avgcorrection · on Dec 23, 2021

Moral of the story is that you were intrinsically motivated to learn… in which case it doesn’t matter how you learn it as long as you follow that motivation.

criddell · on Dec 23, 2021

I had a similar story. I barely got through linear algebra in university. The professor never suggested that linear algebra was useful for anything (except passing the course). Apparently the goal was to manipulate symbols. Applications were something for technicians, not academics. Everything was so abstract and seemed arbitrary. Theorems were developed and proofs were presented without ever providing any sense of why you would want to do whatever it was the theorem did. I had absolutely no intuition about anything related to linear algebra, I only had memorization to go on.

Then a decade or so later I encountered matrices in computer graphics and suddenly I found that a lot of that stuff was actually useful and piece by piece it started to make sense to me.

The way that course was taught didn't work at all for me. I wasn't looking for motivation in the sense that I want to use it, but more about what was the motivation for developing any of it in the first place.

marginalia_nu · on Dec 23, 2021

Is there any other sort of motivation?

tommek4077 · on Dec 23, 2021

Extrinsic, from the outside. For example being paid for stuff.

marginalia_nu · on Dec 23, 2021

Right, seems to me that's still intrinsic. The motivation somehow entering me from the outside, I still want to do the thing. (If I don't want to do it, then I'm by definition not motivated to do it)

avgcorrection · on Dec 23, 2021

If someone was threatening to kill you you would be motivated to defend yourself, even going as far as hurting them… wouldn’t mean that you would want or desire to hurt them.

ghaff · on Dec 23, 2021

You can argue with the (somewhat squishy) motivational theory but money is pretty much the classic extrinsic motivator in the literature. There's also an idea of internalized extrinsic motivation however like peer recognition. (Basically external motivations without an explicit carrot/stick.)

The line is somewhat blurry but there's probably a difference between playing a game and doing a job you don't really like because you're paid to do it.

avgcorrection · on Dec 23, 2021

Nothing squishy about it. The more indirection the more extrinsic it is.

ghaff · on Dec 23, 2021

Squishy in the sense that many things are neither purely external or purely internal.

tonyedgecombe · on Dec 23, 2021

There has been plenty of research that shows a distinction. Stuff like paying kids to read books leads to them reading less over time.

https://rady.ucsd.edu/faculty/directory/gneezy/pub/docs/jep_...

marginalia_nu · on Dec 23, 2021

Behavioral economics is debunked pseudoscience.

ghaff · on Dec 23, 2021

Incentives are more in the vein of psychology than behavioral economics but the two fields are clearly adjacent. And this "debunked pseudoscience" led to a Nobel prize a few years ago. And many of the observations that led to this--relating to people not always being economically "rational"--are very clearly true.

marginalia_nu · on Dec 23, 2021

I don't think anyone other than an economist would assume that people were economically rational. Is it really worth a prize to demonstrate a fundamental hypothesis underpinning your field of science is wrong?

ghaff · on Dec 23, 2021

It was apparently worth a prize to explain more rigorously why behaviors often deviate from an underlying assumption of maximizing expected value. And, yes, some of the economics community was quite resistant to the idea that the answer was anything more than people are stupid. (That may be a bit of an exaggeration but there was quite a bit of resistance in economics to behavioral economics being a field of serious inquiry.)

gfxgirl · on Dec 23, 2021

> Turns out, software matrix multiplication is not the best way of doing 3D graphics

ok, what is? all the 3d engines I've ever worked with do tons of software matrix multiplication. You generally need to know where things are for collision checking and other things so doing it all on the gpu is not an option

marginalia_nu · on Dec 23, 2021

> You generally need to know where things are for collision checking and other things so doing it all on the gpu is not an option

Sure. You can do some of it in software. That's fine.

Point is I didn't use the GPU at all, and did all the rendering in software. No matter how you rotate or translate it, that's not a particularly efficient way of doing it.

marcosdumay · on Dec 23, 2021

Anyway, the GPU usually does it with matrix multiplications.

Simplicitas · on Dec 23, 2021

So what are some of the more efficient ways of doing it, if it's not too much to ask? :-)

marginalia_nu · on Dec 23, 2021

To have the GPU do the work for you.

A series of calls to OpenGL is a heck of a lot easier than setting up 3x3 rotation and translation-matrices and implementing a library to do 3x1, 1x3, 3x3 matrix multiplications, scalar transformations, transpositions, calculating determinants, planar projections, perspective shifts, and what have you.

nicwolff · on Dec 23, 2021

Quaternions?

https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotati...

https://stackoverflow.com/questions/8919086/why-are-quaterni...

felix_n · on Dec 23, 2021

Not to be mean, but TBH I rolled my eyes a bit because awkwardly dancing with the interpreter until something appears to work is kind of exactly like throwing spaghetti at the wall.

And as others have pointed out, a quick jaunt to Stack Overflow might have revealed to OP that leaning on the DB is what any seasoned engineer would do, which is probably a better engineering lesson than how to implement a reduce by hand.

AFAIK Stack Overflow's mission statement isn't "Be Rip-Offable for Devs Everywhere." Obviously I know about "copy-paste engineers" but that's a character trait, not the result of their tools. I've always treated SO more like a knowledgeable coworker to help when I get stuck.

masklinn · on Dec 23, 2021

> I've always treated SO more like a knowledgeable coworker to help when I get stuck.

Not too knowledgeable though, in my experience if you are the knowledgeable colleague in a domain, SO will not help you unless you're in one of the domains with a legendary SO contributor (e.g. C# because of Jon Skeet if you manage to get their attention), and unlike a knowledgeable colleague it won't assist you in looking for a clarification or solution either.

tonyedgecombe · on Dec 23, 2021

Even in C# there are plenty of dodgy answers and stuff that is plain wrong yet accepted and upvoted.

usrbinbash · on Dec 23, 2021

The difference:

SO provides solutions to problems.

Experimenting, playing around and "throwing spaghetti at the wall" can lead to an understanding of the problem.

SO will tell me to use DISTINCT, and that is correct.

But why? Why is this the better? Is it really? Why is the home-brewed reduce slower? How could it be improved? Oh hey, but if I did this to the data upstream I could prevent non.distinct records from being in the DB in the first place, etc. etc.

Software Engineering is, among many other things, a craft. Just like carpenting or stone carving or smithing, it must be honed, and mastery can only be achieved if one plays, trys and experiments.

----

Of course, reading about the solutions to a problem can also help in its understanding. For me personally, if I do not understand a problem, I like to do both: Experiment, and read what the "canonical" way to solve it is.

DangitBobby · on Dec 23, 2021

In my experience, really good SO answers will provide both solutions and understanding. Answer guidelines do ask you to explain your answers.

phkahler · on Dec 23, 2021

>> awkwardly dancing with the interpreter until something appears to work is kind of exactly like throwing spaghetti at the wall.

But awkwardly dancing with the interpreter until something clicks in your head is highly valuable learning. In this case I kept waiting for the author to have a revelation or two - one about tuples, and one about proper use of dictionaries - but that never happened. I think it will in part 2, and then idiomatic python will fall out with the authors complete understanding of new concepts.

eska · on Dec 23, 2021

Veteran advice: don’t try to look smart. Try to solve the problem.

danesparza · on Dec 23, 2021

This is also the best way to navigate any meeting.

MadsRC · on Dec 23, 2021

This guy speaks the truth…

commandlinefan · on Dec 23, 2021

> don’t try to look smart. Try to solve the problem.

Then see if there's a better way to do it that looks smart.

Pinus · on Dec 23, 2021

It's probably possible to do this in a one-liner using half of the stuff in itertools, but just changing the output data structure to a dict (at least temporarily; one can always make a list from it at the end) would yield code that is short, understandable and gets the job done without linear searches.

Essentially, I'm agreeing with eska's comment, with the addition: "...but consider your data structures!"

groupby looks neat, but it requires sorted input, so it's very easy to convert an O(n) problem to an elegant but O(n log n) program.

EDIT: one could also do it with a one-liner in pandas, which means that as long as the input is less than ten million lines or so, the "import pandas as pd" statement is going to take longer than the actual program...

rstuart4133 · on Dec 23, 2021

Perhaps not one line, maybe not perfect Python but definitely O(n):

    groupby_dict = collections.defaultdict(int)
    for l in mylist:
        groupby_dict[l['thing']] += l['count']
    newlist = []
    for thing, count in groupby_dict.items():
        newlist.append(dict(thing=thing, count=count))

Knowing how do that does require the deep knowledge of data structures I'd expect every professional programmer to have, and doing a quick scan of the languages standard library (maybe an hours work) to see what it offers. After that you're set for solving not only this problem, but almost guaranteed to solve all problems you are likely to hit pretty optimally in Python. I have no idea where Stack Overflow fits into the picture.

If you go a step further and do the Python standard tutorial found in the docs, you would discover how a finger weary experienced Python programmer might write those last three lines:

    newlist = [dict(thing=thing, count=count) for thing, count in groupby_dict.items()]

Comprehensions are sweet Python syntactic sugar, but unlike mapping your existing encyclopaedic knowledge of data structures onto the languages standard library they aren't necessary for a casual user of the language. Indeed, some Python'istas will tell you resisting such delights make for clearer code.

But trusting a quick Stack Overflow to tell you the optimal way to use a languages data structures - you must be kidding me.

qsort · on Dec 23, 2021

Doesn't detract from any of your points, just as a FYI: Python has a counter data structure in the standard library. The "pythonic" way would be something like this:

    groupby_dict = Counter()
    for r in response:
        groupby_dict.update({r['thing']: r['count']})

Counter is a subclass of dict, no further conversion needed.

ec109685 · on Dec 23, 2021

Deep knowledge is overstating it I think. Knowing a few data structures and their Big O notation for various operations goes a long way. You don’t need to exactly know how it is implemented under the covers. SQL is similar that once you grasp what data structure it uses to store your data, you can intuit what should be fast and what is slow.

comeonseriously · on Dec 23, 2021

>Yup. This looks awful — and is error prone and slow and probably wins some kind of award for being unPythonic — but it works. And I wrote it and I can read it. We can improve tomorrow.

That's the key takeaway. Write it once to get it working and understand how to solve the problem, then write it better.

"Hence plan to throw one away; you will, anyhow." - Fred Brooks

dqpb · on Dec 23, 2021

This is great. It’s rare to find a well organized brain dump of a beginner struggling with a simple problem.

There is very little record in general of the intermediate states people pass through on their trajectory from novice to expert. I suspect one could build an entirely new framework for education around such data.

oblib · on Dec 23, 2021

>>After some time with no progress in figuring out what sticks, it’s time to step away from that approach.

That's exactly what motivated me to make a habit of going to SO first. But I don't just go there looking to copy and paste. I go there to learn and always review all the solutions given and the history and comments on them.

It was a struggle though because I was ornery and had made a habit of trying to figure everything out on my own. I generally know the logic required but struggled with the syntax to implement it.

Since I made that a habit I have learned so much from contributors there and made huge gains in productivity.

They don't really give a good or direct reason to "step away" from SO here though. It's really more about crafting code you can read and understand and that's good advice.

runekaagaard · on Dec 25, 2021

This does the trick in a (for me) not confusing way:

    from collections import Counter

    response = [
        {
            "thing": "A",
            "count": 4,
        },
        {
            "thing": "B",
            "count": 2,
        },
        {
            "thing": "A",
            "count": 6,
        },
    ]

    counter = Counter()
    for x in response:
        counter[x["thing"]] += x["count"]

    newlist = [{"thing": k, "count": v} for k, v in
               counter.items()]
    print(newlist)

commandlinefan · on Dec 23, 2021

Sometimes I feel like I was fortunate in a way to have learned programming before the internet - the only way to accomplish something back then was to actually take time to understand the fundamentals of what you were working with.

da39a3ee · on Dec 23, 2021

Eee, back when I were a lad…

(This has to be said in a Yorkshire accent)

fellellor · on Dec 23, 2021

The most frequent reason I turn to stack overflow, is for solutions to things I just don’t care about right now. Like, for example, how do I get bootstrap to work in webpacker for rail, etc. I don’t really care about this, I’d rather just be writing ruby code. And even in these cases, SO isn’t that helpful half the time because either the answer is 3 years old and outdated or there used to be a relevant answer but it has been deleted for some violation.

yawnxyz · on Dec 23, 2021

Sometimes I just want to know what the syntax is... sometimes documentation can be VERY dense, but someone on SO has posed a question or answer that can show me how the syntax works for that one specific thing I'm looking for.

Particularly useful for frameworks like Svelte or React where I'm like... I have no idea how that's written.

goto11 · on Dec 23, 2021

This approach is great if you are practicing writing algorithms on your own time.

If I was writing software professionally, I would think "this sounds like 'group by' in SQL, wonder what the equivalent is in Python", and then I would google for the the most common/idiomatic approach.

oli5679 · on Dec 23, 2021

Here is a 1-liner using pandas.

I would be interested if it's possible to do it in one-line using something from itertools.

    import pandas as pd
    [{'thing':x, 'count':y} for x,y in pd.DataFrame(mylist).groupby('thing' ['count'].sum().to_dict().items()]

psyklic · on Dec 24, 2021

Here is a one-liner standard library version:

  from itertools import groupby
  keyfunc = lambda x: x['thing']

  [{'thing': name, 'count': sum(t['count'] for t in things)} 
      for name,things in groupby(sorted(mylist, key=keyfunc), keyfunc)]

oli5679 · on Dec 24, 2021

Thanks this is great

psyklic · on Dec 24, 2021

Here is a more concise Pandas version (though Pandas may be overkill for this task):

  import pandas as pd

  pd.DataFrame(mylist).groupby('thing', as_index=False).sum().to_dict('records')

oli5679 · on Dec 24, 2021

Thanks this is great

grandpoobah · on Dec 24, 2021

The specific example in this article reminds me why I like C# so much.

    var newlist = mylist.GroupBy(i => i.thing).Select(g => new {
        thing = g.Key,
        count = g.Sum(i => i.count),
    }).ToList();

mickotron · on Dec 23, 2021

I always look for a very fundamental way of doing x. But SO questions and replies almost always are about a very very specific use case. It takes a lot of careful searching to find the one line of code I needed for my use case.

Kalanos · on Dec 23, 2021

`pd.DataFrame.from_records(mylist).groupby(['thing']).sum().to_records()`

you can learn a thing or two. people are genuinely here to teach. let's build a new SO because it's owned by china now.

inkeddeveloper · on Dec 23, 2021

Use stack overflow like you use Wikipedia. It’s good to check out but you should really be using it for the references. I find the comments tend to point to documentation quite well.

jve · on Dec 23, 2021

I mean, looking at it and knowing python is strong at lists and dictionaries… there sure is a better way which I would search for.

So, pythonists, what’s the elegant answer?

detaro · on Dec 23, 2021

Agreed with marginalia_ru, if you can get the DB to do it, that's probably the place. if not, I'd go with

    d = collections.defaultdict(int)
    for item in mylist:
        d[item["thing"]] += item["count"]
    newlist = [{"thing": key, "count": value} for key, value in d.items()]

(I'm assuming the order doesn't matter, and the specific output format is needed. E.g. if the next step then iterates over that list and unpacks it, obviously get rid of the last line and use the dict)

There is the collections.Counter type, but I think there's not really a beneficial way of using that here.

Generally, the "there is one elegant solution" aspect of Python is widely overstated :D

YurgenJurgensen · on Dec 23, 2021

I think this defaultdict solution is the best, or at least most pythonic.

I really thought if this was possible to do in a single dictionary comprehension, with some kind of d = {k: sum(v) for k, v in <something>}, but whatever goes in <something> ends up being super ugly.

runekaagaard · on Dec 25, 2021

What you do mean ugly :)

[{"thing": k, "count": v} for k, v in [[c, c.update({x["thing"]: x["count"]})] for c in [Counter()] for x in response][0][0].items()]

Denvercoder9 · on Dec 23, 2021

Note that this approach doesn't work as nicely if you need to store additional information alongside the name and count.

qsort · on Dec 23, 2021

If you need to store other info, it doesn't make sense to aggregate. Think of it in SQL:

    SELECT a, b, SUM(c) FROM table GROUP BY a

doesn't even compile. You either project away `b` or group over both `a` and `b`.

Denvercoder9 · on Dec 23, 2021

That's true, but while in SQL you can easily group by `a` and `b`, here that's difficult (you'll need to key the dict by a tuple of the values, and reassemble the dictionary afterwards, or something like that).

qsort · on Dec 23, 2021

Yeah, fair enough. You'd have the same problem with any approach, though. In a sense, that's easy in SQL only because the hash-magic is abstracted away.

Denvercoder9 · on Dec 23, 2021

Using `itertools.groupby` will probably give the cleanest solution. Something like this (untested):

    groups = itertools.groupby(sorted(mylist, key=lambda x: x['thing']), key=lambda x: x['thing'])
    newlist = [{**group[0], 'count': sum(item['count'] for item in group)} for group in groups]

I'm not overly fond of it, having to sort the list for `groupby` is unpleasant and extracting values from dictonaries is verbose. If this was an array of tuples it could be made much more concise, but of course that doesn't allow storing extra information for each thing, which this solution does.

eesmith · on Dec 23, 2021

Don't know about elegant. The following doesn't have the O(N^2) performance:

    def merge_counts(mylist):
        thing_lookup = {}  # map thing -> item 
        newlist = []
        for item in mylist:
            thing = item["thing"]
            if thing in thing_lookup:
                # seen before; update count.
                thing_lookup[thing]["count"] += item["count"]
            else:
                # first time I've seen this thing
                newlist.append(item)
                thing_lookup[thing] = item
        return newlist

It, like the original, mutates the original dictionary. I would prefer:

                # first time I've seen this thing
                thing = thing.copy() # don't mutate the original
                newlist.append(item)

mcv · on Dec 23, 2021

I'm not a pythonista, but I'd turn it into a dictionary where the value of `thing` becomes the key, and in the value we sum all those counts. Then turn it back into an array again. That's O(n).

In javascript:

  const mylist = [
    {'thing': 'A', 'count': 4},
    {'thing': 'B', 'count': 2},
    {'thing': 'A', 'count': 6}];

  const dict = mylist.reduce(({thing, count}, dict) => ({
    [thing] : dict[thing] ? dict[thing]+count : count
  }, {});

  const newList = Object.keys(dict).map(thing => { thing, count: dict[thing] });

(Not tested or anything, so no guarantee that this works right off the bat, but something like this should work.)

thr0waway164021 · on Dec 23, 2021

Same algorithm in Python

    from functools import reduce
    result = [
        {'thing': k, 'count': v}
        for k, v in reduce(
            lambda result, item: {
                **result,
                item['thing']: item['count'] + result.get(item['thing'], 0)
            },
            mylist,
            {}
        ).items()
    ]

rbanffy · on Dec 23, 2021

On a code review, I'd say I'm impressed, but still demand it written in a way that doesn't require the mindbending this one asks for. We should write code that’s as naïve as possible.

DangitBobby · on Dec 23, 2021

Something a bit more beginner friendly:

    agg = {}
    for item in mylist:
      if item['thing'] in agg:
        agg[item['thing']] += item['count']
      else:
        agg[item['thing']] = item['count']
    
    result_list = []
    for key, value in agg.items():
      result_list.append(dict(thing=key, count=value))

In beginner land, reduce makes your eyes glaze over and dict.get(key, default) is understandable in principle but still confusing in practice.

Izkata · on Dec 23, 2021

After almost a decade doing python, I'd probably use a plain loop like this but with ".get(key, default)"

mcv · on Dec 23, 2021

With the recent changes in Javascript, I've developed an aversion to old fashioned for-loops. I prefer to .map.reduce.filter and if absolutely necessary, .forEach my way through the problem.

marginalia_nu · on Dec 23, 2021

I'd do GROUP BY in the query to SQLite.

deepstack · on Dec 23, 2021

many of these which can be extremely difficult in imperative, can be very easy in declarative.

marginalia_nu · on Dec 23, 2021

SQL was built to do this type of operation. It's generally a better pattern to fetch the data you need, rather than fetch all the data and then to transform it into what you need in the code.

alexchantavy · on Dec 23, 2021

I'm actually working on a similar problem right now so I appreciate your comments. What if the data is already in the Python process so I don't have the ability of doing a `group by` at the database level? Should I perform this functionality in the Python code itself, or should I write to another database table and do the `group by` there?

Edit: upon further reflection I think my application is at the tipping point where it needs another database table.

marginalia_nu · on Dec 25, 2021

Depending on your database, perhaps a materialized view is in order?

deepstack · on Dec 23, 2021

and Prologo will be even better and cleaner syntax.

charlieyu1 · on Dec 23, 2021

I think that is something, maybe collections.Counter, that provides this functionality out of the box.

I can’t really remember it, so I’d probably go with defaultdict

pjc50 · on Dec 23, 2021

From a C#'er: do you people not have LINQ?

reactspa · on Dec 23, 2021

Can someone please clarify why Python decided to go with unconventional names and symbols?

E.g., every other language calls them "arrays", and uses `[ ]`, but Python calls (roughly) them "dicts" and uses `{ }`.

Every other language calls them "objects", and uses `{ }`, but Python calls (roughly) them "lists" and uses `[ ]`.

larrik · on Dec 23, 2021

I'm not sure what you mean.

An Array is a contiguous section of memory with like items. Those like items _could be_ object pointers. The closest thing in Python to this is a tuple.

A List is a collection of things with an order, often a linked list or doubly linked list, but could actually be an array as well. Sometimes "array" and "list" are interchangeable.

A Dictionary (aka HashMap) is a key/value store. They are denoted in Python with a bounding {} (similar to like JSON objects). Unlike JS or JSON, the keys can be anything hashable, not just strings or ints.

I sort of get the impression that "every other language" means javascript, where we sort of abuse the syntax and treat objects as hashmaps (they aren't really) or arrays (also aren't really). Yet, Python maps quite well against JSON objects and lists.

Terminology definitely can be different from language to language, but I find Python to be pretty close to the C class of languages for that.

aliswe · on Dec 23, 2021

i thought this was going to be yet another SO bashing post - I'm disappointed.

unbanned · on Dec 23, 2021

groupby

Rainymood · on Dec 23, 2021

This solution is really bad, read it in as a dataframe and use a groupby...

halfdan · on Dec 23, 2021

That's even worse. You don't need to throw in a massive library like pandas, especially not when OP is build a basic API. Sure, if you're in a data science-y project where pandas is already a dependency, go nuts.

rbanffy · on Dec 23, 2021

It really depends on the size of the list you want to process. If it's 10 items, pandas is overkill (and probably slower). If it's a million items, pandas is a great solution.

I have a nagging feeling there is an easier way to do this, but my quick and dirty solution was

    def merge_list1(l):
    other_dict = defaultdict(lambda: 0)
        for t, c in ((i['thing'], i['count']) for i in l):
            other_dict[t] += c
        return ({'thing': k, 'count': other_dict[k]} for k in other_dict)

which is still readable, but probably far from optimal.

Izkata · on Dec 23, 2021

> If it's a million items, pandas is a great solution.

Possibly not even then, it depends on how much you're doing and I feel like the topic at hand might be around that tipping point. We have some rather slow code that, profiling it, turned out to spend something like 60-70% of its time just converting between python types and native types when moving data in and out of the dataframe.

rbanffy · on Dec 24, 2021

True. If there are millions of different “things” conversion times will end up dominating. If they are just a handful, then the libraries will be able to do a lot more work with parallel operations and converting the output will be very quick