I kinda disagree with the implied notion that there's only these two options mentioned in the article:
A) blindly copy paste whatever code shows up on stack overflow without even trying to understand it
- or -
B) don't ever look up stack overflow at all.
I'll keep doing what I've always done:
Look things up on stack overflow (and other sites), compare different answers and check the discussions as well, figure out WHY people are doing what they do - and then do my own experiments based on that.
For me personally, this leads to faster learning results than experimenting completely on my own, without any guidance.
You don't even need to compare answers, per se. I use SO as a way to know what documentation to check. That is, if I want to X, the official documentation is oftentimes extremely difficult to parse through; to do X you have to instantiate a Foo, with an instance of Bar, etc etc etc, and all you have are API level docs. They're usually written the reverse of what I need to solve the problem (given function X -> understand what it does, rather than given what I want to do -> use function X; this gets even worse when you get higher level concepts, where what I want to do -> implies doing 4 different things in this API -> 4 different functions/data types that need to be strung together).
SO and its ilk give me an example that I can start to cross reference with the API level docs, to figure out how to pull it all together, and also where to look to find the "gotchas", tweaks, and options.
Definitely. Although one of the reasons I do check other answers is that (at least for Java) the accepted answer is oftentimes Java 8 and then one of the other answers has a Java 11+ version that is half the LoC.
I go one further. If I do use a piece of code from Stack overflow (usually a workaround for some API pain point or specific technical gimmick), I linked to the answer from a comment. I then built a tool to notify me if that answer gets new comments or new answers are added. [1] Similar to how I check for new versions of my npm/pip/gem/cargo dependencies, I check for new info on my Stack overflow dependencies.
I link to SO in a comment if there's some bug I'm working around or similar with an explanation about why a strange code snippet is doing something that looks strange. There is usually a TODO so that the code can be removed when the bug is fixed.
99% of the value of StackOverflow for me as an older developer is just finding workarounds to bugs in platforms/APIs/tools that people have run into before me.
Example:
Compiler throws error 0x19822943 on what looks like valid code. Googling the error code along with the compiler name almost invariably returns stackoverflow results near the top, so I look at the post(s) with answers and see "Oh yeah this is a known issue, upgrade to version 1.29 and it goes away".
For all the issues I have with the site it is still pretty great for these sorts of problems that should be covered in the support documentation for whatever it is that failed (but often isn't -- or isn't in a way that google is likely to find it easily).
trying to find some kind of shortcut to just get my code working by copy pasta-ing random bits of spagetti and crossing my fingers. Stop. There’s a better way.
is about as honest as the Juice Loosener pitch from the Simpsons [0]. It's impossible to be able to just copy and paste something you find online and have it work, especially while ignoring all context and learning nothing.
I would be happy enough if Stackoverflow put as much emphasis on citing version numbers as it does the less critical x-y problem. At least the former benefits future readers.
>> I kinda disagree with the implied notion that there's only these two options mentioned in the article
I took that away from the article as well, although I doubt that is the author's original intent.
One thing I like to do, if I have the extra time, is pull out a couple of different algorithm's books (Cormen always being one) and see if I can find a new way of viewing a problem in light of an "algorithmic approach."
Admittedly the research takes time and is always a lot of fun as I explore many other "rabbit holes" but it leads to other insights as well. The thing I will try to do is capture a few bullet points that I may have gleaned during this exercise. The approach is not conducive to goal, or time, sensitive deadlines, however.
This. Even if the solution you come up with in the end is almost literally copied from SO, you'll still have learned something and saw different ways of doing things, which is often more valueable than just writing up a naive but working piec of code. Nothing wrong with that, but especially when still learning a language it's more interesting to figure out the typical way of doing things because you'll come across them again and again. Hence also helps reading other code which uses these constructs. And once you know them they're often faster to write and easier to read/grasp.
The article seems geared towards someone working from a book or guide at their leisure. Building things up from first principles is a great way to learn but I personally never have time for the approach on the job.
I think what the article does get right is a fundamental technique to solve a problem when you're stuck: reduce to the essentials, get your terms / naming things right, write a teeny self-contained thing (I call it 'building a model') to try it out. When the solution is understood and has proven to work, integrate. Especially when you're hard pressed for time, this may prove faster overall than the approach where you fumble with many screws.
I have a rule about ZERO copy-pasting from SO or anywhere else on the internet for that matter. If you don't understand the code, when you have all the time in the world to understand it, seek help, test it, and figure it out; you certain won't suddenly understand it better, when it breaks in production. Just because 100 idiots upvoted something, doesn't make it right.
Personally, I avoid SO because I've seen too many wrong answers. I prefer documentation, or blogs explaining approaches. But, it can be a resource, especially to new developers, to help you understand how you need to think and how to approach problems. As a replacement for coding it yourself, it's might be hot garbage, so you should understand it before committing to having produced that code and assign your name to it.
I do too, in theory. In reality though, 90 percent of the time it's essentially unnavigable / unsearchable / basically intractable to use.
What you want is simply something that tells you "how do I do X" where X is some real-world thing, like strip the end-of-line characters from a file. What you get from "documentation" as such is a huge sprawl of text explaining the guys of 90 different command line options. Good luck digging your way through that.
Imperfect though it is, SO was created to address precisely this huge, gaping disparity between what users want, and what most "documentation" actually delivers. Did I mention it's imperfect? And that you actually have to (shudder) think about the examples given, before randomly cut-and-pasting them into production?
The whole point is though, that it's better than nothing, and frequently is mostly correct (and it's not that hard to tell when the answer is wrong or requires a bit of fine-tuning for your use case). And at the end of the day, still saves me hundreds of hours compared to the nearly useless "documentation" that ships with most running languages and platforms these days.
Some 15 years ago when I was an undergraduate, I took a class in Linear Algebra. A month or whatever in we got to the chapter of transformation matrices: Rotation and translation of 3D points. I immediately saw the utility: I could make 3D graphics with these tools! I pretty much ran home and started building a first person spaceship game, pouring over the book trying to work out how to make all these bits fit together. The graphics were rudimentary, I used SDL and most stars were just white squares (SDL_FillRect), but eventually I got to the point where I could travel in "space" and have the stars zoom by.
Turns out, software matrix multiplication is not the best way of doing 3D graphics, but it's a great way of learning how the mathematics of it works. In the mean time of doing all this I had gotten linear algebra. LA was one of the first "hard" classes we took, and a lot of people barely passed. I got top grades in that class.
The moral of the story is that if you want to learn how something works, then this (what OP is doing), is correct. This is a great way to learn. There probably is a faster or better way of doing it, but in terms of learning-efficiency, this is great.
> The separation of talent and skill is one of the greatest misunderstood concepts for people who are trying to excel, who have dreams, who want to do things. Talent you have naturally. Skill is only developed by hours and hours and hours of beating on your craft.
- Will Smith
This kind of exploration, banging your head against the wall, is exactly "beating on your craft".
No matter if the result is crappy by any metric, trying to wrap one's head around things and producing output - any output - over and over is oh so important.
And dare I say talent without practice is useless and a waste, because one's going to stay in the comfort zone, and usually get cocky about it, whereas relentless practice teaches humility.
There is also an epistemological aspect to a lot of this. In school you are often taught true statements ("facts"), but that is not the same as knowledge. If I accurately predict that the next time I flip a coin, it's going to land heads; I did not know it would. I made a correct correct statement, but I did not know.
Knowledge is about understanding how and why things work. That is what I was attempting to figure out about these linear algebra-operations. My classmates were attempting to retain the facts for the test, that typically doesn't work particularly well, and long term retention is poor. I've done that too. I could barely tell you what it is I'm supposed to have learned in those classes. But I could still probably construct 3D rotation matrices by hand without looking them up, all these years later.
You also had a goal beyond "pass a test", which I think is more important still. After all, Geometry isn't generally something taught or tested as merely a collection of facts (but rather their application, a la "given this, prove that", and you have to intuit a path through applying other proofs to get to the final one), and yet it still sucks as a subject in school. Because there isn't a real "why".
A per-cursor form of talent is burning passion, focus and interest, which leads to better & more practice. The person who is willing to push hard into their field is probably %80 of what people call talent. And then when they encounter the obsessed after the fact, they call it talent because they're effectively the same outliers.
Beyond that, what you describe as talent should be called built in ability, much like IQ or similar, because current usage of talent is too muddled with the well practiced.
Software matrix multiplication is a perfectly reasonable way of doing 3D graphics, when you don't have a GPU. They were just starting to become a consumer product at that time: https://fabiensanglard.net/3dfx_sst1/
Moral of the story is that you were intrinsically motivated to learn… in which case it doesn’t matter how you learn it as long as you follow that motivation.
I had a similar story. I barely got through linear algebra in university. The professor never suggested that linear algebra was useful for anything (except passing the course). Apparently the goal was to manipulate symbols. Applications were something for technicians, not academics. Everything was so abstract and seemed arbitrary. Theorems were developed and proofs were presented without ever providing any sense of why you would want to do whatever it was the theorem did. I had absolutely no intuition about anything related to linear algebra, I only had memorization to go on.
Then a decade or so later I encountered matrices in computer graphics and suddenly I found that a lot of that stuff was actually useful and piece by piece it started to make sense to me.
The way that course was taught didn't work at all for me. I wasn't looking for motivation in the sense that I want to use it, but more about what was the motivation for developing any of it in the first place.
Right, seems to me that's still intrinsic. The motivation somehow entering me from the outside, I still want to do the thing. (If I don't want to do it, then I'm by definition not motivated to do it)
If someone was threatening to kill you you would be motivated to defend yourself, even going as far as hurting them… wouldn’t mean that you would want or desire to hurt them.
You can argue with the (somewhat squishy) motivational theory but money is pretty much the classic extrinsic motivator in the literature. There's also an idea of internalized extrinsic motivation however like peer recognition. (Basically external motivations without an explicit carrot/stick.)
The line is somewhat blurry but there's probably a difference between playing a game and doing a job you don't really like because you're paid to do it.
Incentives are more in the vein of psychology than behavioral economics but the two fields are clearly adjacent. And this "debunked pseudoscience" led to a Nobel prize a few years ago. And many of the observations that led to this--relating to people not always being economically "rational"--are very clearly true.
I don't think anyone other than an economist would assume that people were economically rational. Is it really worth a prize to demonstrate a fundamental hypothesis underpinning your field of science is wrong?
It was apparently worth a prize to explain more rigorously why behaviors often deviate from an underlying assumption of maximizing expected value. And, yes, some of the economics community was quite resistant to the idea that the answer was anything more than people are stupid. (That may be a bit of an exaggeration but there was quite a bit of resistance in economics to behavioral economics being a field of serious inquiry.)
> Turns out, software matrix multiplication is not the best way of doing 3D graphics
ok, what is? all the 3d engines I've ever worked with do tons of software matrix multiplication. You generally need to know where things are for collision checking and other things so doing it all on the gpu is not an option
> You generally need to know where things are for collision checking and other things so doing it all on the gpu is not an option
Sure. You can do some of it in software. That's fine.
Point is I didn't use the GPU at all, and did all the rendering in software. No matter how you rotate or translate it, that's not a particularly efficient way of doing it.
A series of calls to OpenGL is a heck of a lot easier than setting up 3x3 rotation and translation-matrices and implementing a library to do 3x1, 1x3, 3x3 matrix multiplications, scalar transformations, transpositions, calculating determinants, planar projections, perspective shifts, and what have you.
Not to be mean, but TBH I rolled my eyes a bit because awkwardly dancing with the interpreter until something appears to work is kind of exactly like throwing spaghetti at the wall.
And as others have pointed out, a quick jaunt to Stack Overflow might have revealed to OP that leaning on the DB is what any seasoned engineer would do, which is probably a better engineering lesson than how to implement a reduce by hand.
AFAIK Stack Overflow's mission statement isn't "Be Rip-Offable for Devs Everywhere." Obviously I know about "copy-paste engineers" but that's a character trait, not the result of their tools. I've always treated SO more like a knowledgeable coworker to help when I get stuck.
> I've always treated SO more like a knowledgeable coworker to help when I get stuck.
Not too knowledgeable though, in my experience if you are the knowledgeable colleague in a domain, SO will not help you unless you're in one of the domains with a legendary SO contributor (e.g. C# because of Jon Skeet if you manage to get their attention), and unlike a knowledgeable colleague it won't assist you in looking for a clarification or solution either.
Experimenting, playing around and "throwing spaghetti at the wall" can lead to an understanding of the problem.
SO will tell me to use DISTINCT, and that is correct.
But why? Why is this the better? Is it really? Why is the home-brewed reduce slower? How could it be improved? Oh hey, but if I did this to the data upstream I could prevent non.distinct records from being in the DB in the first place, etc. etc.
Software Engineering is, among many other things, a craft. Just like carpenting or stone carving or smithing, it must be honed, and mastery can only be achieved if one plays, trys and experiments.
----
Of course, reading about the solutions to a problem can also help in its understanding. For me personally, if I do not understand a problem, I like to do both: Experiment, and read what the "canonical" way to solve it is.
>> awkwardly dancing with the interpreter until something appears to work is kind of exactly like throwing spaghetti at the wall.
But awkwardly dancing with the interpreter until something clicks in your head is highly valuable learning. In this case I kept waiting for the author to have a revelation or two - one about tuples, and one about proper use of dictionaries - but that never happened. I think it will in part 2, and then idiomatic python will fall out with the authors complete understanding of new concepts.
It's probably possible to do this in a one-liner using half of the stuff in itertools, but just changing the output data structure to a dict (at least temporarily; one can always make a list from it at the end) would yield code that is short, understandable and gets the job done without linear searches.
Essentially, I'm agreeing with eska's comment, with the addition: "...but consider your data structures!"
groupby looks neat, but it requires sorted input, so it's very easy to convert an O(n) problem to an elegant but O(n log n) program.
EDIT: one could also do it with a one-liner in pandas, which means that as long as the input is less than ten million lines or so, the "import pandas as pd" statement is going to take longer than the actual program...
Perhaps not one line, maybe not perfect Python but definitely O(n):
groupby_dict = collections.defaultdict(int)
for l in mylist:
groupby_dict[l['thing']] += l['count']
newlist = []
for thing, count in groupby_dict.items():
newlist.append(dict(thing=thing, count=count))
Knowing how do that does require the deep knowledge of data structures I'd expect every professional programmer to have, and doing a quick scan of the languages standard library (maybe an hours work) to see what it offers. After that you're set for solving not only this problem, but almost guaranteed to solve all problems you are likely to hit pretty optimally in Python. I have no idea where Stack Overflow fits into the picture.
If you go a step further and do the Python standard tutorial found in the docs, you would discover how a finger weary experienced Python programmer might write those last three lines:
newlist = [dict(thing=thing, count=count) for thing, count in groupby_dict.items()]
Comprehensions are sweet Python syntactic sugar, but unlike mapping your existing encyclopaedic knowledge of data structures onto the languages standard library they aren't necessary for a casual user of the language. Indeed, some Python'istas will tell you resisting such delights make for clearer code.
But trusting a quick Stack Overflow to tell you the optimal way to use a languages data structures - you must be kidding me.
Doesn't detract from any of your points, just as a FYI: Python has a counter data structure in the standard library. The "pythonic" way would be something like this:
groupby_dict = Counter()
for r in response:
groupby_dict.update({r['thing']: r['count']})
Counter is a subclass of dict, no further conversion needed.
Deep knowledge is overstating it I think. Knowing a few data structures and their Big O notation for various operations goes a long way. You don’t need to exactly know how it is implemented under the covers. SQL is similar that once you grasp what data structure it uses to store your data, you can intuit what should be fast and what is slow.
>Yup. This looks awful — and is error prone and slow and probably wins some kind of award for being unPythonic — but it works. And I wrote it and I can read it. We can improve tomorrow.
That's the key takeaway. Write it once to get it working and understand how to solve the problem, then write it better.
"Hence plan to throw one away; you will, anyhow." - Fred Brooks
This is great. It’s rare to find a well organized brain dump of a beginner struggling with a simple problem.
There is very little record in general of the intermediate states people pass through on their trajectory from novice to expert. I suspect one could build an entirely new framework for education around such data.
>>After some time with no progress in figuring out what sticks, it’s time to step away from that approach.
That's exactly what motivated me to make a habit of going to SO first. But I don't just go there looking to copy and paste. I go there to learn and always review all the solutions given and the history and comments on them.
It was a struggle though because I was ornery and had made a habit of trying to figure everything out on my own. I generally know the logic required but struggled with the syntax to implement it.
Since I made that a habit I have learned so much from contributors there and made huge gains in productivity.
They don't really give a good or direct reason to "step away" from SO here though. It's really more about crafting code you can read and understand and that's good advice.
Sometimes I feel like I was fortunate in a way to have learned programming before the internet - the only way to accomplish something back then was to actually take time to understand the fundamentals of what you were working with.
The most frequent reason I turn to stack overflow, is for solutions to things I just don’t care about right now. Like, for example, how do I get bootstrap to work in webpacker for rail, etc. I don’t really care about this, I’d rather just be writing ruby code. And even in these cases, SO isn’t that helpful half the time because either the answer is 3 years old and outdated or there used to be a relevant answer but it has been deleted for some violation.
Sometimes I just want to know what the syntax is... sometimes documentation can be VERY dense, but someone on SO has posed a question or answer that can show me how the syntax works for that one specific thing I'm looking for.
Particularly useful for frameworks like Svelte or React where I'm like... I have no idea how that's written.
This approach is great if you are practicing writing algorithms on your own time.
If I was writing software professionally, I would think "this sounds like 'group by' in SQL, wonder what the equivalent is in Python", and then I would google for the the most common/idiomatic approach.
from itertools import groupby
keyfunc = lambda x: x['thing']
[{'thing': name, 'count': sum(t['count'] for t in things)}
for name,things in groupby(sorted(mylist, key=keyfunc), keyfunc)]
I always look for a very fundamental way of doing x. But SO questions and replies almost always are about a very very specific use case. It takes a lot of careful searching to find the one line of code I needed for my use case.
Use stack overflow like you use Wikipedia. It’s good to check out but you should really be using it for the references. I find the comments tend to point to documentation quite well.
Agreed with marginalia_ru, if you can get the DB to do it, that's probably the place. if not, I'd go with
d = collections.defaultdict(int)
for item in mylist:
d[item["thing"]] += item["count"]
newlist = [{"thing": key, "count": value} for key, value in d.items()]
(I'm assuming the order doesn't matter, and the specific output format is needed. E.g. if the next step then iterates over that list and unpacks it, obviously get rid of the last line and use the dict)
There is the collections.Counter type, but I think there's not really a beneficial way of using that here.
Generally, the "there is one elegant solution" aspect of Python is widely overstated :D
I think this defaultdict solution is the best, or at least most pythonic.
I really thought if this was possible to do in a single dictionary comprehension, with some kind of d = {k: sum(v) for k, v in <something>}, but whatever goes in <something> ends up being super ugly.
That's true, but while in SQL you can easily group by `a` and `b`, here that's difficult (you'll need to key the dict by a tuple of the values, and reassemble the dictionary afterwards, or something like that).
Yeah, fair enough. You'd have the same problem with any approach, though. In a sense, that's easy in SQL only because the hash-magic is abstracted away.
Using `itertools.groupby` will probably give the cleanest solution. Something like this (untested):
groups = itertools.groupby(sorted(mylist, key=lambda x: x['thing']), key=lambda x: x['thing'])
newlist = [{**group[0], 'count': sum(item['count'] for item in group)} for group in groups]
I'm not overly fond of it, having to sort the list for `groupby` is unpleasant and extracting values from dictonaries is verbose. If this was an array of tuples it could be made much more concise, but of course that doesn't allow storing extra information for each thing, which this solution does.
I'm not a pythonista, but I'd turn it into a dictionary where the value of `thing` becomes the key, and in the value we sum all those counts. Then turn it back into an array again. That's O(n).
On a code review, I'd say I'm impressed, but still demand it written in a way that doesn't require the mindbending this one asks for. We should write code that’s as naïve as possible.
agg = {}
for item in mylist:
if item['thing'] in agg:
agg[item['thing']] += item['count']
else:
agg[item['thing']] = item['count']
result_list = []
for key, value in agg.items():
result_list.append(dict(thing=key, count=value))
In beginner land, reduce makes your eyes glaze over and dict.get(key, default) is understandable in principle but still confusing in practice.
With the recent changes in Javascript, I've developed an aversion to old fashioned for-loops. I prefer to .map.reduce.filter and if absolutely necessary, .forEach my way through the problem.
SQL was built to do this type of operation. It's generally a better pattern to fetch the data you need, rather than fetch all the data and then to transform it into what you need in the code.
I'm actually working on a similar problem right now so I appreciate your comments. What if the data is already in the Python process so I don't have the ability of doing a `group by` at the database level? Should I perform this functionality in the Python code itself, or should I write to another database table and do the `group by` there?
Edit: upon further reflection I think my application is at the tipping point where it needs another database table.
An Array is a contiguous section of memory with like items. Those like items _could be_ object pointers. The closest thing in Python to this is a tuple.
A List is a collection of things with an order, often a linked list or doubly linked list, but could actually be an array as well. Sometimes "array" and "list" are interchangeable.
A Dictionary (aka HashMap) is a key/value store. They are denoted in Python with a bounding {} (similar to like JSON objects). Unlike JS or JSON, the keys can be anything hashable, not just strings or ints.
I sort of get the impression that "every other language" means javascript, where we sort of abuse the syntax and treat objects as hashmaps (they aren't really) or arrays (also aren't really). Yet, Python maps quite well against JSON objects and lists.
Terminology definitely can be different from language to language, but I find Python to be pretty close to the C class of languages for that.
That's even worse. You don't need to throw in a massive library like pandas, especially not when OP is build a basic API. Sure, if you're in a data science-y project where pandas is already a dependency, go nuts.
It really depends on the size of the list you want to process. If it's 10 items, pandas is overkill (and probably slower). If it's a million items, pandas is a great solution.
I have a nagging feeling there is an easier way to do this, but my quick and dirty solution was
def merge_list1(l):
other_dict = defaultdict(lambda: 0)
for t, c in ((i['thing'], i['count']) for i in l):
other_dict[t] += c
return ({'thing': k, 'count': other_dict[k]} for k in other_dict)
which is still readable, but probably far from optimal.
> If it's a million items, pandas is a great solution.
Possibly not even then, it depends on how much you're doing and I feel like the topic at hand might be around that tipping point. We have some rather slow code that, profiling it, turned out to spend something like 60-70% of its time just converting between python types and native types when moving data in and out of the dataframe.
True. If there are millions of different “things” conversion times will end up dominating. If they are just a handful, then the libraries will be able to do a lot more work with parallel operations and converting the output will be very quick
I'll keep doing what I've always done:
Look things up on stack overflow (and other sites), compare different answers and check the discussions as well, figure out WHY people are doing what they do - and then do my own experiments based on that.
For me personally, this leads to faster learning results than experimenting completely on my own, without any guidance.