Agreed with marginalia_ru, if you can get the DB to do it, that's probably the place. if not, I'd go with
d = collections.defaultdict(int)
for item in mylist:
d[item["thing"]] += item["count"]
newlist = [{"thing": key, "count": value} for key, value in d.items()]
(I'm assuming the order doesn't matter, and the specific output format is needed. E.g. if the next step then iterates over that list and unpacks it, obviously get rid of the last line and use the dict)
There is the collections.Counter type, but I think there's not really a beneficial way of using that here.
Generally, the "there is one elegant solution" aspect of Python is widely overstated :D
I think this defaultdict solution is the best, or at least most pythonic.
I really thought if this was possible to do in a single dictionary comprehension, with some kind of d = {k: sum(v) for k, v in <something>}, but whatever goes in <something> ends up being super ugly.
That's true, but while in SQL you can easily group by `a` and `b`, here that's difficult (you'll need to key the dict by a tuple of the values, and reassemble the dictionary afterwards, or something like that).
Yeah, fair enough. You'd have the same problem with any approach, though. In a sense, that's easy in SQL only because the hash-magic is abstracted away.
Using `itertools.groupby` will probably give the cleanest solution. Something like this (untested):
groups = itertools.groupby(sorted(mylist, key=lambda x: x['thing']), key=lambda x: x['thing'])
newlist = [{**group[0], 'count': sum(item['count'] for item in group)} for group in groups]
I'm not overly fond of it, having to sort the list for `groupby` is unpleasant and extracting values from dictonaries is verbose. If this was an array of tuples it could be made much more concise, but of course that doesn't allow storing extra information for each thing, which this solution does.
I'm not a pythonista, but I'd turn it into a dictionary where the value of `thing` becomes the key, and in the value we sum all those counts. Then turn it back into an array again. That's O(n).
On a code review, I'd say I'm impressed, but still demand it written in a way that doesn't require the mindbending this one asks for. We should write code that’s as naïve as possible.
agg = {}
for item in mylist:
if item['thing'] in agg:
agg[item['thing']] += item['count']
else:
agg[item['thing']] = item['count']
result_list = []
for key, value in agg.items():
result_list.append(dict(thing=key, count=value))
In beginner land, reduce makes your eyes glaze over and dict.get(key, default) is understandable in principle but still confusing in practice.
With the recent changes in Javascript, I've developed an aversion to old fashioned for-loops. I prefer to .map.reduce.filter and if absolutely necessary, .forEach my way through the problem.
SQL was built to do this type of operation. It's generally a better pattern to fetch the data you need, rather than fetch all the data and then to transform it into what you need in the code.
I'm actually working on a similar problem right now so I appreciate your comments. What if the data is already in the Python process so I don't have the ability of doing a `group by` at the database level? Should I perform this functionality in the Python code itself, or should I write to another database table and do the `group by` there?
Edit: upon further reflection I think my application is at the tipping point where it needs another database table.
So, pythonists, what’s the elegant answer?