I mean, looking at it and knowing python is strong at lists and dictionaries… th...

detaro · on Dec 23, 2021

Agreed with marginalia_ru, if you can get the DB to do it, that's probably the place. if not, I'd go with

    d = collections.defaultdict(int)
    for item in mylist:
        d[item["thing"]] += item["count"]
    newlist = [{"thing": key, "count": value} for key, value in d.items()]

(I'm assuming the order doesn't matter, and the specific output format is needed. E.g. if the next step then iterates over that list and unpacks it, obviously get rid of the last line and use the dict)

There is the collections.Counter type, but I think there's not really a beneficial way of using that here.

Generally, the "there is one elegant solution" aspect of Python is widely overstated :D

YurgenJurgensen · on Dec 23, 2021

I think this defaultdict solution is the best, or at least most pythonic.

I really thought if this was possible to do in a single dictionary comprehension, with some kind of d = {k: sum(v) for k, v in <something>}, but whatever goes in <something> ends up being super ugly.

runekaagaard · on Dec 25, 2021

What you do mean ugly :)

[{"thing": k, "count": v} for k, v in [[c, c.update({x["thing"]: x["count"]})] for c in [Counter()] for x in response][0][0].items()]

Denvercoder9 · on Dec 23, 2021

Note that this approach doesn't work as nicely if you need to store additional information alongside the name and count.

qsort · on Dec 23, 2021

If you need to store other info, it doesn't make sense to aggregate. Think of it in SQL:

    SELECT a, b, SUM(c) FROM table GROUP BY a

doesn't even compile. You either project away `b` or group over both `a` and `b`.

Denvercoder9 · on Dec 23, 2021

That's true, but while in SQL you can easily group by `a` and `b`, here that's difficult (you'll need to key the dict by a tuple of the values, and reassemble the dictionary afterwards, or something like that).

qsort · on Dec 23, 2021

Yeah, fair enough. You'd have the same problem with any approach, though. In a sense, that's easy in SQL only because the hash-magic is abstracted away.

Denvercoder9 · on Dec 23, 2021

Using `itertools.groupby` will probably give the cleanest solution. Something like this (untested):

    groups = itertools.groupby(sorted(mylist, key=lambda x: x['thing']), key=lambda x: x['thing'])
    newlist = [{**group[0], 'count': sum(item['count'] for item in group)} for group in groups]

I'm not overly fond of it, having to sort the list for `groupby` is unpleasant and extracting values from dictonaries is verbose. If this was an array of tuples it could be made much more concise, but of course that doesn't allow storing extra information for each thing, which this solution does.

eesmith · on Dec 23, 2021

Don't know about elegant. The following doesn't have the O(N^2) performance:

    def merge_counts(mylist):
        thing_lookup = {}  # map thing -> item 
        newlist = []
        for item in mylist:
            thing = item["thing"]
            if thing in thing_lookup:
                # seen before; update count.
                thing_lookup[thing]["count"] += item["count"]
            else:
                # first time I've seen this thing
                newlist.append(item)
                thing_lookup[thing] = item
        return newlist

It, like the original, mutates the original dictionary. I would prefer:

                # first time I've seen this thing
                thing = thing.copy() # don't mutate the original
                newlist.append(item)

mcv · on Dec 23, 2021

I'm not a pythonista, but I'd turn it into a dictionary where the value of `thing` becomes the key, and in the value we sum all those counts. Then turn it back into an array again. That's O(n).

In javascript:

  const mylist = [
    {'thing': 'A', 'count': 4},
    {'thing': 'B', 'count': 2},
    {'thing': 'A', 'count': 6}];

  const dict = mylist.reduce(({thing, count}, dict) => ({
    [thing] : dict[thing] ? dict[thing]+count : count
  }, {});

  const newList = Object.keys(dict).map(thing => { thing, count: dict[thing] });

(Not tested or anything, so no guarantee that this works right off the bat, but something like this should work.)

thr0waway164021 · on Dec 23, 2021

Same algorithm in Python

    from functools import reduce
    result = [
        {'thing': k, 'count': v}
        for k, v in reduce(
            lambda result, item: {
                **result,
                item['thing']: item['count'] + result.get(item['thing'], 0)
            },
            mylist,
            {}
        ).items()
    ]

rbanffy · on Dec 23, 2021

On a code review, I'd say I'm impressed, but still demand it written in a way that doesn't require the mindbending this one asks for. We should write code that’s as naïve as possible.

DangitBobby · on Dec 23, 2021

Something a bit more beginner friendly:

    agg = {}
    for item in mylist:
      if item['thing'] in agg:
        agg[item['thing']] += item['count']
      else:
        agg[item['thing']] = item['count']
    
    result_list = []
    for key, value in agg.items():
      result_list.append(dict(thing=key, count=value))

In beginner land, reduce makes your eyes glaze over and dict.get(key, default) is understandable in principle but still confusing in practice.

Izkata · on Dec 23, 2021

After almost a decade doing python, I'd probably use a plain loop like this but with ".get(key, default)"

mcv · on Dec 23, 2021

With the recent changes in Javascript, I've developed an aversion to old fashioned for-loops. I prefer to .map.reduce.filter and if absolutely necessary, .forEach my way through the problem.

marginalia_nu · on Dec 23, 2021

I'd do GROUP BY in the query to SQLite.

deepstack · on Dec 23, 2021

many of these which can be extremely difficult in imperative, can be very easy in declarative.

marginalia_nu · on Dec 23, 2021

SQL was built to do this type of operation. It's generally a better pattern to fetch the data you need, rather than fetch all the data and then to transform it into what you need in the code.

alexchantavy · on Dec 23, 2021

I'm actually working on a similar problem right now so I appreciate your comments. What if the data is already in the Python process so I don't have the ability of doing a `group by` at the database level? Should I perform this functionality in the Python code itself, or should I write to another database table and do the `group by` there?

Edit: upon further reflection I think my application is at the tipping point where it needs another database table.

marginalia_nu · on Dec 25, 2021

Depending on your database, perhaps a materialized view is in order?

deepstack · on Dec 23, 2021

and Prologo will be even better and cleaner syntax.

charlieyu1 · on Dec 23, 2021

I think that is something, maybe collections.Counter, that provides this functionality out of the box.

I can’t really remember it, so I’d probably go with defaultdict

pjc50 · on Dec 23, 2021

From a C#'er: do you people not have LINQ?