Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, looking at it and knowing python is strong at lists and dictionaries… there sure is a better way which I would search for.

So, pythonists, what’s the elegant answer?



Agreed with marginalia_ru, if you can get the DB to do it, that's probably the place. if not, I'd go with

    d = collections.defaultdict(int)
    for item in mylist:
        d[item["thing"]] += item["count"]
    newlist = [{"thing": key, "count": value} for key, value in d.items()]
(I'm assuming the order doesn't matter, and the specific output format is needed. E.g. if the next step then iterates over that list and unpacks it, obviously get rid of the last line and use the dict)

There is the collections.Counter type, but I think there's not really a beneficial way of using that here.

Generally, the "there is one elegant solution" aspect of Python is widely overstated :D


I think this defaultdict solution is the best, or at least most pythonic.

I really thought if this was possible to do in a single dictionary comprehension, with some kind of d = {k: sum(v) for k, v in <something>}, but whatever goes in <something> ends up being super ugly.


What you do mean ugly :)

[{"thing": k, "count": v} for k, v in [[c, c.update({x["thing"]: x["count"]})] for c in [Counter()] for x in response][0][0].items()]


Note that this approach doesn't work as nicely if you need to store additional information alongside the name and count.


If you need to store other info, it doesn't make sense to aggregate. Think of it in SQL:

    SELECT a, b, SUM(c) FROM table GROUP BY a
doesn't even compile. You either project away `b` or group over both `a` and `b`.


That's true, but while in SQL you can easily group by `a` and `b`, here that's difficult (you'll need to key the dict by a tuple of the values, and reassemble the dictionary afterwards, or something like that).


Yeah, fair enough. You'd have the same problem with any approach, though. In a sense, that's easy in SQL only because the hash-magic is abstracted away.


Using `itertools.groupby` will probably give the cleanest solution. Something like this (untested):

    groups = itertools.groupby(sorted(mylist, key=lambda x: x['thing']), key=lambda x: x['thing'])
    newlist = [{**group[0], 'count': sum(item['count'] for item in group)} for group in groups]
I'm not overly fond of it, having to sort the list for `groupby` is unpleasant and extracting values from dictonaries is verbose. If this was an array of tuples it could be made much more concise, but of course that doesn't allow storing extra information for each thing, which this solution does.


Don't know about elegant. The following doesn't have the O(N^2) performance:

    def merge_counts(mylist):
        thing_lookup = {}  # map thing -> item 
        newlist = []
        for item in mylist:
            thing = item["thing"]
            if thing in thing_lookup:
                # seen before; update count.
                thing_lookup[thing]["count"] += item["count"]
            else:
                # first time I've seen this thing
                newlist.append(item)
                thing_lookup[thing] = item
        return newlist
It, like the original, mutates the original dictionary. I would prefer:

                # first time I've seen this thing
                thing = thing.copy() # don't mutate the original
                newlist.append(item)


I'm not a pythonista, but I'd turn it into a dictionary where the value of `thing` becomes the key, and in the value we sum all those counts. Then turn it back into an array again. That's O(n).

In javascript:

  const mylist = [
    {'thing': 'A', 'count': 4},
    {'thing': 'B', 'count': 2},
    {'thing': 'A', 'count': 6}];

  const dict = mylist.reduce(({thing, count}, dict) => ({
    [thing] : dict[thing] ? dict[thing]+count : count
  }, {});

  const newList = Object.keys(dict).map(thing => { thing, count: dict[thing] });
(Not tested or anything, so no guarantee that this works right off the bat, but something like this should work.)


Same algorithm in Python

    from functools import reduce
    result = [
        {'thing': k, 'count': v}
        for k, v in reduce(
            lambda result, item: {
                **result,
                item['thing']: item['count'] + result.get(item['thing'], 0)
            },
            mylist,
            {}
        ).items()
    ]


On a code review, I'd say I'm impressed, but still demand it written in a way that doesn't require the mindbending this one asks for. We should write code that’s as naïve as possible.


Something a bit more beginner friendly:

    agg = {}
    for item in mylist:
      if item['thing'] in agg:
        agg[item['thing']] += item['count']
      else:
        agg[item['thing']] = item['count']
    
    result_list = []
    for key, value in agg.items():
      result_list.append(dict(thing=key, count=value))
In beginner land, reduce makes your eyes glaze over and dict.get(key, default) is understandable in principle but still confusing in practice.


After almost a decade doing python, I'd probably use a plain loop like this but with ".get(key, default)"


With the recent changes in Javascript, I've developed an aversion to old fashioned for-loops. I prefer to .map.reduce.filter and if absolutely necessary, .forEach my way through the problem.


I'd do GROUP BY in the query to SQLite.


many of these which can be extremely difficult in imperative, can be very easy in declarative.


SQL was built to do this type of operation. It's generally a better pattern to fetch the data you need, rather than fetch all the data and then to transform it into what you need in the code.


I'm actually working on a similar problem right now so I appreciate your comments. What if the data is already in the Python process so I don't have the ability of doing a `group by` at the database level? Should I perform this functionality in the Python code itself, or should I write to another database table and do the `group by` there?

Edit: upon further reflection I think my application is at the tipping point where it needs another database table.


Depending on your database, perhaps a materialized view is in order?


and Prologo will be even better and cleaner syntax.


I think that is something, maybe collections.Counter, that provides this functionality out of the box.

I can’t really remember it, so I’d probably go with defaultdict


From a C#'er: do you people not have LINQ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: