Hacker Newsnew | past | comments | ask | show | jobs | submit | more angusb's commentslogin

this isn't quite right - if you use zero carbon sources (e.g. solar) to power carbon removal the operation would be net carbon negative.

"but why wouldn't you just use that solar capacity to displace coal plants on the electricity grid??????"

there's already a rapid switch of installed electrical capacity from fossil fuels to solar/wind, but electricity typically accounts for 20% of a developed economy's emissions, so after that we need to look to decarbonise other parts of the economy.

removing carbon from the wider economy involves some low hanging fruits (e.g. better insulation in houses) followed by a succession of increasingly difficult and expensive removal processes, with a very long tail.

at the far end you have things like removing carbon emissions from long haul flights, which is extremely difficult - once it gets this hard, it's much more practical to use solar to power carbon removal.

we can't wait for emissions to go to zero before investing in carbon capture because we'd be waiting decades for battery energy density to increase by orders of magnitude before they are good enough to support long haul flights. We don't have that long


Agreed on the need to invest in carbon capture now. My main argument was that a) reducing emissions to ~0 is going to be necessary even if we invent some insanely efficient carbon sequestration tech; b) the longer we wait, the worse the problem gets, in an exponential way. Emissions reproduce; c) Therefore the bulk of our effort should be spent trying to get emissions down to 0. Because if we fail at that (a task that we know how to accomplish), then no amount of sequestration tech will save us.


Hmm, still not sure I agree. Reducing emissions to close to 0 is definitely very important but it seems probable that we'll reach a steady state where some niche sources of carbon still persist, and are accompanied by corresponding capture. Some niche sources are just unbelievably hard to decarbonise

If in 20 years 1% of our emissions remain but they are accompanied by capture (with a multiplier to account for inefficiencies in capture) then that would be ok by me.

(You'd still want the extra carbon removal to reverse the warming caused by past emissions ofc)


what does all-stock mean when it's an IPO'd company?

I've always thought that a liquid asset like Uber stock is basically as good as cash, so why do they even bother mentioning that it?


There’s probably time restrictions on when you can sell the stock. Also if you cashed out $2.65 billion of Uber stock the price would go way down.


the skin in the game here is incredible


thanks for the link edit!


Yep, 10 years here and never seen it either. Maybe these people target more touristy areas. Author is a she btw :)


The touristy areas thing is probably a big deal. I've also spent ~10 years in London, some of it in Zone 1 without witnessing any attempted street thefts or knowing of any friends being mugged. Approx 1 week in Athens: two, one targeting me (neither successful)

There might be a difference in street crime between London and Athens but it's not that big.


It definitely happens. I live in N5 and my old flatmate had has phone taken from him right outside Arsenal tube station!

I've also been walking on the footpath on New North Road when two guys on scooter have sped past.


Also in N5 and know a few people that have had their phones stolen this way.


I lived in Peckham for 1.5 years and was threatened three times, each time having to take evasive action. And that was in 2001. It's worse now.


How sure can you be that it's worse now if you haven't lived in Peckham since 2001? There doesn't seem to be any discernible upward trend:

http://www.ukcrimestats.com/Neighbourhood/Metropolitan_Polic...

The overall rate of violent crime in London is lower now that it was in 2001. And Peckham was a much rougher neighborhood in 2001 than it is now.


20 years here (in this stint) and I've never seen it but I do tend to avoid crowded places (social anxiety) except when peak travelling.

I do know people who've been moped'd and mugged - never suffered either myself (although I apparently have a "resting hate scowl" according to my sister which scares people away.)


My wife had her phone robbed from her in this way in Canonbury (N5) 2 years ago.

It was very scary for her since they went up onto the curb and pushed her onto the tarmack as they grabbed it. It happened at around 6am in the morning at most 20 metres from our flat which was on a quiet terrace of victorian houses.

It's not just touristy areas. It's probably easier for the motorbike thugs to operate in places which have relatively quiet roads that they can escape on. I suspect that if you live in a nice area, but close to some of the inner city ghettos you are most at risk. I've also seen it happen in and around the parks.


This banner is crazily insufficient. It disappears forever after you visit any other page, without you having to acknowledge its existence.

I just checked fb, and went quickly to my first notification. I didn't really register what the banner was until maybe a second after it loaded - at which point I had already clicked on my first notification. By that point, the banner is gone forever. I can't find any way to get it back.

It's so, so easy to miss this message.


PolyAI | London/Singapore | Onsite/Remote

PolyAI is building the backend machinery to allow computers to have two-way conversations with people. If you've used Siri, you'll probably know that it's okay at handling single-sentence commands like "remind me to call Jane in 2 hours", but anything significantly more complicated - anything that requires a 2-way conversation to establish what you want - is much much more flakey and inconvenient. At PolyAI, we're building the first developer platform for making scalable, maintainable voice apps that rely on conversations, rather than just single-commands. This opens up the possibility of Alexa/Google Home skills for Deliveroo, Uber, Postmates etc that are actually convenient to use and "just work".

We think this is the first step towards a future where voice interfaces are the de facto way of carrying out small to medium sized tasks.

We're a small but fast growing team of 8 people, comprised of (really good!) NLP researchers, and software engineers with a bunch of startup experience behind us. We're healthily funded by Passion and Amadeus and pay competitively.

We're looking for:

* backend engineers * machine learning/NLP engineers

I just want to point out that while we are currently a team of all men, we are super keen to move away from that, and regularly introspect about whether there's anything about our culture that may in any way exclude non male candidates. Additionally, we are always open to feedback about anything we might be getting wrong here.

Get in contact at angus@poly-ai.com


Update: sorry to say that after some research internally we've come by some new information that means we are actually unable to support remote work for now, contrary to the ad above.


Kudos to you for fact-checking!


Fraud detection at GoCardless (YC 11). We use the same tech for training and production classification: logistic regression (sklearn).

-------

Training/retraining

- train on an ad-hoc basis, every few months right now moving to more frequently and regularly as we streamline the process

- training done locally, in memory (we're "medium data" so no need for distributed process), using a version-controlled ipython notebook

- we extract model and preprocessing parameters from the model and preprocessors that were fit in the retraining process, dump to a json config file in the production classification repo

-------

Production classification

- we classify activity in our system on a nightly cron*

- as part of the cron we instantiate the model using config dumped from the retraining process. This means the config is fully readable in the git history (amazing for debugging by wider eng. team if something goes wrong)

- classifications and P(fraud) gets sent to the GoCardless core payments service which then decides whether to escalate cases for manual review

-------

* We're a payments company, processing Direct Debit bank-to-bank transfers. Inter-bank Direct Debit payments happen slowly (typically >2 days) so we don't need a live service for classifications.

Quite simple as production ML services go, but it's currently only 2 people working on this (we're hiring!).


When using sklearn, I've seen a lot of folks just pickle the model and use that as the interchange format. I like the human-readable interchange format you are using better. I assume you just rolled your own. Why not something like PMML?


Yep, we made our own. I haven't heard of PMML before - quite cool! What we've made is a bit more readable for what we're using it for though, IMO. Looks like this:

    {
        "intercept": 1.0,
    
        "features": {
            "feature_1": {
                "coefficient": 1.0,
                "range": [0.1, 10.0],
                "mean_feature_score": 1.0,
                "imputation_value": 1.0
            },
            {
                ....
            }
        }
    }


Is this open source? We were looking for something like this.


Sadly not. I'd be totally up for open sourcing if there's clear demand. If you can find it, send me an email at angus@{company_I_work_at}.com

Note that it's very tied down to our use case right now: only compatible with Logistic Regression, and currently it assumes fixed hyperparameters (will change this in future though), assumes a production pipeline of min-max scaling, imputation, then classification.


PMML is fairly verbose and limited to a particular set of models. It's often easier to pickle the models and then keep tagged versions. I think a human readable format could be created, but since most models are just a pile of numbers it's unclear what is gained.


For Logistic Regression we find human readable config makes a lot of sense. It's pretty intuitive if there aren't too many features - if the model starts behaving weirdly, we can sometimes track it down to a change in a single feature using this (especially when viewing recent git diffs).


Sure. I tend to keep my postprocessing of a model under version control. In particular, what features were most helpful for predictions.


Can't really talk about features on here :(


What kind of features do you look at? Obviously I don't expect you to be able to talk specifics, but I'm curious about the generalities.

Also, how did you settle on logistic regression? Have you tried any other models?


Can't really talk about features on here. Any smart fraudster should be watching every single thing I say :)

We're using logistic regression not because it performs the best, but because it's the most understandable. When cases get flagged for manual review people need to know exactly what seems dodgy about the account, and with Logistic Regression you can read the exact contribution from each feature to the final fraud probability. Seen as the features mean something real and tangible (unlike in neural nets), this means a manual reviewer immediately knows which aspects of someone's behaviour are out of the ordinary when they get presented with a new case (we have a really nice internal UI for presenting this). This saves several minutes per case which really adds up.

Performance-wise Logistic Regression is good, but it can't automatically learn non-linearities in a feature value and its propensity for fraud, and it can't learn about two features that together should indicate a probability of fraud greater than the sum of its parts* . If this becomes a problem for us we'll start looking into nonlinear models where the inner workings are somewhat communicable to the manual review team.

* You can alter feature definitions manually to capture nonlinearities (e.g. a feature which is "user_has_done_x_and_has_done_y_too", but this is very very manual, and needs to be potentially rewritten/manually re-optimised on every retrain. We don't do this.


Just a note on human readability of models: for sure glm gives you a human readeable representation for "free" but there are many ways to get the same kind of readability for neural Networks. Great article, though, cheers!


Ah interesting! Blind spot in my knowledge right there, thanks for pointing it out


I like this approach - first medium data real world solution I've read


Can anyone explain why it's the ad buyers that lose out in this case, not the ad network? Surely it makes way more sense for the ad networks to bare the financial responsibility of preventing ad fraud and not the ad buyer? (The ad equivalent of a money-back-guarantee)

How is an ad buyer ever supposed to make an informed decision about how susceptible their chosen ad vendor is to fraud?


The ad networks sometimes do refund money for invalid clicks. However, there's not much incentive to do so, other than maintaining a good reputation. For this reason, I suspect they put just enough work into fraud detection that you see some refunds, enough to convince you they are diligent.

I suspect it's similar for the R&D spend on preventing them in the first place.


At this point, advertisers just assume some level of fraud is part of the deal with any platform.

As long as you're hitting your target ROI than advertising on still makes sense.


The world can be divided up into two types of advertising. DR (Direct Response) or Branding/Awareness.

The goal of DR is drive an immediate action, for ex. a purchase or news letter signup. Branding/Awareness is more about keeping the brand/product top of mind for the eventual time when the purchasing is actually done.

Usually small and mid-sized advertisers focus on DR. That's why you see a lot of re-targeting type ads for buying products you abandoned in your shopping cart (exception: large ecommerce).

Then you have large advertisers like the Fortune 500 and beyond. They know that you're not making the purchase right there. Hardly any toothpaste, car, $25k server ads or retirement account ads lead to a conversion instantaneously. This is Branding/Product advertising. The hope is to keep their product top-of-mind so you'll consider it when you're driving by the dealership or in the toothpaste isle at Target. This is like your traditional newspaper advertising. Traditional KPIs like CPA used in DR ads don't make sense here. And, due to fraud CPC and CTR are not that useful.

A lot of brand/product advertisements don't have a good instantaneous KPI and measuring long term ROI for a year long $25k server campaign is nebulous art at best.

So to wrap up this story. The guys running this fraud operation were spoofing "premium" video sites with $13.00+ average CPMs (this is high); they were going for the most expensive inventory. The people buying ads on "premium" video sites are not DR advertisers. The goal was to capture Branding/Product advertisers dollars.

It's a bit of a misconception that all online advertising is ROI focused. This was true maybe 4 years ago. With younger audiences (40 and under) consuming more video content online versus linear television there's been in a influx of branding dollars coming "premium" online video.

(Disclosure: My company Adfin provided data for financial estimate for this anti-fraud operation done by WhiteOps)


The frusturating thing about this is how obvious it should be that the publisher account getting paid for the impression is _not_ the premium publisher that it said it was. If your account with, say, AppNexus is registered to "Mikhail Gorsky's Ad Fraud Ring" and it's registering video plays on espn.com/video at a $30 CPM, there are some very very basic heuristics that AppNexus could run to determine there is something off about this arrangement.


That's my personal position.

The vendors that are letting the supply in (SSPs and exchanges) should do more to verify the supply. This could verifying their supply id with provided domain (would get rid of a lot of crappy arbitrage). Additional verification on new suppliers who are generating more traffic. And longer net payment terms for new suppliers to allow for clawing some of it back.

The problem is that many vendors are unwilling to do that. In many cases because they still make money on fraudulent traffic / arbitrage that goes through their platform. Or because they tend to be more accepting of bad data, because adtech is so duct tapped together. People setup their tags/campaigns incorrectly, adservers re-wrap urls, other incorrect rewrapping (fraud/viewablity/attribution), bad javascript, bad publisher sites and hostile browser environments. So they default to be more accepting to not lose on that revenue.


> younger audiences (40 and under)

Thank you.


They're probably following the Uber model of not considering themselves responsible for anything.


Because Google require you hold them harmless in order to buy on their network.


The ad networks are held responsible by basic economics -- the higher the fraud on a network, the lower the observed utility of advertising on that network, and, therefore, over time the less people are willing to pay to advertise on that network.


When I've purchased ads I've always assumed a certain percent will be fake clicks from th get go.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: