Hacker Newsnew | past | comments | ask | show | jobs | submit | j_w's commentslogin

HN isn't even absent of politics, just the front page is really.

Everything we do is political. When we are making software and publishing it, whether or a company or ourselves, for sale or for free, there are political implications to those actions.


You cannot pin a certain version, even today, if you are using some vendor LLM the versions are transient; they are constantly making micro optimizations/tweaks.

> Demonstrate that one junior plus AI can match a small team’s output.

I don't understand the take that a junior with AI is able to replace a small team. Maybe a horribly performing small team? Even then, wouldn't it just be logical to outfit the small team with AI and then have a small team of small teams?

The alleged increased AI output of developers has yet to be realized. Individuals perceive themselves as having greatly increased output, but the market has not yet demonstrated that with more products (or competitors to existing products) and/or improved products.


Doesn't protection from reassignment not exist in most languages anyways? In C++ you should be able to cast away the const. Realistically you probably can achieve this in any language with reflection.

Unless a const is literally a compile time constant inserted through the program, it's likely able to be changed somehow in most languages.


You can definitely protect from reassignment in those languages (e.g. `final` in Java) but they don't completely prevent you from changing the underlying data. Rust would be one that comes to mind that has true immutability. I guess the Go maintainers just didn't want to go down that road, which I get.

Sorry, I guess I read "protect from reassignment" as "protect the underlying data" then.

I would argue that if you CAN change the underlying data, then the understanding of const by 99% of people is made incorrect. Therefore it's not really a good feature (in my opinion).


This understates the vegetable and fruit intake you should have. 3 servings vegetables and 2 fruit is under what you should aim for. 2-4 servings of grains is a lot of grain.

Ideally the bulk of the volume that you eat should be vegetables and fruits. Meat as nutritionally required/when you like it. Meat at every meal/every day is not needed. Grains are a good filler, but vegetables and fruits are king.


There are bad questions (and ideas, like you said). Stackoverflow tried to incentivize asking good, novel questions. You grow up often being told "there are no stupid questions" but that is absolutely not the case.

A good question isn't just "how do I do x in y language?" But something more like "I'm trying to do x in y language. Here's what I've tried: <code> and here is the issue I have <output or description of issue>. <More details as relevant>"

This does two things: 1. It demonstrates that the question ask-er actually cares about whatever it is they are doing, and it's just trying to get free homework answers. 2. Ideally it forces the ask-er to provide enough information that an answer-er can do so without asking follow ups.


I'm reminded of this link: http://www.catb.org/~esr/faqs/smart-questions.html

Biggest thing as someone who has been in Discords that are geared towards support, you can either gear towards new people or professionals but walking the line between both is almost impossible.


I believe in gearing towards teachers. Q&A sites are often at their best when the Q and A come from the same source. But it needs to be someone who understands that the Q is common and can speak the language of those who don't know the A. Unfortunately, not a common skillset (typically doesn't pay the bills).

> You grow up often being told "there are no stupid questions" but that is absolutely not the case.

There are no stupid questions, but there are stupid choices about whom to ask.

Often the right choice is yourself.


Of course it's true in the case of sqlite. It's one of the most used pieces of software ever, and user cpu time spent is going to dwarf any developer time.

Your example should instead be:

- 5 hours of developer time to run in 4 seconds * n

- 5 minutes of developer time to run in 5 seconds * n

As long as n <= 17,700, then the developer time is "not worthwhile." This assumes that you value user time as much as developer time.

In the case of sqlite, the user time may as well be infinite for determining that side of the equation. It's just that widely used.


Also battery life. 20% less time, 20% more battery.

But OP is correct, companies don't care as long as it doesn't translate into higher sales (or lower sales because the competitor does better). That's why you see that sort of optimization mainly in FOSS projects, which are not PDD (profits-driven development).


Or you just make in person exams the majority of the work and make the exams brutal. If you can't pass the exams you don't pass the class, so you need to learn enough to pass the exams.


It's not that the oral format should be dismissed, just that the idea of your exam being speaking to a machine to be judged on the merit of your time in a course is dystopian. Talking to another human is fine.


How different is it in essence from checking boxes to be scanned by a machine and auto-evaluated to get a one dimention numerical score ?

Have exams ever been about humanity and the optics of it ?


Very different. A scantron machine is deterministic and non-chaotic.

In addition to being non-deterministic LLMs can product vastly different output from very slightly different input.

That’s ignoring how vulnerable LLMs are to prompt injection, and if this becomes common enough that exams aren’t thoroughly vetted by humans, I expect prompt attacks to become common.

Also if this is about avoiding in person exams, what prevents students from just letting their AI talk to test AI.


I saw this piece as the start of an experiment, and the use of a "council of AI" as they put it to average out the variability sounds like a decent path to standardization to me (prompt injecting would not be impossible, but getting something past all the steps sounds like a pretty tough challenge)

They mention getting 100% agreement between the LLMs on some questions and lower rates on other, so if an exam was composed of only questions where there is near 100% convergence, we'd be pretty close to a stable state.

I agree it would be reassuring to have a human somewhere in the loop, or perhaps allow the students to appeal the evaluation (at cost?) if they is evidence of a disconnect between the exam and the other criteria. But depending on how the questions and format is tweaked we could IMHO end up with something reliable for very basic assessments.

PS:

> Also if this is about avoiding in person exams, what prevents students from just letting their AI talk to test AI.

Nothing indeed. The arms race hasn't started here, and will keep going IMO.


> Nothing indeed.

So the whole thing is a complete waste of time then as an evaluation exercise.

>council of AIs

This only works if the errors and idiosyncrasies of different models are independent, which isn’t likely to be the case.

>100% agreement

When different models independently graded tests 0% of grades matched exactly and the average disagreement was huge.

They only reached convergence on some questions when they allowed the AIs to deliberate. This is essentially just context poisoning.

1 model incorrectly grading a question will make the other models more likely to incorrectly grade that question.

If you don’t let models see each other’s assessments, all it takes is one person writing an answer in a slightly different way that causes disagreement among models to vastly alter the overall scores by tossing out a question.

This is not even close to something you want to use to make consequential decisions.


Imagine that LLMs reproduce the biases of their training sets and human data sets are biased against nonstandard speakers with rural accents/dialects/AAVE as less intelligent. Do you imagine their grade won't be slightly biased when the entire "council" is trained on the same stereotypes?

Appeals aren't a solution either, because students won't appeal (or possibly even notice) a small bias given the variability of all the other factors involved, nor can it be properly adjucated in a dispute.


I might be given too much credit, but given the tone of the post they're not trying to apply this to some super precise extremely competitive check.

If the goal is to assess whether a student properly understood the work they submitted or more generally if they assimilated most concepts of a course, the evaluation can have a bar low enough for let's say 90% of the student to easily pass. That would give enough of margin of error to account for small biases or misunderstandings.

I was comparing to mark sheet tests as they're subject to similar issues, like students not properly understanding the wording (and usually the questions and answer have to be worded in pretty twisted ways to properly) or straight checking the wrong lines or boxes.

To me this method, and other largely scalable methods, shouldn't be used for precise evaluations, and the teachers proposing it also seem to be aware of these limitations.


A technological solution to a human problem is the appeal we have fallen for too many times these last few decades.

Humans are incredibly good at solving problems, but while one person is solving 'how do we prevent students from cheating' a student is thinking 'how I bypass this limitation preventing me from cheating'. And when these problems are digital and scalable, it only takes one student to solve that problem for every other student to have access to the solution.


Not all environments are equal. Some vendor systems have basically non-existent debugging capabilities that end up dumping you into the wild west when things go wrong.

I have worked with more than one Fintech that provides no test systems/debugging capabilities and have spent time on calls with their developers as we walk through production logs. Not fun.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: