Hacker Newsnew | past | comments | ask | show | jobs | submit | bdod6's commentslogin

An even better post here on how the Flynn Effect and Reverse Flynn Effect was never real in the first place: https://www.cremieux.xyz/p/the-demise-of-the-flynn-effect


hi lancefit, you might be interested in looking at doowii.io as well. I'm the CEO, and you can email me at ben [at] doowii [dot] io


Doowii | Full Stack Engineer | Remote (US) | Full-Time Doowii is on a mission to empower educators with AI-driven data analytics solutions that enhance educational outcomes. We’re a rapidly growing EdTech startup that recently closed a $4.1M fundraising round backed by GSV Ventures, Better VC, Avesta Fund, and more (press release: http://bit.ly/3Rfodmt)

What We Do:

-Provide AI-driven data analytics solutions for schools/universities, essentially serving as an AI data scientist for educators. -Seamlessly integrate advanced data analytics into major EdTech platforms through partnerships.

About the Role: We’re looking for a Full Stack Engineer who is passionate about making a difference in education. You'll work on developing and maintaining scalable web applications using a modern techstack. Your role will involve collaborating with other engineers to design and implement new features, optimizing our applications for genAI accuracy, speed, and scalability, and contributing to a high standard of code quality. Needs to have the startup hustle mindset.

How to Apply:

Go to this link: https://jobs.ashbyhq.com/Doowii/23a76737-5510-4f44-a6c7-db0e...

Looking forward to hearing from you!


Doowii | Full Stack Engineer | Remote (US) | Full-Time

Doowii is on a mission to empower educators with AI-driven data analytics solutions that enhance educational outcomes. We’re a rapidly growing EdTech startup that recently closed a $4.1M fundraising round backed by GSV Ventures, Better VC, Avesta Fund, and more (press release: http://bit.ly/3Rfodmt)

What We Do:

-Provide AI-driven data analytics solutions for schools/universities, essentially serving as an AI data scientist for educators. -Seamlessly integrate advanced data analytics into major EdTech platforms through partnerships.

About the Role: We’re looking for a Full Stack Engineer who is passionate about making a difference in education. You'll work on developing and maintaining scalable web applications using a modern techstack. Your role will involve collaborating with other engineers to design and implement new features, optimizing our applications for genAI accuracy, speed, and scalability, and contributing to a high standard of code quality. Needs to have the startup hustle mindset.

How to Apply:

Go to this link: https://jobs.ashbyhq.com/Doowii/23a76737-5510-4f44-a6c7-db0e...

Looking forward to hearing from you!


Doowii | Remote, US | Full-Time | Senior Backend Software Engineer

At Doowii (doowii.io), we're on a mission to change education with data. We need a Senior Backend Software Engineer who's eager to tackle backend challenges, from data engineering to ETL processes, and help us empower educators with actionable insights. If you're into building scalable data pipelines and making a real impact, let's talk.

What You'll Do:

Lead backend projects to enhance our data analytics platform. Design systems that are scalable, secure, and efficient. Navigate the complex world of data with autonomy and creativity. We design our data pipelines to work with LLMs.

Must-Haves: - Bachelor's in Computer Science (or related) with 5+ years in engineering. - Skills in Python, Java, SQL, plus cloud and server management know-how. - Independent problem-solving and a knack for cutting through ambiguity.

If you've got a Master’s, are into EdTech, have startup experience, or have dabbled in advanced data analytics tech, we're especially keen to hear from you.

- Competitive pay ($150K-$190K), equity, unlimited PTO, and comprehensive benefits from day one. - A mission-driven culture that’s all about improving education.

Don't worry if you don't tick every box. If you're passionate about making a difference in education and have the skills to back it up, we'd love to hear from you. Apply here: https://jobs.ashbyhq.com/Doowii/609a13f1-12ff-4fbf-b058-0889...


I don't reside in the US, but we have a close match. Are you only looking for US people?


We are flexible depending on timezone


internship positions available ?


Fastly


I don't understand the obsession with ClickHouse. While it seems like it fits this particular use case, it still deals with the same limitations and challenges of columnar DBs. Your queries will be very fast with counts/averages, but there's a tradeoff with other functions: inserts are efficient for bulk inserts only, your deletes and updates are slow, no secondary indexes...

While Clickhouse can be lightning fast, is it really designed to be a main backend database?


Clickhouse is not something you use for a CRUD backend.

The obsession with Clickhouse is the phenomenal performance for the OLAP use case, a scenario where there were not many open source, easy to install and maintain options. For the most part you can treat it as a “normal” database, insert data into it and query it without messing about with file format conversions and so on. The fact that it is blindingly fast is a big bonus!


> For the most part you can treat it as a “normal” database

While Clickhouse is great at what it does, I would expect a „normal“ database to support transactions. Don’t use it to handle bank accounts.


ClickHouse supports transactional (ACID) properties for INSERT queries. It can also replicate data across geographically distributed locations with automatic support for consistency, failover and recovery. Quorum writes are supported as well.

This allows to safely use ClickHouse for billing data.



I believe Clickhouse is competing with other analytical, OLAP databases, not something like vanilla Postgres or MariaDB or Oracle.

This is not your application backend database.


Secondary indexes are supported by ClickHouse.

https://clickhouse.tech/docs/en/engines/table-engines/merget...

They are not like indexes for point lookups, but also sparse. Actually they are the best you can do without blowing up the storage.


No...it's designed to be an OLAP database.


Congrats team!


Can someone explain how this is more powerful than someone use an Python/R based workflow? E.g., I currently use a combination .ipynb, python scripts, and RStudio and this feels like it covers everything I need for any data science project.


I think Julia has a cleaner focus on scientific and mathematical computing than either R or Python (both for performance and understanding). i.e. the language is designed in such a way that corresponds more directly to mathematical notation and ways of thinking. If you’ve been in a graduate program that’s heavily mathematical, where you spend equal time doing pen and paper proofs and hacking together simulations and such (and frantically trying to learn a language like R/MATLAB/Python while staying afloat in your courses), you’ll appreciate the advantage of this. To my eyes, Python is too verbose and “computer science-y” and R is too quirky to fulfill this niche (I say this as someone that bleeds RStudio blue, and enjoys using Python+SciPy). I don’t think Julia is aimed at garden-variety / enterprise data science workflows. Caveat—I’m not a Julia user currently, so this is sort of a hot take.

The “Ju” in Jupyter is for Julia, so it’s designed to be used as an interactive notebook language also. The Juno IDE is modeled after RStudio.


> R is too quirky to fulfill this niche

I'd like to offer a counter point or add on to this.

It's quirky enough to have many packages backed by some expert statistician.

I hope Julia get to be successful in this regard too.


The way I wrote that comes off as more dismissive than I intended. I think it’s quirky in the sense that there is a wide variance in styles of accomplishing things in (base) R, so something that appears perfectly natural to me can look foreign to someone else. I think this is partly the user base and partly the language itself, and of course the two are interdependent. To me, it’s a joy to write R code because of it’s flexibility and power, but I often have dreaded sharing it with others (especially as a beginner). It’s easy to look at someone else’s R scripts and think “this is horrifying”. By the way, this is referring more to scientific/statistical workflows—for more general purpose data science in R, the Tidyverse (or even just the pipe operator %>% around which the Tidyverse is built) goes a long, long ways towards helping people write expressive but readable code.

By contrast, Python feels a bit too rigid/standardized. Everyone’s code looks like it was copy+pasted from a book of truth somewhere. This is good for sharing and engineering, not as good for expressing mathematical ideas.

So whereas R has evolved organically over decades and Python is for everyone (and alternatives like MATLAB or SAS are first and foremost software for industry rather than languages), Julia seems to be thoughtfully purpose-built to be a modern language for numerical/scientific computing. It polishes off the rough edges and blends some of the best features of each language. Again, this is just an impression from someone who already thinks in R but is learning both Python/Julia.

More to your point, maybe Julia is at a stage of development where it’s good for both students (for developing computational and mathematical thinking) and experts (for slinging concise but performant code), but not yet the rank-and-file users looking to just get things done.


Fast for-loop, the ability to microoptimize numerical code (skip bounds checking in array access, SIMD optimations), GPU vector computing can use exact same code as CPU due to Julia functions being highly polymorphic. Your research code is your production code.

Also the macro system allows one to define powerful DSLs (see Gen.jl for AI).


I had this exact same thought when I read the headline. It seems like MS and others are viewing ML as a similar opportunity to Big Data/BI ten years ago. You saw the "democratization of data" as people with little technical skills could suddenly create analytics dashboards within tools like Tableau.

In my opinion, it's far too easy to make a critical mistake during design/implementation of ML to follow this same path. And what's more, if you mess up making an analytics dashboard, it's usually fairly obvious. In ML, there are MANY ways to mess up a model and you have no easy way to tell.

If someone doesn't have the technical experience behind creating these models, I would not trust any output they give me from using one of these tools. And if they do have the experience, they would certainly not be choosing to use one of these tools either.


Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make.

I am building a competing tool, so I am not affiliate with MS, but I do think that auto ML has value.

Machine learning is different from imperative programming in such that most of the "programming" is done by experiments and not with actual "program", hence there is an opportunity to replace programming with compute. I.e. an automl platform can create 100's of models/pipelines and just try them all.

Also, why would you trust a model which was created manually and not a model which was auto created.

When a model is created in auto ML it pass the same validation process as manually created model, so in both cases the quality of the model should be judged independent from the way that it was created.

In addition, all models (regardless of how they were created - human / not human), should be monitored for predictive performance. I.e. I will not "trust" any model without continuous verification.


A common error is target leaking. An AutoML system will likely consider this a "strong feature". This is where having someone that actually understands the business domain is critical.

There's no question that there's value in AutoML system yet most ML production systems I've worked on / seen were way more complex than feature vector -> model -> prediction. You likely have multiple models, pipelines, normalizations and plain old conditionals. Hard to automate all of this.


Right. I am aiming at the group of companies that have 0 data scientist and would like to avoid hiring one. I assume that their use cases is simple/common and can be automated.

Note that automation is not only building the model, but automating the full life cycle - pre processing, hp optimization , pipeline deployment and monitoring/retraining.


> "Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make. I am building a competing tool"

the short answer is, go study stats and fundamentals of ML instead of asking hn to build your product for you.

> "why would you trust a model which was created manually and not a model which was auto created."

one of many reasons: domain knowledge is important, and math alone cant tell you things are muffed up. contrived example: you build a linear regression model to predict home price and square footage has a negative coefficient. Math conclusion: bigger house = lower price. domain knowledge: oh, we are missing a feature and the model cant tell the difference between city homes vs rural.

there is value to auto ml but there is a lot of room to go horribly wrong


Again, my point is that for a given data set, an auto ml system is much more efficient and radically cheaper than human modeler.

You are pointing to an area outside the realm of automl (feature engineering/generation) , which is domain specific. But this was not my original question.


this has nothing to do with feature engineering and generation. I never added or changed any features in the example. It is exactly in the realm of automl, you run a model, -because- you are missing data, your model is making wrong assumptions.

You could argue (which you didn't) that this would fall under model interpretation, but a model in this example would probably fail to generalize and make bad predictions in the future: IE slamming home values because they have large square footage.


>In ML, there are MANY ways to mess up a model and you have no easy way to tell.

What about all those businesspeople who only hire analysts to tell them (and their peers) what they want to hear? Now they can tell themselves what they want to hear, having laundered it through a computer.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: