More

deforciant · 2025-05-27T12:58:08 1748350688

finally a good alternative to e2b. Minor nit - it would be good to also have a screenshot in the docs on how to inspect what agent is working on or how to debug if its stuck

deforciant · 2025-05-27T12:55:45 1748350545

as a primarily backend developer I find cursor and chatgpt/grok (for more compelx components) totally amazing. I can finally build UIs that I want for my projects :) I think I have good taste (lol) I just could never spend those hours and days polishing.

Now I can ask it to do some frontend while I focus on backend in the meantime.

We just need the sales agent now.

deforciant · on May 15, 2024

going to try on my websites!

deforciant · on Feb 6, 2024

I always thought that fine tuning is more like getting a style rather than memorizing information word to word or at least the facts. What are the next steps to ensure that it doesn't start pulling info from the base knowledge and reference the docs instead? How long does it usually take to train? 10-15 minutes on what doc size?

lewq · on Feb 6, 2024

Fine tuning is just more training -- so it's definitely possible to teach the model facts this way too.

In practice we've found that it's a bit of a balancing act to teach the model the new knowledge without destroying existing knowledge, but it's just a matter of tuning the parameters carefully. We're also researching whether we can fine-tune a brand new expert in a MoE model like Mixtral, I've also seen work on fine-tuning just a fixed set of weights. I'm sure there will be more developments in this space soon.

In terms of how you refer to new knowledge and not base knowledge, like many things in LLMs, you just ask the LLM :-) For example, if you look at this session https://app.tryhelix.ai/session/62905598-b1b7-4d93-bc39-5a93... and click "Show Info" at the top, you can see the system prompt is:

"You are an intelligent chatbot named Helix that has been fine-tuned on document(s) e1ef2e896c in document group 62905598b1. The document group contains 1 document(s). The user will ask you questions about these documents: you must ONLY answer with context from the documents listed. Do NOT refer to background knowledge."

It does a pretty good job at this, although I'm sure there are ways to improve it further.

Referencing the specific document IDs in the fine-tuning was an innovation that has really helped us.

In terms of training time, yeah - 5 minutes on a news article, 10 minutes on a typical length paper. Pretty usable. We're experimenting with reducing the number of epochs and increasing the learning rate to make it faster at that too.

gbickford · on Feb 8, 2024

Have you tried generating two sets of qapairs, one with bad answers, and using DPO?

lewq · on Feb 8, 2024

Not yet, sounds promising!

aCoreyJ · on Feb 8, 2024

What is the advantage over using Retrieval Augmented Generation ?

mendeza · on Feb 8, 2024

RAG adds context to the users question to reduce hallucination. https://docs.llamaindex.ai/en/stable/getting_started/concept...

aCoreyJ · on Feb 8, 2024

Actually missed this was covered in the post, thanks

aCoreyJ · on Feb 8, 2024

Actually missed this is answered in the article!

drphilwinder · on Feb 6, 2024

Your sentiment is correct, but it's more of a spectrum. Fine tuning can learn facts (otherwise how would the foundation models learn facts?). But it needs those facts in the training dataset. If you have an infinite amount of facts, then you can memorise all of them.

The challenge arises when it becomes hard to generate that training data. If you just have the raw text and pop that in the context (i.e. RAG), then the LLM can be just as factual without any of that hassle.

Q2: identifiers in the prompt to say "you've been trained on this, only answer questions about this".

Q3: Depends on the size of the training data/docs. For the average PDF, about 30 minutes.

Give it a try!

gpderetta · on Feb 8, 2024

> If you have an infinite amount of facts, then you can memorise all of them

pigeon-hole?

gdiamos · on Feb 8, 2024

Not literally infinite, but Llama2 scale models can handle about 10 trillion tokens.

deforciant · on Aug 23, 2023

happened to me after riding maybe five times on MT-09 :) Since my iphone was a bit old anyways decided to buy a new one and this time also get the apple insurance just in case. For navigation I now use Beeline. You need to get used a bit more for the way it gives instructions but it's also way less distracting than having a phone on your bike.

Ref: https://beeline.co/pages/beeline-moto

deforciant · on May 25, 2023

have you tried other models to generate embeddings? I am going to that direction too to create an additional layer of helpers for search. Also, thinking if the document is not too big, it might fit into the initial context with the prompt

deforciant · on Jan 24, 2023

the problem with cal.com "open source" self-hosting is that they have made it quite difficult to run yourself. For example this https://developer.cal.com/self-hosting/docker actually doesn't provide docker images but you need to build it yourself because for some reason frontend needs hardcoded hostname. In no other app I have seen such limitations :) Also, an older version from a year ago just stopped working, couldn't fix it, couldn't update it either :D

It would be good if someone made a fork with fixed setup and docker images for self-hosting :)

saddist0 · on Jan 24, 2023

Why does it need to be a fork and not contribution to this repo?

There is no end of feature requests, but polite way is to appreciate what is already done by original authors and fill your requests as GitHub issue.

Disclaimer: not affiliated to them, just find your comment disrespectful.

candiddevmike · on Jan 24, 2023

One reason would be a CLA. Presumably to contribute to their main repo, you need to sign a CLA to ensure they can relicense this thing as needed. A separate fork wouldn't have that requirement, or shouldn't if it's in good faith.

Phrodo_00 · on Jan 24, 2023

That could be a reason, but I don't see a CLA in their contributing guide[1].

[1] https://github.com/calcom/cal.com/blob/main/CONTRIBUTING.md

candiddevmike · on Jan 24, 2023

IANAL, but that could have some interesting implications for their enterprise licensing/builds I believe. They can't relicense the code for their enterprise builds, so it stays AGPL due to linking/AGPL infection. Would be an interesting court case.

md5wasp · on Jan 24, 2023

I've contributed to this repo (and I also self host Calendly) and didn't have to sign anything.

It _is_ a pain to self-host, but unlike claims elsewhere in this thread I do also self-host the API and database.

pcthrowaway · on Jan 24, 2023

> and I also self host Calendly)

Is the core of Calendly open-source?

I skimmed their repos but it looks like they don't include the "secret sauce" to self-hosting your own event+booking platform: https://github.com/orgs/calendly/repositories

gumby · on Jan 24, 2023

Why would you not want such an agreement? It means the main maintainer(s) have standing if there is a license dispute.

candiddevmike · on Jan 24, 2023

IANAL. As a contributor? It means the company can relicense my contributions into a license that is wildly different from their current one (including no license/copyright). It affords me no benefit.

There are CLA alternatives like the Developer Certificate of Origin (DCO) that ensure the company has the "legal standing" to accept a contribution without infringing on copyright, but it doesn't give them the ability to relicense.

freedomben · on Jan 24, 2023

> It affords me no benefit.

Sure they could relicense it to something wildly different, but they can't retroactively take back old versions of it, so you can still run it as it was when you made the contributions.

I wish nobody required CLAs, but I'm glad that there are products like Cal that would (assumedly) be closed contribution otherwise due to (real or perceived) legal risk.

fragmede · on Jan 25, 2023

> It affords me no benefit.

It means they can sue for open source license violations on your behalf, something that's a bit harder if they don't actually wholly own the copyright.

deforciant · on Jan 24, 2023

Didn't want to sound disrespectful, I did like the UI and liked the idea of self-hosting. Regarding fork vs pr - just putting myself into their shoes I understand why there's probably no will to make the self-hosting easy and potentially make it a bit harder than it should be :)

nxmnxm99 · on Jan 24, 2023

Judging by their English they’re almost certainly from the Baltic so I don’t think his tone was intended as disrespectful

Peer_Rich · on Jan 24, 2023

what's wrong with this image? https://hub.docker.com/r/calcom/cal.com

let me update the docs

splitrocket · on Jan 24, 2023

Instructions aren't for the docker image, they are for the standard js installation.

junon · on Jan 24, 2023

This puzzles me, there are often cases where hostnames are baked into frontends. Also, not everyone wants to use docker, so it's not exactly mandatory to have docker images. Dockerizing most things is rather simple, anyway.

gumby · on Jan 24, 2023

I prefer to simply install rather than use docker.

deforciant · on Nov 20, 2022

Paying for copilot :) at least in go it’s great to write tests and sometimes some smaller functions :) totally worth paying for it, even from your own pocket if the company wouldn’t allow expensing it

janoc · on Nov 20, 2022

If the company wouldn't pay for it then better think twice because you could get in hot water with legal. That's not a tool one's job or even company's business is worth risking over.

Copilot has a ton of still unresolved legal and compliance issues (copyright violation problems, sending proprietary code to Microsoft as you are writing it, etc.) and most larger businesses won't touch it with a 10 foot pole for that reason. There is even a class action lawsuit against Microsoft over Copilot already.

deforciant · on Oct 2, 2022

It's great actually, you can run it with pretty much no maintenance overhead. I have been using it in prod in multiple companies for years now

deforciant · on May 14, 2022

Could you please split it into multiple tickets that we no bigger than 3 points

ben_w · on May 14, 2022

Splitting it into 3-point tickets an 11 point ticket all by itself.

dekken_ · on May 14, 2022

I sure love doing things that aren't actual work