Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is genuinely the cutting edge of how you do interesting things with language models like GPT-3 at the moment.

Training these models with extra data turns out to be incredibly expensive and relatively ineffective.

Instead, the most interesting research is all around tricks like this - figuring out ways to round-trip to the language model, then query other sources of data for the information that it needs, then sending more prompts to the language model again.

I wrote a tutorial about a pattern for doing that a couple of weeks ago, but this SQL trick is a lot more sophisticated than what I've done so far: https://simonwillison.net/2023/Jan/13/semantic-search-answer...



> Training these models with extra data turns out to be incredibly expensive and relatively ineffective.

I can see that it's expensive, but have you tried it for effectiveness?

BTW, your approach is very cool here.


I've only done two experiments with it myself - training a tagging model on my blog's content and using that to suggest tags for untagged entries - and I found the results very unimpressive fur both a cheaper and the most expensive model.

I've seen a few other people suggest that time tuning GPT is unlikely to give better results than just feeding the regular model a few examples in a regular prompt.

I've yet to see anyone talking about a GPT3 fine tuning project that went really for them. Maybe I haven't looked in the right places.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: