You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@karpathy
What if instead of confirmatory AI research by academics writing papers and receiving funding/grants to do hundred thousand dollar training runs, that AI research could be done by the masses using exploratory techniques on their home PC's?
I've recognized for quite a while that consumer GPU's have become so powerful that one can cover a huge search space with things like toy 4 inputs/4 outputs NN's and ways to train different input output mappings.
I got myself a 7985WX, 256GB's 8 channel RAM, and dual 5090's to do just this. But I'm was just playing around with SD Lora training along with my idea of reverse engineering the training of micro sized NN's.
Then I discovered nanoGPT and saw that I "could" train GPT2-small in about 27 days on this system. Then 3 days after nanochat was dropped I stumbled on it and Wow! Another huge perf improvement.
IDEA: With nanochat and the further perf improves I'm working on I think we can train a real(1) GPT/LLM in a few days on a 4090/5090 allowing the "throwing of ideas at the wall to see what sticks" approach. And have hundreds or thousands of people exploring this landscape? NOTE: I believe I can do full training in under a day on a home system which I'll post about separately.
(1) What do I mean by a real GPT/LLM? It is something that exhibits both the exciting behaviors seen in the commercial ones and the warts they have. A step up from the nanoGPT 'baby' model. It has to be just powerful enough to be worthy of deep exploration.
I'd also like to take the idea of #193
and expand it to not just be useful for an individual to keep track of their own experiments but to allow collaboration by the community.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@karpathy
What if instead of confirmatory AI research by academics writing papers and receiving funding/grants to do hundred thousand dollar training runs, that AI research could be done by the masses using exploratory techniques on their home PC's?
I've recognized for quite a while that consumer GPU's have become so powerful that one can cover a huge search space with things like toy 4 inputs/4 outputs NN's and ways to train different input output mappings.
I got myself a 7985WX, 256GB's 8 channel RAM, and dual 5090's to do just this. But I'm was just playing around with SD Lora training along with my idea of reverse engineering the training of micro sized NN's.
Then I discovered nanoGPT and saw that I "could" train GPT2-small in about 27 days on this system. Then 3 days after nanochat was dropped I stumbled on it and Wow! Another huge perf improvement.
IDEA: With nanochat and the further perf improves I'm working on I think we can train a real(1) GPT/LLM in a few days on a 4090/5090 allowing the "throwing of ideas at the wall to see what sticks" approach. And have hundreds or thousands of people exploring this landscape? NOTE: I believe I can do full training in under a day on a home system which I'll post about separately.
(1) What do I mean by a real GPT/LLM? It is something that exhibits both the exciting behaviors seen in the commercial ones and the warts they have. A step up from the nanoGPT 'baby' model. It has to be just powerful enough to be worthy of deep exploration.
I'd also like to take the idea of #193
and expand it to not just be useful for an individual to keep track of their own experiments but to allow collaboration by the community.
Beta Was this translation helpful? Give feedback.
All reactions