I just won an auction for 25 computers. What should I setup on them?

Trainguyrom@reddthat.com · edit-2 7 months ago

I just won an auction for 25 computers. What should I setup on them?

Diabolo96@lemmy.dbzer0.com · edit-2 7 months ago

Run 70b llama3 on one and have a 100% local, gpt4 level home assistant . Hook it up with coqui.Ai xttsv2 for mind baffling natural language speech (100% local too ) that can imitate anyone’s voice. Now, you got yourself Jarvis from Ironman.

Edit : thought they were some kind of beast machines with 192gb ram and stuff. They’re just regular middle-low tier pcs.

Possibly linux@lemmy.zip · 7 months ago

These are 10 year old mid range machines. Llama 7b won’t even run well

Diabolo96@lemmy.dbzer0.com · edit-2 7 months ago

The key is quantized models. A full model wouldn’t fit but a 4bit 8b llama3 would fit.

Possibly linux@lemmy.zip · 7 months ago

It would fit but it would be very slow

Diabolo96@lemmy.dbzer0.com · edit-2 7 months ago

No. Quantization make it go faster. Not blazing fast, but decent.

SaintWacko@midwest.social · 7 months ago

I tried doing that on my home server, but running it on the CPU is super slow, and the model won’t fit on the GPU. Not sure what I’m doing wrong

Diabolo96@lemmy.dbzer0.com · 7 months ago

Sadly, can’t really help you much. I have a potato pc and the biggest model I ran on it was Microsoft phi-2 using the candle framework. I used to tinker with Llama.cpp on colab, but it seems they don’t handle llama3 yet. ollama says it does , but I’ve never tried it before. For the speed, It’s kinda expected for a 70b model to be really slow on the CPU. How much slow is too slow ? I don’t really know…

You can always try the 8b model. People says it’s really great and even replaced the 70b models they’ve been using.

SaintWacko@midwest.social · 7 months ago

Show as in I waited a few minutes and finally killed it when it didn’t seem like it was going anywhere. And this was with the 7b model…

Diabolo96@lemmy.dbzer0.com · 7 months ago

It shouldn’t happen for a 8b model. Even on CPU, it’s supposed to be decently fast. There’s definitely something wrong here.

SaintWacko@midwest.social · 7 months ago

Hm… Alright, I’ll have to take another look at it. I kinda gave up, figuring my old server just didn’t have the specs for it

Diabolo96@lemmy.dbzer0.com · 7 months ago

Specs? Try mistral with llama.ccp.

SaintWacko@midwest.social · 7 months ago

It has a Intel Xeon E3-1225 V2, 20gb of ram, and a Strix GTX 970 with 4gb of VRAM. I’ve actually tried Mistral 7b and Decapoda Llama 7b, running them in Python with Huggingface’s Transformers library (from local models)

Diabolo96@lemmy.dbzer0.com · edit-2 7 months ago

Yeah, it’s not a potato but not that powerful eaither. Nonetheless, it should run a 7b/8b/9b and maybe 13b models easily.

running them in Python with Huggingface’s Transformers library (from local models

That’s your problem right here. Python is great for making llms but is horrible at running them. With a computer as weak as yours, every bit of performance counts.

Just try ollama or llama.ccp . Their github is also a goldmine for other projects you could try.

Llama.ccp can partially run the model on the gpu for way faster inference.

Piper is a pretty decent very lightweight tts engine that can be directly run on your cpu if you want to add tts capabilities to your setup.

Good luck and happy tinkering!