In our company of 24 employees, we get by with two DGX Sparks. We don't use AI heavily, but each Spark can serve about 6-8 concurrent requests with a full context lenght of 256k, which is decent. We get about ~35 t/s depending on the model we use (currently Qwen3.5 122B A10B and Qwen3 Coder Next), but we might set up a smaller model too for simpler tasks.
This works for us and will work for years to come. It is not SOTA, but it works darn well for our purposes, and we control the compute and data flowing through it, so totally worth it.
That's pretty nice actually, how much KV cache does that model require at full context? That tends to be the main limit to running concurrent requests locally, there's KV quantization but it has outsized negative impact on model quality.
I have experimented with both q8 and q4 for KV cache. I can't find any difference between q8 and fp16, but q4 suffers more when the context grows. q8 seems like a good compromise and gives us enough ctx for about 6-8 concurrent, full context sessions. But we have not fully tested those limits yet, as the context windows rarely reach the limit.
This is pretty cool. How would you say that these open models compare to SOTA on coding tasks? I pay $200/mo for Claude Max but honestly this sounds way more fun.
Nowadays I use our local setup 95% of the time, but it is not that long since that flipped for me personally.
Context: I have a $20 Claude Code subscription, and have used it for a handfull of small-ish projects the last year, in parallel with local models on my AMD 9700XTX (24GB) at home. Mostly Ministral 14B and more recently Qwen3.6 27B Dense 4q.
Historically, the tooling (interferens engines and harness) has been the biggest challenge when using local models, a lot of the benefits from Claude Code was a rather unified and well oiled agent system. Local setups often bring with them sutle incompatibilities between models, inference engines and agent systems that are not obvious from initial testing, but cause trouble on projects larger than a couple of files.
The Spark setup at work is now at a point where I do not miss Claude, like at all. A big part of this is the harness and the tools available to the agent, most critically a good tool for searching online. I use my Kagi subscription to allow the models to fetch up-to-date information, and the Kagi MCP I use also has a summarizer which is very helpful in avoiding rapidly filling up the context window.
I mostly use Zed and it's native agent, which only recently got muuuch better, and on the terminal I use Pi with a minimal selection of extensions (currently pi-kagi-search, pi-smart-fetch, pi-btw and pi-diffloop). I also have Pi in Zed via the ACP, but it does not work so well with some of the extensions, especially the lack of a built-in permission system is a problem, when YOLO-mode is the only mode :)
Honestly, as long as you have a model that is decent at tool calling, your good. Having a solid and stable frame around your model makes a huge difference. The only caveat in all of this is that I spend most of my time on smaller projects and debugging on linux base systems, not huge and complex code bases, so your mileage might vary.
The next phase at work is to set up a chatGPT-like webinterface, and so far LibreChat is at the top of my shortlist. We had OpenWebUI for a while, but it is so bad at using MCP tools that it is practically non-functional for us. LibreChat is a bit more work to set up, but the interface and it's MCP story is much more solid. The goal is to plug in our internal helpdesk, docs and task manager system to LibreChat via MCPs to give us a quick way to query and gather information that is currently very time consuming to do on your own.
I remember using some kind of software around the time of windows xp i think, that could replace the chrome/shell so you could design your own GUI entirely – but I can't remember what it was called! I spent a lot of time iterating and experimenting back then, replacing iexplore.exe or whatever the main process was called.
Come to think of it, this could be a nice model to have as the first pass in a more complex agent system where Needle hands of the results of a tool call to a larger model.
It is incredible how far you get with a single HTML-file, containing styles and JS, when building dashboards, small apps and other utilities that can interact with an API or otherwise fetch data from somwhere.
I just drop it on my personal ~ folder on the shared server at work and voilà, everyone can check it out and use it immediately!
And you get sandboxxing for free! My company got tailscale recently, and its just the final cherry on top: `tailscale serve` my `/tools`, and I don't even have to worry about auth!
If it weren't for the Norman invasion, English would probably still have the same levels of semi-mutual-intelligibility as the other Scandinavian languages.
well, if the Normans had simply spoken Norse as one would expect Norsemen to do...
I recently tried some light research (ok, i ddg'ed) recently on this topic as it wasn't that long between the Viking invasions and settling down in claimed territory, "how continuing-to-be-Norse were the Normans?" I was looking at a similar idea to another comment/statement here from a Scandinavian, "would the Normans have maintained enough knowledge of Norse language to have seen connections to Anglo Saxon/Olde Ænglish? (ok, i just wanted to use a ligature)
I didn't find it easy to to find specifics in great detail, but interestingly in William the Conquerer's family tree, his great^n-grandparents and their cohort were frequently marrying French noble women for local connections and prestige, but also having children with their "soulmate" Norsewoman side piece, made more convenient because the Norse marriage practice was more akin to "common law marriage" anyway.
I'm not reading or judging anything into this (what noble of any culture wouldn't pursue extramarital relations, hell the peasants do it too) except from variety of partners they were clearly maintaining connections to their heritage at least as Italian- or Irish-Americans frequently do in the current day.
It's a bit unconnected to all-father. My impression would be that uuldurfadur would be literally "world-father". But it actually means "glory-father"[1]. It's more commonly spelled wuldorfæder. (Also unrelated to the word "wundor" meaning wonder.)
I can recommend the VW e-UP!s from 2013-2016ish. They have very little tech in them but are relatively modern. You can also quite easily tap into the control systems (climate etc) to remote control it with your own hardware: https://docs.openvehicles.com/en/latest/components/vehicle_v...
They are also super fun to drive and, although they have small batteries, the can charge at 40-50kWh, which translates to 10 minutes to ~85% full. We have used a eUP 2013 model to travel across europe (~900km) in two days, many times! One charge last between one and two hours, depending on speed and weather. We usually cruse at about 90km/h, and the car is basically sipping electrons! The newer model have double the range, but I have not owned or testet them, but might be a decent compromise for longer travels.
I run this model on my AMD RX7900XTX with 24GB VRAM with up to 4 concurrent chats and 512K context window in total. It is very fast (~100 t/s) and feels instant and very capable, and I have used Claude Code less and less these days.
Laptops and tables are, as it turns out, not so cheap either. They need to be fixed or replaced at an alarming rate, and they lay claim to a much larger part of a school budget than books ever did. That is part of the reason that we revert back to pen, paper and books in Norway. First for 1-4 grade, but it will be push further up the grades as we go, I think.
Yeah this is for extremely poor schools where children share one phone between them. But compared to buying books for every subject that very soon wear out and become obsolete, this has basically no cost.
This works for us and will work for years to come. It is not SOTA, but it works darn well for our purposes, and we control the compute and data flowing through it, so totally worth it.
reply