How does the S1000 connect to my computer?
The S1000 connects via your local network — Ethernet (recommended) or WiFi. It runs its own operating system (Ubuntu-based) and appears as a compute node on your network. The Bizon Z-Hub app on your Mac, PC, or iPhone communicates with it automatically. No Thunderbolt cable, no PCIe passthrough — it's a networked GPU appliance.
Can I use my own GPU?
Yes. The S1000 accepts any standard PCIe full-length GPU. Install your own NVIDIA RTX 5090, RTX PRO 6000, or other compatible card. You can also buy a pre-configured bundle with the GPU already installed and tested.
How does clustering work?
When you connect multiple S1000 units to the same network, the Bizon Z-Hub App detects them and forms a cluster. The cluster pools VRAM from all GPUs, letting you run models that exceed a single GPU's capacity. For example, two S1000s with RTX 5090s give you 64 GB of total VRAM, enough to run larger models that wouldn't fit on a single card.
Do I need to install GPU drivers on my Mac or PC?
No. GPU drivers run on the S1000 itself, not on your Mac or PC. Your computer just runs the Bizon Z-Hub App management app. This is what makes it work with macOS — you're not trying to install NVIDIA drivers on a Mac. The GPU compute happens entirely on the S1000.
What models can I run?
Any model supported by Ollama or vLLM. This includes Llama, DeepSeek, Qwen, Gemma, Mistral, Nemotron, GLM, MiniMax, and thousands more. The model library in Bizon Z-Hub App lets you search, filter, and download with one click.
How is this different from an eGPU (Thunderbolt enclosure)?
Traditional eGPUs connect via Thunderbolt, require GPU drivers on the host computer, and don't work with macOS for NVIDIA GPUs. The S1000 is a fully self-contained compute node with its own CPU, RAM, and OS. It connects over your network, so it works with any operating system — including macOS, which has no NVIDIA driver support. It's closer to having a private GPU server on your desk than an eGPU enclosure.
Can I access it remotely?
Yes. The Bizon Z-Hub App works over any network connection, including VPN. Manage your S1000 from anywhere in the world. The iPhone app gives you the same monitoring and control when you're away from your desk.
Can I use an NVIDIA RTX 5090 with a Mac?
Not inside the Mac itself — macOS has no native NVIDIA driver support, so you can't install an RTX 5090 in a Mac. The S1000 solves this by running the GPU on its own Ubuntu-based system and exposing it to your Mac over the network. You manage everything from the Bizon Z-Hub app on macOS while the RTX 5090, RTX PRO 6000, or H200 does the compute. It's the practical way to pair Apple Silicon with a dedicated NVIDIA GPU.
What is the largest LLM I can run on the S1000?
It comes down to GPU VRAM. A 32 GB RTX 5090 runs quantized models up to roughly 30B parameters; a 96 GB RTX PRO 6000 runs 70B models at full FP16; and a 141 GB H200 runs 100B+ parameter models. Need more than a single card holds? Cluster multiple S1000 units and their VRAM pools automatically via exo, letting you run models like DeepSeek-R1 or Llama 405B that don't fit on any one GPU.
How is this different from a Mac Studio with large unified memory?
Apple Silicon shares a single pool of unified memory between CPU and GPU at relatively low bandwidth and runs inference on Metal/MLX. The S1000 gives you a dedicated NVIDIA CUDA GPU with up to 4.8 TB/s of memory bandwidth and native support for CUDA inference engines like vLLM — typically much faster token generation on large models. You keep using your Mac as the frontend and offload the heavy compute to the S1000.
Is the S1000 a good alternative to the NVIDIA DGX Spark?
They target the same buyer but differ in one key way: the DGX Spark is a fixed mini-PC with unified memory you can't upgrade, while the S1000 lets you choose — and later swap — the NVIDIA GPU. You get dedicated VRAM with far higher bandwidth, the ability to scale by clustering multiple boxes, and compatibility with your existing Mac or PC. See the full S1000 vs. DGX Spark comparison above.
How much power does it use and how loud is it?
Power draw depends on the GPU — roughly 575 W for an RTX 5090, 600 W for an RTX PRO 6000, or 700 W for an H200 under full load, plus modest system overhead, all from a standard wall outlet. Noise is configurable: the Bizon Z-Hub app offers Quiet, Balanced, and Mad Max power modes with custom fan curves, so you can keep it near-silent at your desk or unlock maximum performance when you need it.
Can I fine-tune or train models, or is it inference only?
It's a full, dedicated NVIDIA GPU, so it handles fine-tuning and LoRA training as well as inference. You have the complete CUDA stack available — PyTorch, Hugging Face, vLLM, and more — over SSH or the app's built-in terminal. The one-click app workflow is tuned for running models, but nothing stops you from launching training and fine-tuning jobs on the same hardware.
Can I connect my own apps, like Open WebUI or coding assistants?
Yes. Models served through Ollama or vLLM on the S1000 expose an OpenAI-compatible API endpoint on your local network. Any tool that speaks the OpenAI API — Open WebUI, LM Studio, Continue, Cursor, or your own scripts — can point at your S1000 instead of a cloud provider, with no change beyond the base URL.