How to Run DeepSeek-R1 Locally, a Free Alternative to OpenAI’s o1 model. Hardware requirements.

Introduction

deepseek

Large Language Models (LLMs) have transformed the world of natural language processing by enabling a range of advanced use cases, from question-answering to code generation. DeepSeek-R1 is one of the latest specialized language models designed to deliver powerful performance in tasks like semantic search, text summarization, classification, and more. Thanks to open-source tooling, you can deploy DeepSeek-R1 locally with frameworks such as Ollama, vLLM, and Transformers to harness its capabilities on-premises or in your own private environment.


This article will walk you through everything you need to know to get DeepSeek-R1 up and running, including:

  • A step-by-step guide to installing DeepSeek-R1 with Ollama, vLLM, and Transformers
  • DeepSeek-R1’s GPU requirements and considerations
  • A detailed chart matching model types to NVIDIA GPUs and VRAM requirements
  • Recommended AI NVIDIA workstations from BIZON for powering your DeepSeek-R1 deployments

Prerequisites

Before installing DeepSeek-R1, ensure you have the following in place:

  • Python Environment: Python 3.8 or higher is recommended, plus Pip or Conda.
  • GPU (optional, but recommended): An NVIDIA GPU with sufficient VRAM. CPU-only mode works but is slower.
  • CUDA and Drivers: For GPU usage, install the appropriate NVIDIA drivers and CUDA toolkit.
  • Git (optional): Useful for cloning repositories.
  • Disk Space: LLMs can occupy multiple gigabytes, so ensure you have enough free space.

Step-by-Step Installation with Ollama

Ollama is a command-line application that simplifies running LLMs locally. Below are the general steps to install and use DeepSeek-R1 with Ollama.


1. Install Ollama

Mac (Apple Silicon / Intel):

brew install ollama/tap/ollama

Linux: Refer to Ollama’s GitHub for .tar.gz or .deb packages, or use:

wget https://github.com/jmorganca/ollama/releases/download/vX.X.X/ollama-linux.tar.gz
tar -xvf ollama-linux.tar.gz
sudo ./ollama install

Windows (via WSL):

First, install WSL by opening PowerShell as an administrator and running:

wsl --install

Follow any on-screen instructions and reboot if prompted. Once WSL is installed, open your WSL distribution (e.g., Ubuntu) and follow the Linux steps above inside the WSL environment.


2. Download DeepSeek-R1 for Ollama

ollama pull deepseek-r1

This retrieves the DeepSeek-R1 model in a GGML or similar format optimized for local inference.


3. Run DeepSeek-R1 with Ollama

ollama run deepseek-r1:<MODEL_CODE>


An interactive session will start in your terminal, allowing you to input prompts and receive responses.


4. GPU Acceleration (Optional)

If you have a supported GPU and the correct CUDA setup, Ollama can offload model parts to the GPU for faster inference. Refer to Ollama’s documentation to confirm GPU backend support and configurations.


Step-by-Step Installation with vLLM

vLLM is a high-throughput, optimized inference engine suitable for production scenarios. Below is how you can install and run DeepSeek-R1 with vLLM.


1. Install vLLM

conda create -n vllm_env python=3.9
conda activate vllm_env
pip install vllm

2. Download DeepSeek-R1

Clone from the Hugging Face Hub or relevant repository:

git lfs install
git clone https://huggingface.co/deepseek/deepseek-r1

3. Run DeepSeek-R1 with vLLM

vllm_server \
--model deepseek-r1 \
--port 8000

Replace deepseek-r1 with the local path or model identifier as needed. You can then send requests to http://127.0.0.1:8000.


4. Example Usage

curl -X POST http://127.0.0.1:8000/generate \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Explain the significance of natural language processing:",
"max_tokens": 100
}'

5. GPU Utilization

To use a GPU, set environment variables (CUDA_VISIBLE_DEVICES) or pass arguments for GPU-based inference. You can also configure multi-GPU setups for higher throughput.


Step-by-Step Installation with Transformers (Hugging Face)

Transformers by Hugging Face is one of the most popular libraries for working with LLMs. Below is how to install and use DeepSeek-R1 via Transformers.


1. Install Transformers

pip install transformers accelerate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Adjust the CUDA version in the torch installation URL to match your system.


2. Load DeepSeek-R1

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek/deepseek-r1")
model = AutoModelForCausalLM.from_pretrained("deepseek/deepseek-r1").cuda()

If you have a local copy, replace "deepseek/deepseek-r1" with your local path.


3. Inference Example

prompt = "What are the key use cases for semantic search?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output_tokens = model.generate(**inputs, max_length=128)
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))

4. Performance Tuning

Use mixed precision (FP16, BF16) or 8-bit quantization to reduce VRAM usage. Hugging Face’s accelerate library can also distribute the model across multiple GPUs for better performance.


DeepSeek-R1 and GPU Requirements


The following table outlines VRAM requirements for DeepSeek-R1 variants in a standard or higher-precision scenario (e.g., FP16 or 8-bit). These values represent approximate VRAM usage and recommended GPUs for inference. For extremely large models (hundreds of billions of parameters), multi-GPU setups with data parallelism or tensor parallelism are necessary.


Model Parameters (B) VRAM Requirement (GB) Recommended GPU
DeepSeek-R1-Zero 671B ~1,342 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x16)
DeepSeek-R1 671B ~1,342 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x16)
DeepSeek-R1-Distill-Qwen-1.5B 1.5B ~3.5 GB NVIDIA RTX 3060 12GB or higher
DeepSeek-R1-Distill-Qwen-7B 7B ~16 GB NVIDIA RTX 4080 16GB or higher
DeepSeek-R1-Distill-Llama-8B 8B ~18 GB NVIDIA RTX 4080 16GB or higher
DeepSeek-R1-Distill-Qwen-14B 14B ~32 GB Multi-GPU setup (e.g., NVIDIA RTX 4090 x2)
DeepSeek-R1-Distill-Qwen-32B 32B ~74 GB Multi-GPU setup (e.g., NVIDIA RTX 4090 x4)
DeepSeek-R1-Distill-Llama-70B 70B ~161 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x2)

The table below covers approximate VRAM requirements when using 4-bit quantization. This significantly reduces memory usage compared to standard precision, but still may require multi-GPU solutions for the largest model variants:


Model Parameters (B) VRAM Requirement (GB) (4-bit) Recommended GPU
DeepSeek-R1-Zero 671B ~336 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x6)
DeepSeek-R1 671B ~336 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x6)
DeepSeek-R1-Distill-Qwen-1.5B 1.5B ~1 GB NVIDIA RTX 3050 8GB or higher
DeepSeek-R1-Distill-Qwen-7B 7B ~4 GB NVIDIA RTX 3060 12GB or higher
DeepSeek-R1-Distill-Llama-8B 8B ~4.5 GB NVIDIA RTX 3060 12GB or higher
DeepSeek-R1-Distill-Qwen-14B 14B ~8 GB NVIDIA RTX 4080 16GB or higher
DeepSeek-R1-Distill-Qwen-32B 32B ~18 GB NVIDIA RTX 4090 24GB or higher
DeepSeek-R1-Distill-Llama-70B 70B ~40 GB Multi-GPU setup (e.g., NVIDIA RTX 4090 24GB x2)

*Note: The VRAM requirements above are approximate. Additional techniques like model sharding, gradient checkpointing, or 8-bit optimizers may reduce memory usage further if your GPU falls short of these estimates.




Recommended AI NVIDIA Workstations from BIZON

If you’re deploying models like DeepSeek-R1 in a professional or research setting, having a powerful workstation is critical. BIZON offers custom-built AI workstations optimized for deep learning and heavy compute loads. Below are three recommended configurations:

    • Entry-Level to Mid-Range AI

      BIZON X5500

      x5500
      • CPU: Threadripper PRO CPUs up to 96 cores
      • GPU: Up to 2× NVIDIA RTX 4090/5090 , and up to four a800/a6000s
      • RAM: Up to 1 TB ECC DDR4/DDR5 (CPU dependent)

      Excellent for small to mid-scale projects, data science workloads, and training or fine-tuning DeepSeek-R1 models up to Medium–Large sizes.



  • Mid-Range to Advanced AI

    BIZON ZX5500

    zx5500
    • CPU: Threadripper PRO CPUs up to 96 cores
    • GPU: Up to 7x H200 / H100 / 4090 / 5090 Watercooled GPUs
    • RAM: Up to 1 TB ECC DDR4/DDR5 (CPU dependent)
    • Storage: Multiple NVMe SSDs (RAID configurations available)

    Perfect for research labs or enterprises needing robust multi-GPU performance for larger DeepSeek-R1 deployments and data-intensive workflows.


  • High-End AI & HPC

    BIZON X7000

    x7000
    • CPU: Dual AMD EPYC CPUs up to 384 cores
    • GPU: Up to 8× NVIDIA RTX 4090 / RTX 5090 / A100 / H100 / H200
    • RAM: Up to 3 TB ECC DDR4/DDR5
    • Storage: Multiple NVMe SSDs with optional RAID setups

    Designed for mission-critical, large-scale DeepSeek-R1 training or inference, capable of handling up to XX-Large model parameters and complex HPC workloads.


Why BIZON?

  • Customization: Tailor CPU, GPU, memory, and storage to your specific workloads.
  • Thermal Design: BIZON’s high airflow and liquid cooling solutions ensure optimal GPU temperatures.
  • Support & Warranty: Specialized deep learning support and flexible upgrade paths.

Conclusion

DeepSeek-R1 is a powerful language model that you can run locally for maximum control and privacy. Whether you choose Ollama, vLLM, or Hugging Face’s Transformers, you have flexible deployment options to integrate DeepSeek-R1 into your workflow. GPU acceleration is highly recommended for any medium or large-scale project, with VRAM requirements scaling alongside model size.


Here’s a quick recap of the key points:

  • Ollama: User-friendly CLI for Mac and Linux, offering quick setups.
  • vLLM: Optimized for production with high throughput.
  • Transformers: A robust, feature-rich library for custom NLP workflows.
  • GPU Requirements: Match the model size to an appropriate GPU; from RTX 3060 for smaller models to A100/H100 for the largest models.
  • BIZON Workstations: A turnkey solution for AI, built for heavy workloads and backed by expert support.

With a properly configured system—be it a single-GPU workstation or a multi-GPU powerhouse—you can unlock the full potential of DeepSeek-R1 for applications like semantic search, question-answering, text summarization, and beyond. Refer to this guide whenever you need a refresher on installing and running DeepSeek-R1 locally, and don’t hesitate to explore BIZON workstations for a scalable, high-performance solution.

Need Help? We're here to help.

Unsure what to get? Have technical questions?
Contact us and we'll help you design a custom system which will meet your needs.

Explore Products