NousResearch Nous Hermes 2 Mixtral 8x7B DPO - Multi GPU

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO is a high-performance model trained on over 1,000,000 entries of high-quality data, achieving state-of-the-art results across various tasks, combining the capabilities of Mixtral 8x7B MoE LLM with SFT + DPO optimization.

Model ID: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

Supported Language(s): en

License: Apache-2.0

Modality: text

Hardware Support

GPU Model Number of accelerators Max Input Tokens Max New Tokens
NVIDIA H100 2 32736 32768
NVIDIA H100 4 32736 32768
NVIDIA H100 8 32736 32768

Software Included

Package Version License
NousResearch/Nous-Hermes-2 Hermes-2-Mixtral-8x7B-DP0 Apache-2.0

Deploying this Offering using the Control Panel

Click the Deploy to DigitalOcean button to deploy this offering. If you aren’t logged in, this link will prompt you to log in with your DigitalOcean account.

Deploy NousResearch Nous Hermes 2 Mixtral 8x7B DPO - Multi GPU to DO

Getting Started After Deploying NousResearch Nous Hermes 2 Mixtral 8x7B DPO - Multi GPU

Quickly Get Started With Your 1-Click Models

  1. Access the Droplet Console:

    • Navigate to the GPU Droplets page.
    • Locate your newly created 1-Click Model Droplet and click on its name.
    • At the top of your screen select and launch the Web Console. It will open in a new window.
  1. Login via SSH:
  • If you selected an SSH key during droplet creation, follow these steps:- Open your preferred SSH client (e.g., PuTTY, Terminal).
  • Use the droplet’s public IP address to log in as root:
ssh root@your_droplet_public_IP
+ Ensure your SSH key is added to the SSH agent, or specify the key file directly:
ssh -i /path/to/your/private_key root@your_droplet_public_IP
+ Once connected, you will be logged in as the root user without needing a password.
  1. Check the Message of the Day (MOTD) for Access Token:

    • Upon successful login via console or SSH, the Message of the Day (MOTD) will be displayed.
    • This message includes important information such as the bearer token. Take note of this token as you’ll need it to use the inference API for your model.

Troubleshooting

  1. Please note that the models require a couple of minutes to load, as the docker containers is started for the respective model. During this process any API calls to the model will timeout.
  2. To ensure that Caddy is working, run:
sudo systemctl status caddy

Usage Examples

Using cURL

You can make a local API call using this cURL command:

curl -X 'POST' \
  'http://<your_droplet_ip>/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer '<your_token_here>'' \
  -d '{
    "model": "<model_name>",
    "messages": [{"role":"user", "content":"What is Deep Learning?"}],
    "max_tokens": 64,
    "stream": false
}'

Using Python with huggingface_hub

from huggingface_hub import InferenceClient

client = InferenceClient(
    base_url="http://0.0.0.0:8080/v1",
    api_key="-",
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
    messages=[
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content, end="")

Using Python with OpenAI library

from openai import OpenAI

client = OpenAI(
    api_key="-",
    base_url="http://0.0.0.0:8080/v1"
)
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
    messages=[
        {"role": "user", "content": "What is deep learning?"},
    ],
    stream=True,
    max_tokens=64,
)

# Iterate and print stream
for message in response:
    print(message.choices[0].delta.content, end="")

This works with every OpenAI client including JavaScript.