The google/gemma-2-9b-it model is a lightweight, state-of-the-art open model from Google, built using Gemini research and technology, designed for text generation tasks such as question answering, summarization, and reasoning, and can be deployed on limited resource environments.
Supported Language(s): en
License: Gemma
Modality: text
GPU Model | Number of accelerators | Max Input Tokens | Max New Tokens |
---|---|---|---|
NVIDIA H100 | 1 | 4064 | 4096 |
NVIDIA H100 | 2 | 4064 | 4096 |
NVIDIA H100 | 4 | 4064 | 4096 |
NVIDIA H100 | 8 | 4064 | 4096 |
Package | Version | License |
---|---|---|
Gemma 2 | gemma-2-9b-it | gemma |
Click the Deploy to DigitalOcean button to create a Droplet based on this 1-Click App. If you aren’t logged in, this link will prompt you to log in with your DigitalOcean account.
In addition to creating a Droplet from the Google Gemma 2 9B - Multi GPU 1-Click App using the control panel, you can also use the DigitalOcean API. As an example, to create a 4GB Google Gemma 2 9B - Multi GPU Droplet in the SFO2 region, you can use the following curl
command. You need to either save your API access token) to an environment variable or substitute it in the command below.
curl -X POST -H 'Content-Type: application/json' \
-H 'Authorization: Bearer '$TOKEN'' -d \
'{"name":"choose_a_name","region":"sfo2","size":"s-2vcpu-4gb","image": "digitaloceanai-googlegemma29b"}' \
"https://api.digitalocean.com/v2/droplets"
Access the Droplet Console:
root
user using the password you set during droplet creation.root
:ssh root@your_droplet_public_IP
+ Ensure your SSH key is added to the SSH agent, or specify the key file directly:
ssh -i /path/to/your/private_key root@your_droplet_public_IP
+ Once connected, you will be logged in as the root user without needing a password.
Check the Message of the Day (MOTD) for Access Token:
sudo systemctl status caddy
You can make a local API call using this cURL command:
curl -X 'POST' \
'http://<your_droplet_ip>/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer '<your_token_here>'' \
-d '{
"model": "<model_name>",
"messages": [{"role":"user", "content":"What is Deep Learning?"}],
"max_tokens": 64,
"stream": false
}'
huggingface_hub
from huggingface_hub import InferenceClient
client = InferenceClient(
base_url="http://0.0.0.0:8080/v1",
api_key="-",
)
output = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct",
messages=[
{"role": "user", "content": "Count to 10"},
],
stream=True,
max_tokens=1024,
)
for chunk in output:
print(chunk.choices[0].delta.content, end="")
from openai import OpenAI
client = OpenAI(
api_key="-",
base_url="http://0.0.0.0:8080/v1"
)
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct",
messages=[
{"role": "user", "content": "What is deep learning?"},
],
stream=True,
max_tokens=64,
)
# Iterate and print stream
for message in response:
print(message.choices[0].delta.content, end="")
This works with every OpenAI client including JavaScript.