Zhanghao Wu b32a947a09 | 3 months ago | |
---|---|---|
.. | ||
README.md | 3 months ago | |
llama2.yaml | 3 months ago | |
llava.yaml | 3 months ago |
This README contains instructions to run a demo for SGLang, an open-source library for fast and expressive LLM inference and serving with 5x throughput.
Install the latest SkyPilot and check your setup of the cloud credentials:
pip install "skypilot-nightly[all]"
sky check
SkyServe Service YAML
with a service
section:service:
# Specifying the path to the endpoint to check the readiness of the service.
readiness_probe: /health
# How many replicas to manage.
replicas: 2
The entire Service YAML can be found here: llava.yaml.
sky serve up -n sglang-llava llava.yaml
sky serve status
to check the status of the serving:sky serve status sglang-llava
You should get a similar output as the following:
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
sglang-llava 1 8m 16s READY 2/2 34.32.43.41:30001
Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
sglang-llava 1 1 34.85.154.76 16 mins ago 1x GCP({'L4': 1}) READY us-east4
sglang-llava 2 1 34.145.195.253 16 mins ago 1x GCP({'L4': 1}) READY us-east4
ENDPOINT=$(sky serve status --endpoint sglang-llava)
READY
, you can use the endpoint to talk to the model with both text and image inputs:curl -L $ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "liuhaotian/llava-v1.6-vicuna-7b",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{
"type": "image_url",
"image_url": {
"url": "https://raw.githubusercontent.com/sgl-project/sglang/main/examples/quick_start/images/cat.jpeg"
}
}
]
}
]
}'
You should get a similar response as the following:
{
"id": "b044d5f637694d3bba30a2d784441c6c",
"object": "chat.completion",
"created": 1707565348,
"model": "liuhaotian/llava-v1.6-vicuna-7b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": " This is an image of a cute, anthropomorphized cat character."
},
"finish_reason": null
}],
"usage": {
"prompt_tokens": 2188,
"total_tokens": 2204,
"completion_tokens": 16
}
}
The process is the same as serving LLaVA, but with the model path changed to Llama-2. Below are example commands for reference.
Start serving by using SkyServe CLI:
sky serve up -n sglang-llama2 llama2.yaml --env HF_TOKEN=<your-huggingface-token>
The entire Service YAML can be found here: llama2.yaml.
sky serve status
to check the status of the serving:sky serve status sglang-llama2
You should get a similar output as the following:
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
sglang-llama2 1 8m 16s READY 2/2 34.32.43.41:30001
Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
sglang-llama2 1 1 34.85.154.76 16 mins ago 1x GCP({'L4': 1}) READY us-east4
sglang-llama2 2 1 34.145.195.253 16 mins ago 1x GCP({'L4': 1}) READY us-east4
ENDPOINT=$(sky serve status --endpoint sglang-llama2)
READY
, you can use the endpoint to interact with the model:curl -L $ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}'
You should get a similar response as the following:
{
"id": "cmpl-879a58992d704caf80771b4651ff8cb6",
"object": "chat.completion",
"created": 1692650569,
"model": "meta-llama/Llama-2-7b-chat-hf",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! I'm just an AI assistant, here to help you"
},
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": 31,
"total_tokens": 47,
"completion_tokens": 16
}
}
No Description
Python SVG Shell Markdown HTML other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》