Nosana: Using OpenAI GPT-OSS:20B with Ollama
Welcome to this tutorial on running the latest OpenAI open-weight model, gpt-oss:20b, hosted on Nosana - a decentralized GPU network that allows for distributed model inference. This setup is fantastic for when you need more computational power than your local machine can offer, without the hassle of managing your own high-end hardware.
Want to learn how to actually spin up a job on Nosana and retrieve the
base_url
? Check out these resources:
We'll cover:
- Setting up your environment.
- Connecting to a remote Nosana Base URL.
- Pulling and interacting with the
gpt-oss:20b
model. - Basic text generation, streaming, and chat.
- A simple example of function calling.
1. Setup and Installation
First, we need to install the necessary Python libraries. We'll use ollama
to communicate with the Ollama server and python-dotenv
to manage our environment variables securely.
%pip install ollama python-dotenv
Environment Variables
To connect to our remote server, we need to tell the Ollama client its address. We'll store this in a .env
file to keep our configuration clean and separate from our code.
Create a file named .env
in the same directory as this notebook and add your remote server URL to it. If you're using Nosana, this will be your unique NOSANA_BASE_URL
.
Your .env
file should look like this:
OLLAMA_HOST=your_nosana_base_url_here
2. Loading Configuration and Connecting
Now, let's load the environment variable from our .env
file. The ollama-python
library is smart and will automatically use the OLLAMA_HOST
variable we've set. For clarity, we will also show how to create a client and explicitly pass the host.
import os
from dotenv import load_dotenv
import ollama
from IPython.display import display, Markdown
# Load environment variables from .env file
load_dotenv()
# Get the remote server URL from environment variables
ollama_host = os.getenv("NOSANA_BASE_URL")
def short_link(link):
if link and len(link) > 10:
return link[:20] + '...' + link[-20:]
return link
if not ollama_host:
print("OLLAMA_HOST environment variable not found!")
print("Please create a .env file and add your remote server URL.")
else:
print(f"Connecting to remote Ollama server at: {short_link(ollama_host)}")
# You can create a client explicitly, which is good practice
client = ollama.Client(host=ollama_host)
Connecting to remote Ollama server at: https://4w9w89qshprb...node.k8s.prd.nos.ci/
3. Interacting with the Model
With our connection established, we can start interacting with the gpt-oss:20b
model. If the model isn't already available on the remote server, Ollama will download it automatically on the first run. You can also explicitly pull it.
model_name = 'gpt-oss:20b'
try:
display(Markdown(f"Pulling the '{model_name}' model. This may take a while..."))
client.pull(model_name)
display(Markdown("Model pulled successfully!"))
except Exception as e:
display(Markdown(f"Error: {e}"))
Pulling the 'gpt-oss:20b' model. This may take a while...
Model pulled successfully!
Basic Generation
Let's start with a simple text generation request.
response = client.generate(
model=model_name,
prompt='Explain the concept of a Large Language Model in one sentence.'
)
print(response['response'])
A Large Language Model is an AI system trained on vast text data that learns statistical patterns of language so it can generate, translate, or understand text in a way that mimics human style.
Streaming Responses
For more interactive applications, you can stream the response as it's being generated. This is great for showing a real-time typing effect.
stream = client.generate(
model=model_name,
prompt='Write a short story about a robot who discovers music in 50 words.',
stream=True
)
for chunk in stream:
print(chunk['response'], end='', flush=True)
Steel heart, dormant in the workshop, scanned old vinyl. A crackling needle whispered rhythm. The robotâs circuits sparked, translating harmonies into code. With each chord, gears wavered, emotions blooming. It recorded melodies, breathing life into metal. Music, the universeâs pulse, filled his void, and he sang for eternal resonance always.
Chat Interface
The chat
method is designed for conversational interactions, where the model remembers the context of the conversation.
messages = [
{
'role': 'user',
'content': 'What is the most important programming language for AI development? Explain in 50 words.'
}
]
chat_response = client.chat(model=model_name, messages=messages)
display(Markdown(chat_response['message']['content']))
Python remains the cornerstone of AI development, offering extensive libraries (TensorFlow, PyTorch, Scikitâlearn), a clear syntax, and a massive community. Its rapid prototyping, readability, and crossâplatform compatibility make researchers and engineers quickly build, test, and deploy models, keeping AI accessible to all developers in industry and academia alike and beyond.
4. Advanced: Function Calling
The gpt-oss
models have strong capabilities for function calling (or tool use). This allows the model to request the invocation of a function you've defined in your code to get external information or perform an action.
Here's a simple example where we define a tool to get the weather.
import json
import requests
def get_current_weather(city: str):
"""Get the current weather in a given city using Open-Meteo API"""
# Geocoding to get latitude and longitude
geo_url = f"https://geocoding-api.open-meteo.com/v1/search?name={city}&count=1"
geo_resp = requests.get(geo_url)
geo_data = geo_resp.json()
if not geo_data.get("results"):
return json.dumps({"city": city, "temperature": "unknown", "unit": "celsius"})
lat = geo_data["results"][0]["latitude"]
lon = geo_data["results"][0]["longitude"]
# Get current weather
weather_url = f"https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}¤t_weather=true"
weather_resp = requests.get(weather_url)
weather_data = weather_resp.json()
temp = weather_data.get("current_weather", {}).get("temperature")
if temp is None:
return json.dumps({"city": city, "temperature": "unknown", "unit": "celsius"})
return json.dumps({"city": city, "temperature": temp, "unit": "celsius"})
tools = [
{
'type': 'function',
'function': {
'name': 'get_current_weather',
'description': 'Get the current weather in a given city',
'parameters': {
'type': 'object',
'properties': {
'city': {
'type': 'string',
'description': 'The city, e.g., San Francisco',
},
},
'required': ['city'],
},
},
},
]
messages = [{'role': 'user', 'content': 'What is the weather like in Singapore?'}]
# First, let the model decide which tool to call
response = client.chat(
model=model_name,
messages=messages,
tools=tools,
)
messages.append(response['message'])
# Then, execute the tool and send the result back to the model
if response['message'].get('tool_calls'):
tool_call = response['message']['tool_calls'][0]
function_name = tool_call['function']['name']
function_args = tool_call['function']['arguments'] # Already a dict
# Call the function
function_response = get_current_weather(city=function_args.get('city'))
messages.append(
{
'role': 'tool',
'content': function_response,
}
)
# Get the final response from the model
final_response = client.chat(model=model_name, messages=messages)
print(final_response['message']['content'])
Singaporeâs weather is typically hot and humid yearâround. Right now the temperature is about **27âŻÂ°C** (â80âŻÂ°F). Itâs in the range of 25â31âŻÂ°C most days, with high humidity and a chance of brief showers, especially during the monsoon seasons.
Conclusion
Congratulations! đ You've successfully connected to a remote Ollama server, interacted with the gpt-oss:20b
model, and even explored its function-calling capabilities.
This remote setup unlocks the ability to work with powerful models from anywhere, without needing a supercomputer at your desk. From here, you can build complex applications, experiment with different models, or fine-tune models for your specific needs.
Ready to explore more models and power your AI apps?
đ Visit Nosana.com to discover more models and supercharge your AI projects!