StageHand: AI-Powered Browser Automation
Welcome to this tutorial on StageHand!
StageHand is a powerful open-source framework that lets you automate web browsers using natural language. It's built on top of Playwright, combining the precision of code with the flexibility of AI. This makes your automation scripts more robust and resilient to website changes.
In this notebook, we will cover the essential concepts to get you up and running with StageHand. We'll learn how to set it up, perform basic actions, extract data, and use its AI agent for more complex tasks.
Core Concepts
StageHand revolves around a few key functions:
page.goto()
: Navigates to a specific URL.page.act()
: Performs an action on the page using a natural language instruction (e.g., "click the login button").page.extract()
: Extracts structured data from the page based on a description and a schema.page.observe()
: Previews the action that the AI will take without actually executing it, which is great for debugging and caching.agent.execute()
: Executes a multi-step task described in natural language.
1. Setup and Installation
First, let's install the necessary Python libraries. We need stagehand-py
for the core functionality and python-dotenv
to manage our API keys securely.
#%pip install stagehand python-dotenv pydantic -q
API Keys
StageHand requires a couple of API keys to function:
- Browserbase API Key & Project ID: StageHand uses Browserbase to run browsers in the cloud. You can get your free credentials from the Browserbase Dashboard.
- LLM API Key: You need an API key from an AI provider like OpenAI, Google, or Anthropic.
Create a file named .env
in the same directory as this notebook and add your keys like this:
BROWSERBASE_API_KEY="your_browserbase_api_key"
BROWSERBASE_PROJECT_ID="your_browserbase_project_id"
OPENAI_API_KEY="your_openai_api_key"
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Read env vars without exposing their values
BROWSERBASE_API_KEY = os.getenv("BROWSERBASE_API_KEY")
BROWSERBASE_PROJECT_ID = os.getenv("BROWSERBASE_PROJECT_ID")
MODEL_API_KEY = os.getenv("OPENAI_API_KEY")
# Print only a boolean status, not the keys themselves
keys_loaded = all([
bool(BROWSERBASE_API_KEY),
bool(BROWSERBASE_PROJECT_ID),
bool(MODEL_API_KEY),
])
print("API keys loaded:", keys_loaded)
API keys loaded: True
2. Your First Automation: goto
and act
Let's start with a simple task: searching for "StageHand AI" on Google.
We will initialize StageHand, navigate to Google, and use page.act()
to perform the search. act()
translates our plain English instruction into a browser action.
from stagehand import Stagehand, StagehandConfig
import asyncio
import os
async def main():
server_url = os.getenv("STAGEHAND_SERVER_URL", "https://api.stagehand.browserbase.com/v1")
config = StagehandConfig(
env="BROWSERBASE",
api_key=BROWSERBASE_API_KEY,
project_id=BROWSERBASE_PROJECT_ID,
model_api_key=MODEL_API_KEY,
model_name="gpt-4o",
)
stagehand = Stagehand(config)
# Set the server URL if it's not already set
if hasattr(stagehand, 'server_url'):
stagehand.server_url = server_url
try:
await stagehand.init()
page = stagehand.page
await page.goto("https://docs.stagehand.dev/")
await page.act("click the quickstart link")
result = await page.extract("extract the main heading of the page")
print(f"Extracted: {result}")
except Exception as e:
print(f"Error occurred: {e}")
finally:
try:
await stagehand.close()
except:
pass
# Run the async main function in a notebook-safe way
try:
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
loop.run_until_complete(main())
except RuntimeError:
asyncio.run(main())
2025-08-15 12:43:22 INFO - running act
2025-08-15 12:43:22 INFO - starting observation
2025-08-15 12:43:23 INFO - Getting accessibility tree data
2025-08-15 12:43:23 INFO - got accessibility tree in 5 ms
2025-08-15 12:43:30 INFO - Warning: found 3 iframe(s) on the page. If you wish to interact with iframe content, please make sure you are setting iframes: true
2025-08-15 12:43:30 INFO - Getting xpath for element
2025-08-15 12:43:30 INFO - Getting xpath for element
2025-08-15 12:43:30 INFO - Getting xpath for element
2025-08-15 12:43:30 INFO - Getting xpath for element
2025-08-15 12:43:30 INFO - found elements
2025-08-15 12:43:30 INFO - Performing act from an ObserveResult
2025-08-15 12:43:30 INFO - click, checking for page navigation
2025-08-15 12:43:31 INFO - click complete
2025-08-15 12:43:32 INFO - finished waiting for (possible) page navigation
2025-08-15 12:43:32 INFO - new page detected with URL
2025-08-15 12:43:34 INFO - running extract
2025-08-15 12:43:34 INFO - starting extraction using a11y tree
2025-08-15 12:43:35 INFO - got accessibility tree in 9 ms
2025-08-15 12:43:35 INFO - Got accessibility tree data
2025-08-15 12:43:37 INFO - received extraction response
2025-08-15 12:43:37 INFO - extraction completed successfully
Extracted: extraction='Quickstart'
3. Extracting Structured Data with extract
StageHand truly shines when it comes to data extraction. Instead of writing complex selectors (like XPath or CSS selectors), you can just describe the data you want.
For structured output, we can use a Pydantic model to define the schema. Here, we'll navigate to the StageHand GitHub repository and extract the repository's name and star count.
import asyncio
import os
from pydantic import BaseModel, Field
from stagehand import Stagehand, StagehandConfig
# Define the data structure we want to extract
class RepoInfo(BaseModel):
name: str = Field(description="The name of the repository")
stars: str = Field(description="The star count of the repository")
async def main():
# Get the server URL - using the correct StageHand API endpoint
server_url = os.getenv("STAGEHAND_SERVER_URL", "https://api.stagehand.browserbase.com/v1")
config = StagehandConfig(
env="BROWSERBASE",
api_key=BROWSERBASE_API_KEY,
project_id=BROWSERBASE_PROJECT_ID,
model_api_key=MODEL_API_KEY,
model_name="gpt-4o",
)
stagehand = Stagehand(config)
# Set the server URL if it's not already set
if hasattr(stagehand, 'server_url'):
stagehand.server_url = server_url
try:
await stagehand.init()
page = stagehand.page
await page.goto("https://github.com/browserbase/stagehand")
print(f"Navigated to: {page.url}")
# Method 1: Extract using structured format with specific instruction
result = await page.extract(
instruction="""Extract the repository information and return it in this exact JSON format:
{
"name": "repository name from the page",
"stars": "star count number with k suffix if applicable"
}
Only return the JSON object, nothing else."""
)
print(f"Raw extraction result: {result}")
# Try to parse the result
try:
import json
if hasattr(result, 'extraction'):
data_str = result.extraction
else:
data_str = str(result)
# Try to extract JSON from the string
if '{' in data_str and '}' in data_str:
start = data_str.find('{')
end = data_str.rfind('}') + 1
json_str = data_str[start:end]
data = json.loads(json_str)
# Validate and print the extracted data
repo_data = RepoInfo(**data)
print(f"✅ Structured Extraction Success!")
print(f" Repository Name: {repo_data.name}")
print(f" Stars: {repo_data.stars}")
else:
print(f"ℹ️ Extraction result (text format): {data_str}")
except Exception as parse_error:
print(f"⚠️ Could not parse as JSON: {parse_error}")
print(f" Raw result: {result}")
except Exception as e:
print(f"❌ Error occurred: {e}")
import traceback
traceback.print_exc()
finally:
try:
await stagehand.close()
except:
pass
print("Session closed.")
# Run the async main function in a notebook-safe way
try:
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
loop.run_until_complete(main())
except RuntimeError:
asyncio.run(main())
Navigated to: https://github.com/browserbase/stagehand
2025-08-15 12:50:51 INFO - running extract
2025-08-15 12:50:51 INFO - starting extraction using a11y tree
2025-08-15 12:50:52 INFO - got accessibility tree in 57 ms
2025-08-15 12:50:52 INFO - Got accessibility tree data
2025-08-15 12:50:57 INFO - received extraction response
2025-08-15 12:50:57 INFO - extraction completed successfully
Raw extraction result: extraction='{\n "name": "stagehand",\n "stars": "16.3k"\n}' ✅ Structured Extraction Success! Repository Name: stagehand Stars: 16.3k Session closed.
4. Advanced Control with the StageHand Agent
For more complex, multi-step workflows, you can use the StageHand Agent. The agent.execute()
method can follow a higher-level instruction by breaking it down into smaller steps automatically.
Let's try a research task: find the official documentation for StageHand and get its introductory sentence.
import asyncio
import os
from stagehand import Stagehand, StagehandConfig
async def main():
# Initialize StageHand with correct API endpoint
server_url = os.getenv("STAGEHAND_SERVER_URL", "https://api.stagehand.browserbase.com/v1")
config = StagehandConfig(
env="BROWSERBASE",
api_key=BROWSERBASE_API_KEY,
project_id=BROWSERBASE_PROJECT_ID,
model_api_key=MODEL_API_KEY,
model_name="gpt-4o",
)
stagehand = Stagehand(config)
# Set the server URL if it's not already set
if hasattr(stagehand, 'server_url'):
stagehand.server_url = server_url
try:
await stagehand.init()
print("✓ StageHand initialized successfully")
page = stagehand.page
# Demonstrate multi-step automation (agent-like behavior using page methods)
print("🤖 Performing multi-step automation...")
# Step 1: Navigate to StageHand docs
await page.goto("https://docs.stagehand.dev/")
print(" ✓ Navigated to StageHand documentation")
# Step 2: Extract the main introduction
intro = await page.extract("extract the main heading and first description paragraph")
print(f" ✓ Extracted introduction: {intro}")
# Step 3: Try to navigate to getting started
await page.act("click on the getting started or quickstart link")
print(" ✓ Clicked on getting started")
# Step 4: Extract information from the new page
getting_started_info = await page.extract("extract the main heading and first paragraph from this page")
print(f" ✓ Extracted getting started info: {getting_started_info}")
print("\n🎉 Multi-step automation completed successfully!")
print("This demonstrates how StageHand can perform complex, multi-step workflows")
print("by chaining together goto(), act(), and extract() operations.")
except Exception as e:
print(f"❌ Error occurred: {e}")
import traceback
traceback.print_exc()
finally:
try:
await stagehand.close()
except:
pass
print("Session closed.")
# Run the async main function in a notebook-safe way
try:
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
loop.run_until_complete(main())
except RuntimeError:
asyncio.run(main())
✓ StageHand initialized successfully 🤖 Performing multi-step automation... ✓ Navigated to StageHand documentation ✓ Navigated to StageHand documentation
2025-08-15 12:52:41 INFO - running extract
2025-08-15 12:52:41 INFO - starting extraction using a11y tree
2025-08-15 12:52:42 INFO - got accessibility tree in 7 ms
2025-08-15 12:52:42 INFO - Got accessibility tree data
2025-08-15 12:52:42 INFO - Warning: found 3 iframe(s) on the page. If you wish to interact with iframe content, please make sure you are setting iframes: true
2025-08-15 12:52:46 INFO - received extraction response
2025-08-15 12:52:46 INFO - extraction completed successfully
✓ Extracted introduction: extraction='**Main Heading:**\nWhat is Stagehand?\n\n**First Description Paragraph:**\nStagehand allows you to automate browsers with natural language and code.'
2025-08-15 12:52:48 INFO - running act
2025-08-15 12:52:48 INFO - starting observation
2025-08-15 12:52:48 INFO - Getting accessibility tree data
2025-08-15 12:52:49 INFO - got accessibility tree in 8 ms
2025-08-15 12:52:51 INFO - Warning: found 3 iframe(s) on the page. If you wish to interact with iframe content, please make sure you are setting iframes: true
2025-08-15 12:52:51 INFO - Getting xpath for element
2025-08-15 12:52:51 INFO - Getting xpath for element
2025-08-15 12:52:51 INFO - Getting xpath for element
2025-08-15 12:52:51 INFO - Getting xpath for element
2025-08-15 12:52:51 INFO - Getting xpath for element
2025-08-15 12:52:51 INFO - found elements
2025-08-15 12:52:51 INFO - Performing act from an ObserveResult
2025-08-15 12:52:51 INFO - click, checking for page navigation
2025-08-15 12:52:52 INFO - click complete
2025-08-15 12:52:53 INFO - finished waiting for (possible) page navigation
2025-08-15 12:52:53 INFO - new page detected with URL
✓ Clicked on getting started
2025-08-15 12:52:55 INFO - running extract
2025-08-15 12:52:55 INFO - starting extraction using a11y tree
2025-08-15 12:52:56 INFO - got accessibility tree in 6 ms
2025-08-15 12:52:56 INFO - Got accessibility tree data
2025-08-15 12:53:00 INFO - received extraction response
2025-08-15 12:53:00 INFO - extraction completed successfully
✓ Extracted getting started info: extraction='**Main Heading:**\nQuickstart\n\n**First Paragraph:**\nYou can get started with Stagehand in just 1 minute! Choose your preferred language below.' 🎉 Multi-step automation completed successfully! This demonstrates how StageHand can perform complex, multi-step workflows by chaining together goto(), act(), and extract() operations. Session closed.
Conclusion and Next Steps
Congratulations! 🎉 You've learned the fundamentals of StageHand.
We've covered:
- Setting up StageHand and managing API keys.
- Basic navigation and actions with
goto
andact
. - Structured data extraction with
extract
and Pydantic. - Executing complex tasks with the StageHand
agent
.
StageHand is a versatile tool for building reliable browser automations. From web scraping and data entry to testing and research, it empowers you to automate browser tasks with simple, natural language commands.
To dive deeper, check out the Official StageHand Documentation.