StageHand: AI-Powered Browser Automation

Last updated: July 21 2025

Welcome to this tutorial on StageHand!

StageHand is a powerful open-source framework that lets you automate web browsers using natural language. It's built on top of Playwright, combining the precision of code with the flexibility of AI. This makes your automation scripts more robust and resilient to website changes.

In this notebook, we will cover the essential concepts to get you up and running with StageHand. We'll learn how to set it up, perform basic actions, extract data, and use its AI agent for more complex tasks.

Core Concepts

StageHand revolves around a few key functions:

page.goto(): Navigates to a specific URL.
page.act(): Performs an action on the page using a natural language instruction (e.g., "click the login button").
page.extract(): Extracts structured data from the page based on a description and a schema.
page.observe(): Previews the action that the AI will take without actually executing it, which is great for debugging and caching.
agent.execute(): Executes a multi-step task described in natural language.

1. Setup and Installation

First, let's install the necessary Python libraries. We need stagehand-py for the core functionality and python-dotenv to manage our API keys securely.

#%pip install stagehand python-dotenv pydantic -q

API Keys

StageHand requires a couple of API keys to function:

Browserbase API Key & Project ID: StageHand uses Browserbase to run browsers in the cloud. You can get your free credentials from the Browserbase Dashboard.
LLM API Key: You need an API key from an AI provider like OpenAI, Google, or Anthropic.

Create a file named .env in the same directory as this notebook and add your keys like this:

BROWSERBASE_API_KEY="your_browserbase_api_key"
BROWSERBASE_PROJECT_ID="your_browserbase_project_id"
OPENAI_API_KEY="your_openai_api_key"

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Read env vars without exposing their values
BROWSERBASE_API_KEY = os.getenv("BROWSERBASE_API_KEY")
BROWSERBASE_PROJECT_ID = os.getenv("BROWSERBASE_PROJECT_ID")
MODEL_API_KEY = os.getenv("OPENAI_API_KEY")

# Print only a boolean status, not the keys themselves
keys_loaded = all([
    bool(BROWSERBASE_API_KEY),
    bool(BROWSERBASE_PROJECT_ID),
    bool(MODEL_API_KEY),
])
print("API keys loaded:", keys_loaded)

API keys loaded: True

2. Your First Automation: `goto` and `act`

Let's start with a simple task: searching for "StageHand AI" on Google.

We will initialize StageHand, navigate to Google, and use page.act() to perform the search. act() translates our plain English instruction into a browser action.

from stagehand import Stagehand, StagehandConfig
import asyncio
import os

async def main():
    server_url = os.getenv("STAGEHAND_SERVER_URL", "https://api.stagehand.browserbase.com/v1")
    
    config = StagehandConfig(
        env="BROWSERBASE",
        api_key=BROWSERBASE_API_KEY,
        project_id=BROWSERBASE_PROJECT_ID,
        model_api_key=MODEL_API_KEY,
        model_name="gpt-4o",
    )
    
    stagehand = Stagehand(config)
    
    # Set the server URL if it's not already set
    if hasattr(stagehand, 'server_url'):
        stagehand.server_url = server_url
    
    try:
        await stagehand.init()
        page = stagehand.page
        
        await page.goto("https://docs.stagehand.dev/")
        await page.act("click the quickstart link")
        
        result = await page.extract("extract the main heading of the page")
        
        print(f"Extracted: {result}")
        
    except Exception as e:
        print(f"Error occurred: {e}")
    finally:
        try:
            await stagehand.close()
        except:
            pass

# Run the async main function in a notebook-safe way
try:
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    loop.run_until_complete(main())
except RuntimeError:
    asyncio.run(main())

2025-08-15 12:43:22 INFO -  running act

2025-08-15 12:43:22 INFO -  starting observation

2025-08-15 12:43:23 INFO -  Getting accessibility tree data

2025-08-15 12:43:23 INFO -  got accessibility tree in 5 ms

2025-08-15 12:43:30 INFO -  Warning: found 3 iframe(s) on the page. If you wish to interact with iframe content, 
please make sure you are setting iframes: true

2025-08-15 12:43:30 INFO -  Getting xpath for element

2025-08-15 12:43:30 INFO -  Getting xpath for element

2025-08-15 12:43:30 INFO -  Getting xpath for element

2025-08-15 12:43:30 INFO -  Getting xpath for element

2025-08-15 12:43:30 INFO -  found elements

2025-08-15 12:43:30 INFO -  Performing act from an ObserveResult

2025-08-15 12:43:30 INFO -  click, checking for page navigation

2025-08-15 12:43:31 INFO -  click complete

2025-08-15 12:43:32 INFO -  finished waiting for (possible) page navigation

2025-08-15 12:43:32 INFO -  new page detected with URL

2025-08-15 12:43:34 INFO -  running extract

2025-08-15 12:43:34 INFO -  starting extraction using a11y tree

2025-08-15 12:43:35 INFO -  got accessibility tree in 9 ms

2025-08-15 12:43:35 INFO -  Got accessibility tree data

2025-08-15 12:43:37 INFO -  received extraction response

2025-08-15 12:43:37 INFO -  extraction completed successfully

Extracted: extraction='Quickstart'

3. Extracting Structured Data with `extract`

StageHand truly shines when it comes to data extraction. Instead of writing complex selectors (like XPath or CSS selectors), you can just describe the data you want.

For structured output, we can use a Pydantic model to define the schema. Here, we'll navigate to the StageHand GitHub repository and extract the repository's name and star count.

import asyncio
import os
from pydantic import BaseModel, Field
from stagehand import Stagehand, StagehandConfig

# Define the data structure we want to extract
class RepoInfo(BaseModel):
    name: str = Field(description="The name of the repository")
    stars: str = Field(description="The star count of the repository")

async def main():
    # Get the server URL - using the correct StageHand API endpoint
    server_url = os.getenv("STAGEHAND_SERVER_URL", "https://api.stagehand.browserbase.com/v1")
    
    config = StagehandConfig(
        env="BROWSERBASE",
        api_key=BROWSERBASE_API_KEY,
        project_id=BROWSERBASE_PROJECT_ID,
        model_api_key=MODEL_API_KEY,
        model_name="gpt-4o",
    )
    
    stagehand = Stagehand(config)
    
    # Set the server URL if it's not already set
    if hasattr(stagehand, 'server_url'):
        stagehand.server_url = server_url
    
    try:
        await stagehand.init()
        page = stagehand.page
        
        await page.goto("https://github.com/browserbase/stagehand")
        print(f"Navigated to: {page.url}")

        # Method 1: Extract using structured format with specific instruction
        result = await page.extract(
            instruction="""Extract the repository information and return it in this exact JSON format:
            {
                "name": "repository name from the page",
                "stars": "star count number with k suffix if applicable"
            }
            
            Only return the JSON object, nothing else."""
        )
        
        print(f"Raw extraction result: {result}")
        
        # Try to parse the result
        try:
            import json
            if hasattr(result, 'extraction'):
                data_str = result.extraction
            else:
                data_str = str(result)
            
            # Try to extract JSON from the string
            if '{' in data_str and '}' in data_str:
                start = data_str.find('{')
                end = data_str.rfind('}') + 1
                json_str = data_str[start:end]
                data = json.loads(json_str)
                
                # Validate and print the extracted data
                repo_data = RepoInfo(**data)
                print(f"✅ Structured Extraction Success!")
                print(f"   Repository Name: {repo_data.name}")
                print(f"   Stars: {repo_data.stars}")
            else:
                print(f"ℹ️ Extraction result (text format): {data_str}")
                
        except Exception as parse_error:
            print(f"⚠️ Could not parse as JSON: {parse_error}")
            print(f"   Raw result: {result}")

    except Exception as e:
        print(f"❌ Error occurred: {e}")
        import traceback
        traceback.print_exc()
    finally:
        try:
            await stagehand.close()
        except:
            pass
        print("Session closed.")

# Run the async main function in a notebook-safe way
try:
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    loop.run_until_complete(main())
except RuntimeError:
    asyncio.run(main())

Navigated to: https://github.com/browserbase/stagehand

2025-08-15 12:50:51 INFO -  running extract

2025-08-15 12:50:51 INFO -  starting extraction using a11y tree

2025-08-15 12:50:52 INFO -  got accessibility tree in 57 ms

2025-08-15 12:50:52 INFO -  Got accessibility tree data

2025-08-15 12:50:57 INFO -  received extraction response

2025-08-15 12:50:57 INFO -  extraction completed successfully

Raw extraction result: extraction='{\n    "name": "stagehand",\n    "stars": "16.3k"\n}'
✅ Structured Extraction Success!
   Repository Name: stagehand
   Stars: 16.3k
Session closed.

4. Advanced Control with the StageHand Agent

For more complex, multi-step workflows, you can use the StageHand Agent. The agent.execute() method can follow a higher-level instruction by breaking it down into smaller steps automatically.

Let's try a research task: find the official documentation for StageHand and get its introductory sentence.

import asyncio
import os
from stagehand import Stagehand, StagehandConfig

async def main():
    # Initialize StageHand with correct API endpoint
    server_url = os.getenv("STAGEHAND_SERVER_URL", "https://api.stagehand.browserbase.com/v1")
    
    config = StagehandConfig(
        env="BROWSERBASE",
        api_key=BROWSERBASE_API_KEY,
        project_id=BROWSERBASE_PROJECT_ID,
        model_api_key=MODEL_API_KEY,
        model_name="gpt-4o",
    )
    
    stagehand = Stagehand(config)
    
    # Set the server URL if it's not already set
    if hasattr(stagehand, 'server_url'):
        stagehand.server_url = server_url
    
    try:
        await stagehand.init()
        print("✓ StageHand initialized successfully")
        
        page = stagehand.page
        
        # Demonstrate multi-step automation (agent-like behavior using page methods)
        print("🤖 Performing multi-step automation...")
        
        # Step 1: Navigate to StageHand docs
        await page.goto("https://docs.stagehand.dev/")
        print("   ✓ Navigated to StageHand documentation")
        
        # Step 2: Extract the main introduction
        intro = await page.extract("extract the main heading and first description paragraph")
        print(f"   ✓ Extracted introduction: {intro}")
        
        # Step 3: Try to navigate to getting started
        await page.act("click on the getting started or quickstart link")
        print("   ✓ Clicked on getting started")
        
        # Step 4: Extract information from the new page
        getting_started_info = await page.extract("extract the main heading and first paragraph from this page")
        print(f"   ✓ Extracted getting started info: {getting_started_info}")
        
        print("\n🎉 Multi-step automation completed successfully!")
        print("This demonstrates how StageHand can perform complex, multi-step workflows")
        print("by chaining together goto(), act(), and extract() operations.")

    except Exception as e:
        print(f"❌ Error occurred: {e}")
        import traceback
        traceback.print_exc()
            
    finally:
        try:
            await stagehand.close()
        except:
            pass
        print("Session closed.")

# Run the async main function in a notebook-safe way
try:
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    loop.run_until_complete(main())
except RuntimeError:
    asyncio.run(main())

✓ StageHand initialized successfully
🤖 Performing multi-step automation...
   ✓ Navigated to StageHand documentation
   ✓ Navigated to StageHand documentation

2025-08-15 12:52:41 INFO -  running extract

2025-08-15 12:52:41 INFO -  starting extraction using a11y tree

2025-08-15 12:52:42 INFO -  got accessibility tree in 7 ms

2025-08-15 12:52:42 INFO -  Got accessibility tree data

2025-08-15 12:52:42 INFO -  Warning: found 3 iframe(s) on the page. If you wish to interact with iframe content, 
please make sure you are setting iframes: true

2025-08-15 12:52:46 INFO -  received extraction response

2025-08-15 12:52:46 INFO -  extraction completed successfully

   ✓ Extracted introduction: extraction='**Main Heading:**\nWhat is Stagehand?\n\n**First Description Paragraph:**\nStagehand allows you to automate browsers with natural language and code.'

2025-08-15 12:52:48 INFO -  running act

2025-08-15 12:52:48 INFO -  starting observation

2025-08-15 12:52:48 INFO -  Getting accessibility tree data

2025-08-15 12:52:49 INFO -  got accessibility tree in 8 ms

2025-08-15 12:52:51 INFO -  Warning: found 3 iframe(s) on the page. If you wish to interact with iframe content, 
please make sure you are setting iframes: true

2025-08-15 12:52:51 INFO -  Getting xpath for element

2025-08-15 12:52:51 INFO -  Getting xpath for element

2025-08-15 12:52:51 INFO -  Getting xpath for element

2025-08-15 12:52:51 INFO -  Getting xpath for element

2025-08-15 12:52:51 INFO -  Getting xpath for element

2025-08-15 12:52:51 INFO -  found elements

2025-08-15 12:52:51 INFO -  Performing act from an ObserveResult

2025-08-15 12:52:51 INFO -  click, checking for page navigation

2025-08-15 12:52:52 INFO -  click complete

2025-08-15 12:52:53 INFO -  finished waiting for (possible) page navigation

2025-08-15 12:52:53 INFO -  new page detected with URL

   ✓ Clicked on getting started

2025-08-15 12:52:55 INFO -  running extract

2025-08-15 12:52:55 INFO -  starting extraction using a11y tree

2025-08-15 12:52:56 INFO -  got accessibility tree in 6 ms

2025-08-15 12:52:56 INFO -  Got accessibility tree data

2025-08-15 12:53:00 INFO -  received extraction response

2025-08-15 12:53:00 INFO -  extraction completed successfully

   ✓ Extracted getting started info: extraction='**Main Heading:**\nQuickstart\n\n**First Paragraph:**\nYou can get started with Stagehand in just 1 minute! Choose your preferred language below.'

🎉 Multi-step automation completed successfully!
This demonstrates how StageHand can perform complex, multi-step workflows
by chaining together goto(), act(), and extract() operations.
Session closed.

Conclusion and Next Steps

Congratulations! 🎉 You've learned the fundamentals of StageHand.

We've covered:

Setting up StageHand and managing API keys.
Basic navigation and actions with goto and act.
Structured data extraction with extract and Pydantic.
Executing complex tasks with the StageHand agent.

StageHand is a versatile tool for building reliable browser automations. From web scraping and data entry to testing and research, it empowers you to automate browser tasks with simple, natural language commands.

To dive deeper, check out the Official StageHand Documentation.

1. Setup and Installation

2. Your First Automation: goto and act

3. Extracting Structured Data with extract

4. Advanced Control with the StageHand Agent

Conclusion and Next Steps

2. Your First Automation: `goto` and `act`

3. Extracting Structured Data with `extract`