Skill

web-scraping

Extracts structured data from websites using Python browser automation with openbrowser-ai. Handles JS-rendered content, pagination, multi-page scraping, and data processing with pandas.

Python

Javascript

Bash

automation

data-engineering

npx claudepluginhub billy-enrizky/openbrowser-ai --plugin openbrowser

Tool Access

This skill is limited to using the following tools:

Bash(openbrowser-ai:*) Bash(curl:*) Bash(uv:*) Bash(irm:*) Read Write

Preview

Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.

SKILL.md

Similar Skills

web-scraper

36.4k

Extracts structured data like tables, lists, and prices from web pages using multi-strategy scraping with pagination, validation, transforms, and CSV/JSON export.

3 files

antigravity-awesome-skills

extract

1.5k

Extracts structured data from websites like product listings, tables, or search results, generating executable Playwright scripts and JSON/CSV output.

actionbook

scraper-builder

Builds production-ready web scrapers for any website using Bright Data APIs including Web Unlocker, Browser, and SERP. Guides site analysis, selector extraction, pagination handling, and code implementation in Python or Node.js.

5 files

brightdata-plugin

Stats

Parent Repo Stars213

Parent Repo Forks17

Last CommitMar 17, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Web Scraping

Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.

All code runs via openbrowser-ai -c. The daemon starts automatically and persists variables across calls. All browser functions are async -- use await.

Setup

Before running, verify openbrowser-ai is installed:

openbrowser-ai --help

If not found, install:

# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex

Workflow

Step 1 -- Navigate and get content overview

openbrowser-ai -c - <<'EOF'
await navigate("https://example.com/data")

# Get browser state to see page title, URL, element count
state = await browser.get_browser_state_summary()
print(f"Title: {state.title}")
print(f"URL: {state.url}")
print(f"Elements: {len(state.dom_state.selector_map)}")
EOF

Step 2 -- Extract data with JavaScript

Use evaluate() to run JS in the browser and return structured data directly as Python objects:

openbrowser-ai -c - <<'EOF'
data = await evaluate("""
(function(){
  return Array.from(document.querySelectorAll(".product-card")).map(el => ({
    name: el.querySelector(".title")?.textContent?.trim(),
    price: el.querySelector(".price")?.textContent?.trim(),
    url: el.querySelector("a")?.href
  }))
})()
""")

import json
print(json.dumps(data, indent=2))
EOF

Step 3 -- Process data with Python

Use pandas, regex, or other Python tools to clean and transform extracted data:

openbrowser-ai -c - <<'EOF'
import json

# Filter and transform
filtered = [item for item in data if item.get("price")]
for item in filtered:
    # Extract numeric price
    price_str = item["price"].replace("$", "").replace(",", "")
    item["price_float"] = float(price_str)

# Sort by price
filtered.sort(key=lambda x: x["price_float"])
print(json.dumps(filtered, indent=2))
EOF

Or with pandas if available:

openbrowser-ai -c - <<'EOF'
import pandas as pd
df = pd.DataFrame(data)
print(df.to_string())
EOF

Step 4 -- Handle pagination

openbrowser-ai -c - <<'EOF'
results = []
page = 1

while True:
    # Extract data from current page
    page_data = await evaluate("""
    (function(){
      return Array.from(document.querySelectorAll(".item")).map(el => ({
        name: el.textContent.trim()
      }))
    })()
    """)
    results.extend(page_data)
    print(f"Page {page}: {len(page_data)} items")

    # Check for next button
    has_next = await evaluate("""
    (function(){ return !!document.querySelector(".pagination .next:not(.disabled)") })()
    """)

    if not has_next:
        break

    # Replace with the actual index from browser.get_browser_state_summary()
    await click(next_button_index)
    await wait(2)
    page += 1

print(f"Total: {len(results)} items")
EOF

Step 5 -- Handle infinite scroll

openbrowser-ai -c - <<'EOF'
results = []
prev_count = 0

for _ in range(20):  # Max 20 scroll attempts
    # Get current items
    count = await evaluate("""
    (function(){ return document.querySelectorAll(".item").length })()
    """)

    if count == prev_count:
        break  # No new content loaded

    prev_count = count
    await scroll(down=True, pages=3)
    await wait(1)

# Now extract all loaded items
results = await evaluate("""
(function(){
  return Array.from(document.querySelectorAll(".item")).map(el => ({
    text: el.textContent.trim()
  }))
})()
""")
print(f"Extracted {len(results)} items")
EOF

Step 6 -- Multi-page scraping

openbrowser-ai -c - <<'EOF'
urls = [
    "https://example.com/page-1",
    "https://example.com/page-2",
    "https://example.com/page-3",
]

all_data = []
for url in urls:
    await navigate(url)
    await wait(1)

    page_data = await evaluate("""
    (function(){
      return document.querySelector("h1")?.textContent?.trim()
    })()
    """)
    all_data.append({"url": url, "title": page_data})
    print(f"{url}: {page_data}")

import json
print(json.dumps(all_data, indent=2))
EOF

Tips

Code is piped via stdin using heredoc (-c - <<'EOF'), so all Python syntax works without shell escaping issues.
Use evaluate() for structured DOM extraction -- it returns Python objects directly.
Use Python for post-processing: filtering, sorting, deduplication, format conversion.
For large datasets, process pages incrementally rather than loading everything into memory.
Check for rate limiting; add await wait(2) between page loads if needed.
Variables persist between -c calls while the daemon is running, so you can build up results across multiple calls.