From openbrowser
Extracts structured data from websites using Python browser automation with openbrowser-ai. Handles JS-rendered content, pagination, multi-page scraping, and data processing with pandas.
npx claudepluginhub billy-enrizky/openbrowser-ai --plugin openbrowserThis skill is limited to using the following tools:
Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.
Extracts structured data like tables, lists, and prices from web pages using multi-strategy scraping with pagination, validation, transforms, and CSV/JSON export.
Extracts structured data from websites like product listings, tables, or search results, generating executable Playwright scripts and JSON/CSV output.
Builds production-ready web scrapers for any website using Bright Data APIs including Web Unlocker, Browser, and SERP. Guides site analysis, selector extraction, pagination handling, and code implementation in Python or Node.js.
Share bugs, ideas, or general feedback.
Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.
All code runs via openbrowser-ai -c. The daemon starts automatically and persists variables across calls. All browser functions are async -- use await.
Before running, verify openbrowser-ai is installed:
openbrowser-ai --help
If not found, install:
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh
# Windows (PowerShell)
irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex
openbrowser-ai -c - <<'EOF'
await navigate("https://example.com/data")
# Get browser state to see page title, URL, element count
state = await browser.get_browser_state_summary()
print(f"Title: {state.title}")
print(f"URL: {state.url}")
print(f"Elements: {len(state.dom_state.selector_map)}")
EOF
Use evaluate() to run JS in the browser and return structured data directly as Python objects:
openbrowser-ai -c - <<'EOF'
data = await evaluate("""
(function(){
return Array.from(document.querySelectorAll(".product-card")).map(el => ({
name: el.querySelector(".title")?.textContent?.trim(),
price: el.querySelector(".price")?.textContent?.trim(),
url: el.querySelector("a")?.href
}))
})()
""")
import json
print(json.dumps(data, indent=2))
EOF
Use pandas, regex, or other Python tools to clean and transform extracted data:
openbrowser-ai -c - <<'EOF'
import json
# Filter and transform
filtered = [item for item in data if item.get("price")]
for item in filtered:
# Extract numeric price
price_str = item["price"].replace("$", "").replace(",", "")
item["price_float"] = float(price_str)
# Sort by price
filtered.sort(key=lambda x: x["price_float"])
print(json.dumps(filtered, indent=2))
EOF
Or with pandas if available:
openbrowser-ai -c - <<'EOF'
import pandas as pd
df = pd.DataFrame(data)
print(df.to_string())
EOF
openbrowser-ai -c - <<'EOF'
results = []
page = 1
while True:
# Extract data from current page
page_data = await evaluate("""
(function(){
return Array.from(document.querySelectorAll(".item")).map(el => ({
name: el.textContent.trim()
}))
})()
""")
results.extend(page_data)
print(f"Page {page}: {len(page_data)} items")
# Check for next button
has_next = await evaluate("""
(function(){ return !!document.querySelector(".pagination .next:not(.disabled)") })()
""")
if not has_next:
break
# Replace with the actual index from browser.get_browser_state_summary()
await click(next_button_index)
await wait(2)
page += 1
print(f"Total: {len(results)} items")
EOF
openbrowser-ai -c - <<'EOF'
results = []
prev_count = 0
for _ in range(20): # Max 20 scroll attempts
# Get current items
count = await evaluate("""
(function(){ return document.querySelectorAll(".item").length })()
""")
if count == prev_count:
break # No new content loaded
prev_count = count
await scroll(down=True, pages=3)
await wait(1)
# Now extract all loaded items
results = await evaluate("""
(function(){
return Array.from(document.querySelectorAll(".item")).map(el => ({
text: el.textContent.trim()
}))
})()
""")
print(f"Extracted {len(results)} items")
EOF
openbrowser-ai -c - <<'EOF'
urls = [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3",
]
all_data = []
for url in urls:
await navigate(url)
await wait(1)
page_data = await evaluate("""
(function(){
return document.querySelector("h1")?.textContent?.trim()
})()
""")
all_data.append({"url": url, "title": page_data})
print(f"{url}: {page_data}")
import json
print(json.dumps(all_data, indent=2))
EOF
-c - <<'EOF'), so all Python syntax works without shell escaping issues.evaluate() for structured DOM extraction -- it returns Python objects directly.await wait(2) between page loads if needed.-c calls while the daemon is running, so you can build up results across multiple calls.