Scrapeless는 광범위한 매개변수 커스터마이징과 다중 형식 내보내기 지원을 통해 유연하고 기능이 풍부한 데이터 수집 서비스를 제공합니다. 이러한 기능은 LangChain이 외부 데이터를 보다 효과적으로 통합하고 활용할 수 있도록 지원합니다. 핵심 기능 모듈은 다음과 같습니다: DeepSerp
  • Google Search: 모든 결과 유형에 걸쳐 Google SERP 데이터의 포괄적인 추출을 가능하게 합니다.
    • 지역별 검색 결과를 검색하기 위해 지역화된 Google 도메인(예: google.com, google.ad) 선택을 지원합니다.
    • 첫 페이지 이후의 결과를 검색하기 위한 페이지네이션을 지원합니다.
    • 중복되거나 유사한 콘텐츠를 제외할지 제어하는 검색 결과 필터링 토글을 지원합니다.
  • Google Trends: 시간 경과에 따른 인기도, 지역별 관심도, 관련 검색어를 포함한 Google의 키워드 트렌드 데이터를 검색합니다.
    • 다중 키워드 비교를 지원합니다.
    • 여러 데이터 유형을 지원합니다: interest_over_time, interest_by_region, related_queries, related_topics.
    • 소스별 트렌드 분석을 위해 특정 Google 속성(Web, YouTube, News, Shopping)별 필터링을 허용합니다.
Universal Scraping
  • JavaScript가 많이 사용되는 최신 웹사이트를 위해 설계되어 동적 콘텐츠 추출을 가능하게 합니다.
    • 지역 제한을 우회하고 안정성을 향상시키기 위한 글로벌 프리미엄 프록시를 지원합니다.
Crawler
  • Crawl: 웹사이트와 연결된 페이지를 재귀적으로 크롤링하여 사이트 전체 콘텐츠를 추출합니다.
    • 구성 가능한 크롤링 깊이와 범위가 지정된 URL 타겟팅을 지원합니다.
  • Scrape: 높은 정밀도로 단일 웹페이지에서 콘텐츠를 추출합니다.
    • 광고, 푸터 및 기타 필수적이지 않은 요소를 제외하는 “메인 콘텐츠만” 추출을 지원합니다.
    • 여러 독립 URL의 일괄 스크래핑을 허용합니다.

Overview

Integration details

ClassPackageSerializableJS supportVersion
ScrapelessCrawlerScrapeToollangchain-scrapelessPyPI - Version
ScrapelessCrawlerCrawlToollangchain-scrapelessPyPI - Version

Tool features

Native asyncReturns artifactReturn data
markdown, rawHtml, screenshot@fullPage, json, links, screenshot, html

Setup

integration은 langchain-scrapeless 패키지에 포함되어 있습니다. !pip install langchain-scrapeless

Credentials

이 도구를 사용하려면 Scrapeless API key가 필요합니다. 환경 변수로 설정할 수 있습니다:
import os

os.environ["SCRAPELESS_API_KEY"] = "your-api-key"

Instantiation

ScrapelessCrawlerScrapeTool

ScrapelessCrawlerScrapeTool을 사용하면 Scrapeless의 Crawler Scrape API를 사용하여 하나 또는 여러 웹사이트에서 콘텐츠를 스크래핑할 수 있습니다. 메인 콘텐츠를 추출하고, 형식, 헤더, 대기 시간 및 출력 유형을 제어할 수 있습니다. 이 도구는 다음 매개변수를 허용합니다:
  • urls (필수, List[str]): 스크래핑하려는 웹사이트의 하나 이상의 URL입니다.
  • formats (선택, List[str]): 스크래핑된 출력의 형식을 정의합니다. 기본값은 ['markdown']입니다. 옵션은 다음과 같습니다:
    • 'markdown'
    • 'rawHtml'
    • 'screenshot@fullPage'
    • 'json'
    • 'links'
    • 'screenshot'
    • 'html'
  • only_main_content (선택, bool): 헤더, 네비게이션, 푸터 등을 제외하고 메인 페이지 콘텐츠만 반환할지 여부입니다. 기본값은 True입니다.
  • include_tags (선택, List[str]): 출력에 포함할 HTML 태그 목록입니다(예: ['h1', 'p']). None으로 설정하면 명시적으로 포함되는 태그가 없습니다.
  • exclude_tags (선택, List[str]): 출력에서 제외할 HTML 태그 목록입니다. None으로 설정하면 명시적으로 제외되는 태그가 없습니다.
  • headers (선택, Dict[str, str]): 요청과 함께 보낼 사용자 정의 헤더입니다(예: 쿠키 또는 user-agent). 기본값은 None입니다.
  • wait_for (선택, int): 스크래핑하기 전에 대기할 시간(밀리초)입니다. 페이지가 완전히 로드될 시간을 주는 데 유용합니다. 기본값은 0입니다.
  • timeout (선택, int): 요청 타임아웃(밀리초)입니다. 기본값은 30000입니다.

ScrapelessCrawlerCrawlTool

ScrapelessCrawlerCrawlTool을 사용하면 Scrapeless의 Crawler Crawl API를 사용하여 기본 URL에서 시작하는 웹사이트를 크롤링할 수 있습니다. URL의 고급 필터링, 크롤링 깊이 제어, 콘텐츠 스크래핑 옵션, 헤더 커스터마이징 등을 지원합니다. 이 도구는 다음 매개변수를 허용합니다:
  • url (필수, str): 크롤링을 시작할 기본 URL입니다.
  • limit (선택, int): 크롤링할 최대 페이지 수입니다. 기본값은 10000입니다.
  • include_paths (선택, List[str]): 크롤링에 일치하는 URL을 포함할 URL 경로명 정규식 패턴입니다. 이러한 패턴과 일치하는 URL만 포함됩니다. 예를 들어, ["blog/.*"]로 설정하면 /blog/ 경로 아래의 URL만 포함됩니다. 기본값은 None입니다.
  • exclude_paths (선택, List[str]): 크롤링에서 일치하는 URL을 제외할 URL 경로명 정규식 패턴입니다. 예를 들어, ["blog/.*"]로 설정하면 /blog/ 경로 아래의 URL이 제외됩니다. 기본값은 None입니다.
  • max_depth (선택, int): 기본 URL을 기준으로 한 최대 크롤링 깊이로, URL 경로의 슬래시 수로 측정됩니다. 기본값은 10입니다.
  • max_discovery_depth (선택, int): 발견 순서를 기반으로 한 최대 크롤링 깊이입니다. 루트 및 사이트맵 페이지의 깊이는 0입니다. 예를 들어, 1로 설정하고 사이트맵을 무시하면 입력된 URL과 직접 링크만 크롤링합니다. 기본값은 None입니다.
  • ignore_sitemap (선택, bool): 크롤링 중 웹사이트 사이트맵을 무시할지 여부입니다. 기본값은 False입니다.
  • ignore_query_params (선택, bool): 유사한 URL의 재스크래핑을 피하기 위해 쿼리 매개변수 차이를 무시할지 여부입니다. 기본값은 False입니다.
  • deduplicate_similar_urls (선택, bool): 유사한 URL을 중복 제거할지 여부입니다. 기본값은 True입니다.
  • regex_on_full_url (선택, bool): 정규식 매칭이 경로만이 아닌 전체 URL에 적용되는지 여부입니다. 기본값은 True입니다.
  • allow_backward_links (선택, bool): URL 계층 구조 외부의 백링크 크롤링을 허용할지 여부입니다. 기본값은 False입니다.
  • allow_external_links (선택, bool): 외부 웹사이트로의 링크 크롤링을 허용할지 여부입니다. 기본값은 False입니다.
  • delay (선택, int): 속도 제한을 준수하기 위한 페이지 스크래핑 간 지연 시간(초)입니다. 기본값은 1입니다.
  • formats (선택, List[str]): 스크래핑된 콘텐츠의 형식입니다. 기본값은 [“markdown”]입니다. 옵션은 다음과 같습니다:
    • 'markdown'
    • 'rawHtml'
    • 'screenshot@fullPage'
    • 'json'
    • 'links'
    • 'screenshot'
    • 'html'
  • only_main_content (선택, bool): 헤더, 네비게이션 바, 푸터 등을 제외하고 메인 콘텐츠만 반환할지 여부입니다. 기본값은 True입니다.
  • include_tags (선택, List[str]): 출력에 포함할 HTML 태그 목록입니다(예: ['h1', 'p']). 기본값은 None입니다(명시적 포함 필터 없음).
  • exclude_tags (선택, List[str]): 출력에서 제외할 HTML 태그 목록입니다. 기본값은 None입니다(명시적 제외 필터 없음).
  • headers (선택, Dict[str, str]): 쿠키 또는 user-agent 문자열과 같이 요청과 함께 보낼 사용자 정의 HTTP 헤더입니다. 기본값은 None입니다.
  • wait_for (선택, int): 페이지가 완전히 로드되도록 콘텐츠를 스크래핑하기 전에 대기할 시간(밀리초)입니다. 기본값은 0입니다.
  • timeout (선택, int): 요청 타임아웃(밀리초)입니다. 기본값은 30000입니다.

Invocation

ScrapelessCrawlerCrawlTool

Usage with Parameters

from langchain_scrapeless import ScrapelessCrawlerCrawlTool

tool = ScrapelessCrawlerCrawlTool()

# Advanced usage
result = tool.invoke({"url": "https://exmaple.com", "limit": 4})
print(result)
{'success': True, 'status': 'completed', 'completed': 1, 'total': 1, 'data': [{'markdown': '# Well hello there.\n\nWelcome to exmaple.com.\n\nChances are you got here by mistake (example.com, anyone?)', 'metadata': {'scrapeId': '547b2478-a41a-4a17-8015-8db378ee455f', 'sourceURL': 'https://exmaple.com', 'url': 'https://exmaple.com', 'statusCode': 200}}]}

Use within an agent

from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
from langchain.agents import create_agent


model = ChatOpenAI()

tool = ScrapelessCrawlerCrawlTool()

# Use the tool with an agent
tools = [tool]
agent = create_agent(model, tools)

for chunk in agent.stream(
    {
        "messages": [
            (
                "human",
                "Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.",
            )
        ]
    },
    stream_mode="values",
):
    chunk["messages"][-1].pretty_print()
================================ Human Message =================================

Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.
================================== Ai Message ==================================
Tool Calls:
  scrapeless_crawler_crawl (call_Ne5HbxqsYDOKFaGDSuc4xppB)
 Call ID: call_Ne5HbxqsYDOKFaGDSuc4xppB
  Args:
    url: https://example.com
    formats: ['markdown']
    limit: 1
================================= Tool Message =================================
Name: scrapeless_crawler_crawl

{"success": true, "status": "completed", "completed": 1, "total": 1, "data": [{"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this\ndomain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)", "metadata": {"viewport": "width=device-width, initial-scale=1", "title": "Example Domain", "scrapeId": "00561460-9166-492b-8fed-889667383e55", "sourceURL": "https://example.com", "url": "https://example.com", "statusCode": 200}}]}
================================== Ai Message ==================================

The crawl of the website https://example.com has been completed. Here is the markdown content extracted from the website:

\`\`\`
# Example Domain

This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.

[More information...](https://www.iana.org/domains/example)
\`\`\`

You can find more information on the website [here](https://www.iana.org/domains/example).

ScrapelessCrawlerScrapeTool

Usage with Parameters

from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool

tool = ScrapelessDeepSerpGoogleTrendsTool()

# Basic usage
result = tool.invoke("Funny 2048,negamon monster trainer")
print(result)
{'parameters': {'engine': 'google.trends.search', 'hl': 'en', 'data_type': 'INTEREST_OVER_TIME', 'tz': '0', 'cat': '0', 'date': 'today 1-m', 'q': 'Funny 2048,negamon monster trainer'}, 'interest_over_time': {'timeline_data': [{'date': 'Jul 11, 2025', 'timestamp': '1752192000', 'value': [0, 0]}, {'date': 'Jul 12, 2025', 'timestamp': '1752278400', 'value': [0, 0]}, {'date': 'Jul 13, 2025', 'timestamp': '1752364800', 'value': [0, 0]}, {'date': 'Jul 14, 2025', 'timestamp': '1752451200', 'value': [0, 0]}, {'date': 'Jul 15, 2025', 'timestamp': '1752537600', 'value': [0, 0]}, {'date': 'Jul 16, 2025', 'timestamp': '1752624000', 'value': [0, 0]}, {'date': 'Jul 17, 2025', 'timestamp': '1752710400', 'value': [0, 0]}, {'date': 'Jul 18, 2025', 'timestamp': '1752796800', 'value': [0, 0]}, {'date': 'Jul 19, 2025', 'timestamp': '1752883200', 'value': [0, 0]}, {'date': 'Jul 20, 2025', 'timestamp': '1752969600', 'value': [0, 0]}, {'date': 'Jul 21, 2025', 'timestamp': '1753056000', 'value': [0, 0]}, {'date': 'Jul 22, 2025', 'timestamp': '1753142400', 'value': [0, 0]}, {'date': 'Jul 23, 2025', 'timestamp': '1753228800', 'value': [0, 0]}, {'date': 'Jul 24, 2025', 'timestamp': '1753315200', 'value': [0, 0]}, {'date': 'Jul 25, 2025', 'timestamp': '1753401600', 'value': [0, 0]}, {'date': 'Jul 26, 2025', 'timestamp': '1753488000', 'value': [0, 0]}, {'date': 'Jul 27, 2025', 'timestamp': '1753574400', 'value': [0, 0]}, {'date': 'Jul 28, 2025', 'timestamp': '1753660800', 'value': [0, 0]}, {'date': 'Jul 29, 2025', 'timestamp': '1753747200', 'value': [0, 0]}, {'date': 'Jul 30, 2025', 'timestamp': '1753833600', 'value': [0, 0]}, {'date': 'Jul 31, 2025', 'timestamp': '1753920000', 'value': [0, 0]}, {'date': 'Aug 1, 2025', 'timestamp': '1754006400', 'value': [0, 0]}, {'date': 'Aug 2, 2025', 'timestamp': '1754092800', 'value': [0, 0]}, {'date': 'Aug 3, 2025', 'timestamp': '1754179200', 'value': [0, 0]}, {'date': 'Aug 4, 2025', 'timestamp': '1754265600', 'value': [0, 0]}, {'date': 'Aug 5, 2025', 'timestamp': '1754352000', 'value': [0, 0]}, {'date': 'Aug 6, 2025', 'timestamp': '1754438400', 'value': [0, 0]}, {'date': 'Aug 7, 2025', 'timestamp': '1754524800', 'value': [0, 0]}, {'date': 'Aug 8, 2025', 'timestamp': '1754611200', 'value': [0, 0]}, {'date': 'Aug 9, 2025', 'timestamp': '1754697600', 'value': [0, 0]}, {'date': 'Aug 10, 2025', 'timestamp': '1754784000', 'value': [0, 100]}, {'date': 'Aug 11, 2025', 'timestamp': '1754870400', 'value': [0, 0]}], 'averages': [{'value': 0}, {'value': 3}], 'isPartial': True}}

Advanced Usage with Parameters

from langchain_scrapeless import ScrapelessCrawlerScrapeTool

tool = ScrapelessCrawlerScrapeTool()

result = tool.invoke(
    {
        "urls": ["https://exmaple.com", "https://www.scrapeless.com/en"],
        "formats": ["markdown"],
    }
)
print(result)
{'success': True, 'status': 'completed', 'completed': 1, 'total': 1, 'data': [{'markdown': "[🩵 Don't just take our word for it. See what our users say on Product Hunt.](https://www.producthunt.com/posts/scrapeless-deep-serpapi)\n\n# Effortless Web Scraping Toolkit  for Business and Developers\n\nThe ultimate scraper's companion: an expandable suite of tools, including\n\nScraping Browser, Scraping API, Universal Scraping API\n\nand Anti-Bot Solutions—designed to work together or independently.\n\n[**4.8**](https://www.g2.com/products/scrapeless/reviews) [**4.5**](https://www.trustpilot.com/review/scrapeless.com) [**4.8**](https://slashdot.org/software/p/Scrapeless/) [**8.5**](https://tekpon.com/software/scrapeless/reviews/)\n\nNo credit card required\n\n## A Flexible Toolkit for Accessing Public Web Data\n\nAI-powered seamless data extraction, effortlessly bypassing blocks with a single API call.\n\n[scrapeless](https://www.scrapeless.com/en)\n\n[![Deep SerpApi](https://www.scrapeless.com/_next/image?url=%2Fassets%2Fimages%2Ftoolkit%2Flight%2Fimg-2.png&w=750&q=100)\\\\\n\\\\\nView more\\\\\n\\\\\n20+ custom parameters\\\\\n\\\\\n20+ Google SERP scenarios\\\\\n\\\\\nPrecision Search Fueling LLM & RAG AI\\\\\n\\\\\n1-2s response; $0.1/1k queries](https://www.scrapeless.com/en/product/deep-serp-api) [![Scraping Browser](https://www.scrapeless.com/_next/image?url=%2Fassets%2Fimages%2Ftoolkit%2Flight%2Fimg-4.png&w=750&q=100)\\\\\n\\\\\nView more\\\\\n\\\\\nHuman-like Behavior\\\\\n\\\\\nHigh Performance\\\\\n\\\\\nBypassing Risk Control\\\\\n\\\\\nConnect using the CDP Protocol](https://www.scrapeless.com/en/product/scraping-browser) [![Universal Scraping API](https://www.scrapeless.com/_next/image?url=%2Fassets%2Fimages%2Ftoolkit%2Flight%2Fimg-1.png&w=750&q=100)\\\\\n\\\\\nView more\\\\\n\\\\\nSession Mode\\\\\n\\\\\nCustom TLS\\\\\n\\\\\nJs Render](https://www.scrapeless.com/en/product/universal-scraping-api)\n\n### Customized Services\n\nContact our technical experts for custom solutions.\n\nBook a demo\n\n## From Simple Data Scraping to Complex Anti-Bot Challenges,   Scrapeless Has You Covered.\n\nFlexible Toolkit for Adapting to Diverse Data Extraction Needs.\n\n[Try for Free](https://app.scrapeless.com/passport/register)\n\n### Fully Compatible with Key Programming Languages and Tools\n\nSeamlessly integrate across all devices, OS, and languages. Worry-free compatibility ensures smooth data collection.\n\nGet all example codes on the dashboard after login\n\n![scrapeless](https://www.scrapeless.com/_next/image?url=%2Fassets%2Fimages%2Fcode%2Fcode-l.jpg&w=3840&q=75)\n\n## Enterprise-level Data Scraping Solution\n\nHigh-quality, tailored web scraping solutions and expert services designed for critical business projects.\n\n### Customized Data Scraping Solutions\n\nTailored web scraping services designed to address your\xa0 unique business requirements and deliver actionable insights.\n\n### High Concurrency and High-Performance Scraping\n\nEfficiently gather massive volumes of data with unparalleled speed and reliability,\xa0ensuring optimal performance even under heavy load.\n\n### Data Cleaning and Transformation\n\nEnhance data accuracy and usability through comprehensive\xa0 cleaning and transformation processes, turning raw data into\xa0 valuable information.\n\n### Real-Time Data Push and API Integration\n\nSeamlessly integrate and access live data streams with robust APIs,\xa0ensuring your applications are always up-to-date with the latest information.\n\n### Data Security and Privacy Protection\n\nProtect your data with state-of-the-art security measures and strict\xa0compliance standards, ensuring privacy and confidentiality at every step.\n\n### Enterprise-level SLA\n\nThe Service Level Agreement (SLA) serves as a safeguard for your project,\xa0ensuring a contract for anticipated outcomes, automated oversight, prompt issue\xa0resolution, and a personalized maintenance plan.\n\n## Why Scrapeless: Simplify Your Data Flow Effortlessly.\n\nAchieve all your data scraping tasks with more power, simplicity, and cost-effectiveness in less time.\n\n### Articles\n\nNews articles/Blog posts/Research papers\n\n### Organized Fresh Data\n\n### Prices\n\nProduct prices/Discount information/Market trend analysis\n\n### No need to hassle with browser maintenance\n\n### Reviews\n\nProduct reviews/User feedback/Social media reviews\n\n### Only pay for successful requests\n\n### Products\n\nProduct Launches/Tech Specs/Product Comparisons\n\n### Fully scalable\n\n## Unleash Your Competitive Edge  in Data within the Industry\n\n## Regulate Compliance for All Users\n\nContact us\n\nWe are committed to using technology for the benefit of humanity and firmly oppose any illegal activities and misuse of our products. We support the collection of publicly available data to improve human life, while strongly opposing the collection of unauthorized or unapproved sensitive information. If you find anyone abusing our services, please provide us with feedback! To further enhance user confidence and control, we have established a dedicated Privacy Center aimed at empowering users with more capabilities and information rights.\n\n![scrapeless](https://www.scrapeless.com/_next/image?url=%2Fassets%2Fimages%2Fregulate-compliance.png&w=640&q=75)\n\n## Web Scraping Blog\n\nMost comprehensive guide, created for all Web Scraping developers.\n\n[View All Blogs](https://www.scrapeless.com/en/blog)\n\n[**Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector** \\\\\n\\\\\nDiscover how the Scrapeless MCP Server gives LLMs real-time web browsing and scraping abilities. Learn how to build AI agents that search, extract, and interact with dynamic web content seamlessly.\\\\\n\\\\\n![Michael Lee](https://www.scrapeless.com/_next/image?url=https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fimages%2Fauthor-avatars%2Fmichael-lee.png&w=48&q=75)Michael Lee\\\\\n\\\\\n17-Jul-2025\\\\\n\\\\\n![Scrapeless MCP Server](https://www.scrapeless.com/_next/image?url=https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fscrapeless-mcp-server%2Fc85738fc1c504abe930fd4514e4a2190.jpeg&w=3840&q=75)](https://www.scrapeless.com/en/blog/scrapeless-mcp-server) [**Product Updates \\| New Profile Feature** \\\\\n\\\\\nProduct Updates \\| Introducing the new Profile feature to enable persistent browser data storage, streamline cross-session workflows, and boost automation efficiency.\\\\\n\\\\\n![Emily Chen](https://www.scrapeless.com/_next/image?url=https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fimages%2Fauthor-avatars%2Femily-chen.png&w=48&q=75)Emily Chen\\\\\n\\\\\n17-Jul-2025\\\\\n\\\\\n![Product Updates | New Profile Feature: Make Browser Data Persistent, Efficient, and Controllable](https://www.scrapeless.com/_next/image?url=https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fscrapeelss-profile%2F3194244c16c9b56e1592640ea95c389e.jpeg&w=3840&q=75)](https://www.scrapeless.com/en/blog/scrapeelss-profile) [**How to Track Your Ranking on ChatGPT?** \\\\\n\\\\\nLearn why traditional SEO tools fall short and how Scrapeless helps you monitor and optimize your AI rankings effortlessly.\\\\\n\\\\\n![Michael Lee](https://www.scrapeless.com/_next/image?url=https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fimages%2Fauthor-avatars%2Fmichael-lee.png&w=48&q=75)Michael Lee\\\\\n\\\\\n01-Jul-2025\\\\\n\\\\\n![ChatGPT Scraper](https://www.scrapeless.com/_next/image?url=https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fchatgpt-scraper%2F7c5b1ac494b6838a7eca2964df15ef59.png&w=3840&q=75)](https://www.scrapeless.com/en/blog/chatgpt-scraper)\n\nContact our sales team\n\nMonday to Friday, 9:00 AM - 18:00 PMSingapore Standard Time (UTC+08:00)\n\nScrapeless offers AI-powered, robust, and scalable web scraping and automation services trusted by leading enterprises. Our enterprise-grade solutions are tailored to meet your project needs, with dedicated technical support throughout. With a strong technical team and flexible delivery times, we charge only for successful data, enabling efficient data extraction while bypassing limitations.\n\nContact us now to fuel your business growth.\n\n[**4.8**](https://www.g2.com/products/scrapeless/reviews) [**4.5**](https://www.trustpilot.com/review/scrapeless.com) [**4.8**](https://slashdot.org/software/p/Scrapeless/) [**8.5**](https://tekpon.com/software/scrapeless/reviews/)\n\nBook a demo\n\nProvide your contact details, and we'll promptly reach out to offer a product demo and introduction. We ensure your information remains confidential, complying with GDPR standards.\n\nGet a demo\n\nRegister and Claim Free Trial\n\nYour free trial is ready! Sign up for a Scrapeless account for free, and your trial will be instantly activated in your account.\n\n[Sign up](https://app.scrapeless.com/passport/register)\n\nWe value your privacy\n\nWe use cookies to analyze website usage and do not record any of your personal information. View [Privacy Policy](https://www.scrapeless.com/en/legal/privacy-policy)\n\nReject\n\nAccept", 'metadata': {'language': 'en', 'description': 'Scrapeless is the best full-stack web scraping toolkit offering Scraping API, Scraping Browser, Universal Scraping API, Captcha Solver, and Proxies, designed to handle all your data collection needs with ease and reliability, empowering businesses and developers with efficient data extraction solutions.', 'google-site-verification': 'xj1xDpU8LpGG_h-2lIBVW_6GNW5Vtx0h5M3lz43HUXc', 'viewport': 'width=device-width, initial-scale=1', 'keywords': 'Scraping API, Scraping Browser, Universal Scraping API, Captcha Solver, and Proxies, web scraping,  web scraper, web scraping api, Web scraper,data scraping, web crawler', 'next-size-adjust': '', 'favicon': 'https://www.scrapeless.com/favicon.ico', 'title': 'Effortless Web Scraping Toolkit - Scrapeless', 'scrapeId': 'c7189211-7034-4e86-9afd-89fa5268b013', 'sourceURL': 'https://www.scrapeless.com/en', 'url': 'https://www.scrapeless.com/en', 'statusCode': 200}}]}

Use within an agent

from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
from langchain.agents import create_agent


model = ChatOpenAI()

tool = ScrapelessCrawlerScrapeTool()

# Use the tool with an agent
tools = [tool]
agent = create_agent(model, tools)

for chunk in agent.stream(
    {
        "messages": [
            (
                "human",
                "Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.",
            )
        ]
    },
    stream_mode="values",
):
    chunk["messages"][-1].pretty_print()
================================ Human Message =================================

Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.
================================== Ai Message ==================================
Tool Calls:
  scrapeless_crawler_scrape (call_qrPMGLjXmzb5QlVoIZgMuyPN)
 Call ID: call_qrPMGLjXmzb5QlVoIZgMuyPN
  Args:
    urls: ['https://example.com']
    formats: ['html']
================================= Tool Message =================================
Name: scrapeless_crawler_scrape

{"success": true, "status": "completed", "completed": 1, "total": 1, "data": [{"metadata": {"viewport": "width=device-width, initial-scale=1", "title": "Example Domain", "scrapeId": "63070ee5-ebef-4727-afe7-2b06466c6777", "sourceURL": "https://example.com", "url": "https://example.com", "statusCode": 200}, "html": "<!DOCTYPE html><html>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n\n\n<div id=\"div-f3t6fv31hyl\" style=\"display: none;\"></div></body></html>"}]}
================================== Ai Message ==================================

The HTML content of the website "https://example.com" is as follows:

\`\`\`html
<!DOCTYPE html><html>
<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>

<div id="div-f3t6fv31hyl" style="display: none;"></div></body></html>
\`\`\`

API reference


Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I