首页 > 常用工具 > Scrapegraph-ai

Scrapegraph-ai

类型：常用工具 评分：4 访问热度：13394 更新时间：2025-01-03 语言：简体中文: 访问官网

介绍同类推荐

Scrapegraph-ai 项目介绍

Scrapegraph-ai 是一个创新的网页抓取 Python 库，它结合了大语言模型 (LLM) 和直接图逻辑来创建网站和本地文档（如 XML、HTML、JSON、Markdown 等）的抓取管道。这个项目的独特之处在于它的简单性和强大功能 - 用户只需说明想要提取的信息，库就会自动完成抓取工作。

核心特性

智能抓取：利用 LLM 理解用户需求，自动设计抓取策略。

多源支持：不仅可以抓取网页，还支持本地文档的信息提取。

灵活配置：提供多种抓取管道，适应不同场景需求。

多语言模型支持：可使用 OpenAI、Groq、Azure、Gemini 等 API，也支持通过 Ollama 使用本地模型。

安装与使用

安装 Scrapegraph-ai 非常简单，只需通过 pip 安装即可：

pip install scrapegraphaiplaywright install

建议在虚拟环境中安装，以避免与其他库发生冲突。

使用示例

以下是使用 Scrapegraph-ai 的 SmartScraperGraph 进行单页面抓取的简单示例：

import jsonfrom scrapegraphai.graphs import SmartScraperGraphgraph_config = { "llm": { "api_key": "YOUR_OPENAI_APIKEY", "model": "openai/gpt-4o-mini", }, "verbose": True, "headless": False,}smart_scraper_graph = SmartScraperGraph( prompt="Find some information about what does the company do, the name and a contact email.", source="https://scrapegraphai.com/", config=graph_config)result = smart_scraper_graph.run()print(json.dumps(result, indent=4))