← 返回基因目录

law-site-link-discovery

Hybrid knowledge.webimport

Fetch a government/law portal seed URL and extract same-origin direct file links (PDF/DOCX/PPT/XLS/ZIP). Shanghai: /cmsres/ + generic hrefs. PBOC tiaofasi: articleFileDir, generic hrefs, follow article list detail pages; when a detail has no file attachment in HTML, include the full article page as an HTML download. NFRA ItemList and gov.cn gjgzk index: add seed page as an HTML line item (single GET). No pagination. NPC: extension-based. No filesystem.

作者 @sharesummer
v0.2.4 2026年5月7日
有更新版本:v0.4.1 →

README

暂无文档。

基因作者可在发布时添加 README。

表现型

输入

属性类型 必填 描述
seedUrl string HTTP(S) page URL to fetch (first list page only; pagination not followed).
linkScope default | single_page_downloadable When single_page_downloadable: for pbc_tiaofasi same as default (list + article detail fetches on the first page, no sitewide pagination). For other sites, one GET to seedUrl and extension-based generic hrefs only, no subpages. NFRA ItemList and gov.cn /zhengce/xxgk/gjgzk/ also include the list/index page URL as a downloadable item (HTML snapshot).
followDetailPages boolean For PBOC tiaofasi list pages: also fetch each linked article detail page on the same list to collect attachments (default true).

输出

属性类型 必填 描述
site string Detected site key, or single_page_downloadable when linkScope is set accordingly
error string Set when fetch or parse failed
items array
原始 JSON Schema

inputSchema

{
  "type": "object",
  "required": [
    "seedUrl"
  ],
  "properties": {
    "seedUrl": {
      "type": "string",
      "description": "HTTP(S) page URL to fetch (first list page only; pagination not followed)."
    },
    "linkScope": {
      "enum": [
        "default",
        "single_page_downloadable"
      ],
      "type": "string",
      "description": "When single_page_downloadable: for pbc_tiaofasi same as default (list + article detail fetches on the first page, no sitewide pagination). For other sites, one GET to seedUrl and extension-based generic hrefs only, no subpages. NFRA ItemList and gov.cn /zhengce/xxgk/gjgzk/ also include the list/index page URL as a downloadable item (HTML snapshot)."
    },
    "followDetailPages": {
      "type": "boolean",
      "description": "For PBOC tiaofasi list pages: also fetch each linked article detail page on the same list to collect attachments (default true)."
    }
  }
}

outputSchema

{
  "type": "object",
  "required": [
    "site",
    "items"
  ],
  "properties": {
    "site": {
      "type": "string",
      "description": "Detected site key, or single_page_downloadable when linkScope is set accordingly"
    },
    "error": {
      "type": "string",
      "description": "Set when fetch or parse failed"
    },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "url",
          "title"
        ],
        "properties": {
          "url": {
            "type": "string"
          },
          "title": {
            "type": "string"
          }
        }
      }
    }
  }
}