Forráskód Böngészése

opt:: subtract json field

Dew-OF-Aurora 2 hete
szülő
commit
60aa89b73d

+ 3 - 1
CLAUDE.md

@@ -85,6 +85,7 @@ sudo bash scripts/uninstall_debian.sh --keep-auth-files
 - Extracts candidates via `parser.field_paths`, `parser.json_paths`, or regex fallback.
 - Normalizes and de-duplicates domains/IPs.
 - Applies include/exclude filtering (`domain_filter`).
+- Applies record-level exclusion rules from `record_filter` (API-specific strategy via config).
 - Optionally ranks records (`scoring`) using API fields/time windows or API order.
 - Optionally healthchecks candidates with TLS handshake (`healthcheck`).
 - Selects winner from scored/check results (`selection.top_n`).
@@ -123,8 +124,9 @@ sudo bash scripts/uninstall_debian.sh --keep-auth-files
 ## Config model (`config.json`)
 
 Key top-level blocks:
-- `api`: endpoint/method/headers/body/timeout
+- `api`: endpoint/method/headers/params/body/timeout
 - `parser`: domain extraction paths/regex
+- `record_filter`: optional record-level exclusion rules (API-specific)
 - `domain_filter`: include suffixes / exclude regex
 - `scoring`: record ranking fields and strategy
 - `healthcheck`: TLS probe settings

+ 54 - 18
README.md

@@ -40,7 +40,7 @@
 
 ### 3.1 配置
 
-编辑 `config.json`。
+编辑 `config.json`(可先从 `config.example.json` 复制一份再按你的 API 调整)
 
 典型解析路径(如 API 返回 `data.good[].ip`):
 
@@ -52,20 +52,56 @@
 }
 ```
 
-### 3.2 语法检查
+### 3.2 配置块说明(API 无关)
+
+你可以为不同 API 使用同一套脚本,只调整配置:
+
+- `api`:请求地址、方法、header、query 参数、超时
+- `parser`:如何从返回 JSON 提取候选域名
+- `record_filter`:按记录字段做排除(个性化策略应放这里)
+- `domain_filter`:按域名字符串做 include/exclude
+- `scoring`:如何按分数字段排序
+- `healthcheck`:可选 TLS 检测
+- `selection`:候选截断数量
+- `output`:runtime 文件输出路径与文件名
+- `v2ray`:模板替换输出(可选)
+- `notify`:后置命令回调(可选)
+
+`record_filter` 示例(排除 locationCountry/locationCity 含“泛播”的记录):
+
+```json
+"record_filter": {
+  "enabled": true,
+  "records_path": "data.good[]",
+  "domain_field": "ip",
+  "exclude_if_any": [
+    { "field_path": "locationCountry", "contains": "泛播", "case_sensitive": false },
+    { "field_path": "locationCity", "contains": "泛播", "case_sensitive": false }
+  ]
+}
+```
+
+规则支持:
+- `contains`
+- `equals`
+- `regex`
+
+> 说明:`record_filter` 仅影响“记录筛选策略”;服务主流程、日志输出、runtime 文件写入逻辑保持不变。
+
+### 3.3 语法检查
 
 ```bash
 python3 -m py_compile scripts/domain_updater.py
 python3 -m py_compile scripts/update_vmess_links.py
 ```
 
-### 3.3 运行一次
+### 3.4 运行一次
 
 ```bash
 python3 scripts/domain_updater.py --config config.json
 ```
 
-### 3.4 本地查看结果
+### 3.5 本地查看结果
 
 ```bash
 cat runtime/current_domain.txt
@@ -163,20 +199,20 @@ sudo journalctl -u vmess-domain-rotator.service -n 120 --no-pager
 sudo bash scripts/install_debian.sh [options]
 ```
 
-| 参数                           | 说明                           | 默认值                               |
-| ------------------------------ | ------------------------------ | ------------------------------------ |
-| `--user <name>`              | 指定 service 用户              | 当前 `sudo` 用户                   |
-| `--group <name>`             | 指定 service 用户组            | 当前 `sudo` 用户主组               |
-| `--interval <value>`         | 定时周期(如 `1h`/`5min`) | `1h`                               |
-| `--git-push <0                 | 1>`                            | 是否自动 push                        |
-| `--git-push-remote <name>`   | 远程名                         | `origin`                           |
-| `--git-http-username <u>`    | HTTPS 认证用户名               | `git`                              |
-| `--git-http-token <t>`       | HTTPS token(明文参数)        | 空                                   |
-| `--git-http-token-file <f>`  | 从文件读取 token               | 空                                   |
-| `--git-use-credential-store <0 | 1>`                            | 是否使用 `credential.helper store` |
-| `--git-credentials-file <f>` | 指定 credential store 文件路径 | 空(Git 默认)                       |
-| `--no-install-deps`          | 跳过 apt 安装依赖              | 关闭                                 |
-| `-h, --help`                 | 查看帮助                       | -                                    |
+| 参数 | 说明 | 默认值 |
+|---|---|---|
+| `--user <name>` | 指定 service 用户 | 当前 `sudo` 用户 |
+| `--group <name>` | 指定 service 用户组 | 当前 `sudo` 用户主组 |
+| `--interval <value>` | 定时周期(如 `1h`/`5min`) | `1h` |
+| `--git-push <0\|1>` | 是否自动 push | `1` |
+| `--git-push-remote <name>` | 远程名 | `origin` |
+| `--git-http-username <u>` | HTTPS 认证用户名 | `git` |
+| `--git-http-token <t>` | HTTPS token(明文参数) | 空 |
+| `--git-http-token-file <f>` | 从文件读取 token | 空 |
+| `--git-use-credential-store <0\|1>` | 是否使用 `credential.helper store` | `1` |
+| `--git-credentials-file <f>` | 指定 credential store 文件路径 | 空(Git 默认) |
+| `--no-install-deps` | 跳过 apt 安装依赖 | 关闭 |
+| `-h, --help` | 查看帮助 | - |
 
 说明:
 

+ 86 - 0
config.example.json

@@ -0,0 +1,86 @@
+{
+  "api": {
+    "url": "https://example.com/api/domains",
+    "method": "GET",
+    "headers": {
+      "Authorization": "Bearer <token>"
+    },
+    "params": {
+      "page": 1
+    },
+    "body": null,
+    "timeout_sec": 10
+  },
+  "parser": {
+    "field_paths": [
+      "data.good[].ip"
+    ],
+    "json_paths": [],
+    "regex": "[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
+  },
+  "record_filter": {
+    "enabled": false,
+    "records_path": "data.good[]",
+    "domain_field": "ip",
+    "exclude_if_any": [
+      {
+        "field_path": "locationCountry",
+        "contains": "泛播",
+        "case_sensitive": false
+      },
+      {
+        "field_path": "provider",
+        "equals": "internal",
+        "case_sensitive": false
+      },
+      {
+        "field_path": "tags",
+        "regex": "(test|staging)",
+        "case_sensitive": false
+      }
+    ]
+  },
+  "domain_filter": {
+    "include_suffixes": [],
+    "exclude_regex": [
+      "^(?:25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(?:\\.(?:25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}$"
+    ]
+  },
+  "scoring": {
+    "enabled": true,
+    "records_path": "data.good[]",
+    "ip_field": "ip",
+    "created_time_field": "createdTime",
+    "score_fields": [
+      "avgScore"
+    ],
+    "within_hours": 24,
+    "prefer_lower": false,
+    "use_api_order": false
+  },
+  "healthcheck": {
+    "enabled": false,
+    "attempts": 2,
+    "timeout_ms": 1800,
+    "port": 443,
+    "tls_verify": true
+  },
+  "selection": {
+    "top_n": 3
+  },
+  "output": {
+    "runtime_dir": "./runtime",
+    "current_domain_file": "current_domain.txt",
+    "current_domain_json": "current_domain.json",
+    "state_file": "state.json",
+    "substore_vars_file": "substore_vars.json"
+  },
+  "v2ray": {
+    "template_file": "",
+    "output_file": "",
+    "replace_token": "__AUTO_DOMAIN__"
+  },
+  "notify": {
+    "command": ""
+  }
+}

+ 17 - 0
config.json

@@ -13,6 +13,23 @@
     "json_paths": [],
     "regex": "[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
   },
+  "record_filter": {
+    "enabled": true,
+    "records_path": "data.good[]",
+    "domain_field": "ip",
+    "exclude_if_any": [
+      {
+        "field_path": "locationCountry",
+        "contains": "泛播",
+        "case_sensitive": false
+      },
+      {
+        "field_path": "locationCity",
+        "contains": "泛播",
+        "case_sensitive": false
+      }
+    ]
+  },
   "domain_filter": {
     "include_suffixes": [],
     "exclude_regex": [

+ 6 - 6
runtime/current_domain.json

@@ -1,26 +1,26 @@
 {
-  "domain": "cloudflare.182682.xyz",
-  "updated_at": "2026-04-17T18:04:28Z",
+  "domain": "a.lma.de5.net",
+  "updated_at": "2026-04-18T13:09:34Z",
   "status": "ok",
   "source_count": 20,
   "checked_count": 0,
   "top_candidates": [
     {
-      "domain": "cloudflare.182682.xyz",
+      "domain": "a.lma.de5.net",
       "scores": [
-        137.0
+        135.0
       ],
       "created_raw": "2026-04-18 00:00:00"
     },
     {
-      "domain": "a.lma.de5.net",
+      "domain": "bbs.alipansou.com",
       "scores": [
         135.0
       ],
       "created_raw": "2026-04-18 00:00:00"
     },
     {
-      "domain": "bbs.alipansou.com",
+      "domain": "nexusmods.com",
       "scores": [
         135.0
       ],

+ 1 - 1
runtime/current_domain.txt

@@ -1 +1 @@
-cloudflare.182682.xyz
+a.lma.de5.net

+ 2 - 2
runtime/state.json

@@ -1,6 +1,6 @@
 {
-  "updated_at": "2026-04-17T18:04:28Z",
-  "last_good_domain": "cloudflare.182682.xyz",
+  "updated_at": "2026-04-18T13:09:34Z",
+  "last_good_domain": "a.lma.de5.net",
   "status": "ok",
   "source_count": 20,
   "checked_count": 0,

+ 2 - 2
runtime/substore_vars.json

@@ -1,5 +1,5 @@
 {
-  "AUTO_DOMAIN": "cloudflare.182682.xyz",
-  "UPDATED_AT": "2026-04-17T18:04:28Z",
+  "AUTO_DOMAIN": "a.lma.de5.net",
+  "UPDATED_AT": "2026-04-18T13:09:34Z",
   "STATUS": "ok"
 }

+ 97 - 0
scripts/domain_updater.py

@@ -159,6 +159,98 @@ def parse_created_time(s):
         return None
 
 
+def record_field_value(record, field_path):
+    if not isinstance(record, dict) or not field_path:
+        return None
+    return get_by_json_path(record, field_path)
+
+
+def rule_matches(value, rule):
+    if value is None or not isinstance(rule, dict):
+        return False
+
+    values = flatten_values(value)
+    if not values:
+        values = [value]
+
+    case_sensitive = bool(rule.get("case_sensitive", False))
+
+    if "contains" in rule:
+        needle = str(rule.get("contains", ""))
+        if not needle:
+            return False
+        for item in values:
+            hay = str(item)
+            if case_sensitive:
+                if needle in hay:
+                    return True
+            else:
+                if needle.lower() in hay.lower():
+                    return True
+        return False
+
+    if "equals" in rule:
+        target = str(rule.get("equals", ""))
+        for item in values:
+            item_s = str(item)
+            if case_sensitive:
+                if item_s == target:
+                    return True
+            else:
+                if item_s.lower() == target.lower():
+                    return True
+        return False
+
+    if "regex" in rule:
+        pattern = str(rule.get("regex", ""))
+        if not pattern:
+            return False
+        flags = 0 if case_sensitive else re.IGNORECASE
+        try:
+            rx = re.compile(pattern, flags)
+        except Exception:
+            return False
+        for item in values:
+            if rx.search(str(item)):
+                return True
+        return False
+
+    return False
+
+
+def collect_excluded_domains(payload, record_filter_cfg, scoring_cfg):
+    if not record_filter_cfg.get("enabled", False):
+        return set()
+
+    rules = record_filter_cfg.get("exclude_if_any", [])
+    if not rules:
+        return set()
+
+    records_path = record_filter_cfg.get("records_path", scoring_cfg.get("records_path", "data.good[]"))
+    domain_field = record_filter_cfg.get("domain_field", scoring_cfg.get("ip_field", "ip"))
+
+    blocked = set()
+    for record in get_values_by_path(payload, records_path):
+        if not isinstance(record, dict):
+            continue
+
+        domain_raw = record_field_value(record, domain_field)
+        domain = str(domain_raw or "").strip().lower().rstrip(".")
+        if not domain:
+            continue
+
+        for rule in rules:
+            field_path = str(rule.get("field_path", "")).strip()
+            if not field_path:
+                continue
+            value = record_field_value(record, field_path)
+            if rule_matches(value, rule):
+                blocked.add(domain)
+                break
+
+    return blocked
+
+
 def parse_scored_records(payload, scoring_cfg):
     if not scoring_cfg.get("enabled", False):
         return []
@@ -393,6 +485,11 @@ def main():
         parsed = parse_domains(payload, cfg.get("parser", {}))
         filtered = apply_filter(parsed, cfg.get("domain_filter", {}))
 
+        record_filter_cfg = cfg.get("record_filter", {})
+        blocked_domains = collect_excluded_domains(payload, record_filter_cfg, cfg.get("scoring", {}))
+        if blocked_domains:
+            filtered = [d for d in filtered if d not in blocked_domains]
+
         scored_records = parse_scored_records(payload, cfg.get("scoring", {}))
         scored_records = [r for r in scored_records if r["domain"] in set(filtered)]
         ranked_scored = rank_scored_records(scored_records, cfg.get("scoring", {}))