Browse Source

[feat] engine ChinaSo: support source filter for ChinaSo-News

* filtering ChinaSo-News results by source, option ``chinaso_news_source``
* add ChinaSo engine to the online docs https://docs.searxng.org/dev/engines/online/chinaso.html
* fix SearXNG categories in the settings.yml
* deactivate ChinaSo engines ``inactive: true`` until [1] is fixed
* configure network of the ChinaSo engines

[1] https://github.com/searxng/searxng/issues/4694

Signed-off-by: @BrandonStudio
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
BrandonStudio 3 days ago
parent
commit
d47cf9db24

+ 2 - 1
docs/admin/settings/settings_engines.rst

@@ -148,6 +148,8 @@ engine is shown.  Most of the options have a default value or even are optional.
 ``display_error_messages`` : default ``true``
 ``display_error_messages`` : default ``true``
   When an engine returns an error, the message is displayed on the user interface.
   When an engine returns an error, the message is displayed on the user interface.
 
 
+.. _engine network:
+
 ``network`` : optional
 ``network`` : optional
   Use the network configuration from another engine.
   Use the network configuration from another engine.
   In addition, there are two default networks:
   In addition, there are two default networks:
@@ -257,4 +259,3 @@ Example configuration in settings.yml for a German and English speaker:
 
 
 When searching, the default google engine will return German results and
 When searching, the default google engine will return German results and
 "google english" will return English results.
 "google english" will return English results.
-

+ 8 - 0
docs/dev/engines/online/chinaso.rst

@@ -0,0 +1,8 @@
+.. _chinaso engine:
+
+=======
+ChinaSo
+=======
+
+.. automodule:: searx.engines.chinaso
+   :members:

+ 83 - 3
searx/engines/chinaso.py

@@ -1,5 +1,60 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
 # SPDX-License-Identifier: AGPL-3.0-or-later
-"""ChinaSo: A search engine from ChinaSo."""
+"""ChinaSo_, a search engine for the chinese language area.
+
+.. attention::
+
+   ChinaSo engine does not return real URL, the links from these search
+   engines violate the privacy of the users!!
+
+   We try to find a solution for this problem, please follow `issue #4694`_.
+
+   As long as the problem has not been resolved, these engines are
+   not active in a standard setup (``inactive: true``).
+
+.. _ChinaSo: https://www.chinaso.com/
+.. _issue #4694: https://github.com/searxng/searxng/issues/4694
+
+Configuration
+=============
+
+The engine has the following additional settings:
+
+- :py:obj:`chinaso_category` (:py:obj:`ChinasoCategoryType`)
+- :py:obj:`chinaso_news_source` (:py:obj:`ChinasoNewsSourceType`)
+
+In the example below, all three ChinaSO engines are using the :ref:`network
+<engine network>` from the ``chinaso news`` engine.
+
+.. code:: yaml
+
+   - name: chinaso news
+     engine: chinaso
+     shortcut: chinaso
+     categories: [news]
+     chinaso_category: news
+     chinaso_news_source: all
+
+   - name: chinaso images
+     engine: chinaso
+     network: chinaso news
+     shortcut: chinasoi
+     categories: [images]
+     chinaso_category: images
+
+   - name: chinaso videos
+     engine: chinaso
+     network: chinaso news
+     shortcut: chinasov
+     categories: [videos]
+     chinaso_category: videos
+
+
+Implementations
+===============
+
+"""
+
+import typing
 
 
 from urllib.parse import urlencode
 from urllib.parse import urlencode
 from datetime import datetime
 from datetime import datetime
@@ -20,13 +75,31 @@ paging = True
 time_range_support = True
 time_range_support = True
 results_per_page = 10
 results_per_page = 10
 categories = []
 categories = []
-chinaso_category = 'news'
+
+ChinasoCategoryType = typing.Literal['news', 'videos', 'images']
 """ChinaSo supports news, videos, images search.
 """ChinaSo supports news, videos, images search.
 
 
 - ``news``: search for news
 - ``news``: search for news
 - ``videos``: search for videos
 - ``videos``: search for videos
 - ``images``: search for images
 - ``images``: search for images
+
+In the category ``news`` you can additionally filter by option
+:py:obj:`chinaso_news_source`.
+"""
+chinaso_category = 'news'
+"""Configure ChinaSo category (:py:obj:`ChinasoCategoryType`)."""
+
+ChinasoNewsSourceType = typing.Literal['CENTRAL', 'LOCAL', 'BUSINESS', 'EPAPER', 'all']
+"""Filtering ChinaSo-News results by source:
+
+- ``CENTRAL``: central publication
+- ``LOCAL``: local publication
+- ``BUSINESS``: business publication
+- ``EPAPER``: E-Paper
+- ``all``: all sources
 """
 """
+chinaso_news_source: ChinasoNewsSourceType = 'all'
+"""Configure ChinaSo-News type (:py:obj:`ChinasoNewsSourceType`)."""
 
 
 time_range_dict = {'day': '24h', 'week': '1w', 'month': '1m', 'year': '1y'}
 time_range_dict = {'day': '24h', 'week': '1w', 'month': '1m', 'year': '1y'}
 
 
@@ -35,7 +108,9 @@ base_url = "https://www.chinaso.com"
 
 
 def init(_):
 def init(_):
     if chinaso_category not in ('news', 'videos', 'images'):
     if chinaso_category not in ('news', 'videos', 'images'):
-        raise SearxEngineAPIException(f"Unsupported category: {chinaso_category}")
+        raise ValueError(f"Unsupported category: {chinaso_category}")
+    if chinaso_category == 'news' and chinaso_news_source not in typing.get_args(ChinasoNewsSourceType):
+        raise ValueError(f"Unsupported news source: {chinaso_news_source}")
 
 
 
 
 def request(query, params):
 def request(query, params):
@@ -56,6 +131,11 @@ def request(query, params):
             'params': {'start_index': (params["pageno"] - 1) * results_per_page, 'rn': results_per_page},
             'params': {'start_index': (params["pageno"] - 1) * results_per_page, 'rn': results_per_page},
         },
         },
     }
     }
+    if chinaso_news_source != 'all':
+        if chinaso_news_source == 'EPAPER':
+            category_config['news']['params']["type"] = 'EPAPER'
+        else:
+            category_config['news']['params']["cate"] = chinaso_news_source
 
 
     query_params.update(category_config[chinaso_category]['params'])
     query_params.update(category_config[chinaso_category]['params'])
 
 

+ 17 - 3
searx/settings.yml

@@ -619,23 +619,37 @@ engines:
     # to show premium or plus results too:
     # to show premium or plus results too:
     # skip_premium: false
     # skip_premium: false
 
 
+  # WARNING: links from chinaso.com voilate users privacy
+  # Before activate these engines its mandatory to read
+  # - https://github.com/searxng/searxng/issues/4694
+  # - https://docs.searxng.org/dev/engines/online/chinaso.html
+
   - name: chinaso news
   - name: chinaso news
-    chinaso_category: news
     engine: chinaso
     engine: chinaso
     shortcut: chinaso
     shortcut: chinaso
+    categories: [news]
+    chinaso_category: news
+    chinaso_news_source: all
     disabled: true
     disabled: true
+    inactive: true
 
 
   - name: chinaso images
   - name: chinaso images
-    chinaso_category: images
     engine: chinaso
     engine: chinaso
+    network: chinaso news
     shortcut: chinasoi
     shortcut: chinasoi
+    categories: [images]
+    chinaso_category: images
     disabled: true
     disabled: true
+    inactive: true
 
 
   - name: chinaso videos
   - name: chinaso videos
-    chinaso_category: videos
     engine: chinaso
     engine: chinaso
+    network: chinaso news
     shortcut: chinasov
     shortcut: chinasov
+    categories: [videos]
+    chinaso_category: videos
     disabled: true
     disabled: true
+    inactive: true
 
 
   - name: cloudflareai
   - name: cloudflareai
     engine: cloudflareai
     engine: cloudflareai