filtron.rst 4.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
  1. .. _searx_filtron:
  2. ==========================
  3. How to protect an instance
  4. ==========================
  5. .. sidebar:: further reading
  6. - :ref:`filtron.sh`
  7. .. contents:: Contents
  8. :depth: 2
  9. :local:
  10. :backlinks: entry
  11. .. _filtron: https://github.com/asciimoo/filtron
  12. Searx depens on external search services. To avoid the abuse of these services
  13. it is advised to limit the number of requests processed by searx.
  14. An application firewall, filtron_ solves exactly this problem. Filtron is just
  15. a middleware between your web server (nginx, apache, ...) and searx, we describe
  16. such infratructures in chapter: :ref:`architecture`.
  17. filtron & go
  18. ============
  19. .. _Go: https://golang.org/
  20. .. _filtron README: https://github.com/asciimoo/filtron/blob/master/README.md
  21. Filtron needs Go_ installed. If Go_ is preinstalled, filtron_ is simply
  22. installed by ``go get`` package management (see `filtron README`_). If you use
  23. filtron as middleware, a more isolated setup is recommended. To simplify such
  24. an installation and the maintenance of, use our script :ref:`filtron.sh`.
  25. Sample configuration of filtron
  26. ===============================
  27. .. sidebar:: Tooling box
  28. - :origin:`/etc/filtron/rules.json <utils/templates/etc/filtron/rules.json>`
  29. An example configuration can be find below. This configuration limits the access
  30. of:
  31. - scripts or applications (roboagent limit)
  32. - webcrawlers (botlimit)
  33. - IPs which send too many requests (IP limit)
  34. - too many json, csv, etc. requests (rss/json limit)
  35. - the same UserAgent of if too many requests (useragent limit)
  36. .. code:: json
  37. [
  38. { "name": "search request",
  39. "filters": [
  40. "Param:q",
  41. "Path=^(/|/search)$"
  42. ],
  43. "interval": "<time-interval-in-sec (int)>",
  44. "limit": "<max-request-number-in-interval (int)>",
  45. "subrules": [
  46. {
  47. "name": "roboagent limit",
  48. "interval": "<time-interval-in-sec (int)>",
  49. "limit": "<max-request-number-in-interval (int)>",
  50. "filters": [
  51. "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"
  52. ],
  53. "actions": [
  54. { "name": "log"},
  55. { "name": "block",
  56. "params": {
  57. "message": "Rate limit exceeded"
  58. }
  59. }
  60. ]
  61. },
  62. {
  63. "name": "botlimit",
  64. "limit": 0,
  65. "stop": true,
  66. "filters": [
  67. "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"
  68. ],
  69. "actions": [
  70. { "name": "log"},
  71. { "name": "block",
  72. "params": {
  73. "message": "Rate limit exceeded"
  74. }
  75. }
  76. ]
  77. },
  78. {
  79. "name": "IP limit",
  80. "interval": "<time-interval-in-sec (int)>",
  81. "limit": "<max-request-number-in-interval (int)>",
  82. "stop": true,
  83. "aggregations": [
  84. "Header:X-Forwarded-For"
  85. ],
  86. "actions": [
  87. { "name": "log"},
  88. { "name": "block",
  89. "params": {
  90. "message": "Rate limit exceeded"
  91. }
  92. }
  93. ]
  94. },
  95. {
  96. "name": "rss/json limit",
  97. "interval": "<time-interval-in-sec (int)>",
  98. "limit": "<max-request-number-in-interval (int)>",
  99. "stop": true,
  100. "filters": [
  101. "Param:format=(csv|json|rss)"
  102. ],
  103. "actions": [
  104. { "name": "log"},
  105. { "name": "block",
  106. "params": {
  107. "message": "Rate limit exceeded"
  108. }
  109. }
  110. ]
  111. },
  112. {
  113. "name": "useragent limit",
  114. "interval": "<time-interval-in-sec (int)>",
  115. "limit": "<max-request-number-in-interval (int)>",
  116. "aggregations": [
  117. "Header:User-Agent"
  118. ],
  119. "actions": [
  120. { "name": "log"},
  121. { "name": "block",
  122. "params": {
  123. "message": "Rate limit exceeded"
  124. }
  125. }
  126. ]
  127. }
  128. ]
  129. }
  130. ]
  131. Route request through filtron
  132. =============================
  133. Filtron can be started using the following command:
  134. .. code:: sh
  135. $ filtron -rules rules.json
  136. It listens on ``127.0.0.1:4004`` and forwards filtered requests to
  137. ``127.0.0.1:8888`` by default.
  138. Use it along with ``nginx`` with the following example configuration.
  139. .. code:: nginx
  140. location / {
  141. proxy_set_header Host $http_host;
  142. proxy_set_header X-Real-IP $remote_addr;
  143. proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  144. proxy_set_header X-Scheme $scheme;
  145. proxy_pass http://127.0.0.1:4004/;
  146. }
  147. Requests are coming from port 4004 going through filtron and then forwarded to
  148. port 8888 where a searx is being run.