filtron.rst 4.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. ==========================
  2. How to protect an instance
  3. ==========================
  4. Searx depens on external search services. To avoid the abuse of these services
  5. it is advised to limit the number of requests processed by searx.
  6. An application firewall, ``filtron`` solves exactly this problem. Information
  7. on how to install it can be found at the `project page of filtron
  8. <https://github.com/asciimoo/filtron>`__.
  9. Sample configuration of filtron
  10. ===============================
  11. An example configuration can be find below. This configuration limits the access
  12. of:
  13. - scripts or applications (roboagent limit)
  14. - webcrawlers (botlimit)
  15. - IPs which send too many requests (IP limit)
  16. - too many json, csv, etc. requests (rss/json limit)
  17. - the same UserAgent of if too many requests (useragent limit)
  18. .. code:: json
  19. [{
  20. "name":"search request",
  21. "filters":[
  22. "Param:q",
  23. "Path=^(/|/search)$"
  24. ],
  25. "interval":"<time-interval-in-sec (int)>",
  26. "limit":"<max-request-number-in-interval (int)>",
  27. "subrules":[
  28. {
  29. "name":"roboagent limit",
  30. "interval":"<time-interval-in-sec (int)>",
  31. "limit":"<max-request-number-in-interval (int)>",
  32. "filters":[
  33. "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"
  34. ],
  35. "actions":[
  36. {
  37. "name":"block",
  38. "params":{
  39. "message":"Rate limit exceeded"
  40. }
  41. }
  42. ]
  43. },
  44. {
  45. "name":"botlimit",
  46. "limit":0,
  47. "stop":true,
  48. "filters":[
  49. "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"
  50. ],
  51. "actions":[
  52. {
  53. "name":"block",
  54. "params":{
  55. "message":"Rate limit exceeded"
  56. }
  57. }
  58. ]
  59. },
  60. {
  61. "name":"IP limit",
  62. "interval":"<time-interval-in-sec (int)>",
  63. "limit":"<max-request-number-in-interval (int)>",
  64. "stop":true,
  65. "aggregations":[
  66. "Header:X-Forwarded-For"
  67. ],
  68. "actions":[
  69. {
  70. "name":"block",
  71. "params":{
  72. "message":"Rate limit exceeded"
  73. }
  74. }
  75. ]
  76. },
  77. {
  78. "name":"rss/json limit",
  79. "interval":"<time-interval-in-sec (int)>",
  80. "limit":"<max-request-number-in-interval (int)>",
  81. "stop":true,
  82. "filters":[
  83. "Param:format=(csv|json|rss)"
  84. ],
  85. "actions":[
  86. {
  87. "name":"block",
  88. "params":{
  89. "message":"Rate limit exceeded"
  90. }
  91. }
  92. ]
  93. },
  94. {
  95. "name":"useragent limit",
  96. "interval":"<time-interval-in-sec (int)>",
  97. "limit":"<max-request-number-in-interval (int)>",
  98. "aggregations":[
  99. "Header:User-Agent"
  100. ],
  101. "actions":[
  102. {
  103. "name":"block",
  104. "params":{
  105. "message":"Rate limit exceeded"
  106. }
  107. }
  108. ]
  109. }
  110. ]
  111. }]
  112. Route request through filtron
  113. =============================
  114. Filtron can be started using the following command:
  115. .. code:: sh
  116. $ filtron -rules rules.json
  117. It listens on ``127.0.0.1:4004`` and forwards filtered requests to
  118. ``127.0.0.1:8888`` by default.
  119. Use it along with ``nginx`` with the following example configuration.
  120. .. code:: nginx
  121. location / {
  122. proxy_set_header Host $http_host;
  123. proxy_set_header X-Real-IP $remote_addr;
  124. proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  125. proxy_set_header X-Scheme $scheme;
  126. proxy_pass http://127.0.0.1:4004/;
  127. }
  128. Requests are coming from port 4004 going through filtron and then forwarded to
  129. port 8888 where a searx is being run.