filtron.rst 4.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145
  1. How to protect an instance
  2. ==========================
  3. Searx depens on external search services. To avoid the abuse of these services it is advised to limit the number of requests processed by searx.
  4. An application firewall, ``filtron`` solves exactly this problem. Information on how to install it can be found at the `project page of filtron <https://github.com/asciimoo/filtron>`__.
  5. Sample configuration of filtron
  6. -------------------------------
  7. An example configuration can be find below. This configuration limits the access of
  8. * scripts or applications (roboagent limit)
  9. * webcrawlers (botlimit)
  10. * IPs which send too many requests (IP limit)
  11. * too many json, csv, etc. requests (rss/json limit)
  12. * the same UserAgent of if too many requests (useragent limit)
  13. .. code:: json
  14. [{
  15. "name":"search request",
  16. "filters":[
  17. "Param:q",
  18. "Path=^(/|/search)$"
  19. ],
  20. "interval":"<time-interval-in-sec (int)>",
  21. "limit":"<max-request-number-in-interval (int)>",
  22. "subrules":[
  23. {
  24. "name":"roboagent limit",
  25. "interval":"<time-interval-in-sec (int)>",
  26. "limit":"<max-request-number-in-interval (int)>",
  27. "filters":[
  28. "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"
  29. ],
  30. "actions":[
  31. {
  32. "name":"block",
  33. "params":{
  34. "message":"Rate limit exceeded"
  35. }
  36. }
  37. ]
  38. },
  39. {
  40. "name":"botlimit",
  41. "limit":0,
  42. "stop":true,
  43. "filters":[
  44. "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"
  45. ],
  46. "actions":[
  47. {
  48. "name":"block",
  49. "params":{
  50. "message":"Rate limit exceeded"
  51. }
  52. }
  53. ]
  54. },
  55. {
  56. "name":"IP limit",
  57. "interval":"<time-interval-in-sec (int)>",
  58. "limit":"<max-request-number-in-interval (int)>",
  59. "stop":true,
  60. "aggregations":[
  61. "Header:X-Forwarded-For"
  62. ],
  63. "actions":[
  64. {
  65. "name":"block",
  66. "params":{
  67. "message":"Rate limit exceeded"
  68. }
  69. }
  70. ]
  71. },
  72. {
  73. "name":"rss/json limit",
  74. "interval":"<time-interval-in-sec (int)>",
  75. "limit":"<max-request-number-in-interval (int)>",
  76. "stop":true,
  77. "filters":[
  78. "Param:format=(csv|json|rss)"
  79. ],
  80. "actions":[
  81. {
  82. "name":"block",
  83. "params":{
  84. "message":"Rate limit exceeded"
  85. }
  86. }
  87. ]
  88. },
  89. {
  90. "name":"useragent limit",
  91. "interval":"<time-interval-in-sec (int)>",
  92. "limit":"<max-request-number-in-interval (int)>",
  93. "aggregations":[
  94. "Header:User-Agent"
  95. ],
  96. "actions":[
  97. {
  98. "name":"block",
  99. "params":{
  100. "message":"Rate limit exceeded"
  101. }
  102. }
  103. ]
  104. }
  105. ]
  106. }]
  107. Route request through filtron
  108. -----------------------------
  109. Filtron can be started using the following command:
  110. .. code:: bash
  111. $ filtron -rules rules.json
  112. It listens on 127.0.0.1:4004 and forwards filtered requests to 127.0.0.1:8888 by default.
  113. Use it along with ``nginx`` with the following example configuration.
  114. .. code:: nginx
  115. location / {
  116. proxy_set_header Host $http_host;
  117. proxy_set_header X-Real-IP $remote_addr;
  118. proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  119. proxy_set_header X-Scheme $scheme;
  120. proxy_pass http://127.0.0.1:4004/;
  121. }
  122. Requests are coming from port 4004 going through filtron and then forwarded to port 8888 where a searx is being run.