Best method(s) to minimize/compact traffic consumption of project



  • Hi there,

    As I'm more and more using proxies that are billed per Gigabyte I'm wondering what is the best approach to achieve minimal traffic consumption in a BAS-Project?

    I assume the least traffic is needed by a bot based on pure HTTP-Requests but that requires a lot of technical knowledge and I think this can be easily recognized by the automated sites as there is no mouse-movement, clicks and so on?

    Also, for a browser based project, I read something about to pre-load and cache all the needed elements on the website and re-use this cached elements again and again for creating different accounts (maybe also through different stream of the same project). So most of the traffic is just loaded one-time, cached and the cached elements are re-used later. But I don't know how exactly I can handle this approach with BAS and also avoid some possible anonymity leaks through the cached elements with this method.
    Would be great if someone could help and point me to the right direction.

    And what other solutions are available to minimize the traffic usage and stay under the radar of the sites anti-bot mechanisms?

    Thanks for you help and sharing your experience :-)





  • Thank you for the links. But I meant more on the traffic consumption side. Not the pure CPU/RAM usage.

    E.g. I use luminati proxies and they charge about $15/GB. So if I create e.g. 1k gmx accounts it would cost me about $45 if I load all the elements on their sign up page and cache nothing (not to mention the shitty ads and "news" on their main site if you want to make it more legit). Already tried to disable all the images and unnecessary scripts on the site but traffic per account remains still at about 1MB.

    I think if I can cache all the elements that remain the same on every sign up (but avoid the fingerprinting-/tracking ones that need to be loaded again every time) that I can reduce the cost to 25%. But I don't know how to correctly cache all the needed things and reload the other ones in BAS...

    Maybe there is an software out there that works like AdGuard (an intercepting "proxy" server that filters out advertisement from your surfing traffic). Working as an indermediate that caches most of the elements on specified websites and gives the opportunity to (re-)load the needed things, so saving most of the normally needed bandwith. Read something about Squid, but seems too complicated to configure, so an easy to set up solution would be preferred.

    Thank you guys!



  • The only thing I know is to use Mask Deny and have it ignore unnecessary files. Things like *.jpg *.png etc.. if you can get away with it, ignoring *.js should save some GB but might break the site. Also search source code of the site and deny any ad network domains (Google Adsense) etc.. this will also eat away bandwith.

    Also use cookies for logging in, if possible

    If anyone else has some tips I'd love to hear them, because I'm also trying to get my bandwith consumption low



  • @spockthe40oz Thank you for sharing your thoughts. I already blocked pictures and other things but to much blocking leads to raising red flags and getting accounts banned on most sites.

    I'm currently playing around with a local Squid caching proxy. In general it works well and reduces data consumption a lot but seems to have some problems the way how BAS sets proxy connections. Have to ask about that in another thread.



  • @morpheus93

    I'm currently playing around with a local Squid caching proxy. In general it works well and reduces data consumption a lot but seems to have some problems the way how BAS sets proxy connections. Have to ask about that in another thread.

    I got a same question, so watching.
    when u using Squid, what about the "stay under the radar of the sites anti-bot mechanisms" ? does fingerprint still working right?



  • @morpheus93 How have you set Squid local proxy? I mean you use other proxy, so you settled root Squid and trhough it then you are using Luminati? Can you, please, explain, I do not need traffic economy, I need faster load and I don't need a lot of cache information, I aslo wonder how you restricted img and how does it affect fingerprint



  • same question, did you find a solution? I need to load page from cache.



  • has anyone found an individual solution that he want to share here? would be interesting to discuss the pros and cons of the differente approaches.



  • I'm wondering if sharing cache/cached files between different accounts (profiles) will lower anonymity (e.g. the target website can track accounts together as they share some cached pictures or css/js files)?

    What are your recommendations/experiences guys?



  • @morpheus93, @NotWegant

    I believe there is possible way to proceed:

    • Launch your browser without using a proxy or with cheap proxy.
    • Perform some browsing tasks to fill up the profile cache.
    • Clear all cookies.
    • Save your browser profile.
    • Copy the saved profile to the new browser profile and run your production script.

    Additionally, you may need to make some deals with local storage depending on the nature of your production script. Make sure to factor this into your process as needed.



  • @morpheus93, @NotWegant

    In my experience, to save on traffic, I implemented the following solution:

    • Installed my own proxy software on a server
    • Configured the scripts to use the server's address
    • Redirected requests to either an expensive or cheap proxy on the server depending on the website's address.

    I maintained my own list of websites for the cheap proxy. If a request was made to one of these websites, it was redirected to the cheap proxy. Otherwise, it was redirected to an expensive one. For example, if the request was for static files (CDN), I redirected it to the cheaper proxy.

    I used Squid software for this purpose and wrote a script to maintain the required configuration files. This approach helped me optimize traffic and reduce costs for the websites I managed.

    In addition, to further reduce traffic, I also banned some websites like google-analytics.com, hexagon-analytics.com, and many others instead of redirecting them to the cheaper proxy.

    By using a cheaper proxy for requests to static files and maintaining a list of websites that could use the cheaper proxy, I was able to significantly reduce the cost of mobile traffic for that project.

    However, I'm not sure that Squid is the best solution for this task. In the future, I plan to try using HAProxy because I think redirecting TCP connections may offer better performance. This could potentially lead to even further improve the performance.



  • @sergerdn Wow, thank you very much for sharing your experiences and many helpful hints.

    I'm currently working on a quite similar approach as you described in your posting above (collecting cache data with cheap proxy and then switchig to the more expensive one).

    Hope this works as expected and will not open some new anti-bot detection capabilities.

    Your approach using a centralized re-direction server sounds very sophisticated. I will also look into this and how it can be implemented.

    Thank you!



  • @morpheus93 said in Best method(s) to minimize/compact traffic consumption of project:

    Your approach using a centralized re-direction server sounds very sophisticated.

    Thank you for your feedback on my previous approach. I agree that using a main-in-the-middle proxy server can be a more efficient solution. By centralizing the redirecting process in a proxy server, you can avoid making changes to each individual BAS script and instead make updates to the proxy server configuration. This can save time and effort in managing and maintaining the redirection process.

    In addition, using a proxy server can provide additional security features such as filtering. For example, you may want to block some network requests to certain domains, which can help improve both traffic spending and security.

    But of course, you can implement both a main-in-the-middle proxy and prepare your profiles with cheap proxies to fill up the local cache. This can help improve performance and further optimize your traffic flow.

    Overall, a main-in-the-middle proxy server combined with a cheap proxy cache can be a powerful solution for managing and optimizing traffic.



  • @sergerdn Thank you for sharing. Can I get a copy of your config for reference? Thank you very much



  • @Xxxxxx said in Best method(s) to minimize/compact traffic consumption of project:

    @sergerdn Thank you for sharing. Can I get a copy of your config for reference? Thank you very much

    I'm not sure if it will be of much help to you. Please note that I only copied and pasted a small part of the project, and you'll need some more information to make it work.

    I apologize for any incorrect comments I may have provided as I do not remember specifically which line they referred to.

    Here is snippet:

    squid.conf:

    # This ACL is used to exempt requests to any backup proxies from the other ACL rules.
    # I do not remember why I did it.
    acl no_backup_proxy_acl dstdomain .google.com
    
    # These ACLs are used to match requests to specific domains that are associated with the backup proxy.
    # For example, the following ACL matches requests that are destined for 'api64.ipify.org'.
    acl proxy_backup_domain_acl dstdomain api64.ipify.org
    acl proxy_backup_domain_acl dstdomain .mradx.net
    acl proxy_backup_domain_acl dstdomain .yastatic.net
    acl proxy_backup_domain_acl dstdomain .yandex.net
    
    # This line sets up a port for Squid to listen on.
    http_port 10.66.66.5:14199 name=port_14199 tcpkeepalive=60,30,3
    
    # This ACL is used to match requests that are destined for the port set up above.
    # For example, the following ACL matches requests that are destined for port 14199.
    acl port_14199_acl myportname port_14199
    
    # This line allows Squid to handle requests that match the above ACL.
    never_direct allow port_14199_acl
    
    # This line sets up a cache peer with the name 'proxy14199'.
    cache_peer proxy_ip_of_proxy_provider_1 parent 9599 0 connect-fail-limit=100 connect-timeout=10 no-tproxy no-query proxy-only no-digest no-netdb-exchange name=proxy14199 login=my_login_of_proxy_provider_1
    
    # This line sets up another cache peer with the name 'proxy14199_backup'.
    cache_peer proxy_ip_of_proxy_provider_2 parent 22225 0 connect-fail-limit=100 connect-timeout=10 no-tproxy no-query proxy-only no-digest no-netdb-exchange name=proxy14199_backup login=my_login_of_proxy_provider_2
    
    
    # This rule allows traffic that matches the specified ACLs to access the cache peer named 'proxy14199'.
    cache_peer_access proxy14199 allow port_14199_acl no_backup_proxy_acl 
    
    # This rule denies traffic that matches the specified ACLs from accessing the cache peer named 'proxy14199'.
    cache_peer_access proxy14199 deny port_14199_acl proxy_backup_domain_acl
    cache_peer_access proxy14199 allow port_14199_acl
    cache_peer_access proxy14199 deny all
    
    # This rule denies traffic that does not match the specified ACLs from accessing the cache peer named 'proxy14199_backup'.
    cache_peer_access proxy14199_backup deny !port_14199_acl
    cache_peer_access proxy14199_backup deny !proxy_backup_domain_acl
    
    # This rule allows traffic that matches the specified ACLs to access the cache peer named 'proxy14199_backup'.
    cache_peer_access proxy14199_backup allow port_14199_acl proxy_backup_domain_acl
    
    # This rule denies all other traffic from accessing the cache peer named 'proxy14199_backup'.
    # The 'deny all' directive at the end of this block sets the default behavior for requests that do not match the above rules.
    cache_peer_access proxy14199_backup deny all
    


  • @sergerdn Thanks!!! It help me avoid a lot trouble!



  • @Xxxxxx said in Best method(s) to minimize/compact traffic consumption of project:

    @sergerdn Thanks!!! It help me avoid a lot trouble!

    You are welcome. I understand that using Squid ACL rules can be a challenge for everyone.



  • @sergerdn Hello, I now try to work with your recommand way(squid), but meet one matter about forward https requests to specific proxy. I write one conf but can only forward http request. Pleasure if received from your reply.



  • @Xxxxxx said in Best method(s) to minimize/compact traffic consumption of project:

    @sergerdn Hello, I now try to work with your recommand way(squid), but meet one matter about forward https requests to specific proxy. I write one conf but can only forward http request. Pleasure if received from your reply.

    Squid is overkill for many tasks because its ACL is not easy for everyone. The first time I had a task, I spent about a week making it work.

    And please make sure to note that I don't know Squid very well and I am not a system administrator of Linux. I am just an ordinary user.


Log in to reply