@Fox Thank you very much. I can jump to next video now.
Is this website really build a protection against scraping?
-
the website is https://www.proxyrotator.com/free-proxy-list/ .
After months mastering BAS (and learning javascript,node,regex,xpath etc...) I became confident there is no website I can't take the information from... until I tried the one above.
If someone knows or has idea how to take the proxies from it, please share.
I'm really curious what solutions (maybe without the obvious one - screenshot and OCR over it) we have in BAS for it.
-
@hungrym said in Is this website really build a protection against scraping?:
the website is https://www.proxyrotator.com/free-proxy-list/ .
After months mastering BAS (and learning javascript,node,regex,xpath etc...) I became confident there is no website I can't take the information from... until I tried the one above.
If someone knows or has idea how to take the proxies from it, please share.
I'm really curious what solutions (maybe without the obvious one - screenshot and OCR over it) we have in BAS for it.What exactly is the problem? I looked at this site, it is simple in my opinion, a little inconvenient to parse, but in General it is not a problem.
-
@usertrue And how exactly? Did you check the source code? It's not possible to copy/paste the proxy (you can just try in normal browser), how about to write BAS script to parse it. I mean what xpath,css, regex you will use to take the full proxy and add it to a list in BAS ?
-
@hungrym said in Is this website really build a protection against scraping?:
And how exactly? Did you check the source code? It's not possible to copy/paste the proxy (you can just try in normal browser), how about to write BAS script to parse it. I mean what xpath,css, regex you will use to take the full proxy and add it to a list in BAS ?
Yes, I looked at the page code, it has everything you need.
-
@hungrym I wrote a js that runs in a browser and collects data. But port comes in the form of base64 pictures of the - think themselves further. There are recognition modules in node js, but I don't have time for that.
{ let proxy = []; let rows = Array.from(document.querySelectorAll('tbody tr:not([class])') ); rows.forEach( row => { let ip = Array.from(row.querySelectorAll('td:nth-of-type(2)>*') ).filter(el=> { let xy = el.getBoundingClientRect(); return el == document.elementFromPoint(xy.x, xy.y); }).map( el => el.textContent).slice(0,-1).join(''); let port = row.querySelectorAll('td:nth-of-type(3)>img')[0].src.split(';')[2]; let loc = row.querySelectorAll('td:nth-of-type(4)')[0].textContent.trim(); let type = row.querySelectorAll('td:nth-of-type(6)')[0].textContent; proxy.push({ip,port,type,loc}); }); JSON.stringify(proxy) }