KarmaCoverage Posted October 29, 2015 Share Posted October 29, 2015 When did this take place? I assume all the content over there is gone? I guess the redirect is a good thing. karlos and Parabellum 2 Link to comment Share on other sites More sharing options...
karlos Posted October 29, 2015 Share Posted October 29, 2015 It started redirecting from yesterday I think. We might be able to see old content from the web archive site. Link to comment Share on other sites More sharing options...
KarmaCoverage Posted October 29, 2015 Author Share Posted October 29, 2015 web archive site? Link to comment Share on other sites More sharing options...
Phintech Posted October 29, 2015 Share Posted October 29, 2015 You can still use google's cache, just have to replace the url below with the url in the query string. Not sure how long this will last though with the redirect in place. General Discussion http://webcache.googleusercontent.com/search?q=cache:https://xrptalk.org/forum/7-general-discussion/ Topic Example http://webcache.googleusercontent.com/search?q=cache:https://xrptalk.org/topic/6306-ripple-labs-responded-to-bitstamp-lawsuit-jed-co-exceeded-the-agreed-upon-xrp-sales-limits/ Link to comment Share on other sites More sharing options...
karlos Posted October 29, 2015 Share Posted October 29, 2015 https://web.archive.org/web/*/Xrptalk.org Link to comment Share on other sites More sharing options...
Phintech Posted October 29, 2015 Share Posted October 29, 2015 2 minutes ago, karlos said: https://web.archive.org/web/*/Xrptalk.org Alot of the thread pages from the general discussion page were dead in the latest archive there. If you use the archived version from Sep 15 though, they seem to be working. https://web.archive.org/web/20150915122939/https://xrptalk.org/forum/7-general-discussion/ karlos and dzham 2 Link to comment Share on other sites More sharing options...
FNetV1 Posted October 30, 2015 Share Posted October 30, 2015 Its a shame that Hurukan decided to take down the forum now when he said he was going to leave it up for a few more months. I was not done yet in saving a copy of that site, I was only able to save over 4 GB of content within a period of like 30 days and a great deal of links are missing. For example on the General page my HTrack crawler managed to crawl pages from 1 to 6 and then the last 5 pages, but nothing in the middle yet. Here's why my crawler was working very slow: The server was configured in such as way that if you made repeated requests for a web page faster than 0.2 seconds the server would ban your IP address for up to 6 hours, I learned the hard way when I set my crawler to crawl multi threaded up to 8 threads as quickly as possible and over 2 minutes later when I had crawled like the first 50 MB my IP was banned. To make matter worse, Win HTTrack Website Copier's Proxy server settings only applies to HTTP only sites, if you try to crawl a website using a proxy server that happens to be HTTPS (Secure HTTP) Win HTTrack would bypass your proxy server, again I learned this the hard way and earned myself another ban for a few hours. I tried everything that I could in order to get crawling working for the HTTPS website and I failed. I have tried many other crawlers out there, including the paid for ones (the trial versions) and most of them did not even had proxy support. Next, I tried running Win HTTrack inside FreeCAP (formerly known as SocksCap back in the days), a program that proxifies forcefully everything that have to do with networking for any particular program that you choose in my attempt to force Win HTTrack to use my proxy not only for HTTP website, but for HTTPS websites as well such as XRPtalk.org that only gets served HTTPS wise and forcefully, I tried to go to http://xrptalk.org and was immediately forwarded to httpS://xrptalk.org (probably a .htaccess server wide rule Hurukan had defined to force HTTPS onto everyone, nothing bad with that, just that if you try to use programs like Win HTTrack to crawl using a proxy server you are not going to be able to do so) I had ready with me a software called MultiProxy installed where each HTTP or HTTPS request was supposed to get served by a unique IP address based on my large list of HTTPS proxy servers I had downloaded, tested and configured, but it was all for nothing as I was unable to get the darned Win HTTrack to crawl using my master proxy server for the HTTPS based website, it only worked for HTTP based website. I took my liberty to submit a bug report for the creator of Win HTTrack informing that the program's proxy support only works for HTTP and not for HTTPS websites, that for HTTPS website it simply gets bypassed and uses your real IP address. So, since I was forcefully limited to using my real IP address as I could not use a proxy server to crawl the HTTPS website, I had is set to crawl at 0.2 pages per seconds, that was the value that I was able to determine based on trial and error that did not resulted in an eventual ban, but its slow as molasses, so I just delegated my TV's computer system that I leave on 24/7 for the job and I was not willing to deal with VPN proxy servers as it would be very manual and I would have to manually change the IP address each time one gets banned and not all connection attempts are successful due to bad/offline servers meaning more wasted time between disconnecting/reconnecting to connect to another VPN server, so I went the slow normal way which got me over 4 GB worth of materials. My goal was to get the whole site's content, fully 100% and then host it into a domain name I had registered for this purpose that I was and I am still planning to use for discussions in related to speculation not only for XRP, BTC and all cryptos, but speculation for all kinds of assets including the stock exchange, penny stocks, etc and give access to the crawled content as a reference to traders/speculators. MundoXRP and Parabellum 2 Link to comment Share on other sites More sharing options...
T8493 Posted October 30, 2015 Share Posted October 30, 2015 (edited) You could use tor as a https proxy. Tor automatically changes your IP addresses. Tor + wget works like a charm, even for large websites. Edited October 30, 2015 by T8493 Link to comment Share on other sites More sharing options...
xrp Posted October 31, 2015 Share Posted October 31, 2015 Wow, FNetV1... kudos to you for the efforts on archiving the material! That sounds like an incredible amount of work. Hopefully it preserves some great value for future use. Link to comment Share on other sites More sharing options...
Parabellum Posted October 31, 2015 Share Posted October 31, 2015 Very strange that it doesn't link to XRPchat.com. I guess it fits in Hurukan his attitude of resentment towards large parts of the community and as a type of final 'revenge'. I used to have a lot of respect for Hurukan, but that move with xrptalk made me lose it all at once. D-fault123 1 Link to comment Share on other sites More sharing options...
D-fault123 Posted October 31, 2015 Share Posted October 31, 2015 5 minutes ago, Parabellum said: Very strange that it doesn't link to XRPchat.com. I guess it fits in Hurukan his attitude of resentment towards large parts of the community and as a type of final 'revenge'. I used to have a lot of respect for Hurukan, but that move with xrptalk made me lose it all at once. Yep, I'm sure he knows about this site. Link to comment Share on other sites More sharing options...
nik Posted October 31, 2015 Share Posted October 31, 2015 Should there be an initiative aimed at somehow maintaining a balance between efforts and benefits for karlos, please count me in as well. Link to comment Share on other sites More sharing options...
kanaas Posted October 31, 2015 Share Posted October 31, 2015 1 hour ago, D-fault123 said: Yep, I'm sure he knows about this site. Are you sure? It's not because every other singe XRPTalk user has found about this beauty, that one single user like The Hurukan cannot have lost his way Link to comment Share on other sites More sharing options...
kanaas Posted October 31, 2015 Share Posted October 31, 2015 (edited) Wonder if Hurukan did handover his XTK IOU's to the Ripple folks as well. I've still some 100K of them http://ledgermonitor.heartbit.io/app?{"address":"raXpsscPp99gDrsm6qzTy9c6wQitr6q1h"} Edited October 31, 2015 by kanaas Link to comment Share on other sites More sharing options...
FNetV1 Posted October 31, 2015 Share Posted October 31, 2015 (edited) On 10/30/2015, 3:21:09, T8493 said: You could use tor as a https proxy. Tor automatically changes your IP addresses. Tor + wget works like a charm, even for large websites. While the idea of using TOR never crossed my mind because I was more interested in getting my multi threaded scrapper to crawl the site automatically and as quickly as possible, using the TOR browser (based on Firefox) would have been manual, sure there is a plugin I could have used and that I also explored, but it was a much more bastardized version of a web crawler that was much slower than using HTTrack single threaded mode and was prone to instability issues when attempting to crawl such a large website and there was no way to save states to resume later on if for some reason the browser crashed, closed, etc, but HTTrack had this functionality of resuming where it left off. Also, for using TOR, HTTrack is a web browser on its own and while the installation of TOR gave you a proxy server (127.0.0.1:????) that you could use to configure other locally installed program to pass thru TOR, I would have gotten the same results if I had configured HTTrack to connect through TOR's local proxy server since HTTrack was bypassing the proxy server for HTTPS websites it would have not worked as it did not work for my Multi Proxy server and other proxy servers that I had used. for WGET for Windows, I was able to get WGET to crawl the HTTPS site using a proxy server, Multi Proxy > 127.0.0.1:8080, and had muttiproxy configured to serve a unique proxy (IP) pre each request, this was working fine, but I was very, deeply dissapointed to the fact that WGET lacks multi thread support, and I was limited to just one HTTPS request at once, I tried to running multiple WGET instance at once with the "skip already present pages on disk" parameter enabled, but each of these instance were wasting time connecting to the site and probing the page's content before realizing that I already had a copy of such page before moving on to the next link, this was obviously wasting too many time, so I just left one thread (instance) running with the proxy server support without any time wait limits, scraping the next page as soon as the last page was done, after leaving my WGET with proxy server experiment up and running for a few days on another dedicated PC parallel to my other dedicated PC that had HTTrack without any proxy server running I realized that I was crawling more MB's of content using HTTrack than I was with WGET, the WGET experiment was a clear failure as it was performing slower than my non proxied HTTrack at 0.2 pages per second, part of this reason was, there were many proxy servers within my compiled list of HTTPS proxy servers that were simply too slow and when WGET hit a request using a slow proxy server I noticed that that particular request would take as much as 15 (!!) seconds to finish crawling the page before WGET went to the next page, this of course could have been mitigated if WGET *HAD* multi thread support, I could have easily ran like 20 to 40 threads at once to compensate for the slow proxies, but the developers of WGET never crossed into their minds to implement multi thread support for WGET, being an open source project how come no one though of this?? I was very frustrated to the fact that I had no viable way of crawling an HTTPS website multi threaded mode through a proxy! All the programs, including HTTrack that comes with proxy support (only for HTTP sites thats it) failed me! It feels like if the developers of these crawling programs doesn't want you to crawl HTTPS website and makes it difficult for you to do so. A good stable multi threaded software like HTTrack that gives you lots of features, ability to change user agent (and I had to, because Hurukan had blocked the HTTrack's default user agent to I had to mimick that of Firefox for it to work), the ability to tell the program to wait x number of seconds before attempting to crawl the next page (I had to set 0.2 requests per seconds in order to avoid getting banned for hours), and this program offers multi threaded support, a features that completely wont help you if you can't use proxies as must servers are configured by default to ban you if you perform too many requests at once, HTTracks mitigates this by offering the user to use a proxy server, a proxy server that only works for HTTP only sites, when over 90% of all websites out there today are configured to serve HTTPS only pages nullifying completely what ever benefit the user could get by crawling multi threaded (multiple pages at once) If it wasn't for this, I would have probably be done with the job in less than a week. Edited November 1, 2015 by FNetV1 Adding more info and correcting typographical errors. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now