The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.

Forums closing at the end of January - Alternatives?

Discussion in 'Dell Latitude, Vostro, and Precision' started by mdsurveyor, Jan 18, 2022 at 10:10 AM.

  1. Aaron44126

    Aaron44126 Notebook Prophet

    Reputations:
    874
    Messages:
    5,543
    Likes Received:
    2,038
    Trophy Points:
    331
    Trying again with a much throttled download process. No simultaneous connections, just one download thread. And also I am limiting it to one page hit every 3 seconds, instead of basically "pull stuff down as fasts as you can". We'll see what happens.
     
  2. Aaron44126

    Aaron44126 Notebook Prophet

    Reputations:
    874
    Messages:
    5,543
    Likes Received:
    2,038
    Trophy Points:
    331
    So I had to restart it a few times to tweak the "scope" of the pull, but I think that I am happy with it now. It is running at one page hit every 2 seconds, it's been going for an hour now, and I haven't hit a block. (I'm downloading member profile pages too but I had to make sure that it doesn't crawl off into everyone's "recent postings" which would massively blow up the scope of the pull.)

    ...At this rate, my quick math shows that it would take nearly 2 days to download the entire Dell section — if every thread just had one page, which is not the case. Still, I think that I should be able to get through the Dell section before the site goes down. If not, then I'll still make the portion that I have crawled available which should be a good chunk of it. Getting through the entire site at this speed doesn't seem feasible.

    [Edit] Realized that the "similar threads" linked at the bottom of each thread page is going to probably blow up the scope anyway. Not sure if there's anything that I can really do about that. I'm just going to let it run as long as I can I guess... and maybe to some separate ones to make sure that certain threads that I really want are saved.
     
    Last edited: Jan 19, 2022 at 10:01 AM
  3. unnoticed

    unnoticed Notebook Consultant

    Reputations:
    29
    Messages:
    117
    Likes Received:
    59
    Trophy Points:
    41
    Exactly that happened to me, I wondered why it took so long and offline explorer was ripping every thread in this box.
    Threw all the projects in the bin and started over.
    Found another alternative with the wizard specifically for forum threads that won't include anything other than the actual url as a wildcard from the forum thread.


    I followed your settings and it seem to work.
    2 threads and 2 seconds delay and speed throttled down to medium (20 480 bytes/s).
    If I hit the firewall I'll go down to 1 thread. Slow and steady wins the race...if I have enough time.
    Dammit...hit the firewall, 18 pages of M4800 thread saved.

    Next time 1 thread and 5 seconds delay.


    .....hang on. This software has proxy's. I might be able to start a tor proxy
    Could not get the proxy working inside the software but I set the internal browser to internet explorer, set a socks5 proxy on internet explorer and pointed it to the tor 127.0.0.1:9050.
    Check mate! When I hit the firewall in theory I can just switch tor identity.
     
  4. Reciever

    Reciever D! For Dragon!

    Reputations:
    1,491
    Messages:
    5,320
    Likes Received:
    4,090
    Trophy Points:
    431
    Well I can download from 3 different sources. Work, home, home via VPN. @Aaron44126 Im sure there are many others that would assist in this endeavor if we allocate members to certain portions of the forum.

    Please let me know when you have had a good balance for several hours without being cut off and we can spread the word and come up for a plan on that side of the effort then create a few repositories for everyone who is interested to be able to retrieve them from
     
    Tenoroon likes this.
  5. Aaron44126

    Aaron44126 Notebook Prophet

    Reputations:
    874
    Messages:
    5,543
    Likes Received:
    2,038
    Trophy Points:
    331
    I seem to be able to run at 1 thread / 1 second. I'm now trying with scope limited to just the Latitude/Precision forum and we'll see how that goes. I'm afraid that it will still spider out too much with the "Similar threads" and I'll just have to pick and choose threads to save. (Saving an individual thread seems to be straightforward at least.)
     
  6. etern4l

    etern4l Notebook Virtuoso

    Reputations:
    2,911
    Messages:
    3,524
    Likes Received:
    3,442
    Trophy Points:
    331
    Just short off 100k documents to download the whole site... Gosh. Interestingly I'm not seeing the JS link replacement issue (I can see a replacement happening when I hover over a link, but jt just replaces relative with absolute local URl. Most downloaded pages have.no CSS either (home page is fine) , not sure if the relevant CSS downloada are still pending or something is messed up. HTTrack 3.49-2, pretty much defat mirror settings + filtering out of Tapatalk stuff and rate limiting.
     
  7. Reciever

    Reciever D! For Dragon!

    Reputations:
    1,491
    Messages:
    5,320
    Likes Received:
    4,090
    Trophy Points:
    431
    If we can get a good template then we can try to coordinate with what members that are willing to assist and determine priority of threads to be saved then work our way through the rest
     
  8. Aaron44126

    Aaron44126 Notebook Prophet

    Reputations:
    874
    Messages:
    5,543
    Likes Received:
    2,038
    Trophy Points:
    331
    I'm not having an issue with CSS on downloaded pages. I am noticing that some images like user avatars do not show up. It looks like XenForo puts a ? and timestamp or something in those image URLs and because that's not part of the filename, it doesn't work when you open the page locally... but if you threw it on a web server then it should be fine because that part of the URL would be effectively ignored.

    Almost 2 hours on my current run and no issues with the firewall. It's gotten through about 14 out of the 381 pages of threads in the Latitude/Precision section so it seems like a completeable task (even allowing it to do some "similar threads" spidering) — though many of the longer threads with many pages are not fully downloaded yet either. I also updated the default config to have it download any .zip files that it runs across — might include some unnecessary stuff but I know there are some vBIOS attachments and stuff like that which I would like to have included.

    [Edit] Did start getting some "403 forbidden" errors; they all came in one batch and then it started working again. Found this page on how to clear them out, which I will try when the download process finishes. https://forum.httrack.com/readmsg/33890/31062/index.html
     
    Last edited: Jan 19, 2022 at 4:14 PM
  9. etern4l

    etern4l Notebook Virtuoso

    Reputations:
    2,911
    Messages:
    3,524
    Likes Received:
    3,442
    Trophy Points:
    331
    Yeah, the extended-parsing option doesn't work properly. All the links end up pointing to the root directory, and when disabled, the JS substitutes the original hostname. Would need some postprocessing, as you said.
     
  10. etern4l

    etern4l Notebook Virtuoso

    Reputations:
    2,911
    Messages:
    3,524
    Likes Received:
    3,442
    Trophy Points:
    331
    Can you spare me some gawking and post the command to automatically remove the offe ding javascript?
     
Loading...

Share This Page