| 00:00:53 | | fangfufu joins |
| 00:01:04 | | fangfufu is now authenticated as fangfufu |
| 00:19:37 | | G4te_Keep3r quits [Client Quit] |
| 00:19:52 | | G4te_Keep3r joins |
| 00:33:53 | | hexa- quits [Quit: WeeChat 3.3] |
| 00:37:44 | | HP_Archivist (HP_Archivist) joins |
| 00:41:24 | | hexa- (hexa-) joins |
| 00:45:01 | <@JAA> | More fun with TechnologyGuide: if the URL contains 'nonexistant', the server resets the connection: http://forum.notebookreview.com/nonexistant/ |
| 00:45:54 | <Frogging101> | notebookreview is closing? :/ |
| 00:46:08 | <@JAA> | Yep |
| 00:46:40 | <Frogging101> | that sucks |
| 00:46:49 | <@JAA> | My qwarc archive of the thread pages just finished a few minutes ago. Should be complete apart from the countless broken shit like the above. |
| 00:47:17 | <Jake> | what even.... 'temp' and 'nonexistant' |
| 00:47:43 | <@JAA> | http://forum.notebookreview.com/threads/asus-v6j-everest-benchmarks.43048/ returns an empty page. |
| 00:47:57 | <@JAA> | It's kind of hilarious just how broken these forums are. |
| 00:50:31 | <@JAA> | http://forum.notebookreview.com/nessus/ |
| 00:50:36 | <@JAA> | ¯\_(ツ)_/¯ |
| 00:52:12 | <Jake> | they have some terribly broken WAF paired with a 10 year old corrupted forum database? |
| 00:58:35 | | Jaro joins |
| 00:59:25 | | Jaro quits [Remote host closed the connection] |
| 01:00:02 | | dm4v quits [Client Quit] |
| 01:00:48 | | chrismeller (chrismeller) joins |
| 01:01:00 | | chrismeller quits [Remote host closed the connection] |
| 01:02:08 | | chrismeller (chrismeller) joins |
| 01:02:30 | | chrismeller quits [Remote host closed the connection] |
| 01:03:38 | | chrismeller (chrismeller) joins |
| 01:04:00 | | chrismeller quits [Remote host closed the connection] |
| 01:04:29 | | dm4v joins |
| 01:04:29 | | dm4v is now authenticated as dm4v |
| 01:04:29 | | dm4v quits [Changing host] |
| 01:04:29 | | dm4v (dm4v) joins |
| 01:05:08 | | chrismeller (chrismeller) joins |
| 01:05:30 | | chrismeller quits [Remote host closed the connection] |
| 01:05:53 | | chrismeller (chrismeller) joins |
| 01:15:14 | | Atom-- joins |
| 01:17:59 | | Atom quits [Ping timeout: 252 seconds] |
| 01:19:35 | | datechnoman (datechnoman) joins |
| 01:33:05 | | Atom-- quits [Read error: Connection reset by peer] |
| 02:03:22 | | dm4v_ joins |
| 02:03:49 | | dm4v quits [Ping timeout: 252 seconds] |
| 02:03:49 | | dm4v_ is now known as dm4v |
| 02:03:49 | | dm4v is now authenticated as dm4v |
| 02:03:49 | | dm4v quits [Changing host] |
| 02:03:49 | | dm4v (dm4v) joins |
| 02:27:14 | | thetechrobo_ joins |
| 02:27:36 | | thetechrobo_ quits [Remote host closed the connection] |
| 02:28:43 | | thetechrobo_ joins |
| 02:29:06 | | thetechrobo_ quits [Remote host closed the connection] |
| 02:29:26 | | thetechrobo_ joins |
| 02:30:15 | | TheTechRobo quits [Ping timeout: 265 seconds] |
| 03:04:53 | | anarcat quits [Quit: rebooting] |
| 03:04:53 | | thermospheric quits [Remote host closed the connection] |
| 03:42:14 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 04:00:06 | | jacobk joins |
| 04:18:46 | | LegitSi quits [Ping timeout: 244 seconds] |
| 04:28:26 | | G4te_Keep3r quits [Ping timeout: 240 seconds] |
| 04:32:19 | | mutantmnky quits [Remote host closed the connection] |
| 04:33:25 | | mutantmnky (mutantmonkey) joins |
| 04:43:49 | | G4te_Keep3r joins |
| 04:54:33 | | OverhaulDeskwork6 joins |
| 04:58:21 | | OverhaulDeskwork quits [Ping timeout: 252 seconds] |
| 04:58:22 | | OverhaulDeskwork6 is now known as OverhaulDeskwork |
| 05:12:27 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:14:23 | <@JAA> | TechnologyGuide forum archive post counts: 1841025 from Brighthand (1880674 on homepage), 53604 from DigitalCameraReview (58092), 4180267 from NotebookReview (no global stats, about 9.35M from adding up the subforum numbers, but those don't include everything), 508834 from TabletPCReview (521529) |
| 05:15:05 | <@JAA> | The NotebookReview discrepancy seems pretty bad, but no idea where it comes from. I didn't see any systematic problems in the data. |
| 05:17:39 | <@JAA> | Adding up the 'Messages' numbers for the subforums gives 9347229 there. This doesn't include http://forum.notebookreview.com/forums/nbr-marketplace.18/ (which is shown as a link on the homepage instead of a forum entry). |
| 05:50:39 | | qwertyasdfuiopghjkl joins |
| 05:55:36 | | HP_Archivist quits [Remote host closed the connection] |
| 05:56:02 | | HP_Archivist (HP_Archivist) joins |
| 06:09:02 | | march_happy (march_happy) joins |
| 06:12:26 | | HP_Archivist quits [Ping timeout: 240 seconds] |
| 06:48:51 | | vexr leaves |
| 06:50:46 | | stormy joins |
| 06:53:25 | <stormy> | looking for someone who helped to archive soup.io |
| 07:08:46 | | Mateon1 quits [Remote host closed the connection] |
| 07:08:59 | | Mateon1 joins |
| 07:09:48 | <@OrIdow6> | stormy: Unless you're looking for someone to describe their personal experiences on the project, it's best just to ask their question |
| 07:13:22 | <stormy> | fair enough. so if I understood correctly, the archived data gets uploaded directly to archive.org and nothing else. what I'm trying to find out is: did the crawler get around the "Content Warning" pages, and, once on archive.org, how do I get around the content warning pages. |
| 07:20:02 | <@OrIdow6> | Do you have an example? |
| 07:20:43 | <stormy> | sure: https://web.archive.org/web/20191201125421/http://einefragevonstil.soup.io/ |
| 07:21:27 | <stormy> | I have enough experience in web crawling to know how it can work on the crawling side, but not with archive.org... |
| 07:23:09 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 07:23:31 | | wyatt8740 joins |
| 07:24:07 | | Eighty quits [Ping timeout: 265 seconds] |
| 07:24:27 | | Eighty (Eighty) joins |
| 07:33:54 | <@OrIdow6> | So I don't see anything to indicate that the project got those |
| 07:34:21 | <stormy> | where can I find a list of soup.io hosts that the project did cover? |
| 07:35:30 | <@OrIdow6> | I do not think any such list exists |
| 07:35:50 | <@OrIdow6> | Well, I expect it was deleted since that project as 2 years ago |
| 07:36:18 | <@OrIdow6> | Also, that site was "covered" by the project, I just don't see anything getting around the content warning |
| 07:36:54 | | adia quits [Client Quit] |
| 07:37:20 | <@OrIdow6> | If you do want a list I can giveyou some quick info on how to generate it |
| 07:37:55 | <stormy> | that'd be great, thanks |
| 07:37:57 | | adia (adia) joins |
| 07:41:24 | <spirit> | stormy: https://web.archive.org/web/*/http://example.com/* will list all URLs |
| 07:41:29 | <@OrIdow6> | Use https://archive.org/services/docs/api/internetarchive/cli.html to download all the .os.cdx.gz files in collection:archiveteam:soup.io , then decompress then, extract the URLs, and then the domains |
| 07:42:13 | <@OrIdow6> | It would be better to tell us what you're trying to do, though, since asking for what the project covered is fairly specific |
| 07:44:07 | <stormy> | thanks, will try both of this later. what I'm trying to do it getting an offline mirror for some soups that I remember. anything that still exists. I was hoping that it would exist in other forms than on archive.org, but I guess I'll have to make do with that. |
| 07:44:30 | <stormy> | gotta run for now, back in an hour. |
| 07:47:20 | | adia quits [Client Quit] |
| 07:49:03 | | stormy quits [Ping timeout: 244 seconds] |
| 07:49:19 | | adia (adia) joins |
| 07:49:26 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 07:50:46 | | fiftysix_k_modem (fiftysix_k_modem) joins |
| 08:00:42 | | sec^nd quits [Remote host closed the connection] |
| 08:01:41 | | adia quits [Ping timeout: 252 seconds] |
| 08:01:51 | | sec^nd (second) joins |
| 08:01:57 | | adia (adia) joins |
| 08:33:58 | | Lord_Nightmare quits [Client Quit] |
| 08:37:21 | | Lord_Nightmare (Lord_Nightmare) joins |
| 08:38:49 | | stormy joins |
| 08:51:03 | <stormy> | OrIdow6 are you sure that collection:archiveteam:soup.io is the correct identifier? since it gives me a query error https://archive.org/advancedsearch.php?q=collection%3Aarchiveteam%3Asoup.io&fl%5B%5D=identifier&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=50&page=1&output=json&callback=callback&save=yes |
| 08:58:23 | <@rewby> | Usually you just search for a file from the project in the UI and you can find the collection that way |
| 08:58:54 | <@rewby> | Also, if you're worried the data only lives on the ia, you can just list the items using the cli (and collection name) and just download them all |
| 08:59:33 | <@rewby> | Usually you can get most of it to replay by loading the warcs into pywb3. Sometimes you need to find some additional warcs with static site data like javascript and css files. |
| 08:59:38 | <@rewby> | The IA has them somewhere usually |
| 09:01:56 | <stormy> | I'm still having trouble getting an item list from a url prefix, either with cli or web |
| 09:02:31 | <@OrIdow6> | You're right I said iy wrong, that should be archiveteam_soupio, not archiveteam:soupio |
| 09:09:12 | <stormy> | thanks, that's getting me somewhere |
| 09:40:36 | | Mateon1 quits [Remote host closed the connection] |
| 09:40:49 | | Mateon1 joins |
| 09:46:17 | | hackiter joins |
| 09:46:28 | | hackiter leaves |
| 09:49:35 | | IDK (IDK) joins |
| 09:50:04 | | Megame (Megame) joins |
| 09:52:32 | <pabs> | this company is allegedly shutting down https://cyberninjas.com/ https://edition.cnn.com/2022/01/07/politics/cyber-ninjas-shutting-down-arizona/index.html |
| 09:59:33 | | essowicz joins |
| 10:00:08 | | essowicz quits [Remote host closed the connection] |
| 10:18:31 | | dsadsada joins |
| 10:18:45 | | dsadsada quits [Remote host closed the connection] |
| 10:27:24 | <IDK> | Anyone know how do request specific IDs with the API of https://www.thiswebsitewillselfdestruct.com/api/get_letter |
| 10:27:52 | <IDK> | Currently all IDs are randomized and random messages are displayed |
| 10:44:13 | | march_happy quits [Ping timeout: 265 seconds] |
| 10:50:36 | | mateoooo joins |
| 10:51:27 | <mateoooo> | ee |
| 10:52:00 | | mateoooo quits [Remote host closed the connection] |
| 11:11:53 | | mati joins |
| 11:13:29 | | mati quits [Remote host closed the connection] |
| 12:19:48 | | sonick (sonick) joins |
| 12:46:06 | | G4te_Keep3r4 joins |
| 12:46:54 | | G4te_Keep3r quits [Read error: Connection reset by peer] |
| 12:46:54 | | G4te_Keep3r4 is now known as G4te_Keep3r |
| 12:47:00 | | Megame quits [Client Quit] |
| 12:48:46 | | yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/] |
| 12:50:49 | | yano (yano) joins |
| 13:06:35 | | stormy quits [Remote host closed the connection] |
| 13:08:47 | | march_happy (march_happy) joins |
| 13:13:02 | | trenjikan joins |
| 13:13:06 | | march_happy quits [Ping timeout: 240 seconds] |
| 13:13:32 | | trenjikan quits [Remote host closed the connection] |
| 13:20:21 | | march_happy (march_happy) joins |
| 13:36:30 | | HP_Archivist (HP_Archivist) joins |
| 13:36:47 | | HP_Archivist quits [Remote host closed the connection] |
| 13:38:00 | | HP_Archivist (HP_Archivist) joins |
| 13:38:17 | | HP_Archivist quits [Remote host closed the connection] |
| 13:39:30 | | HP_Archivist (HP_Archivist) joins |
| 13:39:47 | | HP_Archivist quits [Remote host closed the connection] |
| 13:40:46 | | Arcorann quits [Ping timeout: 240 seconds] |
| 13:41:00 | | HP_Archivist (HP_Archivist) joins |
| 13:41:17 | | HP_Archivist quits [Remote host closed the connection] |
| 13:41:45 | | HP_Archivist (HP_Archivist) joins |
| 14:48:55 | | IDK quits [Client Quit] |
| 15:00:35 | | jonboy3452 quits [Read error: Connection reset by peer] |
| 15:03:17 | | march_happy quits [Remote host closed the connection] |
| 15:05:29 | | sec^nd quits [Remote host closed the connection] |
| 15:19:03 | | march_happy (march_happy) joins |
| 15:23:04 | | Jonboy345 joins |
| 15:23:43 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 15:28:57 | | Hifihedgehog joins |
| 15:29:01 | | Jonboy345 joins |
| 15:29:27 | <Hifihedgehog> | Hey JAA. If you don't mind me asking, how goes the archiving effort for the TechnologyGuide sites? |
| 15:30:26 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 15:34:10 | | Jonboy345 joins |
| 15:36:10 | <Hifihedgehog> | Thanks JAA! |
| 15:36:12 | <Hifihedgehog> | https://archive.org/details/technologyguide_forums_20220125 |
| 15:42:23 | | Hifihedgehog quits [Remote host closed the connection] |
| 15:49:41 | | chrismeller quits [Ping timeout: 265 seconds] |
| 15:51:43 | | Hifihedgehog joins |
| 15:54:43 | | Hifihedgehog quits [Remote host closed the connection] |
| 15:55:19 | | knecht420 quits [Client Quit] |
| 15:57:16 | | knecht420 (knecht420) joins |
| 15:57:34 | | fiftysix_k_modem quits [Client Quit] |
| 16:31:43 | | march_happy quits [Ping timeout: 252 seconds] |
| 16:43:20 | | lennier1 quits [Read error: Connection reset by peer] |
| 16:43:34 | | lennier1 (lennier1) joins |
| 17:13:55 | | sec^nd (second) joins |
| 17:43:06 | | nimaje quits [Ping timeout: 240 seconds] |
| 17:43:28 | | fiftysix_k_modem (fiftysix_k_modem) joins |
| 17:43:48 | | nimaje joins |
| 17:51:37 | | fiftysix_k_modem quits [Remote host closed the connection] |
| 18:02:21 | | IDK (IDK) joins |
| 18:04:18 | <IDK> | Appearently roblox is banning all users with YT or any other refrence to off site platforms, be ready to see some 404s |
| 18:04:26 | <IDK> | *some |
| 18:07:36 | <IDK> | https://www.youtube.com/watch?v=9DBb6_aVS4M |
| 18:24:21 | | blop joins |
| 18:27:45 | | blop quits [Remote host closed the connection] |
| 18:43:27 | | fiftysix_k_modem joins |
| 18:43:27 | | fiftysix_k_modem is now authenticated as fiftysix_k_modem |
| 18:56:56 | | mutantmnky quits [Remote host closed the connection] |
| 18:58:00 | | mutantmnky (mutantmonkey) joins |
| 20:00:44 | <h2ibot> | JustAnotherArchivist edited TechnologyGuide (+53): https://wiki.archiveteam.org/?diff=48220&oldid=48214 |
| 20:00:45 | <h2ibot> | Gridkr edited Coronavirus/Affected companies (+401): https://wiki.archiveteam.org/?diff=48221&oldid=45063 |
| 20:00:48 | <duce1337> | anyone archive all CIA world factobook? https://www.cia.gov/the-world-factbook/ |
| 20:41:22 | <AK> | Making sure a copy is grabbed now duce1337 |
| 20:43:05 | <duce1337> | ok |
| 20:43:17 | | dm4v quits [Client Quit] |
| 20:44:13 | | xyzfootage joins |
| 20:46:15 | | dm4v joins |
| 20:46:17 | | dm4v is now authenticated as dm4v |
| 20:46:17 | | dm4v quits [Changing host] |
| 20:46:17 | | dm4v (dm4v) joins |
| 20:47:07 | | spirit quits [Client Quit] |
| 20:47:27 | | xyzfootage quits [Remote host closed the connection] |
| 20:53:08 | | spirit joins |
| 21:02:06 | | HP_Archivist quits [Ping timeout: 240 seconds] |
| 21:33:31 | <wessel1512> | OrIdow6 Im curreny very buzzy with my day job so i have had the time to spent on the Ukrainian archive |
| 21:34:19 | <wessel1512> | But i got some more list of urls that i need to clean before they can be added to the wiki |
| 21:39:20 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 21:50:28 | | march_happy (march_happy) joins |
| 21:53:36 | | BlueMaxima joins |
| 21:56:26 | | fiftysix_k_modem quits [Client Quit] |
| 22:02:11 | | dm4v quits [Client Quit] |
| 22:04:26 | | dm4v joins |
| 22:04:26 | | dm4v is now authenticated as dm4v |
| 22:04:26 | | dm4v quits [Changing host] |
| 22:04:26 | | dm4v (dm4v) joins |
| 22:26:50 | | Arcorann (Arcorann) joins |
| 23:02:03 | | simon816 quits [Remote host closed the connection] |
| 23:06:45 | | simon816 (simon816) joins |
| 23:09:26 | | jacobk quits [Ping timeout: 240 seconds] |
| 23:15:43 | | LegitSi joins |
| 23:36:22 | | march_happy quits [Remote host closed the connection] |
| 23:36:40 | | march_happy (march_happy) joins |
| 23:41:25 | | march_happy quits [Ping timeout: 265 seconds] |
| 23:47:12 | | user_ quits [Remote host closed the connection] |
| 23:47:26 | | user_ (gazorpazorp) joins |
| 23:48:57 | | HP_Archivist (HP_Archivist) joins |
| 23:49:12 | | HP_Archivist quits [Remote host closed the connection] |
| 23:50:24 | | HP_Archivist (HP_Archivist) joins |
| 23:50:42 | | HP_Archivist quits [Remote host closed the connection] |
| 23:51:55 | | HP_Archivist (HP_Archivist) joins |
| 23:52:12 | | HP_Archivist quits [Remote host closed the connection] |
| 23:53:24 | | HP_Archivist (HP_Archivist) joins |
| 23:53:42 | | HP_Archivist quits [Remote host closed the connection] |
| 23:54:10 | | HP_Archivist (HP_Archivist) joins |