00:00:53fangfufu joins
00:19:37G4te_Keep3r quits [Client Quit]
00:19:52G4te_Keep3r joins
00:33:53hexa- quits [Quit: WeeChat 3.3]
00:37:44HP_Archivist (HP_Archivist) joins
00:41:24hexa- (hexa-) joins
00:45:01<@JAA>More fun with TechnologyGuide: if the URL contains 'nonexistant', the server resets the connection: http://forum.notebookreview.com/nonexistant/
00:45:54<Frogging101>notebookreview is closing? :/
00:46:08<@JAA>Yep
00:46:40<Frogging101>that sucks
00:46:49<@JAA>My qwarc archive of the thread pages just finished a few minutes ago. Should be complete apart from the countless broken shit like the above.
00:47:17<Jake>what even.... 'temp' and 'nonexistant'
00:47:43<@JAA>http://forum.notebookreview.com/threads/asus-v6j-everest-benchmarks.43048/ returns an empty page.
00:47:57<@JAA>It's kind of hilarious just how broken these forums are.
00:50:31<@JAA>http://forum.notebookreview.com/nessus/
00:50:36<@JAA>¯\_(ツ)_/¯
00:52:12<Jake>they have some terribly broken WAF paired with a 10 year old corrupted forum database?
00:58:35Jaro joins
00:59:25Jaro quits [Remote host closed the connection]
01:00:02dm4v quits [Client Quit]
01:00:48chrismeller (chrismeller) joins
01:01:00chrismeller quits [Remote host closed the connection]
01:02:08chrismeller (chrismeller) joins
01:02:30chrismeller quits [Remote host closed the connection]
01:03:38chrismeller (chrismeller) joins
01:04:00chrismeller quits [Remote host closed the connection]
01:04:29dm4v joins
01:04:29dm4v quits [Changing host]
01:04:29dm4v (dm4v) joins
01:05:08chrismeller (chrismeller) joins
01:05:30chrismeller quits [Remote host closed the connection]
01:05:53chrismeller (chrismeller) joins
01:15:14Atom-- joins
01:17:59Atom quits [Ping timeout: 252 seconds]
01:19:35datechnoman (datechnoman) joins
01:33:05Atom-- quits [Read error: Connection reset by peer]
02:03:22dm4v_ joins
02:03:49dm4v quits [Ping timeout: 252 seconds]
02:03:49dm4v_ is now known as dm4v
02:03:49dm4v quits [Changing host]
02:03:49dm4v (dm4v) joins
02:27:14thetechrobo_ joins
02:27:36thetechrobo_ quits [Remote host closed the connection]
02:28:43thetechrobo_ joins
02:29:06thetechrobo_ quits [Remote host closed the connection]
02:29:26thetechrobo_ joins
02:30:15TheTechRobo quits [Ping timeout: 265 seconds]
03:04:53anarcat quits [Quit: rebooting]
03:04:53thermospheric quits [Remote host closed the connection]
03:42:14qwertyasdfuiopghjkl quits [Remote host closed the connection]
04:00:06jacobk joins
04:18:46LegitSi quits [Ping timeout: 244 seconds]
04:28:26G4te_Keep3r quits [Ping timeout: 240 seconds]
04:32:19mutantmnky quits [Remote host closed the connection]
04:33:25mutantmnky (mutantmonkey) joins
04:43:49G4te_Keep3r joins
04:54:33OverhaulDeskwork6 joins
04:58:21OverhaulDeskwork quits [Ping timeout: 252 seconds]
04:58:22OverhaulDeskwork6 is now known as OverhaulDeskwork
05:12:27DogsRNice quits [Read error: Connection reset by peer]
05:14:23<@JAA>TechnologyGuide forum archive post counts: 1841025 from Brighthand (1880674 on homepage), 53604 from DigitalCameraReview (58092), 4180267 from NotebookReview (no global stats, about 9.35M from adding up the subforum numbers, but those don't include everything), 508834 from TabletPCReview (521529)
05:15:05<@JAA>The NotebookReview discrepancy seems pretty bad, but no idea where it comes from. I didn't see any systematic problems in the data.
05:17:39<@JAA>Adding up the 'Messages' numbers for the subforums gives 9347229 there. This doesn't include http://forum.notebookreview.com/forums/nbr-marketplace.18/ (which is shown as a link on the homepage instead of a forum entry).
05:50:39qwertyasdfuiopghjkl joins
05:55:36HP_Archivist quits [Remote host closed the connection]
05:56:02HP_Archivist (HP_Archivist) joins
06:09:02march_happy (march_happy) joins
06:12:26HP_Archivist quits [Ping timeout: 240 seconds]
06:48:51vexr leaves
06:50:46stormy joins
06:53:25<stormy>looking for someone who helped to archive soup.io
07:08:46Mateon1 quits [Remote host closed the connection]
07:08:59Mateon1 joins
07:09:48<@OrIdow6>stormy: Unless you're looking for someone to describe their personal experiences on the project, it's best just to ask their question
07:13:22<stormy>fair enough. so if I understood correctly, the archived data gets uploaded directly to archive.org and nothing else. what I'm trying to find out is: did the crawler get around the "Content Warning" pages, and, once on archive.org, how do I get around the content warning pages.
07:20:02<@OrIdow6>Do you have an example?
07:20:43<stormy>sure: https://web.archive.org/web/20191201125421/http://einefragevonstil.soup.io/
07:21:27<stormy>I have enough experience in web crawling to know how it can work on the crawling side, but not with archive.org...
07:23:09wyatt8740 quits [Ping timeout: 265 seconds]
07:23:31wyatt8740 joins
07:24:07Eighty quits [Ping timeout: 265 seconds]
07:24:27Eighty (Eighty) joins
07:33:54<@OrIdow6>So I don't see anything to indicate that the project got those
07:34:21<stormy>where can I find a list of soup.io hosts that the project did cover?
07:35:30<@OrIdow6>I do not think any such list exists
07:35:50<@OrIdow6>Well, I expect it was deleted since that project as 2 years ago
07:36:18<@OrIdow6>Also, that site was "covered" by the project, I just don't see anything getting around the content warning
07:36:54adia quits [Client Quit]
07:37:20<@OrIdow6>If you do want a list I can giveyou some quick info on how to generate it
07:37:55<stormy>that'd be great, thanks
07:37:57adia (adia) joins
07:41:24<spirit>stormy: https://web.archive.org/web/*/http://example.com/* will list all URLs
07:41:29<@OrIdow6>Use https://archive.org/services/docs/api/internetarchive/cli.html to download all the .os.cdx.gz files in collection:archiveteam:soup.io , then decompress then, extract the URLs, and then the domains
07:42:13<@OrIdow6>It would be better to tell us what you're trying to do, though, since asking for what the project covered is fairly specific
07:44:07<stormy>thanks, will try both of this later. what I'm trying to do it getting an offline mirror for some soups that I remember. anything that still exists. I was hoping that it would exist in other forms than on archive.org, but I guess I'll have to make do with that.
07:44:30<stormy>gotta run for now, back in an hour.
07:47:20adia quits [Client Quit]
07:49:03stormy quits [Ping timeout: 244 seconds]
07:49:19adia (adia) joins
07:49:26BlueMaxima quits [Read error: Connection reset by peer]
07:50:46fiftysix_k_modem (fiftysix_k_modem) joins
08:00:42sec^nd quits [Remote host closed the connection]
08:01:41adia quits [Ping timeout: 252 seconds]
08:01:51sec^nd (second) joins
08:01:57adia (adia) joins
08:33:58Lord_Nightmare quits [Client Quit]
08:37:21Lord_Nightmare (Lord_Nightmare) joins
08:38:49stormy joins
08:51:03<stormy>OrIdow6 are you sure that collection:archiveteam:soup.io is the correct identifier? since it gives me a query error https://archive.org/advancedsearch.php?q=collection%3Aarchiveteam%3Asoup.io&fl%5B%5D=identifier&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=50&page=1&output=json&callback=callback&save=yes
08:58:23<@rewby>Usually you just search for a file from the project in the UI and you can find the collection that way
08:58:54<@rewby>Also, if you're worried the data only lives on the ia, you can just list the items using the cli (and collection name) and just download them all
08:59:33<@rewby>Usually you can get most of it to replay by loading the warcs into pywb3. Sometimes you need to find some additional warcs with static site data like javascript and css files.
08:59:38<@rewby>The IA has them somewhere usually
09:01:56<stormy>I'm still having trouble getting an item list from a url prefix, either with cli or web
09:02:31<@OrIdow6>You're right I said iy wrong, that should be archiveteam_soupio, not archiveteam:soupio
09:09:12<stormy>thanks, that's getting me somewhere
09:40:36Mateon1 quits [Remote host closed the connection]
09:40:49Mateon1 joins
09:46:17hackiter joins
09:46:28hackiter leaves
09:49:35IDK (IDK) joins
09:50:04Megame (Megame) joins
09:52:32<pabs>this company is allegedly shutting down https://cyberninjas.com/ https://edition.cnn.com/2022/01/07/politics/cyber-ninjas-shutting-down-arizona/index.html
09:59:33essowicz joins
10:00:08essowicz quits [Remote host closed the connection]
10:18:31dsadsada joins
10:18:45dsadsada quits [Remote host closed the connection]
10:27:24<IDK>Anyone know how do request specific IDs with the API of https://www.thiswebsitewillselfdestruct.com/api/get_letter
10:27:52<IDK>Currently all IDs are randomized and random messages are displayed
10:44:13march_happy quits [Ping timeout: 265 seconds]
10:50:36mateoooo joins
10:51:27<mateoooo>ee
10:52:00mateoooo quits [Remote host closed the connection]
11:11:53mati joins
11:13:29mati quits [Remote host closed the connection]
12:19:48sonick (sonick) joins
12:46:06G4te_Keep3r4 joins
12:46:54G4te_Keep3r quits [Read error: Connection reset by peer]
12:46:54G4te_Keep3r4 is now known as G4te_Keep3r
12:47:00Megame quits [Client Quit]
12:48:46yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/]
12:50:49yano (yano) joins
13:06:35stormy quits [Remote host closed the connection]
13:08:47march_happy (march_happy) joins
13:13:02trenjikan joins
13:13:06march_happy quits [Ping timeout: 240 seconds]
13:13:32trenjikan quits [Remote host closed the connection]
13:20:21march_happy (march_happy) joins
13:36:30HP_Archivist (HP_Archivist) joins
13:36:47HP_Archivist quits [Remote host closed the connection]
13:38:00HP_Archivist (HP_Archivist) joins
13:38:17HP_Archivist quits [Remote host closed the connection]
13:39:30HP_Archivist (HP_Archivist) joins
13:39:47HP_Archivist quits [Remote host closed the connection]
13:40:46Arcorann quits [Ping timeout: 240 seconds]
13:41:00HP_Archivist (HP_Archivist) joins
13:41:17HP_Archivist quits [Remote host closed the connection]
13:41:45HP_Archivist (HP_Archivist) joins
14:48:55IDK quits [Client Quit]
15:00:35jonboy3452 quits [Read error: Connection reset by peer]
15:03:17march_happy quits [Remote host closed the connection]
15:05:29sec^nd quits [Remote host closed the connection]
15:19:03march_happy (march_happy) joins
15:23:04Jonboy345 joins
15:23:43Jonboy345 quits [Read error: Connection reset by peer]
15:28:57Hifihedgehog joins
15:29:01Jonboy345 joins
15:29:27<Hifihedgehog>Hey JAA. If you don't mind me asking, how goes the archiving effort for the TechnologyGuide sites?
15:30:26Jonboy345 quits [Read error: Connection reset by peer]
15:34:10Jonboy345 joins
15:36:10<Hifihedgehog>Thanks JAA!
15:36:12<Hifihedgehog>https://archive.org/details/technologyguide_forums_20220125
15:42:23Hifihedgehog quits [Remote host closed the connection]
15:49:41chrismeller quits [Ping timeout: 265 seconds]
15:51:43Hifihedgehog joins
15:54:43Hifihedgehog quits [Remote host closed the connection]
15:55:19knecht420 quits [Client Quit]
15:57:16knecht420 (knecht420) joins
15:57:34fiftysix_k_modem quits [Client Quit]
16:31:43march_happy quits [Ping timeout: 252 seconds]
16:43:20lennier1 quits [Read error: Connection reset by peer]
16:43:34lennier1 (lennier1) joins
17:13:55sec^nd (second) joins
17:43:06nimaje quits [Ping timeout: 240 seconds]
17:43:28fiftysix_k_modem (fiftysix_k_modem) joins
17:43:48nimaje joins
17:51:37fiftysix_k_modem quits [Remote host closed the connection]
18:02:21IDK (IDK) joins
18:04:18<IDK>Appearently roblox is banning all users with YT or any other refrence to off site platforms, be ready to see some 404s
18:04:26<IDK>*some
18:07:36<IDK>https://www.youtube.com/watch?v=9DBb6_aVS4M
18:24:21blop joins
18:27:45blop quits [Remote host closed the connection]
18:43:27fiftysix_k_modem joins
18:56:56mutantmnky quits [Remote host closed the connection]
18:58:00mutantmnky (mutantmonkey) joins
20:00:44<h2ibot>JustAnotherArchivist edited TechnologyGuide (+53): https://wiki.archiveteam.org/?diff=48220&oldid=48214
20:00:45<h2ibot>Gridkr edited Coronavirus/Affected companies (+401): https://wiki.archiveteam.org/?diff=48221&oldid=45063
20:00:48<duce1337>anyone archive all CIA world factobook? https://www.cia.gov/the-world-factbook/
20:41:22<AK>Making sure a copy is grabbed now duce1337
20:43:05<duce1337>ok
20:43:17dm4v quits [Client Quit]
20:44:13xyzfootage joins
20:46:15dm4v joins
20:46:17dm4v quits [Changing host]
20:46:17dm4v (dm4v) joins
20:47:07spirit quits [Client Quit]
20:47:27xyzfootage quits [Remote host closed the connection]
20:53:08spirit joins
21:02:06HP_Archivist quits [Ping timeout: 240 seconds]
21:33:31<wessel1512>OrIdow6 Im curreny very buzzy with my day job so i have had the time to spent on the Ukrainian archive
21:34:19<wessel1512>But i got some more list of urls that i need to clean before they can be added to the wiki
21:39:20qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
21:50:28march_happy (march_happy) joins
21:53:36BlueMaxima joins
21:56:26fiftysix_k_modem quits [Client Quit]
22:02:11dm4v quits [Client Quit]
22:04:26dm4v joins
22:04:26dm4v quits [Changing host]
22:04:26dm4v (dm4v) joins
22:26:50Arcorann (Arcorann) joins
23:02:03simon816 quits [Remote host closed the connection]
23:06:45simon816 (simon816) joins
23:09:26jacobk quits [Ping timeout: 240 seconds]
23:15:43LegitSi joins
23:36:22march_happy quits [Remote host closed the connection]
23:36:40march_happy (march_happy) joins
23:41:25march_happy quits [Ping timeout: 265 seconds]
23:47:12user_ quits [Remote host closed the connection]
23:47:26user_ (gazorpazorp) joins
23:48:57HP_Archivist (HP_Archivist) joins
23:49:12HP_Archivist quits [Remote host closed the connection]
23:50:24HP_Archivist (HP_Archivist) joins
23:50:42HP_Archivist quits [Remote host closed the connection]
23:51:55HP_Archivist (HP_Archivist) joins
23:52:12HP_Archivist quits [Remote host closed the connection]
23:53:24HP_Archivist (HP_Archivist) joins
23:53:42HP_Archivist quits [Remote host closed the connection]
23:54:10HP_Archivist (HP_Archivist) joins