00:01:50<pabs>and that means offsite links probably need doing in a separate job with -u firefox or similar
00:02:14<pabs>AB needs some URL to UA regex functionality :)
00:05:21janos777 quits [Ping timeout: 260 seconds]
00:09:42janos777 joins
00:14:41janos777 quits [Ping timeout: 260 seconds]
00:26:47etnguyen03 (etnguyen03) joins
00:34:31pabs quits [Ping timeout: 260 seconds]
00:41:23BitsNBytesNBagels quits [Client Quit]
00:42:23pabs (pabs) joins
00:55:09etnguyen03 quits [Client Quit]
01:07:48ericgallager quits [Quit: This computer has gone to sleep]
01:24:53DogsRNice_ joins
01:26:26ericgallager joins
01:27:20DogsRNice quits [Ping timeout: 258 seconds]
01:35:03dabs quits [Read error: Connection reset by peer]
01:35:54DogsRNice joins
01:38:41DogsRNice_ quits [Ping timeout: 260 seconds]
01:47:59ericgallager quits [Client Quit]
01:48:08DogsRNice_ joins
01:50:20DogsRNice quits [Ping timeout: 258 seconds]
01:52:04DogsRNice joins
01:54:10DogsRNice_ quits [Ping timeout: 258 seconds]
01:56:13cyanbox joins
01:56:46DopefishJustin quits [Ping timeout: 260 seconds]
01:59:07DogsRNice quits [Read error: Connection reset by peer]
02:02:18DopefishJustin joins
02:20:41ericgallager joins
02:24:46katocala quits [Ping timeout: 260 seconds]
02:25:07katocala joins
02:43:30ericgallager quits [Client Quit]
02:44:21NeonGlitch (NeonGlitch) joins
02:45:39<mind_combatant>would it be worth considering/possible to archive amazon product pages? i've on multiple occasions had product pages linked either on a list of mine or that i'd ordered before either disappear completely or be replaced with an entirely different product when i went to check on them for one reason or another.
03:20:55Webuser298098 quits [Quit: Ooops, wrong browser tab.]
03:22:52NeonGlitch quits [Client Quit]
03:30:10Megame quits [Quit: Leaving]
03:43:10<pabs>if you have an example page we can try it in #archivebot and #jseater and you can try it in web.archive.org/save
03:43:23<pabs>IIRC it doesn't work in AB though
03:47:22<pabs>https://www.morpheus-research.com/backblaze/ https://news.ycombinator.com/item?id=43802675
03:50:13Webuser012683 joins
04:06:05SootBector quits [Remote host closed the connection]
04:11:28SootBector (SootBector) joins
04:14:54<wickedplayer494>Well that explains a heck of a lot of why BLZE's chart deflated to the pittance it's been trading at most of its public life
04:22:01cmlow quits [Ping timeout: 260 seconds]
04:38:46<mind_combatant>an example of an item that dissapeared: www.amazon.com/dp/B083J229RJ/?coliid=I18Y7V6X9MXSBH&colid=3FEA242LHQKWQ&psc=0&ref_=cm_sw_r_apann_lstpd_M02PAJ67GM8MBTES9HGT&language=en-US
04:38:46<mind_combatant>an example of a product that seems to be in the process of changing (surprised i came accross this one, the title is still the old product while the images are for an unrelated screen protecter it seems): https://www.amazon.com/dp/B00N1BXIEA?ref=ppx_pop_mob_ap_share
04:38:46<mind_combatant>another one that's dissapeared: https://www.amazon.com/dp/B077PMSQPZ?ref=ppx_pop_mob_ap_share
04:38:46<mind_combatant>another one in the process of being changed: https://www.amazon.com/dp/B010U1DNE8?ref=ppx_pop_mob_ap_share
04:38:46<mind_combatant>i've got more examples of completely deleted ones, and a lot of the ones that were fully changed before seem to be deleted now, but this seems to be happening at scale, given how much i can find just in my own history.
04:39:40PredatorIWD25 quits [Read error: Connection reset by peer]
04:55:08PredatorIWD25 joins
04:57:33<@hook54321>currently scraping comments from ovarit (AB didn't get them), not getting recorded into a WARC as i don't know how to code
04:58:26<@Fusl>hook54321: https://github.com/internetarchive/warcprox
05:01:19<@hook54321>thanks, will try to rig that into the python script. doubt it'll make it playable in WBM though
05:03:08<@Fusl>maybe not but still worth scraping through warcprox and then dumping the raw warc into IA just in case someone wants to play around with it
05:05:11<@hook54321>yeah. i really don't know what i'm doing, i played around with network console to see how it's getting the comments, replicated the request with CURL and then chatgpt gave m a script which... works but is probably ugly
05:12:36<@hook54321>yeah, don't think i'm gonna be able to get that to work tonight
05:12:36<@hook54321>"Fetching post ID 154...
05:12:36<@hook54321>💥 Error fetching post 154: HTTPSConnectionPool(host='ovarit.com', port=443): Max retries exceeded with url: /graphql (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))"
05:27:38Webuser907855 joins
05:28:04Webuser012683 quits [Client Quit]
05:28:15Webuser907855 quits [Client Quit]
05:40:44<pabs>mind_combatant: the non-404 ones seem to work with -u firefox
05:40:52<pabs>(in AB)
05:44:49<pabs>hook54321: if you are using the Python requests lib, add verify=False to the get() calls
06:10:46Webuser738386 joins
06:13:18Island joins
06:13:19Webuser738386 quits [Client Quit]
06:29:33Webuser823470 joins
07:16:35Webuser823470 quits [Client Quit]
07:23:02Webuser685269 joins
07:29:37Webuser831171 quits [Quit: Ooops, wrong browser tab.]
07:46:39Island quits [Read error: Connection reset by peer]
07:50:31sec^nd quits [Remote host closed the connection]
07:50:53sec^nd (second) joins
08:15:56ducky quits [Ping timeout: 260 seconds]
08:21:07ducky (ducky) joins
08:23:31lflare quits [Quit: Bye]
08:24:14lflare (lflare) joins
08:29:41ducky quits [Remote host closed the connection]
08:29:47ducky (ducky) joins
08:49:57<c3manu>pabs++
08:49:58<eggdrop>[karma] 'pabs' now has 83 karma!
08:50:09<c3manu>it works! \o/ thank you so much
09:39:50Medaka joins
09:49:38<Medaka>Hello. I have created URL lists for pr.fc2.com and vote.fc2.com.
09:49:56<Medaka>Both services are scheduled to shut down on June 2, 2025.
09:49:57<Medaka>Please help archive them.
09:50:22<Medaka>https://transfer.archivete.am/inline/B1oRr/pr_fc2_urls.txt
09:50:29<Medaka>https://transfer.archivete.am/inline/pX8Tz/vote_fc2_urls.txt
10:01:45Webuser692884 joins
10:02:04NatTheCat quits [Quit: Ping timeout (120 seconds)]
10:02:27NatTheCat (NatTheCat) joins
10:05:42BearFortress quits []
10:06:35Webuser685269 quits [Client Quit]
10:10:41Webuser768109 joins
10:13:55<c3manu>Medaka: thanks for those lists! i know nothing about the context (yet); are those URLs not discoverable from their respective index URLs?
10:15:55<c3manu>..and i’m assuming those are going to be pretty big right?
10:18:53<c3manu>at least pr.fc2.com has a sitemap, so that might just not be complete then? https://pr.fc2.com/sitemap.xml
10:19:47<c3manu>i’m gonna wait what pabs can say about this, since they queued a bunch of https://piyo.fc2.com/ URLs yesterday and probably know more ;)
10:49:44Webuser768109 quits [Client Quit]
11:00:01Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
11:00:29BearFortress joins
11:02:44Bleo18260072271962345 joins
11:08:21Medaka quits [Read error: Connection reset by peer]
11:08:39Medaka joins
11:12:44<Medaka>c3manu: Thank you for your reply! Ah, I didn't realize there was a sitemap.
11:12:44<Medaka>There are indeed several index URLs there that I had missed.
11:23:11@Sanqui quits [Ping timeout: 260 seconds]
11:23:42<c3manu>Medaka: the sitemap seems way shorter than your list though, so i don't know what's going on there
11:41:37BlankEclair quits [Remote host closed the connection]
11:47:35lflare quits [Quit: Bye]
11:51:58BlankEclair (BlankEclair) joins
11:57:53Sanqui joins
11:57:53Sanqui quits [Changing host]
11:57:53Sanqui (Sanqui) joins
11:57:53@ChanServ sets mode: +o Sanqui
12:01:09lflare (lflare) joins
12:07:25<Medaka>c3manu: I mainly collected only the pr.fc2.com/{username}/ (user profile) pages. The sitemap includes a small number of pr.fc2.com/{username}/ pages, as well as some pages that link to multiple {username} profiles. That's probably why my list looks much larger. (I missed several pages like /contents/birthday/ and /contents/age/, etc.)
12:08:29<Medaka>Also, it seems that even the URLs listed in the sitemap are missing some entries. I will try to create a more complete URL list.
12:23:58etnguyen03 (etnguyen03) joins
12:28:22<pabs>Medaka++
12:28:22<eggdrop>[karma] 'Medaka' now has 1 karma!
12:28:26<pabs>thanks for working on this
12:29:54<pabs>there are some other domains that seem fc2 related in the existing jobs, might want to process the ab2f job log from https://ab2f.archivingyoursh.it/
12:37:37ducky_ (ducky) joins
12:37:37ducky quits [Remote host closed the connection]
12:37:54ducky_ quits [Remote host closed the connection]
12:53:56makeworld quits [Read error: Connection reset by peer]
12:54:42makeworld joins
12:59:35<Medaka>pabs: Thank you.
12:59:46<Medaka>However, when I try to access https://ab2f.archivingyoursh.it/, I get a 403 Forbidden error.
13:00:16<pabs>katia: ^ ab2f broke again :)
13:00:31<pabs>some more about ab2f and related things on https://wiki.archiveteam.org/index.php/ArchiveBot/Monitoring
13:01:48<Medaka>Is that different from https://archive.fart.website/archivebot/viewer/?
13:03:27<pabs>yeah, its realtime, same as archivebot.com
13:03:46<pabs>the viewer only indexes jobs after they are uploaded to IA and complete
13:06:08janos777 joins
13:07:04<Medaka>I see
13:21:28etnguyen03 quits [Client Quit]
13:24:01<katia>pabs: sorry. Works again I think? If it was a 404
13:24:32<pabs>looks good
13:29:30Webuser007948 joins
13:29:50Webuser007948 quits [Client Quit]
13:35:11cmlow joins
13:42:01BennyOtt quits [Ping timeout: 260 seconds]
13:45:07BennyOtt (BennyOtt) joins
13:51:43ducky (ducky) joins
13:56:36ducky quits [Ping timeout: 260 seconds]
14:14:28BornOn420 quits [Remote host closed the connection]
14:15:01BornOn420 (BornOn420) joins
14:28:50ducky (ducky) joins
14:33:21ducky quits [Ping timeout: 260 seconds]
14:54:26cyanbox quits [Read error: Connection reset by peer]
14:55:43ducky (ducky) joins
15:11:34pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat]
15:11:37Megame (Megame) joins
15:48:16pedantic-darwin joins
15:58:08grill (grill) joins
16:18:13Hackerpcs quits [Quit: Hackerpcs]
16:32:52SootBector quits [Remote host closed the connection]
16:34:03SootBector (SootBector) joins
16:35:47DopefishJustin quits [Remote host closed the connection]
16:39:09Hackerpcs (Hackerpcs) joins
16:39:28DopefishJustin joins
16:58:54etnguyen03 (etnguyen03) joins
17:05:46Webuser846077 joins
17:05:56Webuser846077 quits [Client Quit]
17:26:28nicolas17 joins
17:26:49ericgallager joins
17:29:53grill quits [Ping timeout: 258 seconds]
17:34:33etnguyen03 quits [Client Quit]
17:52:07tzt quits [Ping timeout: 258 seconds]
18:08:13<h2ibot>Manu edited Mailman/2 (+31, /* Queued lists.erlangen.ccc.de */): https://wiki.archiveteam.org/?diff=55508&oldid=55097
18:14:13tzt (tzt) joins
18:16:11eroc19906 quits [Ping timeout: 260 seconds]
18:16:50grill (grill) joins
18:28:26janos777 quits [Ping timeout: 260 seconds]
18:30:15ericgallager quits [Client Quit]
18:38:13janos777 joins
18:43:22ericgallager joins
18:44:19<h2ibot>Manu edited Mailman/2 (+31, /* Queued lists.ccc-mannheim.de */): https://wiki.archiveteam.org/?diff=55509&oldid=55508
18:50:01janos777 quits [Ping timeout: 260 seconds]
18:52:01FiTheArchiver joins
18:52:03FiTheArchiver quits [Remote host closed the connection]
19:02:43ericgallager quits [Client Quit]
19:02:44janos777 joins
19:37:29janos777 quits [Read error: Connection reset by peer]
19:56:15ericgallager joins
19:57:58janos777 joins
20:29:17Webuser381280 joins
20:30:53janos778 joins
20:35:01janos777 quits [Ping timeout: 260 seconds]
20:38:31grill quits [Ping timeout: 260 seconds]
20:43:16janos778 quits [Read error: Connection reset by peer]
20:50:39eroc1990 (eroc1990) joins
21:13:57dabs joins
21:21:04Island joins
21:24:36nicolas17 quits [Quit: Konversation terminated!]
21:24:47nicolas17 joins
21:25:51nicolas17 quits [Client Quit]
21:30:25nicolas17 joins
21:42:58murb quits [Quit: gone]
22:00:07murb (murb) joins
22:22:03ericgallager quits [Client Quit]
23:04:56ThreeHM quits [Ping timeout: 260 seconds]
23:06:32ThreeHM (ThreeHeadedMonkey) joins
23:06:56NeonGlitch (NeonGlitch) joins
23:07:33NeonGlitch quits [Client Quit]
23:08:06NeonGlitch (NeonGlitch) joins
23:08:23etnguyen03 (etnguyen03) joins
23:30:39etnguyen03 quits [Client Quit]
23:48:41tzt quits [Read error: Connection reset by peer]
23:49:12tzt (tzt) joins
23:54:56ericgallager joins
23:55:12ericgallager quits [Remote host closed the connection]