00:28:47iwanttohelp joins
00:29:53<iwanttohelp>hello, unfortunately i know of a shortener thats been running for 8 years with 9k links. how would i go about archiving it? i already have the list of ids->urls.
00:30:17<iwanttohelp>also, i added it to the urlteam wiki (but i got thrown into a moderation review queue or something)
00:30:49<iwanttohelp>is there a preferred format or method for sharing backups of url shorteners?
00:39:48<@JAA>iwanttohelp: Hi. Edit approved, thanks! If you have it as a CSV or similar, you could just share it here or upload to archive.org directly.
00:40:32<@JAA>Generally, the preferred format would be WARC, but that must be produced at retrieval time. Other than that, any mapping will do.
00:42:04<@JAA>The data produced by the project here is BEACON, cf. the 'Archives' section on the wiki page for details.
00:43:06<@JAA>9k is tiny, so I'll create a WARC for it myself.
00:43:28nomad-geek joins
00:48:55<@JAA>Looks like the charset is 0-9a-zA-Z, not a-zA-Z0-9.
01:02:41<iwanttohelp>JAA: https://s.wikicharlie.cl/database-export-2023-02-01.csv.gz
01:02:55<iwanttohelp>i got access to the server last week :)i'll make it a beacon
01:05:07<@JAA>Ah lovely, thank you!
01:19:38<@JAA>Running something now to retriev it as WARC so it can go into the Wayback Machine.
01:19:49<@JAA>retrieve*
01:22:39<@JAA>I'm also retrieving the redirects at https://wikicharlie.cl/s/x so those will work in the WBM as well.
01:27:26<iwanttohelp>nice, thanks
01:27:30<iwanttohelp>not sure how to validate it, but i read the beacon link dump spec and made a https://s.wikicharlie.cl/database-export-2023-02-01.beacon.gz
01:28:21<iwanttohelp>theres some serious attempt at sql injection somewhere in there. a lot of spam with invalid "urls" full of queries
01:29:07<@JAA>Thanks, ran both of the exports through ArchiveBot, so they'll be safe.
01:29:23<iwanttohelp>awesome!
01:29:55<iwanttohelp>i'll add the dump url to the urlteam wiki page and i'll upload the csv/beacon to archive.org
01:30:16<iwanttohelp>i'll try to make monthly backups but i'm not sure for how long i'll have access to that server
01:30:34<@JAA>Sounds good. I'll add my archive as well when it's up.
01:33:52n9nes quits [Ping timeout: 252 seconds]
01:35:04<@JAA>Looks like the WARCing will take another hour or so.
01:35:38n9nes joins
01:41:52<datechnoman>JAA when it's uploaded can you flick me a link to it? Will throw the urls in #// for processing
01:55:45<@JAA>datechnoman: I could, but I was going to do that myself anyway. :-)
02:08:53<@JAA>https://wikicharlie.cl/s/1ht is a 403 for some reason rather than redirecting to the subdomain.
02:17:06<iwanttohelp>not sure whats happening there
02:17:47<iwanttohelp>in the subdomain it works fine, it 302s to www.elamaule.cl
02:48:40<datechnoman>No worries. Will leave it in your capable hands :) JAA
02:50:59<@JAA>iwanttohelp: Looks like any Xht does that. Oh well, at least it works on the subdomain. :-)
02:52:05<@JAA>Finished about 15 minutes ago, will upload and process shortly.
02:59:58<@JAA>Done, s.wikicharlie.cl_20230203, should show up in the WBM within the next couple days.
03:02:14<iwanttohelp>does that mean all of the urls will be available publicly at once via https://web.archive.org/web/*/s.wikicharlie.cl/abc123 ?
03:03:45<@JAA>Correct
03:04:05<@JAA>The redirect targets are also being archived via #// (aka https://wiki.archiveteam.org/index.php/URLs ).
03:04:18<iwanttohelp>amazing, thanks
03:04:31<@JAA>Or well, they were probably done within seconds given datechnoman's resources on that project. :-)
03:06:11<datechnoman>hehe oopies?
03:06:12<@JAA>Note that the WBM is case-insensitive. So shortcodes differing only in case will collide on the WBM with no easy way to tell them apart.
03:06:15<datechnoman>150,000 URL's per minute atm
03:21:03<fuzzy8021>come on datechnoman theres 30mill in queue ;)
03:25:08<datechnoman>fuzzy8021 It keeps going up with the sitemaps that get queued every month lol. Cut me some slack!
03:28:21<datechnoman>Peaked 251,000 Url's in a min a little bit ago
04:03:47iwanttohelp quits [Ping timeout: 265 seconds]
04:26:30mutantmnky (mutantmonkey) joins
04:28:51mutantmonkey quits [Remote host closed the connection]
06:28:03treora quits [Remote host closed the connection]
06:28:04treora joins
07:37:58user_ quits [Ping timeout: 252 seconds]
09:49:25Matthww1 quits [Ping timeout: 252 seconds]
09:52:31Matthww1 joins
12:41:53mutantmnky quits [Ping timeout: 276 seconds]
12:47:01mutantmnky (mutantmonkey) joins
14:37:15Chris5010 quits [Quit: ]
14:58:58mutantmnky quits [Remote host closed the connection]
14:59:23mutantmnky (mutantmonkey) joins
16:32:16mutantmnky quits [Remote host closed the connection]
16:32:40mutantmnky (mutantmonkey) joins
17:09:02benjinsm joins
17:12:10benjins quits [Ping timeout: 252 seconds]
17:25:02benjinsm is now known as benjins
17:30:28Chris5010 (Chris5010) joins
17:40:58Minkafighter72 quits [Quit: The Lounge - https://thelounge.chat]
17:41:41Minkafighter722 joins
17:52:58eroc19904 (eroc1990) joins
17:54:31eroc1990 quits [Ping timeout: 252 seconds]
18:00:00eroc1990 (eroc1990) joins
18:00:56eroc19904 quits [Ping timeout: 252 seconds]
18:59:04monoxane4 (monoxane) joins
18:59:25monoxane quits [Ping timeout: 252 seconds]
18:59:26monoxane4 is now known as monoxane
19:51:40benjins quits [Ping timeout: 252 seconds]
20:34:36Minkafighter722 quits [Client Quit]
20:39:29Minkafighter722 joins
21:08:32jacksonchen666 (jacksonchen666) joins
22:48:02benjins joins
23:28:08tzt quits [Ping timeout: 265 seconds]