00:09:53 | | riteo quits [Ping timeout: 260 seconds] |
00:10:14 | | sighsloth1090 joins |
00:10:34 | | HP_Archivist (HP_Archivist) joins |
00:15:09 | | scurvy_duck joins |
00:16:20 | <pokechu22> | Junie: I'm not sure if you're already aware of this, but the main format for things we download is https://en.wikipedia.org/wiki/WARC_(file_format) which includes metadata such as HTTP headers. Some projects also produce logs in various forms but there isn't anything standard for it |
00:16:43 | <Junie> | Thank you! That'll help! |
00:18:17 | <pokechu22> | oh, and archive.org generates CDX files for each WARC that act as an index to them (https://archive.org/web/researcher/cdx_file_format.php) and can also be bulk queried (https://archive.org/developers/wayback-cdx-server.html) |
00:19:55 | | eroc1990 quits [Quit: Ping timeout (120 seconds)] |
00:20:14 | | eroc1990 (eroc1990) joins |
00:21:30 | | etnguyen03 quits [Client Quit] |
00:27:48 | | scurvy_duck quits [Ping timeout: 250 seconds] |
00:33:22 | | sighsloth1090 is now authenticated as sighsloth1090 |
00:34:58 | | sighsloth1090 quits [Client Quit] |
00:46:08 | | riteo (riteo) joins |
00:52:58 | | scurvy_duck joins |
00:53:07 | | sighsloth1090 (sighsloth1090) joins |
00:55:55 | | Junie quits [Quit: Ooops, wrong browser tab.] |
01:20:45 | | Webuser113856 joins |
01:22:38 | | xmagik joins |
01:23:11 | | Webuser113856 quits [Client Quit] |
01:24:48 | | scurvy_duck quits [Client Quit] |
01:25:24 | <xmagik> | Does anyone have any resources for dumping a DDS-1 tape using modern-ish equipment? |
01:36:09 | | Hackerpcs quits [Quit: Hackerpcs] |
01:39:44 | | Hackerpcs (Hackerpcs) joins |
01:41:34 | | xkey quits [Quit: WeeChat 4.4.3] |
01:42:36 | | xkey (xkey) joins |
01:44:55 | | pixel leaves [Error from remote client] |
01:45:22 | | gust quits [Ping timeout: 250 seconds] |
02:11:01 | | sighsloth1090 leaves |
02:12:22 | | NeonGlitch (NeonGlitch) joins |
02:13:12 | | etnguyen03 (etnguyen03) joins |
02:15:56 | | nulldata (nulldata) joins |
02:18:04 | <pabs> | from the #sdf channel: <kmc> Urban Dead is shutting down. Another UK Online Safety Act victim. RIP. |
02:18:44 | <pabs> | https://www.urbandead.com/shutdown.html |
02:32:17 | | NeonGlitch quits [Client Quit] |
02:45:17 | <pabs> | how do I tell if the wpull db for an AB job has been uploaded yet? and where would I find it? looking for the one for https://archive.fart.website/archivebot/viewer/job/202411210253059dr05 |
02:50:44 | | BornOn420 quits [Remote host closed the connection] |
02:51:21 | | BornOn420 (BornOn420) joins |
02:52:13 | <monoxane> | rewby its called traffic engineering :P |
03:00:48 | | notarobot1 joins |
03:02:11 | | Webuser149129 joins |
03:06:51 | <balrog> | can we make sure everything related to ex: https://efabless.com is archived? There's also https://github.com/efabless and the site has a lot of links to stuff like google sheets from pages such at https://efabless.com/shuttle-status ... |
03:06:56 | <balrog> | we already lost some subdomains |
03:07:04 | <balrog> | (the company recently went out of business) |
03:09:32 | | etnguyen03 quits [Client Quit] |
03:10:21 | <pabs> | website got saved https://archive.fart.website/archivebot/viewer/?q=efabless |
03:10:38 | <pabs> | GitHub got saved on #gitgud |
03:10:54 | <TheTechRobo> | pabs: I don't think AB uploads the DBs on its own, does it? |
03:10:56 | <TheTechRobo> | Cf. https://github.com/ArchiveTeam/ArchiveBot/issues/465 |
03:11:30 | <pabs> | TheTechRobo: right, J_A_A does that manually |
03:13:09 | <@JAA> | pabs: No good method. You can use https://archive.org/details/archivebot?tab=collection&query=format%3AZSTANDARD&sort=-publicdate to find items that have a DB (probably, occasionally there might be other .zst files). |
03:13:13 | <pabs> | ahh I remember now, you have to search on IA for ZSTANDARD items IIRC |
03:14:40 | <pabs> | ok I will add this to the wiki |
03:16:18 | | loug83181422 quits [Quit: The Lounge - https://thelounge.chat] |
03:17:29 | | BlueMaxima quits [Read error: Connection reset by peer] |
03:19:35 | <h2ibot> | PaulWise edited ArchiveBot (+370, tips: ignores, wpull databases): https://wiki.archiveteam.org/?diff=54521&oldid=54417 |
03:20:35 | <h2ibot> | PaulWise edited ArchiveBot (+41, uploads are manual/infrequent): https://wiki.archiveteam.org/?diff=54522&oldid=54521 |
03:26:34 | | etnguyen03 (etnguyen03) joins |
03:34:59 | | NeonGlitch (NeonGlitch) joins |
04:07:29 | | etnguyen03 quits [Remote host closed the connection] |
04:31:17 | <pabs> | -feed/#hackernews-firehose- Local mirrors of long-defunct historical websites https://retromirror.org/ https://news.ycombinator.com/item?id=43242455 |
04:35:05 | | caylin quits [Remote host closed the connection] |
04:35:24 | | caylin (caylin) joins |
04:37:17 | | caylin quits [Client Quit] |
04:37:39 | | caylin (caylin) joins |
04:53:01 | | Webuser149129 quits [Client Quit] |
05:34:05 | | nicolas17 is now authenticated as nicolas17 |
05:45:39 | | StarletCharlotte joins |
05:52:25 | <StarletCharlotte> | Hello. I'm currently trying to enumerate all buckets on GRH's bucket API using curl requests (I have a premium account so I can use the API). The API's output is in JSON and I would like to download each JSON the API outputs to [start number].json. The API itself uses this format for its URLs: https://buckets.grayhatwarfare.com/api/v2/buckets?start=[start number]&limit=1000 where start is the result you want to start at, 0 being the |
05:52:25 | <StarletCharlotte> | first result. |
05:52:54 | <StarletCharlotte> | How would I go about doing that? What kind of script would I need? |
05:54:56 | <StarletCharlotte> | oh yeah, it'd need to increment by 1000 each time... |
05:56:41 | <StarletCharlotte> | https://stackoverflow.com/questions/9289015/how-do-i-make-parts-of-a-url-to-increment-at-the-same-time-in-curl This seems promising, though I'd have to figure out how to get it to increment by 1000 each time up to about 430000. |
05:56:56 | <StarletCharlotte> | And I have never written a bash script before so... |
05:58:28 | | NeonGlitch quits [Client Quit] |
06:01:00 | <pokechu22> | StarletCharlotte: If you don't need the data from the previous request and just want to generate a large list of URLs, there's a bunch of ways you can do that (`seq <start> <end>` in bash is one way, or something simple in python). If you need to download the data to get the next one you're probably going to want to use python and requests (or any other language you're |
06:01:02 | <pokechu22> | familiar with) |
06:01:42 | <pokechu22> | basically, the way I usually approach that is to get the numbers I want into a text document, and then I edit that text document into the format I need. Or I write a script in a language I'm familiar with to output all the numbers I care about |
06:02:34 | <StarletCharlotte> | @pokechu22: I am trying to download all of the data, though I am not familiar with either bash or Python. I'm currently looking into bash but I don't know if that's the best. |
06:06:53 | <StarletCharlotte> | https://www.geeksforgeeks.org/bash-scripting-until-loop/ |
06:07:02 | <StarletCharlotte> | Okay this should help as well... |
06:15:27 | | SootBector quits [Remote host closed the connection] |
06:15:52 | | SootBector (SootBector) joins |
06:16:02 | | sparky14925 (sparky1492) joins |
06:17:56 | | sparky1492 quits [Ping timeout: 250 seconds] |
06:17:57 | | sparky14925 is now known as sparky1492 |
06:32:12 | <StarletCharlotte> | Alright @pokechu22, I managed to bodge together a bash script that should do it. https://pastecode.io/s/89cnng9t |
06:33:20 | <StarletCharlotte> | the header is set to "nice try" because I'm not giving people my auth code for obvious reasons |
06:36:01 | <@JAA> | IIRC, you can use --output multiple times interleaved with URLs to specify a filename for each download. That lets you reuse the connection rather than starting a new process for each request. |
06:36:37 | <StarletCharlotte> | I uh... I am not sure what that means I'll be honest |
06:36:53 | <StarletCharlotte> | This is my first time doing this. |
06:37:20 | <@JAA> | `curl --header '...' --output 0.json 'https://buckets.grayhatwarfare.com/api/v2/buckets?start=0&limit=1000' --output 1000.json 'https://buckets.grayhatwarfare.com/api/v2/buckets?start=1000&limit=1000' ...` |
06:37:40 | <pokechu22> | Doesn't curl --verbose mention it leaves connections open? I never looked into how that actually works though |
06:37:40 | <StarletCharlotte> | oh thanks |
06:37:54 | <@JAA> | You could generate that with a Bash loop and an array. |
06:38:19 | <StarletCharlotte> | well, it doesn't seem like I'm getting rate limited and I don't have much left. I'll remember for next time though. |
06:38:56 | <@JAA> | `curl=(curl --header '...'); for ((i = 0; i <= 400000; i += 1000)); do curl+=(--output "${i}.json" "https://buckets.grayhatwarfare.com/api/v2/buckets?start=${i}&limit=1000"); done; "${curl[@]}"` |
06:39:26 | <StarletCharlotte> | yeah it just got done |
06:39:32 | <StarletCharlotte> | thanks btw |
06:40:17 | | pixel (pixel) joins |
06:40:25 | <StarletCharlotte> | I have copied that down for later |
06:57:31 | | ArchivalEfforts quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
06:57:41 | | ArchivalEfforts joins |
07:15:28 | <steering> | JAA: xargs! |
07:17:19 | <steering> | (I'm testing if that would actually work because why not) |
07:19:15 | <steering> | ah, no, because -I implies -L 1 |
07:31:37 | <@JAA> | Yeah |
07:33:41 | <@JAA> | `curl --header '...' $(printf -- '--output %d.json https://buckets.grayhatwarfare.com/api/v2/buckets?start=%d&limit=1000 ' {0..400000..1000}{,})` would work, but I can't recommend that. |
07:35:45 | <@arkiver> | so, does anyone know where we can contact "Junie"? |
08:02:01 | | xmagik quits [Quit: Ooops, wrong browser tab.] |
08:03:42 | | loug83181422 joins |
08:10:59 | | astrinaut leaves [][] |
08:47:22 | | dvb leaves |
09:01:17 | | Island quits [Read error: Connection reset by peer] |
09:25:17 | | pabs quits [Read error: Connection reset by peer] |
09:26:39 | | pabs (pabs) joins |
10:32:22 | | PAARCLiCKS quits [Quit: The Lounge - https://thelounge.chat] |
10:32:23 | | Wohlstand (Wohlstand) joins |
10:35:35 | | DopefishJustin quits [Read error: Connection reset by peer] |
10:35:49 | | DopefishJustin joins |
10:35:49 | | DopefishJustin is now authenticated as DopefishJustin |
10:36:55 | | LunarianBunny1147 quits [Quit: Ping timeout (120 seconds)] |
10:37:12 | | LunarianBunny1147 (LunarianBunny1147) joins |
10:38:43 | | abirkill- (abirkill) joins |
10:38:48 | | abirkill quits [Ping timeout: 250 seconds] |
10:39:06 | | abirkill- is now known as abirkill |
10:44:24 | | PAARCLiCKS (s4n1ty) joins |
10:45:08 | | asie quits [Ping timeout: 260 seconds] |
10:45:20 | | yourfate1 (yourfate) joins |
10:45:26 | | sec^nd quits [Remote host closed the connection] |
10:45:43 | | yourfate quits [Ping timeout: 260 seconds] |
10:45:49 | | sec^nd (second) joins |
10:46:18 | | simon816 quits [Ping timeout: 260 seconds] |
10:46:45 | | asie joins |
10:50:25 | | simon816 (simon816) joins |
10:53:09 | | ducky quits [Ping timeout: 260 seconds] |
12:00:05 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
12:00:58 | | Wohlstand quits [Ping timeout: 260 seconds] |
12:02:56 | | Bleo18260072271962345 joins |
12:24:37 | <@rewby> | monoxane: What I did was much more than TE. |
12:25:13 | <katia> | :o |
12:25:31 | <@rewby> | O10 is needed for other purposes temporarily so I've shut off the targets on it |
12:25:52 | <@rewby> | No there is nothing broken with it |
12:27:23 | | ducky (ducky) joins |
12:34:22 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:35:08 | | SkilledAlpaca418962 joins |
12:40:11 | <monoxane> | ah funky |
12:49:23 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
13:00:41 | | StarletCharlotte joins |
13:10:33 | | Meroje quits [Quit: bye!] |
13:10:58 | | pokechu22 quits [Ping timeout: 260 seconds] |
13:11:30 | | Meroje joins |
13:11:30 | | Meroje is now authenticated as Meroje |
13:11:30 | | Meroje quits [Changing host] |
13:11:30 | | Meroje (Meroje) joins |
13:14:22 | | StarletCharlotte quits [Ping timeout: 250 seconds] |
13:58:30 | | NeonGlitch (NeonGlitch) joins |
13:59:08 | | StarletCharlotte joins |
13:59:27 | | archiver1966 joins |
14:04:10 | | archiver1966 quits [Client Quit] |
14:05:38 | | eroc1990 quits [Quit: The Lounge - https://thelounge.chat] |
14:06:06 | | eroc1990 (eroc1990) joins |
14:18:56 | | StarletCharlotte quits [Ping timeout: 250 seconds] |
14:31:44 | | StarletCharlotte joins |
14:47:13 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
14:47:42 | | moth_ joins |
15:06:36 | | benjins3 quits [Ping timeout: 250 seconds] |
15:13:10 | | Wohlstand (Wohlstand) joins |
15:21:04 | | katocala quits [Ping timeout: 260 seconds] |
15:23:58 | | balrog quits [Ping timeout: 260 seconds] |
15:24:27 | | katocala joins |
15:33:17 | | katocala is now authenticated as katocala |
15:39:41 | | StarletCharlotte joins |
15:48:19 | | zhongfu (zhongfu) joins |
15:51:22 | | zhongfu quits [Client Quit] |
15:52:05 | | zhongfu (zhongfu) joins |
16:01:39 | | Webuser083892 joins |
16:03:33 | <Webuser083892> | Nice, this channel is also publicly logged. As it should be :D |
16:04:55 | | HP_Archivist quits [Quit: Leaving] |
16:06:14 | <katia> | hi mom |
16:12:52 | | StarletCharlotte quits [Remote host closed the connection] |
16:13:05 | | StarletCharlotte joins |
16:13:07 | <Webuser083892> | Hi Katia :D |
16:13:38 | <Webuser083892> | Is there something wrong with the trackers? The upload seems... stuck? |
16:13:41 | <katia> | ur not my mom she is not in belgium |
16:14:01 | <katia> | yeah, things are slowed down temporarily |
16:18:48 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
16:20:37 | | StarletCharlotte joins |
16:23:09 | <Webuser083892> | Alright, if it's expected that's good. :) |
16:38:42 | | Webuser083892 quits [Client Quit] |
16:44:07 | <@arkiver> | all should start getting in better shape in the coming days! |
16:49:23 | <that_lurker> | \o/ |
17:05:21 | | i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat] |
17:05:40 | | i_have_n0_idea (i_have_n0_idea) joins |
17:06:09 | | NeonGlitch quits [Quit: My Mac Mini has gone to sleep. ZZZzzz…] |
17:12:13 | | ThreeHM quits [Quit: WeeChat 4.4.3] |
17:15:27 | | NeonGlitch (NeonGlitch) joins |
17:16:20 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
17:16:59 | | VerifiedJ (VerifiedJ) joins |
17:17:29 | | ThreeHM (ThreeHeadedMonkey) joins |
17:18:34 | | benjins3 joins |
17:21:17 | | unlobito (unlobito) joins |
17:23:14 | | lennier2 joins |
17:26:08 | | lennier2_ quits [Ping timeout: 250 seconds] |
17:35:58 | | sparky1492 quits [Quit: No Rain, No Rainbows] |
17:37:55 | | sparky1492 (sparky1492) joins |
17:38:46 | | pokechu22 (pokechu22) joins |
17:39:14 | | sparky14920 (sparky1492) joins |
17:42:48 | | sparky1492 quits [Ping timeout: 260 seconds] |
17:42:49 | | sparky14920 is now known as sparky1492 |
17:48:39 | | sparky14929 (sparky1492) joins |
17:52:08 | | sparky1492 quits [Ping timeout: 250 seconds] |
17:52:09 | | sparky14929 is now known as sparky1492 |
17:52:14 | | lflare quits [Quit: Ping timeout (120 seconds)] |
17:53:11 | | lflare (lflare) joins |
17:54:54 | | lflare quits [Client Quit] |
18:00:11 | | Webuser331643 joins |
18:11:26 | | Webuser331643 quits [Client Quit] |
18:15:33 | | Sluggs quits [Excess Flood] |
18:23:44 | | Sluggs joins |
18:41:01 | | Webuser201432 joins |
18:41:06 | | Webuser201432 quits [Client Quit] |
18:42:27 | <Exorcism> | arkiver: for zhubai, will a DPoS project be planned or should we archive with archivebot? |
18:48:11 | | flotwig quits [Quit: ZNC - http://znc.in] |
18:53:40 | | riteo quits [Ping timeout: 250 seconds] |
19:07:05 | | flotwig joins |
19:13:42 | | riteo (riteo) joins |
19:26:16 | | gust joins |
19:33:38 | | kansei quits [Quit: ZNC 1.9.1 - https://znc.in] |
19:34:24 | | kansei (kansei) joins |
19:51:25 | | Webuser990617 joins |
19:52:02 | | Webuser990617 quits [Client Quit] |
20:06:42 | | Megame (Megame) joins |
20:08:33 | | spirit joins |
20:33:37 | | gust quits [Remote host closed the connection] |
20:33:56 | | gust joins |
20:35:20 | | balrog (balrog) joins |
20:51:06 | | sparky1492 quits [Ping timeout: 250 seconds] |
21:00:34 | | SkilledAlpaca4189621 joins |
21:02:18 | | SkilledAlpaca418962 quits [Ping timeout: 260 seconds] |
21:02:19 | | SkilledAlpaca4189621 is now known as SkilledAlpaca418962 |
21:14:04 | | spirit quits [Ping timeout: 250 seconds] |
21:14:22 | | spirit joins |
21:19:52 | | nicolas17 quits [Quit: Konversation terminated!] |
21:37:53 | | BearFortress quits [Ping timeout: 260 seconds] |
21:38:32 | | NeonGlitch quits [Quit: My Mac Mini has gone to sleep. ZZZzzz…] |
21:48:38 | | nicolas17 joins |
21:52:30 | | trc (trc) joins |
22:06:28 | | Sokar quits [Ping timeout: 260 seconds] |
22:16:29 | | spirit quits [Remote host closed the connection] |
22:27:28 | | BlueMaxima joins |
22:59:54 | | sighsloth1090 (sighsloth1090) joins |
23:51:24 | | utulien joins |
23:53:55 | | StarletCharlotte quits [Remote host closed the connection] |
23:54:14 | | StarletCharlotte joins |