Week of February 23rd summary for marin-community/marin

Milestone: Reliable, repeatable, enjoyable infrastructure
115 merged 5 opened 44 issues closed 14 contributors 0 epics 368 comments this week

Iris reliability hardening, Grug's module-first API refactor, and early MoE training experiments on TPU. The first CoreWeave GPU canary ferry was stood up and @ClassicLarry got Grug MoE running with replicated weights on v4 and v5p.

Other Changes


@gonzalobenegas added DNA experiments covering promoters, genomic regions, and k-mer tokenization #2992, plus auto-detection of BOS/EOS tokens in the DNA batch tokenizer #3055. @teetone updated Evalchemy non-math evaluation domains #3128. Agent recipe and scrub skill improvements by @dlwh #3129 and @rjpower #3056.

120 PRs this week, 193 new comments, and 44 issues closed (44 total)
Sort:

Top 15 runs (by FLOPs) this week (completed, running, crashed)


Dense isoflop scaling experiments continued with a full v8 rerun of the scaling_v3 sweeps at 3e20 and 2e20 FLOP budgets, showing slightly improved losses vs. v3 (e.g., 2.576 vs. 2.577 at the 2.5B/3e20 point). The first Grug MoE runs appeared: dlwh ran a 286M MoE with replicated weights on v5p-8 to 11.3B tokens at 21.6% MFU as a flopmatch baseline against the dense canary, and attempted a 2-host v5p-16 trial that crashed. ClassicLarry submitted a 300M speedrun on v5p-16. The first CoreWeave GPU canary ferry launched on H100x8 but crashed early at 3.7% MFU, beginning the GPU platform bring-up. The AdamH hyperparameter mega-sweep continued with 50+ additional 5B-token runs.

Run Owner Hardware FLOP Budget Wall Time Loss Evals Links
(done) exp2262pt2f_sft_qwen2pt5_ot4_30k_math_qwen3_235b_a22b_32768token-ad1022 Moo Jin Kim TPU v6 lite
(128 chips)
6.45e20 model
2.61e21 HW
(25% MFU)
23.3h Final loss: 0.0434 BPB: 0.500
(done) exp2262pt2g_sft_qwen2pt5_ot4_30k_math_n1_rejsamp_qwen3_32b_32768-689abd Moo Jin Kim TPU v6 lite
(128 chips)
6.45e20 model
2.61e21 HW
(25% MFU)
17.6h Final loss: 0.0042 BPB: 0.279
(done) exp2262pt3d_qwen3_1pt7b_base_ot4_240k_math_qwen3_32b_32768tokens-dec321 Moo Jin Kim TPU v5
(16 chips)
1.30e21 model
2.60e21 HW
(50% MFU)
2.8d Final loss: 0.202 BPB: 0.129
(done) exp2262pt3g_qwen3_1pt7b_base_ot4_30k_math_n8_rejsamp_soft_qwen3_-63d27f Moo Jin Kim TPU v5
(32 chips)
1.01e21 model
2.19e21 HW
(46% MFU)
2.0d Final loss: 0.1069 BPB: 0.062
(done) exp2262pt3c_qwen3_1pt7b_base_ot4_240k_math_qwen3_4b_32768tokens-557d96 Moo Jin Kim TPU v5 lite
(128 chips)
1.30e21 model
2.08e21 HW
(62% MFU)
2.3d Final loss: 0.0973 BPB: 0.056
isoflop-3e+20-d4096-L40-B16-adamh_scaling_v6 Will Held TPU v4
(32 chips)
3.00e20 model
1.36e21 HW
(22% MFU)
2.4d Final loss: 2.968 BPB: 0.955
isoflop-3e+20-d4096-L40-B16-adamh_scaling_v3 Will Held TPU v4
(32 chips)
3.00e20 model
1.36e21 HW
(22% MFU)
2.3d Final loss: 2.8507 BPB: 0.955
(done) exp2262pt2g_3_llama3pt1_ot4_30k_math_n1_rejsamp_qwen3_32b_32768t-662abd Moo Jin Kim TPU v5 lite
(256 chips)
4.76e20 model
1.30e21 HW
(36% MFU)
8.9h Final loss: 0.0854 BPB: 0.173
isoflop-3e+20-d768-L8-B1024-adamh_scaling_v5 Will Held TPU v4
(32 chips)
3.00e20 model
1.29e21 HW
(23% MFU)
1.8d Final loss: 2.9591 BPB: 1.005
isoflop-3e+20-d768-L8-B1024-adamh_scaling_v8 Will Held TPU v4
(32 chips)
3.00e20 model
1.29e21 HW
(23% MFU)
1.8d Final loss: 2.9525 BPB: 1.002
isoflop-3e+20-d768-L8-B1024-adamh_scaling_v3 Will Held TPU v4
(32 chips)
3.00e20 model
1.29e21 HW
(23% MFU)
1.8d Final loss: 2.9795 BPB: 1.013
isoflop-3e+20-d768-L8-B1024-adamh_scaling_v6 Will Held TPU v4
(32 chips)
3.00e20 model
1.29e21 HW
(23% MFU)
1.8d Final loss: 2.9579 BPB: 1.004
isoflop-3e+20-d1024-L11-B512-adamh_scaling_v7 Will Held TPU v4
(32 chips)
3.00e20 model
1.23e21 HW
(24% MFU)
1.7d Final loss: 2.8145 BPB: 0.947
isoflop-3e+20-d1024-L11-B512-adamh_scaling_v8 Will Held TPU v4
(32 chips)
3.00e20 model
1.23e21 HW
(24% MFU)
1.7d Final loss: 2.8047 BPB: 0.946
isoflop-3e+20-d1024-L11-B512-adamh_scaling_v6 Will Held TPU v4
(32 chips)
3.00e20 model
1.23e21 HW
(24% MFU)
1.7d Final loss: 2.8069 BPB: 0.947
Dense isoflop v8 sweep (3e20 budget, 273M-4.3B, 7 sizes) @Helw150 v4-32
(16 chips)

(23-44% MFU)
30-48h per run loss=2.576-2.937, 11-223B tokens per run
Dense isoflop v8 sweep (2e20 budget, 273M-2.5B, 6 sizes) @Helw150 v4-16
(8 chips)

(30-42% MFU)
25-39h per run loss=2.605-2.911, 11-134B tokens per run
Grug MoE flopmatch daily (286M MoE, replicated weights) @dlwh v5p-8
(4 chips)

(21.6% MFU)
12.5h loss=3.122, 11.3B tokens
Grug MoE v5p-16 trial (286M, 2-host) crashed @dlwh v5p-16
(8 chips)

(21.5% MFU)
2.9h loss=N/A, 4.2B tokens (crashed)
300M speedrun (stdattn, 4096 ctx) @ClassicLarry v5p-16
(8 chips)

(20.9% MFU)
2.2h loss=2.926, 6.0B tokens
CoreWeave GPU canary ferry (H100x8) crashed @rjpower H100x8
(8 GPUs)

(3.7% MFU)
0.3h 20M tokens (crashed early)
AdamH hyperparameter mega-sweep v3 (loop 3-9, 157M) @Helw150 v5p-8
(4 chips)

(23% MFU)
5h per run loss=3.48-3.75, 5B tokens each, ~50+ runs
Merged PR Open PR Draft PR Closed PR Open issue Closed issue

Keyboard shortcuts

?
Toggle this help
j / k
Next / previous section
t
Toggle details in current section
s
Cycle sort order in current section
o
Open current epic on GitHub
m
Open current milestone on GitHub
M
Open milestones list on GitHub