Building Truly Open, Ethical, and Accessible AI.
Training now — ERNIE 21B enhanced with 37.84B tokens of pure reasoning
The LibreModel Project is a community-driven initiative to create powerful, state-of-the-art language models with an unwavering commitment to transparency and ethical principles. We are proving that foundational AI development can be done affordably, ethically, and in the open.
Lumen is our flagship reasoning model, built by enhancing ERNIE 4.5 21B with massive-scale continued pre-training focused on mathematics and reasoning. Rather than training a weak model from scratch, we're targeting the specific weaknesses in an already-strong foundation model.
Phase 1: Continued Pre-Training (CPT) — 37.84B tokens targeting ERNIE's documented weaknesses in mathematics and reasoning. Our dataset includes OpenMathReasoning (26B tokens from the AIMO-2 winning solution), DeepSeek reasoning traces, and curated math problems.
Phase 2: Supervised Fine-Tuning (SFT) — 180K high-quality examples teaching tool use, persona), and safe behavior. No reinforcement learning. No reward modeling. Just clean, ethical supervised learning.
Target: Match Apriel-15B and K2-Think-32B performance at only 3B active parameters (21B total). These models achieved frontier results through curriculum training and quality fine-tuning — the same approach we're using.
Training starts: This weekend (November 2025)
Expected completion: 10-30 days depending on throughput
Release: Full model weights, training code, dataset recipes, and technical report
Gigi was our proof-of-concept: a 960M parameter model trained on 100% public domain data for under $500. Named for its training data (Gutenberg & Government reports), Gigi validated our curriculum learning approach and proved that ethical, affordable model development is possible.
Key Lesson: Gigi lacked "autobiographical voice" due to heavy use of classic literature and limited post-training. We have designed 180k examples of multiturn chat behavior for SFT.
We believe the future of AI should be open, ethical, and accessible to everyone. This means:
Watch us train Lumen in real-time. All code, weights, and documentation will be released upon completion.