Speed Like Never Before โก
Jan v0.6.4 delivers our biggest performance update yet. Models load faster, inference is smoother, and memory usage is dramatically reduced. This is the Jan youโve been waiting for.
๐ Inference Speed Improvements
Dramatic performance gains across the board:
- 3x Faster Model Loading: Optimized model initialization reduces wait times
- 50% Faster Inference: Improved CUDA kernels and memory management
- Instant Model Switching: Switch between models with near-zero delay
- Background Preloading: Frequently used models stay ready in memory
- Smart Caching: Intelligent context caching reduces repeated work
๐ฏ GPU Optimization Revolution
Completely rewritten GPU acceleration:
- Auto-GPU Detection: Automatically finds and uses your best GPU
- Multi-GPU Support: Distribute model layers across multiple GPUs
- Memory Optimization: 40% reduction in VRAM usage
- Dynamic Offloading: Automatically balance between GPU and CPU
- CUDA 12 Support: Latest NVIDIA drivers and optimizations
๐ง Smarter Memory Management
Revolutionary memory handling:
- Adaptive Memory: Automatically adjusts to available system memory
- Memory Pressure Detection: Gracefully handles low-memory situations
- Efficient Model Unloading: Frees memory when models arenโt needed
- Context Length Optimization: Handle longer conversations without slowdown
- Memory Usage Dashboard: Real-time visibility into memory consumption
๐ฑ Startup Speed Breakthrough
Jan now starts in seconds, not minutes:
- Cold Start Optimization: 5x faster first launch
- Background Services: Core services start in parallel
- Lazy Loading: Only load components when needed
- Configuration Caching: Settings load instantly
- Progressive Initialization: UI appears immediately, features load progressively
๐ง Model Management Overhaul
Streamlined model experience:
- One-Click Downloads: Simplified model acquisition
- Download Resume: Interrupted downloads continue automatically
- Parallel Downloads: Download multiple models simultaneously
- Storage Optimization: Automatic cleanup of unused model files
- Model Recommendations: AI suggests optimal models for your hardware
๐พ Storage Efficiency
Dramatic reduction in disk usage:
- Model Compression: 30% smaller model files without quality loss
- Duplicate Detection: Automatically removes duplicate models
- Incremental Updates: Only download model changes, not entire files
- Smart Cleanup: Removes temporary files and caches automatically
- Storage Analytics: See exactly whatโs using your disk space
๐ Network Optimizations
Faster downloads and better connectivity:
- CDN Integration: Download models from the closest server
- Connection Pooling: Efficient network resource usage
- Retry Logic: Automatic recovery from network interruptions
- Bandwidth Adaptation: Adjusts download speed to network conditions
- Offline Mode: Better handling when internet is unavailable
๐ Performance Monitoring
New tools to understand performance:
- Real-time Metrics: See inference speed, memory usage, GPU utilization
- Performance History: Track performance over time
- Bottleneck Detection: Identify whatโs slowing down your system
- Benchmark Tools: Compare performance across different configurations
- Performance Profiles: Save optimal settings for different use cases
๐ Critical Fixes
Major stability improvements:
- Fixed memory leaks during long conversations
- Resolved GPU driver compatibility issues
- Eliminated random crashes during model switching
- Fixed model corruption during interrupted downloads
- Resolved race conditions in multi-threaded operations
Technical Details
This release includes fundamental changes to our inference engine, memory management, and GPU acceleration systems. While backwards compatible, you may notice different memory usage patterns and significantly improved performance.
Experience the fastest Jan ever. Download v0.6.4 and feel the difference.