New and Improved Extremely Large Model!
Our new and improved xlarge has better generation quality and a 4x faster prediction speed. This model now supports a maximum token length of 2048 tokens and frequency and presence penalties.
Updated Small, Medium, and Large Generation Models
Updated small, medium, and large models are more stable and resilient against abnormal inputs due to a FP16 quantization fix. We also fixed a bug in generation presence & frequency penalty, which will result in more effective penalties.