Kimi Open Platform Context Cache 50% Cost Reduction and Public Beta Details

By:Mia Published 2024-08-07T05:48:20Z

TapTechNews August 7th news, the AI unicorn company Dark Side of the Moon announced today that the context cache Cache storage cost of the Kimi Open Platform has been reduced by 50%, from $1.57/1Mtokens/min to $0.79/1Mtokens/min, effective immediately.

Kimi Open Platform Context Cache 50% Cost Reduction and Public Beta Details_0

On July 1, the ContextCaching function of the Kimi Open Platform started public beta. The official said that under the premise of unchanged API prices, this technology can reduce the cost of using the long-text flagship large model by up to 90% for developers and improve the model response speed.

TapTechNews attached The details of the public beta of the Kimi Open Platform context cache function are as follows:

Technology Introduction

According to the introduction, context cache is a data management technology that allows the system to pre-store a large amount of data or information that will be frequently requested. When the user requests the same information, the system can provide it directly from the cache without having to recalculate or retrieve from the original data source.

Kimi Open Platform Context Cache 50% Cost Reduction and Public Beta Details_1

Applicable Scenarios

Context cache is suitable for frequent requests and repeated reference to a large number of initial context scenarios, which can reduce the cost of long-text models and improve efficiency. The official said that the cost is reduced by up to 90%, and the first Token latency is reduced by 83%. Applicable business scenarios are as follows:

QABot that provides a large amount of preset content, such as the KimiAPI assistant

Frequent queries for a fixed set of document collections, such as the information disclosure Q&A tool for listed companies

Periodic analysis of static code libraries or knowledge bases, such as various CopilotAgents

Instantaneously high-traffic hit AI applications, such as the哄哄 simulator, LLMRiddles

Agent-like applications with complex interaction rules, etc.

Kimi Open Platform Context Cache 50% Cost Reduction and Public Beta Details_2

Kimi Open Platform Context Cache 50% Cost Reduction and Public Beta Details_3

Billing Instructions

The context cache charging mode is mainly divided into the following three parts:

Cache Creation Fee

When calling the Cache creation interface and successfully creating the Cache, the fee is charged according to the actual amount of Tokens in the Cache. $3.84/Mtoken

Cache Storage Fee

During the Cache survival time, the Cache storage fee is charged by minute. $1.57/Mtoken/min

Cache Call Fee

The charging of incremental tokens for Cache calls: Charged at the original price of the model

The charging of the number of Cache calls: During the Cache survival time, if the user requests the successfully created Cache through the chat interface and the chatmessage content matches the surviving Cache successfully, the Cache call fee will be charged according to the number of calls. $0.03/ time

Kimi Open Platform Context Cache 50% Cost Reduction and Public Beta Details_4

Public Beta Time and Eligibility Instructions

Public beta time: After the function goes online, the public beta lasts for 3 months, and the price during the public beta period may be adjusted at any time.

Public beta eligibility: During the public beta period, the ContextCaching function is preferentially opened to Tier 5 level users, and the opening time of other user scopes is to be determined.

Kimi open platform Context Cache cost reduction public beta