TapTechNews August 7th news, the AI unicorn company Dark Side of the Moon announced today that the context cache Cache storage cost of the Kimi Open Platform has been reduced by 50%, from $1.57/1Mtokens/min to $0.79/1Mtokens/min, effective immediately.
On July 1, the ContextCaching function of the Kimi Open Platform started public beta. The official said that under the premise of unchanged API prices, this technology can reduce the cost of using the long-text flagship large model by up to 90% for developers and improve the model response speed.
TapTechNews attached The details of the public beta of the Kimi Open Platform context cache function are as follows:
According to the introduction, context cache is a data management technology that allows the system to pre-store a large amount of data or information that will be frequently requested. When the user requests the same information, the system can provide it directly from the cache without having to recalculate or retrieve from the original data source.
Context cache is suitable for frequent requests and repeated reference to a large number of initial context scenarios, which can reduce the cost of long-text models and improve efficiency. The official said that the cost is reduced by up to 90%, and the first Token latency is reduced by 83%. Applicable business scenarios are as follows:
QABot that provides a large amount of preset content, such as the KimiAPI assistant
Frequent queries for a fixed set of document collections, such as the information disclosure Q&A tool for listed companies
Periodic analysis of static code libraries or knowledge bases, such as various CopilotAgents
Instantaneously high-traffic hit AI applications, such as the哄哄 simulator, LLMRiddles
Agent-like applications with complex interaction rules, etc.
The context cache charging mode is mainly divided into the following three parts:
When calling the Cache creation interface and successfully creating the Cache, the fee is charged according to the actual amount of Tokens in the Cache. $3.84/Mtoken
During the Cache survival time, the Cache storage fee is charged by minute. $1.57/Mtoken/min
The charging of incremental tokens for Cache calls: Charged at the original price of the model
The charging of the number of Cache calls: During the Cache survival time, if the user requests the successfully created Cache through the chat interface and the chatmessage content matches the surviving Cache successfully, the Cache call fee will be charged according to the number of calls. $0.03/ time
Public beta time: After the function goes online, the public beta lasts for 3 months, and the price during the public beta period may be adjusted at any time.
Public beta eligibility: During the public beta period, the ContextCaching function is preferentially opened to Tier 5 level users, and the opening time of other user scopes is to be determined.