Over the past month, the term "middleman" has frequently appeared on many people's radar. Some of the previous airdrop hunters in the crypto world have quietly transformed into "API middleman" merchants, engaging in the business of token import and export.
The so-called "middleman" is not a new technological invention but rather a form of arbitrage based on the price difference of global AI services and access barriers. Despite facing challenges such as privacy, security, and compliance, this field has attracted a large number of individuals and small teams to participate.
So, what exactly is an "API middleman"? How does it conduct Token arbitrage based on the global AI price difference and access barriers, attracting a large number of individuals and small teams to join?
Let's now dissect its essence and operational process.
1. What is a Middleman?
The essence of an API middleman is to build an intermediary service that provides foreign AI vendors' API Tokens to domestic users at a lower price and in a more convenient manner, known as the "global Token porter."
Its operational process is roughly as follows:

· Choose a foreign AI vendor model (such as OpenAI/Claude)
· The resource provider obtains low-priced tokens through "gray" or technical means
· Build a middleman for encapsulation, billing, and distribution
· Provide to end-users such as developers/companies/individuals
Functionally, it operates as an "AI transfer station"; from a business perspective, it resembles a liquidity intermediary in the Token secondary market.
The premise of this chain is not a technological barrier but the coexistence of several long-standing differences:
· Official API pricing is high
· There is a cost mismatch between subscription and API models
· Different regions have different access and payment conditions
· Users have a strong demand for model capabilities but find the official integration path unfriendly
Only when these factors combine, is when the "intermediary" gains its foothold.
II. Why Do People Use Intermediaries?
The reason behind the surge in "Token Import" lies in the high cost brought about by the transformation of AI roles, as well as the gap in capabilities between domestic and international models.
1. Good Models Require a Lot of Tokens to Operate
With the maturity of desktop-level AI agents such as Codex and Claude Code, AI has truly gained the ability to "work," such as assisting in programming, video editing, financial transactions, and office automation. These tasks rely heavily on high-performance large models, with costs billed in Tokens.
Take Claude Code, for example, with an official price of about $5 per million Tokens. Intensive use for one hour could cost tens of dollars, while heavy developers or enterprises could consume over $100 per day. This cost far exceeds the expectations of many people, even surpassing the cost of hiring junior programmers, making "how to use top-notch AI at a low cost" a necessity.
2. Clear Advantage of Top Overseas Models
Although domestic models have progressed rapidly in the past year and have highly competitive prices, in scenarios such as complex code tasks, toolchain collaboration, long-chain reasoning, multi-modal stability, etc., top overseas models still have a clear advantage.
This is why many developers, researchers, and content teams, even knowing that the price is higher, are still willing to prioritize the model capabilities of OpenAI, Anthropic, and Google.
In simple terms, users do not insist on using an "intermediary"; users just want:
· Better models
· Lower prices
· Easier access
When these three things cannot be obtained simultaneously through official channels, intermediaries naturally emerge.
3. Cost Mismatch Between Subscription and API Models
Another reason why intermediaries have become popular is that there is not always a linear correspondence between subscription benefits and API billing.
A common practice in the market has always been to purchase official subscriptions, team packages, enterprise credits, or other discounted resources, and then resell some of these capabilities to end users after encapsulating them.
Take OpenAI, for example. Purchasing a Plus subscription allows you to use Codex's service, log in through OAuth to access OpenClaw, which is equivalent to calling an API. The $20 monthly subscription fee for Plus can generate approximately 26 million tokens, with an output price of $10-12 per million, equivalent to $260-312. Purchasing a subscription to proxy out tokens is a very cost-effective way to use them.
From the experience of some users, this approach may indeed be cheaper at certain stages than directly using the official API. However, it must be emphasized:
· This is not the official pricing system
· Nor does it mean it can be a stable and equivalent alternative to API calls
· Furthermore, this method is by no means sustainable in the long term
Many people only see the "cheap" aspect but overlook the fact that these cheap deals are often built on unstable resources, gray areas, or strategic loopholes.
Three, Can Intermediaries Be Used?
Whether intermediaries can be used is not an absolute answer.
The real question is: What risks are you willing to take?
The profit model of intermediaries seems straightforward—buy low, sell high. However, when you truly dissect it, it usually involves at least three layers, each carrying different risks.
1. Upstream: Where Do the Low-Cost Token Resources Come From?
This is the starting point of the entire ecosystem and also the grayest layer.
Some resource providers obtain model call capabilities far below market price through various means, such as:
· Utilizing enterprise support programs and cloud credits
· Registering accounts in bulk for rotation
· Redistributing subscription benefits, team accounts, or discounted resources
· In more extreme cases, it may also involve credit card fraud, fraudulent account opening, and other illegal methods
The different sources of resources determine the stability ceiling of the intermediary. If the upstream resources themselves are based on unstable or even illegal methods, then what end users are buying is not a bargain but a temporary interface that may fail at any time.
2. Middleman: Whose Servers Is Your Data Passing Through?
This is often the most easily overlooked question.
When you invoke a model through a middleman, user inputs such as prompts, contexts, file contents, and model outputs usually pass through the middleman's own servers first.
This data is highly valuable, reflecting real user intent, industry-specific prompts, and model output quality, and can be used for evaluating or fine-tuning proprietary models. The middleman may anonymize this data, package it, and sell it to domestic big model companies, data brokers, or academic research institutions. Users, while paying, also contribute training data for free, becoming a typical example of "customers also being the product."
A recent rant by OpenClaw founder @steipete illustrates this point:

Furthermore, the middleman may also perform script injections in the request chain (e.g., stealthily adding hidden System Prompts), thereby altering model behavior, increasing token consumption, and even introducing additional security vulnerabilities. This risk is particularly critical in the AI Agent scenario.
3. Endpoint: Did You Buy the Flagship Version, or Did You Receive It?
This is the third common risk: model degradation or model swapping.
What users see when they pay is a certain high-end model name, but what the actual request is directed to may not necessarily be the corresponding version. The reason is simple—for some merchants, the most direct way to reduce costs is not optimization but substitution.
For example, a user buys the flagship Opus 4.7 version, but the actual invocation is made to the sub-flagship Sonnet 4.6 or the lightweight Haiku version. Because the API format can remain compatible, ordinary users find it hard to detect it at first. Only when the task becomes sufficiently complex, will they distinctly feel that the "effect is off," "the stability is insufficient," or "the context quality has deteriorated," but they cannot substantiate it.
According to a research team's testing of 17 third-party API platforms, 45.83% of platforms have an "identity mismatch" issue, where users pay the price for GPT-4 but actually run a cheap open-source model, with performance differences of up to 40%.
In conclusion, using unofficial intermediaries poses risks such as data leakage, privacy issues, service interruptions, model inconsistencies, and exit scams. Therefore, for sensitive operations, business projects, or tasks involving personal privacy, it is highly recommended to use the official API.
IV. Can the Intermediary Business Still Thrive?
Despite the high risks involved, this business has not disappeared. On the contrary, it is still evolving.
If the early "Token Import" was about importing overseas models at a low cost, then there is now another concept in the market: Token Export.
1. Why Do Some Still Engage in This Business?
Because there is a genuine demand, low startup costs, and a prepaid model that brings fast cash flow. However, there is immense risk management pressure. Recently, Claude has increased the intensity of user KYC and banning, while OpenAI has patched many "0-payment" loopholes. On the other hand, due to the unstable service, behind the cheap costs lies a high and consistent post-sale cost. Additionally, coupled with intense competition in the industry, many intermediaries are currently facing a situation of declining volume and price.
Therefore, this industry is more like a high turnover, low stability, high-risk short-term window, making it difficult to easily package into a long-term, steady, sustainable business.
2. Why is "Token Export" Resurfacing?
If "Token Import" was about leveraging the price difference of overseas models, then "Token Export" is about utilizing the cost-effectiveness advantage of domestic models, packaging them for sale to overseas users, forming a "reverse export" path.
The price advantage of domestic models is significant. Referring to early 2026 data, the price of Qwen3.5 million Tokens is as low as 0.8 Chinese Yuan (about $0.11), which is 1/18 of Gemini 3 Pro and more than 27 times lower than Claude Sonnet 4.6's $3 input price. GLM-5 surpasses Gemini 3 Pro in programming benchmarks, nearing Claude Opus 4.5, yet the API price is merely a fraction of the latter.
These domestic models have relatively low accessibility overseas due to registration barriers, payment restrictions, language interfaces, and a lack of information among overseas developers about the capabilities of domestic models, creating invisible entry barriers.
Therefore, some middleware nodes choose to use a domestic RMB bulk purchase model API quota, expose an OpenAI-compatible interface to overseas developers and startups through a protocol conversion layer, and sell it to them in USDT/USDC pricing, with a considerable profit margin.
For example, Alibaba Cloud's Hundred Refinements Coding Plan offers a package of four major models: Qwen3.5, GLM-5, MiniMax M2.5, Kimi K2.5. New users can get 18,000 request quotas for only 7.9 RMB in the first month, which is priced in USD for sale in the overseas market, with a profit margin of over 200%.
From a pure business logic perspective, this certainly has profit potential.
However, in the long term, it also cannot avoid one issue: stability and compliance.
3. Is This Approach Stable?
It is not stable. Not long ago, Minimax announced that it would standardize third-party middleware nodes because some nodes were cutting corners, leading to damage to Minimax's own reputation. Not to mention that if the source of the token involves theft or fraud, it may constitute a criminal offense. Moreover, if users use the middleware token and it leads to data leakage or misuse, it may also bring unjustified disaster to you as the seller of the token.
So the real issue is not "can you make money," but rather: can the money you make cover the underlying systemic risks.
5. How Can Ordinary Users Identify Middleware Node Risks?
In the background of a mixed API middleware node market, choosing a reliable service is crucial.
As some middleware nodes engage in model substitution and adulteration, users can employ some detection methods:
· "ping + self-reporting model" command following the test
Always say "pong" exactly, and tell me what series of models you are from, preferably telling me the specific version number. Reply in Chinese.
User input: ping
Features of a Genuine Model:
· Strictly replies with "pong" (in lowercase, without additional chatter)
· input_tokens are usually around 60-80
· Simple style, no emojis, not sycophantic
Fake Model/Misleading Features:
· abnormally high input_tokens count (often exceeding 1500, indicating massive injection of hidden system prompts)
· responds with "Pong! + gibberish + emoji"
· does not strictly adhere to the "exactly say 'pong'" command
Reference to @billtheinvestor's detection methods:
1. 0.01 Temperature Sort Test: Input "5, 15, 77, 19, 53, 54" and ask AI to sort or select the maximum value. Genuine Claude almost consistently outputs 77, while a fake GPT-4o-latest often outputs 162. If the results are wildly inconsistent over 10 consecutive attempts, it is likely a fake model.
2. Long Text Input Sniffing: If a simple ping operation results in an input_tokens count exceeding 200, it may imply that a relay station has concealed a massive prompt, with the probability of a fake model exceeding 90%.
3. Violation of Refusal Speech Style Identification: Deliberately ask a violating question and observe AI's refusal style. Genuine Claude will politely but firmly reply, "sorry but I can't assist...," while a fake model often tends to be overly verbose, use emojis, or employ flattering tones like "Sorry, master~".
4. Feature Deficiency Detection: If a model lacks function calls, image recognition, or stable long-context responses, it is highly likely a weak model impersonating.
Additionally, you can also choose some relay station detection websites to assess the "purity" of your own token, but be aware that this may result in exposing the key in plain text. The safest approach is still through official channels.
It is important to emphasize:
Even if you have mastered the identification techniques, it does not mean you can completely avoid risks because many risks are invisible to ordinary users.
Final Thoughts
A transit station is not the ultimate answer in the AI era; it is more like a temporary arbitrage window under the mismatch of global model capabilities, pricing mechanisms, payment terms, and access permissions.
For ordinary users, it may indeed be a low-cost entry point to access top-notch models; but for developers, teams, and entrepreneurs, the real cost has never been the Token itself, but the underlying stability, security, compliance, and trust cost.
Cheap can be replicated, and interface compatibility can also be replicated. What is truly difficult to replicate is never the price but long-term reliability.
Friendly reminder: Ordinary users who want to try, it is recommended to only use it in non-sensitive, non-critical scenarios, and never input core data, business secrets, or personal privacy; developers should prioritize official APIs or official self-made proxies to ensure stability and compliance, for a more secure experience; entrepreneurs planning to enter the field must establish a clear exit mechanism in advance to avoid getting deeply stuck in the gray area.
