Hostname: page-component-586b7cd67f-l7hp2 Total loading time: 0 Render date: 2024-12-06T13:38:15.805Z Has data issue: false hasContentIssue false

Game-theoretic policy computing and simulation for blockchained buffering system via diffusion approximation

Published online by Cambridge University Press:  12 January 2024

Wanyang Dai*
Affiliation:
Department of Mathematics and State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China
*
Corresponding author: Wanyang Dai; Email: nan5lu8@nju.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

We study 2-stage game-theoretic problem oriented 3-stage service policy computing, convolutional neural network (CNN) based algorithm design, and simulation for a blockchained buffering system with federated learning. More precisely, based on the game-theoretic problem consisting of both “win-lose” and “win-win” 2-stage competitions, we derive a 3-stage dynamical service policy via a saddle point to a zero-sum game problem and a Nash equilibrium point to a non-zero-sum game problem. This policy is concerning users-selection, dynamic pricing, and online rate resource allocation via stable digital currency for the system. The main focus is on the design and analysis of the joint 3-stage service policy for given queue/environment state dependent pricing and utility functions. The asymptotic optimality and fairness of this dynamic service policy is justified by diffusion modeling with approximation theory. A general CNN based policy computing algorithm flow chart along the line of the so-called big model framework is presented. Simulation case studies are conducted for the system with three users, where only two of the three users can be selected into the service by a zero-sum dual cost game competition policy at a time point. Then, the selected two users get into service and share the system rate service resource through a non-zero-sum dual cost game competition policy. Applications of our policy in the future blockchain based Internet (e.g., metaverse and web3.0) and supply chain finance are also briefly illustrated.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press.

1. Introduction

In this paper, we study a blockchained buffering system with federated learning as shown in Figure 1.

Figure 1. A blockchained buffering system with federated learning, which consists of J users and V pools.

The main focus is of three folds: 2-stage game-theoretic problem oriented 3-stage service policy computing of users-selection and rate scheduling/dynamic pricing, convolutional neural network (CNN) based algorithm design, and simulation case studies. The game-theoretic problem consists of both zero-sum and non-zero-sum 2-stage game competitions (representing “win-lose” and “win-win” 2-stage competitions). Furthermore, the computed policy is proved to be asymptotically optimal and fair via diffusion approximation. The asymptotic optimality means that the whole workload and total cost of the system are asymptotically minimized. The asymptotic fairness means that no user can change his personal policy unilaterally for profit. The computed 3-stage service policy as shown in Figure 2 is based on a 2-stage game-theoretic problem whose solution is represented by a saddle point to a zero-sum game problem and a Nash equilibrium point to a non-zero-sum game problem (see also the related concepts in Dai [Reference Dai8, Reference Dai9], Marchi [Reference Marchi19], Nash [Reference Nash21], and Rosen [Reference Rosen25]). Furthermore, in the 2-stage game problems, each user has his own utility function in terms of price, queue length, and service rate. This utility function is a generalization of the existing one in Dai [Reference Dai6, Reference Dai8, Reference Dai9], Ye and Yao [Reference Ye and Yao28], and references therein. Examples of such a utility function can be the so-called proportionally fair allocation, minimal potential delay allocation, and $(\beta,\alpha)$-proportionally fair allocation (see e.g., Ye and Yao [Reference Ye and Yao28]), which are widely used in internet protocols and communication systems.

Figure 2. A 3-stage processing flow chart of users-selection, dynamic pricing, and rate scheduling for a multiple pool service system with J-users, where J is taken to be 3 for an illustration.

In Figure 1, we present such a generalized service system consisting of V service pools and J buffer queues corresponding to J-parallel users for two positive integer numbers V and J. A blockchain system is added to this system for security and distributed data storage. A federated leaning center is also added to this system for dynamic policy computing and online payment transaction via stable digital currency. This blockchained buffering and federated learning system is a generalized platform of the recent studies (see e.g., Ayaz et al. [Reference Ayaz, Sheng, Tian and Guan2], Dai [Reference Dai8, Reference Dai9, Reference Dai11], Demertzis et al. [Reference Demertzis, Iliadis, Pimenidis, Tziritas, Koziri and Kikiras14], Qu et al. [Reference Qu, Pokhrel, Garg, Gao and Xiang22]), which come up in different research areas such as metaverse, sixth generation of wireless communication (6G), internet of vehicles (IoV), web3.0, etc. Due to the security consideration of the system, blockchains are used to protect privacy among different users as shown in Figure 1. Moreover, the way developed in Dai [Reference Dai8, Reference Dai9] uses a single-dimensional aggregated total workload process of the system to dynamically design general-dimensional decision parameter vector at each time point in the federated learning (FL) center. Then, the FL center sends the computed parameters back to their corresponding individual service pools, respectively, for local information upgrades and local model training. This idea to employ the single-dimensional aggregated workload process in the design of Dai [Reference Dai8, Reference Dai9] is motivated from the state space collapse property (or more classically, the Little’s law) widely studied in queueing literature (see e.g., Bramson [Reference Bramson3] and Little [Reference Little17]). The purpose to use this idea in the processing of a multiclass queueing network environment is aimed to reduce the system’s dimension and to avoid the curse of dimensionality. When state space collapse phenomenon happens in a queueing system, the multiple class queueing processes will display a certain proportional relationship with the single-dimensional aggregated workload process, which will significantly reduce the system computational complexity. In the design of Dai [Reference Dai8, Reference Dai9], the proportional relationship represented by the state space collapse property is generalized to be represented by a solution to a game-theoretic problem. This solution can be trained and computed in the FL center, which can be used as an online scheduling policy. Under certain traffic flow and service discipline assumptions, this dynamic policy is proved to be asymptotically fair and optimal in certain sense (that will be elaborated later) through diffusion approximation under the so-called diffusive scaling and the well-known heavy-traffic regime. When the game-theoretic problem reduces to an optimization problem and without the blockchain security consideration, readers are also referred to Dai [Reference Dai6], Ye and Yao [Reference Ye and Yao28] for related studies. Furthermore, the studies in Dai [Reference Dai8, Reference Dai9] consider a general multiple service pool system with a more general input flow process (i.e., a J-dimensional triply stochastic renewal reward process (TSRRP)). In the meanwhile, the studies in Dai [Reference Dai6], Ye and Yao [Reference Ye and Yao28] focus on the analysis when the input process is a conventional J-dimensional renewal process.

The contribution of this research is of three folds. The first fold is to add the dynamic pricing capability to the study in Dai [Reference Dai9]. The second fold is to design a general CNN based policy computing algorithm flow chart along the line of the so-called big model framework. The third fold is to provide detailed simulation examples. However, the main focus of our current paper is on the design and analysis of a joint 3-stage service policy extended from the previously mentioned 2-stage game-theoretic policy concerning users-selection, dynamic pricing, and online rate resource allocation for given queue/environment state dependent pricing and utility functions. The asymptotic optimality and fairness of this designed 3-stage service policy is justified by applying the well-known heavy traffic approximation and modeling technique together with our newly added dynamic pricing functionality. In the meanwhile, it is also supported by our newly conducted simulation case studies with three users and through explicitly constructing the solutions to their corresponding dual cost game competition problems. Note that, the allocated rate to a user is corresponding to the service time for the user. Furthermore, the service for the user includes four steps: user registration, security checking, dynamic policy computing, and online payment transaction through stable digital currency as shown in Figure 1. Thus, the service time for the user will be the summation of the processing times corresponding to the four service steps.

The design procedure is illustrated in Figure 2, where J (e.g., J = 3) users want to receive services from the V service pools. However, only two of them can be selected to receive services at each time point according to chosen state dependent pricing and utility functions. After the selected two users get into services, they need to share the limited capacity from different service pools in a cooperative way. The selection process is determined by a dynamic policy corresponding to a solution (called a saddle point) to a zero-sum dual cost game competition problem at each particular time point. The sharing process for the selected two users is to compete the system rate service resource and is determined by a dynamic policy through a solution (called a Nash equilibrium point) to a non-zero-sum dual cost game competition problem. Note that, as shown in Figure 2, the associated 2-stage game-theoretic problem is a J × V-dimensional problem. In general, an explicit solution is not available. Thus, we design a CNN based algorithm flow chart for general usage. However, to support our current designed policy, we choose smaller J and V to conduct simulation case studies, which is presented in Sections 45.

The input data flows from different users to our system as indicated in Figure 1 are characterized by a J-dimensional TSRRP that can be further approximated by a diffusion process with the target for our effective simulation. Furthermore, the service rate capacity available for resource-competing users at each pool is modeled as a randomly capacity region evolving with a finite state continuous Markov chain (FS-CTMC). The arrived data flows can immediately get into service if the servers are available. Otherwise, they will be stored in the queueing buffers waiting for service. Besides buffers, the decision information after service for each user will be stored in a distributed data base called blockchain as designed in Figure 1.

There are many reasons (e.g., security, decentralization, smart contract) for today’s FinTech system and future Internet (e.g., metavers and web3.0) to choose blockchain as a key technology (see, e.g., Buterin [Reference Buterin4], Dai [Reference Dai8,Reference Dai9,Reference Dai11], Iansiti and Lakehani [Reference Iansiti and Lakehani16], Nakamoto [Reference Nakamoto20], Rajan and Visser [Reference Rajan and Visser23]). Blockchain consisting of data blocks is an orderly distributed database with encryption and is frequently referred to as a ledger. Each data block contains the proposed (or calculated) decision information with customer’s private and public keys at a single time point with a time-stamp and a link to a previous block. The management of a blockchain can be realized via various smart contracts in a decentralized way. Traditionally, a smart contract within a blockchain can be considered as a digitalized regulation rule with common sense. In this paper, we will make the rule dynamically evolving according to an online decision-making policy, i.e., a solution to a dynamic game based competition problem.

More precisely, we will study the resource allocation and dynamic pricing of a joint saddle & Nash equilibrium service policy for such a system. When users get into service either from buffers or from outside of the system, the computation of their processing policy concerning resource allocation may depend on long history data stored in the blockchain (e.g., represented by a conditional mean defined process) and can be dynamically priced via stable digital currency. Each queue may be served at the same time through multiple smart service pools while each pool may also serve multiple queues simultaneously by running intelligent policies. Note that, to reflect the dynamic evolving nature of real-world systems and to realize the decentralized operation in a blockchain, the users to be selected at a time is random, the number of pools to serve a specific queue is random, and the number of queues to be served by a given pool is also random. The effectiveness of our proposed policy is in terms of revenue, profit, cost, system delay, etc. We model them through some utility (or hash) functions in terms of the performance measures of their internal data flow dynamics such as queue length and workload processes. To demonstrate the usefulness of our policy, we derive a reflecting diffusion with regime-switching (RDRS) model for the performance measures under our designed policy to offer services to different users in a cost-effective, efficient, and fair way. Based on this RDRS model, our proposed policy is effectively implemented with numerical simulations for some case studies of the system in Figures 12. Concerning the numerical scheme through Monte Carlo simulation for a Brownian motion driven stochastic system, readers are referred to Dai [Reference Dai10] and references therein for more details.

Since dynamic resource pricing is our major concern, we give some discussions about the concept of stable digital currency and its applications. More precisely, stable digital currency is a digital token used in digital informational and data network systems. It can be traced back to the optimal pricing of bits (or ports) in telecommunication managements and admission controls through token buffers in communication networks around the mid and late 90s by Bell Labs’ researchers (see e.g., the related discussions in Dai [Reference Dai6], Elwalid and Mitra [Reference Elwalid and Mitra15]). Along this line, Nakomoto [Reference Nakamoto20] extended the concept of bit (or port) to the bitcoin in the year of 2008 and Buterin [Reference Buterin4] further enhanced this concept to Ethereum in the year of 2013. Note that, for both the bitcoin and the Ethereum, they are still not real stable digital currencies. However, during this evolvement, Dai (see e.g., Maker [18]) made his effort to endow the Ethereum with real value and invented DaiCoin. Since then, this concept and lawful implementations are becoming more and more popular with the emergence of US digital dollar (DD), (China Central Bank) digital currency/electronic payment (DC/EP), and (European) Central Bank digital currency (CBDC). The latest application of dynamic resource pricing can be found in a metaverse system (see e.g., Dai [Reference Dai11]).

To show the importance of stable digital currency, an example based on a supply chain finance service system is displayed in Figure 3. The business model presented in this example can be considered as a generalized online digital payment system with service lead time involvements and consists of 4 typical service stages to make goods eventually delivered to customers: raw material procurements, make to order (MTO), assemble to order (ATO), and agent sales (see e.g., the upper-half of Figure 3 where agents are further classified into two levels of suppliers). Usually, during the procurement and service stages, the cash cannot be paid until the delivery of procured products. Thus, the bank notes, e-bill, receipt, etc. through credit, mortgage, and the third party warrant as shown in the middle of Figure 3 are widely used in real-world practice. To improve the efficiency and security of this type of payments, the stable digital currency such as US DD, (China) DC/EP, and (European) CBDC as designed in Figure 3 is a suitable choice. Furthermore, as shown in the upper-right corner of Figure 3, data information among companies can be asymmetric or symmetric. Frequently, they are not exchangeable. Thus, our policy can be integrated into this system to develop a smart online algorithm in terms of dynamic resource pricing to solve this problem. From the lower-half design in Figure 3, we can see that our supply chain system can be mapped into and interact with an information system through wireless 5G/6G network or wireline IP network (and even future IoB). Then, many online payments and transactions with lawful services can be handled as shown in the middle of Figure 3.

Figure 3. A generalized supply chain with finance transactions via stable digital currencies. In this figure, ATO means assemble to order, MTO means make to order, DD means digital dollar, DC/EP means (China Central Bank) digital currency/electronic payment, CBDC means (European) Central Bank digital currency, transmission control protocol means transmission control protocol, IoB means Internet of Blockchains, Asym and sym mean asymmetry and symmetry, respectively.

The remainder of this paper is organized as follows. In Section 2, we formulate our system model for dynamic resource pricing via stable digital currency. In Section 3, we present our main theorem based on a 3-stage (i.e., users-selection, dynamic pricing, and resource-competition scheduling) policy. A corresponding general CNN based algorithm flow chart is also presented in this section. In Section 4, we present two illustrative policy examples. In Section 5, we conduct simulation case studies to show the effectiveness of our policy. In Section 6, we theoretically prove our main theorem. In Section 7, we give the conclusion of this paper.

2. System model

In this section, we present our service model with dynamic resource pricing capability. It owns V number of service pools associated with a set of positive integers ${\cal V}\equiv\{1,\ldots,V\}$ and owns J number of queues for J-parallel users corresponding to a set of positive integers ${\cal J}\equiv\{1,\ldots,J\}$). Furthermore, we assume that the buffer storage in each queue is nonnegative. Each pool owns Jv number of flexible parallel-servers with v belonging to a positive integer set ${\cal V}$. Let the prime denote the transpose of a vector or a matrix. Then, associated with the queues, there is a J-dimensional arrival process $A=\{A(t)=(A_{1}(t),\ldots,A_{J}(t))',t\geq 0\}$ and it is called a data packet arrival process. In this situation, $A_{j}(t)$ for each $j\in{\cal J}$, $t\geq 0$, and some positive integer $n\in\{1,2,\ldots\}$ is the number of data packets that arrive at the jth queue during time interval $(0,t]$. In addition, in a real world service system such as a banking service or a supply chain system, the associated input ethereum/cash flows and supply/demand processes can be digitalized and mapped into the data packet based framework. The size of a data packet is a random number $\zeta\in\{1,2,\ldots\}$. Our system is assumed under an external random environment driven by a stationary FS-CTMC $\alpha=\{\alpha(t),t\in[0,\infty)\}$ with a finite state space ${\cal K}\equiv\{1,\ldots,K\}$, whose generator matrix is given by $G=(g_{il})$ with $i,l\in{\cal K}$, and

(2.1)\begin{eqnarray} &&g_{il}=\left\{\begin{array}{ll} -\gamma(i)&\mbox{if}\;\;i=l,\\ \gamma(i)q_{il}&\mbox{if}\;\;i\neq l, \end{array} \right. \end{eqnarray}

where $\gamma(i)$ is the holding rate for the continuous time chain staying in a state $i\in{\cal K}$ and $Q=(q_{il})$ is the corresponding transition matrix of its embedded discrete-time Markov chain (see e.g., Resnick [Reference Resnick24]). Moreover, define τn for each nonnegative integer $n\in\{0,1,\ldots\}$ by:

(2.2)\begin{align} \tau_{0}\equiv 0,\;\;\tau_{n}\equiv\inf\{t \gt \tau_{n-1}:\alpha(t)\neq\alpha(t^{-})\}. \end{align}

Note that, the external random environment may be caused by different factors (see e.g., the explanations in Choudhury et al. [Reference Choudhury, Mandelbaum, Reiman and Whitt5], Dai [Reference Dai6], Wang and Moayeri [Reference Wang and Moayeri27]). Here, in our blockchained case, the FS-CTMC $\alpha(\cdot)$ may be considered as a history-data dependent system randomly evolving parameter for a targeted stationary random variable $\alpha(\infty)$, which can be generated by (or approximated through) a conditional mean defined process, i.e.,

(2.3)\begin{eqnarray} \alpha(t)=E\Big[\alpha(\infty)\left|{\cal F}_{t}\Big]\right., \end{eqnarray}

where $\{{\cal F}_{t},t\in[0,\infty)\}$ is a filtration generated by blockchain history information. For example, when the blockchain database read and write (R/W) times are exponentially distributed, the martingale representation theorem for a jump-diffusion process (see e.g., Applebaum [Reference Applebaum1]) can be applied to (2.3) to generate the required FS-CTMC assumption. Then, we can model the arrival process $A_{j}(\cdot)$ for each positive integer $j\in{\cal J}$ as a big data flow stream through a TSRRP as in Dai [Reference Dai8]. More precisely, the process $A_{j}(\tau_{n}+\cdot)$ for each $n\in\{0,1,\ldots\}$ is a counting process corresponding to a (conditional) delayed renewal reward process with arrival rate $\lambda_{j}(\alpha(\tau_{n}))$ and mean reward $m_{j}(\alpha(\tau_{n}))$ associated with finite squared coefficients of variations $\alpha^{2}_{j}(\alpha(\tau_{n}))$ and $\zeta^{2}_{j}(\alpha(\tau_{n}))$ during time interval $[\tau_{n},\tau_{n+1})$.

Now, we let $\{u_{j}(k),k=1,2,\ldots\}$ be the sequence of times between the arrivals of the $(k-1)$th and the kth reward batches of packets at the jth queue. The associated batch reward is given by $w_{j}(k)$ and all the data packets arrived with it are indexed in certain successive order. Therefore, we can present the renewal counting process corresponding to the inter-arrival time sequence $\{u_{j}(k),k=1,2,\ldots\}$ for each $j\in{\cal J}$ as follows,

(2.4)\begin{eqnarray} &&N_{j}(t)=\sup\left\{n\geq 0:\sum_{k=1}^{n}u_{j}(k)\leq t\right\}. \end{eqnarray}

Thus, we can restate the definition of an TSRRP $A_{j}(\cdot)$ quantitatively through the expression,

(2.5)\begin{eqnarray} &&A_{j}(t)=\sum_{k=1}^{N_{j}(t)}w_{j}(k). \end{eqnarray}

Each data packet will first get service in the system and then leave it. The service is managed by a blockchain. In this blockchain, the service for a data packet is composed of two parts: security checking and policy computation (or real data payload transmission). After completing the service, the security information and the policy (or the transmission result) will be stored and copied to all the participating partner nodes for storage and in the meanwhile to produce nonce values and private keys. Moreover, we denote $\{v_{j}(k),k=1,2,\ldots\}$ to be the sequence of successive arrived packet lengths at queue j, which is assumed to be a sequence of strictly positive i.i.d. random variables with average packet length $1/\mu_{j}\in(0,\infty)$ and squared coefficient of variation $\beta_{j}^{2}\in(0,\infty)$. In addition, we suppose that all the inter-arrival and service time processes are mutually (conditionally) independent when the environmental state is fixed. Associated with each $j\in{\cal J}$ and each nonnegative constant h, we employ $S_{j}(\cdot)$ to denote the renewal counting process corresponding to $\{v_{j}(k),k=1,2,\ldots\}$. In other words,

(2.6)\begin{eqnarray} &&S_{j}(h)=\sup\left\{n\geq 0:\sum_{k=1}^{n}v_{j}(k)\leq h\right\}. \end{eqnarray}

Define $Q_{j}(t)$ to be the jth queue length with $j\in{\cal J}$ at each time $t\in[0,\infty)$ and $D_{j}(t)$ to be the number of packet departures from the jth queue in $(0,t]$. Therefore, the queueing dynamics governing the evolving of the internal data flow in and out within our unified service platform can be modeled by:

(2.7)\begin{eqnarray} &&Q_{j}(t)=Q_{j}(0)+A_{j}(t)-D_{j}(t), \end{eqnarray}

where each queue is assumed to have an infinite storage capacity to buffer data packets (jobs) arrived from a given user.

Note that, in a DaiCoin and blockchain based mortgage system (see e.g., Maker [18]), $Q_{j}(t)$ is the number of Ethereums available at time t. In this case, we need to dynamically determine how many Dais should be loaned to customer j for each Ethereum at time t according to the value of Q(t). Similarly, in a banking system, $Q_{j}(t)$ can be the number of loan demands waiting at time t. In this case, we need to determine what is the loan interest rate at time t according to the value of Q(t). Furthermore, in communication and cloud-computing based service systems, we need to price the bit service ratio at time t according to the value of Q(t). In all, we need to dynamically price our service in a real-world system according to the evolving of Q(t) with the evolution of time t. For convenience, we will use the unified terminology “price $P_{j}(t)$” to denote the price (the number of Dais or interest ratio) associated with $Q_{j}(t)$ and $\alpha(t)$ at time t. In economics, there are different pricing functions with respect to Q(t) and $\alpha(t)$ (see e.g., Dai and Jiang [Reference Dai and Jiang13]). Here, we assume that $P_{j}(t)$ is a positive function in terms of $Q_{j}(t)$ and $\alpha(t)$, i.e.,

(2.8)\begin{eqnarray} &&P_{j}(t)=f_{j}(Q_{j}(t),\alpha(t)). \end{eqnarray}

Note that, $P_{j}(t)$ is a specific value at time t for given values of $Q_{j}(t)$ and $\alpha(t)$. In general, $P_{j}(t)$ is a random variable at time t since both $Q_{j}(t)$ and $\alpha(t)$ are random variables at time t. From the expression of (2.8), we can see that the price also depends on the random environment movement (e.g., the season’s movement). In addition, we suppose that $f_{j}(\cdot,\cdot)$ in (2.8) is Lipschitz continuous with respect to $Q_{j}(t)$. Then, we can introduce a utility (or a hash) function with respect to the valued queue length $P_{j}(t)Q_{j}(t)$ for user $j\in{\cal J}$ at each service pool $v\in{\cal V}$ as follows,

(2.9)\begin{eqnarray} &&U_{vj}(P(t)Q(t),\Lambda(t))\;\;\mbox{with}\;\;P(t)Q(t)=(P_{1}(t)Q_{1}(t),\ldots,P_{J}(t)Q_{J}(t)), \end{eqnarray}

where $P(t)=(P_{1}(t),\ldots,P_{J}(t))$ and $\Lambda(t)=(\Lambda_{1}(t),\ldots\Lambda_{J}(t))$. Moreover, $\Lambda_{j}(t)$ for each $t\in[0,\infty]$ and $j\in{\cal J}$ is the summation of all service rates allocated to the jth user at time t from all possible pools and servers. Here, we remark that, $\Lambda_{j}(t)$ may be given in a feedback control form and it depends on the current price P(t), the current queue length Q(t), and the system state $\alpha(t)$ at a given time t. In other words, we have that $\Lambda_{j}(t)=\Lambda_{j}(P(t)Q(t),\alpha(t))$. Furthermore, we note that, the upper case $\Lambda_{j}(t)$ used here denotes service capacity allocation process. It is not directly related to the lower case λj as used in (6.3) of Subsubsection 6.1.3, which denotes the nominal arrival rate for user j.

Now, we define W(t) and $W_{j}(t)$ to be the (expected) total workload in the system at time t and the one associated with user j at time t, to wit,

(2.10)\begin{eqnarray} &&W(t)=\sum_{j=1}^{J}W_{j}(t),\;\;\;\;\;\;\;W_{j}(t)=\frac{Q_{j}(t)}{\mu_{j}}. \end{eqnarray}

In the following study, we will use W(t) and Q(t) as performance measures, $f=(f_{1},\ldots,f_{J})$ in (2.8) as pricing function, and $\{U_{vj},j\in{\cal J},v\in{\cal V}\}$ in (2.9) as utility (or hash) functions. Based on these measures and functions, we can propose a joint dynamical pricing and rate scheduling policy $(P,\Lambda)$ with users-selection at each time point for different service pools and servers to all the users. Under the policy, the total workload W(t) and its corresponding total cost are minimized, where the total cost is a summation of costs serving all users in certain sense. For an exact definition of the total cost, it is given in (3.22) though the so-called dual-costs. Furthermore, we assume that the available resources from different pools and servers can be flexibly allocated and shared between the system and users, i.e., the system operates under a concurrent resource occupancy service regime. Based on these facts, we can define $T_{j}(t)$ to be the cumulative amount of service given to the jth queue up to time t, i.e.,

(2.11)\begin{eqnarray} &&T_{j}(t)=\int_{0}^{t}\Lambda_{j}(P(s)Q(s),\alpha(s))ds. \end{eqnarray}

Hence, if we let $S_{j}(t)$ be the total number of jobs (packets) that finishes service in the system by time t, we know that $D_{j}(t)=S_{j}(T_{j}(t))$.

3. Main theorem

TSRRPs can effectively model big data arrival streams. However, it is difficult to directly conduct the analysis of the associated physical queueing model in (2.7) or its related physical workload model in (2.10) due to the non-Markovian characteristics of TSRRPs. Thus, in this paper, we will develop a scheme by applying the well-known heavy traffic approximation and modeling technique to establish the RDRS model corresponding to our newly designed game-competition based dynamic resource pricing and scheduling policy by considering our queueing system under the asymptotic regime, where it is heavily loaded (load balanced), i.e., under the so-called heavy traffic condition. Furthermore, we will prove the correctness of RDRS modeling via diffusion approximation while we will also show the effectiveness of the identified model for our newly proposed pricing and scheduling policy by presenting simulation case studies. The corresponding simulation results are displayed in Figures 45 (as follows) and Figure 11 and their interpretations are presented in Subsection 5.

Figure 4. In this simulation, the number of simulation iterative times is $N=6,000$, the simulation time interval is $[0,T]$ with T = 20, which is further divided into $n=5,000$ subintervals as explained in Subsection 5. Other values of simulation parameters introduced in Definition 3.1 and Subsubsection 4 are as follows: initialprice1 = 2.25, initialprice2 = 1.5, initialprice3 = 2.25, upperboundprice1 = 4, upperboundprice2 = 2, upperboundprice3 = 4, lowerboundprice1 = 0.49, lowerboundprice2 = 0.7, lowerboundprice3 = 0.49, queuepolicylowerbound1 = 0, queuepolicylowerbound2 = 0, queuepolicylowerbound3 = 0, $\lambda_{1}=10/3$, $\lambda_{2}=5$, $\lambda_{3}=10/3$, $m_{1}=3$, $m_{2}=1$, $m_{3}=3$, $\mu_{1}=1/10$, $\mu_{2}=1/20$, $\mu_{3}=1/10$, $\alpha_{1}=\sqrt{10/3}$, $\alpha_{2}=\sqrt{20}$, $\alpha_{3}=\sqrt{10/3}$, $\beta_{1}=\sqrt{10}$, $\beta_{2}=\sqrt{20}$, $\beta_{3}=\sqrt{10}$, $\zeta_{1}=1$, $\zeta_{2}=\sqrt{2}$, $\zeta_{3}=1$, $\rho_{1}=\rho_{2}=\rho_{3}=1,000$, $\theta_{1}=-1$, $\theta_{2}=-1.2$, $\theta_{3}=-1$.

Figure 5. In this simulation, the number of simulation iterative times is $N=6,000$, the simulation time interval is $[0,T]$ with T = 20, which is further divided into $n=5,000$ subintervals as explained in Subsection 5. Other values of simulation parameters introduced in Definition 3.1 and Subsubsection 4 are as follows: initialprice1 = 1, initialprice2 = 1, initialprice3 = 1, upperboundprice1 = 1, upperboundprice2 = 1, upperboundprice3 = 1, lowerboundprice1 = 1, lowerboundprice2 = 1, lowerboundprice3 = 1, queuepolicylowerbound1 = 0, queuepolicylowerbound2 = 0, queuepolicylowerbound3 = 0, $\lambda_{1}=10/3$, $\lambda_{2}=5$, $\lambda_{3}=10/3$, $m_{1}=3$, $m_{2}=1$, $m_{3}=3$, $\mu_{1}=1/10$, $\mu_{2}=1/20$, $\mu_{3}=1/10$, $\alpha_{1}=\sqrt{10/3}$, $\alpha_{2}=\sqrt{20}$, $\alpha_{3}=\sqrt{10/3}$, $\beta_{1}=\sqrt{10}$, $\beta_{2}=\sqrt{20}$, $\beta_{3}=\sqrt{10}$, $\zeta_{1}=1$, $\zeta_{2}=\sqrt{2}$, $\zeta_{3}=1$, $\rho_{1}=\rho_{2}=\rho_{3}=1,000$, $\theta_{1}=-1$, $\theta_{2}=-1.2$, $\theta_{3}=-1$.

3.1. RDRS model

In this subsection, we first present the basic idea of our main claim in terms of our RDRS modeling under a smart contract policy. Second, for convenience, we introduce the definition of RDRS model. More precisely, for each $t\geq 0$ and $j\in{\cal J}$, we introduce two sequences of diffusion-scaled processes: $\hat{Q}^{r}(\cdot)$ and $\hat{W}^{r}(\cdot)$ by:

(3.1)\begin{eqnarray} &&\hat{Q}_{j}^{r}(t)\equiv\frac{Q_{j}^{r}(r^{2}t)}{r},\;\;\;\;\;\;\;\hat{W}^{r}(t)\equiv\frac{W^{r}(r^{2}t)}{r}, \end{eqnarray}

where $\{r,r\in{\cal R}\}$ is supposed to be a strictly increasing sequence of positive real numbers and tends to infinity. Then, our main claim can be presented as follows.

The sequence of 2-tuple scaled processes in (3.1) corresponding to a game-competition based dynamic resource pricing and scheduling policy with users’ selection, which is designed in the subsequent subsection, converges jointly in distribution. More precisely, under the heavy traffic condition described in Section 6, we have that:

(3.2)\begin{eqnarray} &&(\hat{Q}^{r}(\cdot),\hat{W}^{r}(\cdot)) \Rightarrow(\hat{Q}(\cdot),\hat{W}(\cdot)) \;\;\;\mbox{along}\;\;\;r\in{\cal R}, \end{eqnarray}

where $\hat{W}(\cdot)$ is presented by an RDRS model and $\hat{Q}(\cdot)$ is an asymptotic queue policy process with dynamic pricing globally over $[0,\infty)$ through a saddle point to zero-sum game-competition problem and a Pareto minimal-dual-cost Nash equilibrium point to a non-zero-sum game-competition problem.

Definition 3.1. A u-dimensional stochastic process $\hat{Z}(\cdot)$ with $u\in{\cal J}$ is claimed as an RDRS with oblique reflection if it can be uniquely represented as:

(3.3)\begin{eqnarray} &&\left\{\begin{array}{ll} \hat{Z}(t)&=\;\;\;\hat{X}(t) +\int_{0}^{t}R(\alpha(s),s)d\hat{Y}(s)\geq 0,\\ d\hat{X}(t)&=\;\;\;b(\alpha(t),t)dt+\sigma^{E}(t)d\hat{H}^{E}(t)+\sigma^{S}(t)d\hat{H}^{S}(t). \end{array} \right. \end{eqnarray}

Furthermore, $b(\alpha(t),t)=(b_{1}(\alpha(t),t),\ldots,b_{u}(\alpha(t),t)'$ is a u-dimensional vector, $\sigma^{E}(t)$ and $\sigma^{S}(t)$ are u × J matrices, $R(\alpha(t),t)$ with $t\in R_{+}$ is a u × u matrix, and $(\hat{Z}(\cdot),\hat{Y}(\cdot))$ is a coupled almost surely continuous solution of (3.3) with the following properties for each $j\in\{1,\ldots,u\}$,

\begin{align*} \left\{\begin{array}{ll} \hat{Y}_{j}(0)=0;\\ \mbox{Each component}\;\;\hat{Y}_{j}(\cdot)\;\;\mbox{of}\;\; \hat{Y}(\cdot)=(\hat{Y}_{1}(\cdot),\ldots,\hat{Y}_{u}(\cdot))'\;\;\mbox{is non-decreasing};\\ \mbox{Each component}\;\;\hat{Y}_{j}(\cdot)\;\;\mbox{can increase only at a time}\;\;t\in[0,\infty)\;\;\mbox{that}\;\;\hat{Z}_{j}(t)=0,\;\mbox{i.e.},\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \int_{0}^{\infty}\hat{Z}_{j}(t)d\hat{Y}_{j}(t)=0. \end{array} \right. \end{align*}

In addition, a solution to the RDRS in (3.3) is called a strong solution if it is in the pathwise sense and is called a weak solution if it is in the sense of distribution.

In terms of the well-posedness of an RDRS, readers are referred to a general discussion in Dai [Reference Dai7]. Furthermore, in Definition 3.1, the stochastic processes $B^{E}(\cdot)$ and $B^{S}(\cdot)$ are, respectively, two J-dimensional standard Brownian motions, which are independent each other. For each state $i\in{\cal K}$ and a time $t\in[0,\infty)$, the nominal arrival rate vector $\lambda(i)$, the mean reward vector m(i) , the nominal throughput vector $\rho(i)$, and a constant parameter vector $\theta(i)$ are given as follows,

(3.4)\begin{align} \left\{\begin{array}{ll} \lambda(i)&=\;\;\;(\lambda_{1}(i),\ldots,\lambda_{J}(i))',\\ m(i)&=\;\;\;(m_{1}(i),\ldots,m_{J}(i))',\\ \rho(i)&=\;\;\;\left(\rho_{1}(i),\ldots,\rho_{J}(i)\right)',\\ \theta(i)&=\;\;\;(\theta_{1}(i),\ldots,\theta_{J}(i))'. \end{array} \right. \end{align}

The covariance matrices are given by:

(3.5)\begin{eqnarray} &&\left\{\begin{array}{ll} \Gamma^{E}(i)&=\;\;\;\left(\Gamma^{E}_{kl}(i)\right)_{J\times J}\\ &\equiv\;\;\;\mbox{diag}\left(\lambda_{1}(i)m_{1}^{2}(i)\zeta^{2}_{1}(i)+\lambda_{1}(i)m^{2}_{1}(i)\alpha_{1}^{2}(i),\right.\\ &\;\;\;\;\;\;\;\;\;\;\;\;\left.\;\;\;\;\ldots,\lambda_{J}(i)m_{J}^{2}(i) \zeta^{2}_{J}(i)+\lambda_{J}(i)m^{2}_{J}(i)\alpha^{2}_{J}(i)\right),\\ \Gamma^{S}(i)&=\;\;\;\left(\Gamma^{S}_{kl}(i)\right)_{J\times J}\\ &\equiv\;\;\;\mbox{diag}\left(\lambda_{1}(i)m_{1}(i)\beta_{1}^{2},\ldots, \lambda_{J}(i)m_{J}(i)\beta_{J}^{2}\right). \end{array} \right. \end{eqnarray}

The It$\hat{o}$’s integrals with respect to the Brownian motions are defined as:

(3.6)\begin{eqnarray} &&\left\{\begin{array}{ll} \hat{H}^{e}(t)&=\;\;\;\left(\hat{H}^{e}_{1}(t)',\ldots,\hat{H}^{e}_{J}(t)\right)'\;\;\mbox{with}\;\;e\in\{E,S\},\\ \hat{H}^{e}_{j}(t)&=\;\;\;\int_{0}^{t}\sqrt{\Gamma^{e}_{jj}(\alpha(s))}dB^{e}_{j}(s). \end{array} \right. \end{eqnarray}

3.2. A 3-stage users-selection and dynamic pricing/rate scheduling policy

In this subsection, we design a 3-stage users-selection, dynamic pricing, and rate scheduling policy through a 2-stage game-theoretic problem consisting of both zero-sum and non-zero-sum game-competitions myopically at each time point for the purpose as stated in the introduction of the paper.

3.2.1. General service capacity region

In our system, the jobs in the jth queue for each $j\in{\cal J}$ may be served at the same time by a random but at most Vj ($\leq V$) number of service pools corresponding to selected utility (hash) functions at a particular time point. With this simultaneous service mechanism, the total service rate for the jth queue at the time point is the summation of the rates from all the pools possibly to serve the jth queue. More precisely, we index these pools by a subset ${\cal V}(\,j)$ of the set ${\cal V}$ as follows,

(3.7)\begin{eqnarray} &&{\cal V}(\,j)\equiv\Big\{v_{1j},\ldots,v_{V_{j}j}\Big\}\subseteq{\cal V}, \end{eqnarray}

where, vlj with $l\in\{1,\ldots,V_{j}\}$ denotes the vljth pool in ${\cal V}(\,j)$. In the same way, a pool denoted by $v\in{\cal V}$ can possibly serve at most Jv number of job classes represented by a subset ${\cal J}(v)$ of the set ${\cal J}$, i.e.,

(3.8)\begin{eqnarray} &&{\cal J}(v)\equiv\Big\{\,j_{v1},\ldots,j_{vJ_{v}}\Big\}\subseteq{\cal J}, \end{eqnarray}

where jvl with $l\in\{1,\ldots,J_{v}\}$ indexes the jvlth job class in ${\cal J}(v)$. To be more illustrative, let L be a J × V constituent matrix such that:

\begin{align*} L_{jv}=\left\{\begin{array}{ll} 1&\mbox{if}\;\;\mbox{user $j$ can be served by pool $v$},\\ 0&\mbox{otherwise}. \end{array} \right. \end{align*}

Then, ${\cal V}(\,j)$ consists of all the non-zero components of the jth row of L while ${\cal J}(v)$ consists of all the non-zero components of the vth column of L. Furthermore, in each pool v, there are Jv number of flexible parallel-servers with rate allocation vector:

(3.9)\begin{eqnarray} &&c_{v\cdot}(t)=(c_{j_{v1}}(t),\ldots,c_{j_{vJ_{v}}}(t))', \end{eqnarray}

where $c_{j_{vl}}(t)$ with $l\in\{1,\ldots,J_{v}\}$ is the assigned service rate to the jvlth user at pool v and time t. Similarly, corresponding to the $l\in\{1,\ldots,J_{v}\}$, we will also denote the rate $c_{j_{vl}}(t)$ by $c_{vj}(t)$ for an index $j\in{\cal J}(v)$.

Note that, the vector in (3.9) takes values in a capacity region ${\cal R}_{v}(\alpha(t))$ driven by the FS-CTMC $\alpha=\{\alpha(t),t\in[0,\infty)\}$. For each given $i\in{\cal K}$ and $v\in{\cal V}$, the set ${\cal R}_{v}(i)$ is a convex region containing the origin and has Lv $( \gt J_{v})$ boundary pieces (see e.g., the upper-left graph in Figure 2). In this region, every point is defined according to the associated users, i.e., $x=(x_{j_{v1}},\ldots,x_{j_{vJ_{v}}})$. On the boundary of ${\cal R}_{v}(i)$ for each $i\in{\cal K}$, Jv of them are $(J_{v}-1)$-dimensional linear facets along the coordinate axes. The other ones denoted by ${\cal O}_{v}(i)$ are located in the interior of $R^{J_{v}}_{+}$. It is called the capacity surface of ${\cal R}_{v}(i)$ and it has $B_{v}=L_{v}-J_{v}\;( \gt 0)$ linear or smooth curved facets $h_{vk}(c_{v\cdot},i)$ on $R_{+}^{J_{v}}$ for $k\in{\cal U}_{v}\equiv\{1,2,\ldots,B_{v}\}$, i.e.,

(3.10)\begin{eqnarray} &&{\cal R}_{v}(i)\equiv\left\{c_{v\cdot}\in R_{+}^{J_{v}}:\;h_{vk}(c_{v\cdot},i)\leq 0,\;k\in{\cal U}_{v}\right\}. \end{eqnarray}

Furthermore, if we define $C_{U_{v}}(i)$ to be the sum capacity upper bound for ${\cal R}_{v}(i)$, the facet in the center of ${\cal O}_{v}(i)$ is linear and is assumed to be a non-degenerate $(J_{v}-1)$-dimensional region. More precisely, it can be represented by

(3.11)\begin{eqnarray} &&h_{vk_{U_{v}}}(c_{v\cdot},i)=\sum_{j\in{\cal J}(v)}c_{j}-C_{U_{v}}(i), \end{eqnarray}

where $k_{U_{v}}\in{\cal U}_{v}$ is the index corresponding to $C_{U_{v}}(i)$. In addition, we suppose that any one of the Jv linear facets along the coordinate axes forms an $(J_{v}-1)$-user capacity region associated with a particular group of $J_{v}-1$ users if the queue corresponding to the other user is empty. In the same manner, we can provide an interpretation for the $(J_{v}-l)$-user capacity region for each $l\in\{2,\ldots,J_{v}-1\}$.

Concerning the allocation of the service resources over the capacity regions to different users, we adopt the so-called head of line service discipline. Equivalently, the service goes to the packet at the head of the line for a serving queue where packets are stored in the order of their arrivals. The service rates are determined by a utility (or hash) function of the environmental state, the price for each user, and the number of packets in each of the queues. More precisely, for each state $i\in{\cal K}$, a price vector $p=(p_{1},\ldots,p_{J})$, and a queue length vector $q=(q_{1},\ldots,q_{J})'$, we define $\Lambda_{\cdot j}(pq,i)$ with $j\in{\cal J}$ to be the rate vector (in qubits/ps) of serving the jth queue at all its possible service pools, i.e.,

(3.12)\begin{eqnarray} &&\Lambda_{\cdot j}(pq,i)=c^{{\cal Q}(pq)}_{\cdot j}(i)=(c^{{\cal Q}(pq)}_{v_{1j}}(i),\ldots,c^{{\cal Q}(pq)}_{v_{V_{j}j}}(i)), \end{eqnarray}

where,

(3.13)\begin{eqnarray} &&{\cal Q}(pq)\equiv\{\,j\in{\cal J},q_{j}=0\}. \end{eqnarray}

Furthermore, let $\Lambda_{v\cdot}(pq,i)$ for each $v\in{\cal V}$ be the rate vector for all the users possibly served at service pool v, i.e.,

(3.14)\begin{eqnarray} &&\Lambda_{v\cdot}(pq,i)=c^{{\cal Q}(pq)}_{v\cdot}(i)=(c^{{\cal Q}(pq)}_{j_{v1}}(i),\ldots,c^{{\cal Q}(pq)}_{j_{vJ_{v}}}(i)). \end{eqnarray}

Thus, $c^{{\cal Q}(pq)}_{v_{lj}}(i)=c^{{\cal Q}(pq)}_{j}(i)$ if the pool index $v_{lj}\in{\cal V}(\,j)$ for an integer $l\in\{1,\ldots,V_{j}\}$ with $j\in{\cal J}$ while the total rate used in (2.11) can be represented by:

(3.15)\begin{eqnarray} &&\Lambda_{j}(P(s)Q(s),\alpha(s))=\sum_{v\in{\cal V}(\,j)}c_{vj}^{{\cal Q}(P(s)Q(s))}(\alpha(s)). \end{eqnarray}

In the end, we impose the convention that an empty queue should not be served. Then, for each $v\in{\cal V}$ and ${\cal Q}\subseteq{\cal J}$ (e.g., a set as given by (3.13)), we can define:

(3.16)\begin{align} c^{{\cal Q}}_{j_{vl}}(i)\equiv\left\{\begin{array}{ll} =0&\mbox{if}\;\;j_{vl}\in{\cal Q}\;\;\mbox{with}\;\;l\in\{1,\ldots,J_{v}\},\\ \gt 0&\mbox{if}\;\;j_{vl}\notin {\cal Q}\;\;\mbox{with}\;\;l\in\{1,\ldots,J_{v}\}, \end{array} \right. \end{align}
(3.17)\begin{align} \;\;\;\;\;\;\;c^{{\cal Q}}_{vj}(i)\equiv^{{\cal Q}}_{j_{vl}}(i)\;\;\mbox{for some}\;\;j\in{\cal J}(v)\;\;\mbox{corresponding to each}\;\; l\in\{1,\ldots,J_{v}\}, \end{align}
(3.18)\begin{align} F^{v}_{{\cal Q}}(i)\equiv\bigg\{x\in{\cal R}_{v}(i):\;x_{j_{vl}}=0\;\; \mbox{for all}\;\;j_{vl}\in{\cal Q}\;\;\mbox{with}\;\; l\in\{1,\ldots,J_{v}\}\bigg\}. \end{align}

Therefore, for all ${\cal Q}$ such that $\emptyset\subsetneqq{\cal Q}\subseteq{\cal J}(v)$ corresponding to each $v\in{\cal V}$, if $c^{\cal Q}_{v\cdot}(i)$ is on the boundaries of the capacity region ${\cal R}_{v}(i)$, we have the following observation that:

(3.19)\begin{eqnarray} &\left\{\begin{array}{ll} \sum_{j\in{\cal J}(v)}c_{vj}^{\emptyset}(i)&\geq\;\;\;\sum_{j\in{\cal J}(v)}c_{vj}^{\cal Q}(i),\\ \sum_{j\in{\cal J}(v)\setminus{\cal Q}}c_{vj}^{\emptyset}(i)&\leq\;\;\;\sum_{j\in{\cal J}(v)\setminus{\cal Q}}c_{vj}^{\cal Q}(i), \end{array} \right. \end{eqnarray}

where $c^{\emptyset}_{v\cdot}(i)\in{\cal O}_{v}(i)$ and $\emptyset$ denotes the empty set. Typical examples of our capacity region are referred to the upper-left graph in Figure 2 for more details.

3.2.2. A dynamic pricing and scheduling policy with users-selection

For our purpose, we classify all the users into two types. More precisely, we first need to smartly choose the users to be served. In other words, at each time point and for each pool v, we intelligently select a set ${\cal M}(i,v)\equiv\{\,j_{v1}(i),\ldots,j_{vM_{v}}(i)\}$ of users to get into services with $j_{vl}\in{\cal J}$ and $l\in\{1,\ldots,M_{v}\}$ for a given positive integer number $M_{v}\leq J_{v}$. Among these chosen users, we need to conduct the dynamic pricing while realize optimal and fair resource allocation. Therefore, we design a strategy by mixing a saddle point and a static Pareto maximal-utility Nash equilibrium policy myopically at each time point t to a mixed zero-sum and non-zero-sum game problem for each state $i\in{\cal K}$ and a given valued queue length vector $pq=(p_{1}q_{1},\ldots,p_{J}q_{J})'$. Here we note that $p=(p_{1},\ldots,p_{J})'$ is a given price vector and $q=(q_{1},\ldots,q_{J})'$ is a given queue length vector such that $p_{j}=f_{j}(q_{j},i)$ as in (2.8) for each $j\in{\cal J}$ and $i\in{\cal K}$. The saddle point corresponds the users’ selection while the Pareto optimality represents the full utilization of resources in the whole game system and the Nash equilibrium represents the fairness to all the chosen users. More exactly, in this game, there are J users (players) associated with the J queues. Each of them has his own utility function $U_{vj}(p_{j}q_{j},c_{vj})$ with $j\in{\cal J}(v)$ and $v\in{\cal V}(\,j)$. This utility function is a generalization of the existing one in Dai [Reference Dai6, Reference Dai8, Reference Dai9], Ye and Yao [Reference Ye and Yao28], and references therein. Examples of such a utility function can be the so-called proportionally fair allocation, minimal potential delay allocation, and $(\beta,\alpha)$-proportionally fair allocation (see e.g., Ye and Yao [Reference Ye and Yao28]), which are widely used in internet protocols and communication systems. Every chosen user selects a policy to maximize his own utility function at each service pool v while the summation of all the users’ utility functions and the summation of the utility functions associated with the chosen users are also maximized. To wit, we can formulate a generalized users-selection, pricing, and resource-scheduling game problem by extending the ones in Examples 4.14.3 as follows,

(3.20)\begin{eqnarray} &&\left\{\begin{array}{ll} \max_{c_{v\cdot}\in F^{v}_{{\cal Q}}(i)}U_{00}(pq,c)&=\;\;\;U_{00}(pq,c^{*}(i)),\\ \max_{c_{v\cdot}\in F^{v}_{{\cal Q}}(i)}U_{0j}(pq,c)&=\;\;\;U_{0j}(pq,c^{*}(i)),\;j\in{\cal M}(i,v)\cap({\cal J}(v)\setminus{\cal Q}(q)),\\ \max_{c_{v\cdot}\in F^{v}_{{\cal Q}}(i)}(-U_{0j}(pq,c))&=\;\;\;-U_{0j}(pq,c^{*}(i)),\;j\in({\cal J}(v)\setminus{\cal Q}(q))\setminus{\cal M}(i,v)) \end{array} \right. \end{eqnarray}

while we have that:

(3.21)\begin{eqnarray} &&\left\{\begin{array}{ll} \max_{c_{v\cdot}\in F^{v}_{{\cal Q}}(i),\;j\in{\cal M}(i,v)\bigcap\left({\cal J}(v)\setminus{\cal Q}(q)\right)}U_{vj}(pq,c)&=\;\;\;U_{vj}(pq,c^{*}(i)),\\ \max_{c_{v\cdot}\in F^{v}_{{\cal Q}}(i),\;j\in\left({\cal J}(v)\setminus{\cal Q}(q)\right)\setminus{\cal M}(i,v)}(-U_{vj}(pq,c))&=\;\;\;-U_{vj}(pq,c^{*}(i)). \end{array} \right. \end{eqnarray}

Note that, the rate vector c in (3.20)-(3.21) is given by:

\begin{align*} c=((c_{j_{11}},\ldots,c_{j_{1J_{1}}}),\ldots,(c_{j_{V1}},\ldots,c_{j_{VJ_{V}}})), \end{align*}

and the utility functions used in (3.20)-(3.21) are defined by:

\begin{align*} \left\{\begin{array}{ll} U_{00}(pq,c)&=\;\;\;\sum_{j\in{\cal J}(v)\setminus{\cal Q}(q)}\sum_{v\in{\cal V}(\,j)}U_{vj}(p_{j}q_{j},c_{vj}),\\ U_{0j}(pq,c)&=\;\;\;\sum_{v\in{\cal V}(\,j)}U_{vj}(p_{j}q_{j},c_{vj}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \mbox{for each}\;\;j\in{\cal J}(v)\setminus{\cal Q}(q),\\ U_{vj}(pq,c)&=\;\;\;U_{vj}(p_{j}q_{j},c_{vj}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \mbox{for each}\;\;j\in{\cal J}(v)\setminus{\cal Q}(q)\;\; \mbox{and}\;\;v\in{\cal V}(\,j). \end{array} \right. \end{align*}

Then, by extending the concepts of Nash equilibrium point, saddle point, and Pareto optimality in Dai [Reference Dai8, Reference Dai9], Marchi [Reference Marchi19], Nash [Reference Nash21] and Rosen [Reference Rosen25], we have the following definition concerning a utility based 2-stage game-theoretic policy via a saddle point to a zero-sum game problem and a static Pareto maximal-utility Nash equilibrium point to a non-zero-sum game problem myopically at each particular time point.

Definition 3.2. For each state $i\in{\cal K}$, a price vector $p\in R^{J}_{+}$, and a queue length vector $q\in R^{J}_{+}$ such that $(2.8)$ is satisfied, we call the rate vector:

\begin{align*} c^{*}(i)\in F_{{\cal Q}(q)}(i)\equiv F^{1}_{{\cal Q}(q)}(i)\times\ldots\times F^{V}_{{\cal Q}(q)}(i), \end{align*}

a utility based 2-stage game-theoretic policy if it is a solution to the game problem in (3.20)-(3.21), which is obtained by firstly obtaining a saddle point to a zero-sum game problem and secondly obtaining a static Pareto maximal-utility Nash equilibrium point to a non-zero-sum game problem, such that, for each $j\in{\cal J}(v)\setminus{\cal Q}(q)$ and any given $c(i)\in F_{{\cal Q}(q)}(i)$, the following facts are true,

\begin{align*} \left\{\begin{array}{ll} U_{00}(pq,c^{*}(i))&\geq\;\;\;U_{00}(pq,c(i)),\\ U_{vj}(pq,c^{*}(i))&\geq\;\;\;U_{vj}(q,c^{*}_{\cdot-j}(i))\;\;\;\mbox{if}\;\;j\in{\cal M}(i,v)\cap({\cal J}(v)\setminus{\cal Q}(q)),\;v\in\{0\}\cup{\cal V}(\,j),\\ -U_{vj}(pq,c^{*}(i))&\geq\;\;\;-U_{vj}(pq,c^{*}_{\cdot-j}(i))\;\;\mbox{if}\;\;j\in({\cal J}(v)\setminus{\cal Q}(q))\setminus{\cal M}(i,v),\;v\in\{0\}\cup{\cal V}(\,j),\\ c^{*}_{\cdot -j}(i)&\equiv\;\;\;(c_{\cdot 1}^{*}(i),\ldots,c^{*}_{\cdot j-1}(i),c_{\cdot j}(i),c^{*}_{\cdot j+1}(i),\ldots,c^{*}_{\cdot J}(i)). \end{array} \right. \end{align*}

3.3. Main theorem under the policy

Before stating our main theorem, we first introduce another concept of the so-called 2-stage dual-cost game-theoretic policy via a saddle point to a zero-sum game problem and a minimal-dual-cost Pareto Nash equilibrium point to a non-zero-sum game problem myopically at each given time point for a given price parameter $p\in R_{+}^{J}$. Then, based on the 2-stage dual-cost game-theoretic policy, we can inversely obtain the price vector and determine the target rate vector. To do so, we formulate a 2-stage minimal-dual-cost game problem in (3.22), which is corresponding to the utility based one in (3.20)-(3.21). More precisely, for a given $i\in{\cal K}$, a price parameter $p\in R_{+}^{J}$, a rate vector $c\in {\cal R}(i)\equiv{\cal R}_{1}(i)\times\ldots\times {\cal R}_{V}(i)$, and a parameter $w\geq 0$, the 2-stage minimal-dual-cost game problem can be presented as follows:

(3.22)\begin{eqnarray} &&\left\{\begin{array}{ll} \min_{q\in R^{J}_{+}}C_{00}(pq,c),\\ \min_{q_{j}\in R_{+},\;j\in{\cal M}(i,v)\bigcap\left({\cal C}(c)\bigcap{\cal J}(v)\right)}C_{vj}(pq,c),\\ \min_{q_{j}\in R_{+},\;j\in\left({\cal C}(c)\bigcap{\cal J}(v)\right)\setminus{\cal M}(i,v)}(-C_{vj}(pq,c)), \end{array} \right. \end{eqnarray}

subject to

\begin{align*} \sum_{j\in{{\cal M}(i,v)\bigcap\cal C}(c)}\frac{q_{j}}{\mu_{j}}\geq w, \end{align*}

where, the cost function $C_{vj}(pq,c)$ for each $j\in{\cal J}(v)$ and $v\in\{0\}\cup{\cal V}(\,j)$ is defined by:

\begin{align*} \left\{\begin{array}{ll} C_{00}(pq,c)&=\;\;\;\sum_{j\in{\cal C}(c)\bigcap{\cal J}(v)}\sum_{v\in{\cal V}(\,j)}C_{j}(p_{j}q_{j},c_{vj}),\\ C_{0j}(pq,c)&=\;\;\;\sum_{v\in{\cal V}(\,j)}C_{vj}(p_{j}q_{j},c_{vj}),\\ C_{vj}(pq,c)&=\;\;\;C_{vj}(p_{j}q_{j},c_{vj})=\frac{1}{\mu_{j}}\int_{0}^{q_{j}}\frac{\partial U_{vj}(p_{j}u,c_{vj})}{\partial c_{vj}}du\;\;\mbox{for} \;\;j\in{\cal C}(c)\cap{\cal J}(v),\;v\in{\cal V}(\,j), \end{array} \right. \end{align*}

and ${\cal C}(c)$ is an index set associated with the non-zero rates and non-empty queues, i.e.,

\begin{align*} {\cal C}(c)\equiv\bigg\{\,j:c_{\cdot j}\neq 0\;\;\mbox{componentwise with}\;\;j\in{\cal J}\bigg\}. \end{align*}

In other words, if the environment is in state $i\in{\cal K}$, we try to find a queue state q for a given $c\in{\cal R}(i)$, a price parameter vector $p\in R_{+}^{J}$, and a given parameter $w\geq 0$ such that the individual user’s dual-costs and the total dual-cost over the system are all minimized at the same time while the (average) workload meets or exceeds w. Then, we have the following definitions.

Definition 3.3. For each state $i\in{\cal K}$, a price vector $p\in R_{+}^{J}$, and a rate vector $c(i)\in{\cal R}(i)$, a queue length vector $q^{*}\in R^{J}_{+}$ is called a dual-cost based 2-stage game-theoretic policy if it is a solution to the game problem in (3.22), which is obtained by firstly obtaining a saddle point to a zero-sum game problem and secondly obtaining a static Pareto minimal-dual cost Nash equilibrium point to a non-zero-sum game problem, such that, for each $j\in{\cal C}(c)$, $v\in\{0\}\cup{\cal V}$, and any given $q\in R^{J}_{+}$ with $q_{j}=0$ when $j\in{\cal J}\setminus {\cal C}(c)$, we have that:

(3.23)\begin{eqnarray} &&\left\{\begin{array}{ll} C_{00}(pq^{*},c(i))&\leq\;\;\;C_{00}(pq,c(i)),\\ C_{vj}(pq^{*},c(i))&\leq\;\;\;C_{vj}(pq^{*}_{-j},c(i))\;\;\mbox{if}\;\;j\in{\cal M}(i,v)\cap({\cal C}(c)\cap{\cal J}(v)),\\ -C_{vj}(pq^{*},c(i))&\leq\;\;\;-C_{vj}(pq^{*}_{-j},c(i))\;\;\mbox{if}\;\;j\in({\cal C}(c)\cap{\cal J}(v))\setminus{\cal M}(i,v),\\ q^{*}_{-j}&\equiv\;\;\;(q_{1}^{*},\ldots,q^{*}_{j-1},q_{j},q^{*}_{j+1},\ldots,q^{*}_{J}). \end{array} \right. \end{eqnarray}

Note that, once we obtain the queue policy point $q^{*}$ with respect to the given price vector p from Definition 3.3, we can inversely deduce the corresponding price policy vector p in terms of $q^{*}$, i.e., $p=g(q^{*})$ as in (2.8). This relationship can be used to design iterative algorithms in our numerical simulations. Furthermore, in Definition 3.3, we have used more strict concept of “Pareto optimal Nash equilibrium point”, this concept can be relaxed to “Pareto optimal point” and the related theoretical discussion keeps true. In certain cases and when it is necessary, we can shift the Pareto optimal point to the Pareto optimal Nash equilibrium point by some mapping techniques.

Definition 3.4. Let $\hat{Q}^{r,(P,G)}(\cdot)$ and $\hat{W}^{r,(P,G)}(\cdot)$ be the diffusion-scaled queue length and workload processes, respectively, under an arbitrarily feasible dynamic pricing and rate scheduling policy (P, G) satisfying the Lipschitz condition in (2.8). A vector process $\hat{Q}(\cdot)$ is called an asymptotic dual-cost based 2-stage game-theoretic policy globally over the whole time horizon if, for any $t\geq 0$ and $v\in\{0\}\cup{\cal V}(\,j)$ with $j\in{\cal J}$, we have that:

(3.24)\begin{eqnarray} &&\liminf_{r\rightarrow\infty}C_{00}(P(t)\hat{Q}^{r,(P,G)}(t),\rho_{j}(\alpha(t)))\geq C_{00}(P(t)\hat{Q}(t),\rho_{j}(\alpha(t))). \end{eqnarray}

Furthermore, for each $j\in{\cal M}(\alpha(t),v,t)\cap({\cal C}(c)\cap{\cal J}(v))$, we have that:

(3.25)\begin{eqnarray} &&\liminf_{r\rightarrow\infty}C_{vj}(P(t)\hat{Q}_{-j}^{r,(P,G)}(t),\rho_{j}(\alpha(t)))\geq C_{vj}(P(t)\hat{Q}(t),\rho_{j}(\alpha(t))). \end{eqnarray}

In addition, for each $j\in({\cal C}(c)\cap{\cal J}(v))\setminus{\cal M}(\alpha(t),v,t)$, we have that:

(3.26)\begin{eqnarray} &&\liminf_{r\rightarrow\infty}\left(-C_{vj}(P(t)\hat{Q}_{-j}^{r,(P,G)}(t),\rho_{j}(\alpha(t)))\right)\geq -C_{vj}(P(t)\hat{Q}(t),\rho_{j}(\alpha(t))). \end{eqnarray}

Note that, in (3.25)-(3.26) and for each $j\in{\cal J}$, we have that:

(3.27)\begin{eqnarray} \hat{Q}_{-j}^{r,(P,G)}(t)&=&(\hat{Q}_{1}(t),\ldots,\hat{Q}_{j-1}(t),\hat{Q}_{j}^{r,(P,G)}(t),\hat{Q}_{j+1}(t),\ldots,\hat{Q}_{J}(t)). \end{eqnarray}

Next, let $q^{*}(w,p,\rho(i))$ be a dual-cost based 2-stage game-theoretic policy corresponding to the game problem in (3.22) in terms of each given number $w\geq 0$, $p\in R_{+}^{J}$, and $i\in{\cal K}$ at a given time t. Furthermore, let $p(w,q^{*},\rho(i))$ denote its corresponding inverse price vector with respect to $q^{*}$ and construct price policy vector:

(3.28)\begin{eqnarray} &&p^{*}(w,q^{*},\rho(i))=f(p(w,q^{*},\rho(i))), \end{eqnarray}

such that the Lipschitz condition in (2.8) is satisfied. Then, our main theorem can be presented as follows.

Theorem 3.5. For the game-competition based users-selection, dynamic pricing, and scheduling policy determined by (3.20)-(3.22) and (3.28) with $Q^{r}(0)=0$ and conditions (6.3)-(6.8) (that will be detailed in Section 6), the convergence in (3.2) is true. Furthermore, the limit queue length $\hat{Q}(\cdot)$ and the total workload $\hat{W}(\cdot)$ in (3.2) have the relationship:

(3.29)\begin{align} \left\{\begin{array}{ll} \hat{Q}(t)&=\;\;\;q^{*}(\hat{W}(t),\hat{P}(t),\rho(\alpha(t))),\\ \hat{P}(t)&=\;\;\;p^{*}(\hat{W}(t),\hat{Q}(t),\rho(\alpha(t))), \end{array} \right. \end{align}

where $\hat{P}(\cdot)$ the inverse price vector process defined through (3.28) and $\hat{W}(\cdot)$ is a 1-dimensional RDRS in strong sense with:

(3.30)\begin{align} \left\{\begin{array}{ll} b(i,t)&=\;\;\;\sum_{j\in\bigcup_{v\in{\cal V}}{\cal M}(i,v,t)}\frac{\theta_{j}(i)}{\mu_{j}}, \\ \sigma^{E}(t)&=\;\;\;\sigma^{S}(t)=\left(\hat{\sigma}_{1}(t),\ldots,\hat{\sigma}_{J}(t)\right), \\ \hat{\sigma}_{j}(t)&=\;\;\;\left\{\begin{array}{ll} \frac{1}{\mu_{j}}&\mbox{if}\;\;j\in\bigcup_{v\in{\cal V}}{\cal M}(i,v,t),\\ 0&\mbox{otherwise,} \end{array} \right.\\ R(i,t)&=\;\;\;1 \end{array} \right. \end{align}

for $t\in[0,\infty)$ and some constant $\theta_{j}(i)$ for each $j\in\bigcup_{v\in{\cal V}}{\cal M}(i,v,t)$. In addition, there is a common supporting probability space, under which and with probability one, the limit queue length $\hat{Q}(\cdot)$ is an asymptotic dual-cost based 2-stage game-theoretic policy globally over time interval $[0,\infty)$. Finally, the limit workload $\hat{W}(\cdot)$ is also asymptotic minimal in the sense that:

(3.31)\begin{align} \liminf_{r\rightarrow\infty}\hat{W}^{r,(P,G)}(t)\geq\hat{W}(t). \end{align}

The proof of Theorem 3.5 will be given in Section 6.

3.4. CNN-based algorithm flow chart

Based on the policy derived in Theorem 3.5, we can design a CNN based algorithm flow chart in Figure 6.

Figure 6. CNN based algorithm flow chart.

More precisely, for a constant $T\in[0,\infty)$, we divide the interval $[0,T]$ equally into n subintervals $\{[t_{i},t_{i+1}],i\in\{0,1,\ldots,n-1\}\}$ with $t_{0}=0$, $t_{n}=T$, and $\Delta t_{i}=t_{i+1}-t_{i}=\frac{T}{n}$. Furthermore, let

(3.32)\begin{eqnarray} &&\Delta F(t_{i})\equiv F(t_{i})-F(t_{i-1}), \end{eqnarray}

for each process $F(\cdot)\in\{B^{E}(\cdot),B^{S}(\cdot),\hat{W}(\cdot),\hat{Y}(\cdot)\}$. Then, we can develop an iterative procedure as shown in Figure 6 to conduct RDRS model based simulation studies and to illustrate the efficiency of our designed policy. In this algorithm, the main part is the policy computing concerning users-selection, dynamic pricing via queueing strategy, and rate scheduling in the federated learning center at each time $t_{i+1}$ for given initial conditions at t 0. As shown in Figure 2 and as mentioned in Introduction of this paper, our game-theoretic problem is a J × V-dimensional problem. In general, an explicit solution is not available. Thus, we need a numerical method as designed in Figure 6 to solve the 2 -stage game-theoretic problem. The target is to get an associated saddle point to its corresponding zero-sum game problem and a Nash equilibrium point to its non-zero-sum game problem. In real-world system, J and V may take large numbers and our CNN algorithm exhibits big model behavior. Since our designed platform is a cloud computing (or even the near future quantum-cloud computing) based one, this big model can be effectively solved. However, to illustrate the usage of our designed algorithm and demonstrate the efficiency of our designed policy, we choose smaller J and V to conduct simulation case studies, which is presented in Sections 45.

Finally, note that the limit total workload $\hat{W}(t_{i+1})$ in Figure 6 can be replaced by the total workload in a real-world system if the input load of the system is close to heavy traffic (that will be detailed in Section 6). Furthermore, for the CNN algorithm flow chart designed in Figure 6, there are actually 4 processing stages in the federated learning center. The 2-stage game-theoretic problem is divided into 3 stages with an additional Stage II(b) to handle the dynamic pricing. The Stage III is directly corresponding to the integral relationship between the utility functions and their dual cost functions in (3.22).

4. Three illustrative policy examples

To be more illustrative and as a preparation of the following simulation case studies, we here present a 2-stage dynamic pricing and rate scheduling example and a 3-stage users-selection, dynamic pricing, and rate scheduling example based on a single-pool service system as shown in Figure 2. Hence, we will omit the pool index v.

4.1. The first example

The first example is corresponding to a 2-user case as shown in Figure 7, which can be used to model the DaiCoin based digital payment system with two types of Ethereums (corresponding to two DaiCoins) as in Maker [18]. It can also be used to model a MIMO wireless communication channel (i.e., a single base station equipped with two antennas) shared by two-users or a quantum computer with two eigenmodes (see e.g., Dai [Reference Dai9, Reference Dai12]). In this case, we are interested in the problem about how to price the two users and conduct the computing rate (i.e., power) resource allocation cooperatively inside a service system. More precisely, we take V = 1 with J = 2 and assume that the state space of the FS-CTMC $\alpha(t)$ defined in Subsection 2 consists only of a single state (i.e., $\alpha(t)\equiv 1$ for all $t\in[0,\infty)$). In a MIMO wireless environment, this case is associated with the so-called pseudo static channels. Then, the capacity region denoted by ${\cal R}$ is supposed to be a non-degenerate convex one confined by five boundary lines including the two ones on x-axis and y-axis as shown in Figure 7. The capacity upper bound of the region satisfies $c_{1}+c_{2}=2,000$. This region is corresponding to a degenerate fixed MIMO wireless channel of the generally randomized one in Dai [Reference Dai6]. For each price vector $p=(p_{1},p_{2})\in R_{+}^{2}=[0,\infty)\times[0,\infty)$ corresponding to the process P(t) in (2.8) and the queue length vector $q=(q_{1},q_{2})\in R_{+}^{2}=[0,\infty)\times[0,\infty)$ corresponding to the process Q(t) defined in (2.7) at a particular time point, we take the utility functions in terms of rate vector $c=(c_{1},c_{2})\in{\cal R}$ for user 1 and user 2, respectively, by:

(4.1)\begin{eqnarray} &&U_{1}(pq,c)=U_{1}(p_{1}q_{1},c_{1})=p_{1}q_{1}\ln(c_{1}),\;\;\;U_{2}(pq,c)=U_{2}(p_{2}q_{2},c_{2})=-\frac{(p_{2}q_{2})^{2}}{c^{2}_{2}}, \end{eqnarray}

Figure 7. A 2-stage processing flow chart of dynamic pricing and rate scheduling for a single pool service system with 2-users.

where $ln(\cdot)$ is the logarithm function with the base e. Note that, the utility functions U 1 and U 2 in (4.1) are called proportionally fair and minimal potential delay allocations, respectively, which are widely used in communication systems Furthermore, in a (quantum) blockchain system, these utility functions can be considered as generalized hash functions to replace the currently used random number generators to generate partial nonce values and private keys, i.e., $((p_{1},p_{2}),(q_{1},q_{2}),(c_{1},c_{2}))$.

Example 4.1. For the upper graph case in Figure 7 and by the utility functions in (4.1), we can propose a 2-stage pricing and rate-scheduling policy at each time point $t\in[0,\infty)$ by a Pareto maximal-utility Nash equilibrium point to the following non-zero-sum game problem:

(4.2)\begin{eqnarray} &&\max_{c\in{\cal R}}U_{j}(pq,c)\;\;\mbox{for each}\;\;j\in\{0,1,2\}\;\;\mbox{and a fixed}\;\;pq\in R_{+}^{2}, \end{eqnarray}

where $U_{0}(pq,c)=U_{1}(pq,c)+U_{2}(pq,c)$. To wit, if $c^{*}=(c_{1}^{*},c_{2}^{*})$ is a solution to the game problem in (4.2), we have that:

(4.3)\begin{eqnarray} &&\left\{\begin{array}{ll} U_{0}(pq,c^{*})\geq U_{0}(pq,c),&\\ U_{1}(pq,c^{*})\geq U_{1}(pq,c^{*}_{-1})&\mbox{with}\;\;\;c^{*}_{-1}=(c_{1},c^{*}_{2}),\\ U_{2}(pq,c^{*})\geq U_{2}(pq,c^{*}_{-2})&\mbox{with}\;\;\;c^{*}_{-2}=(c^{*}_{1},c_{2}). \end{array} \right. \end{eqnarray}

Furthermore, it follows from the inequalities in (4.3) that, if a game player’s (i.e., a user’s) rate service policy is unilaterally changed, his utility cannot be improved.

Remark 4.2. Due to the non-degenerate assumption imposed in (3.10)-(3.11), the strictly positive outer normal vector ζ to the facet ${\cal O}_{v}$ with v = 1 at the point $\rho^{*}_{v}$ exists, which is as shown in Figure 7. Hence, the so-called complete resource pooling condition (CRP) as introduced in Subsubsection 6.1.2 holds. In this case, a so-called fixed point defined as a Pareto minimal Nash equilibrium point to a dual-cost non-zero-sum game problem can be explicitly constructed, which is presented in (5.5). Interested readers are also referred to Dai [Reference Dai8], Ye and Yao [Reference Ye and Yao28] for more related discussions.

4.2. The second example

The second example is by adding Stage 0 for users’ selection in Figure 8. Comparing with the first case with J = 2, we here consider a 3-user case (i.e., J = 3) and add one more user selection layer. At each time point, we choose two of the three users for service according to a zero-sum game competition policy. When any two users $i,j\in\{1,2,3\}$ with ij are selected, they will be served based on a non-zero-sum game competition policy. The capacity upper bound of the corresponding capacity region satisfies $c_{i}+c_{j}=2,000$ as in the first 2-user case. Furthermore, suppose that, at a particular time point, there is a price vector $p=(p_{1},p_{2},p_{3})\in R_{+}^{3}$ corresponding to the process P(t) in (2.8) and a queue length vector $q=(q_{1},q_{2},q_{3})\in R_{+}^{3}$ corresponding to the process Q(t) defined in (2.7). Then, for each $(c_{i},c_{j})\in{\cal R}$, the corresponding utility functions are taken as in (4.1) if $i,j\in\{1,2\}$. However, if i = 3 or j = 3, the corresponding utility function is taken to be the following one,

(4.4)\begin{eqnarray} &&U_{3}(pq,c)=U_{3}(p_{3}q_{3},c_{3})=p_{3}q_{3}\ln(c_{3}). \end{eqnarray}

Example 4.3. For the second case corresponding to both the upper and lower graphs in Figure 8 and by the utility functions in (4.1) and (4.4), we can design a 3-stage users-selection, pricing and rate-scheduling policy myopically at each time point $t\in[0,\infty)$, which involves two steps as follows. First, we choose two users for service by a saddle point policy via the solution to the zero-sum game problem,

(4.5)\begin{eqnarray} &&\;\;\;\;\max_{c\in{\cal R}}U_{0}(pq,c),\;\max_{c\in{\cal R}}U_{0j}(pq,c),\;\max_{c\in{\cal R}}\left(-U_{0j_{1}}(pq,c)\right),\;\max_{c\in{\cal R}}\left(-U_{0j_{2}}(pq,c)\right), \end{eqnarray}

Figure 8. A 3-stage processing flow chart of users-selection, dynamic pricing, and rate scheduling for a single pool service system with 3-users.

for each $j\in\{1,2,3\}$, $j_{1}\in\{1,2,3\}\setminus\{\,j\}$, $j_{2}\in\{1,2,3\}\setminus\{\,j,j_{1}\}$, and a fixed $pq\in R_{+}^{3}$, where,

(4.6)\begin{eqnarray} &&\left\{\begin{array}{ll} U_{0}(pq,c)&=\;\;U_{1}(pq,c)+U_{2}(pq,c)+U_{3}(pq,c),\\ U_{01}(pq,c)&=\;\;U_{1}(pq,c)+U_{2}(pq,c),\\ U_{02}(pq,c)&=\;\;U_{1}(pq,c)+U_{3}(pq,c),\\ U_{03}(pq,c)&=\;\;U_{2}(pq,c)+U_{3}(pq,c). \end{array} \right. \end{eqnarray}

In other words, if $c^{*}=(c_{1}^{*},c_{2}^{*},c_{3}^{*})$ is a solution to the game problem in (4.5), and if

\begin{align*} c^{*}_{-j}=\left\{\begin{array}{ll} (c_{j},c^{*}_{j_{1}},c^{*}_{j_{2}})&\mbox{if}\;\;j=1,\\ (c^{*}_{j_{1}},c_{j},c^{*}_{j_{2}})&\mbox{if}\;\;j=2,\\ (c^{*}_{j_{1}},c^{*}_{j_{2}},c_{j})&\mbox{if}\;\;j=3, \end{array} \right. \end{align*}

then, for a fixed $pq\in R_{+}^{3}$, we have that:

(4.7)\begin{eqnarray} &&\left\{\begin{array}{ll} U_{0}(pq,c^{*})&\geq\;\;U_{0}(pq,c),\\ U_{0j}(pq,c^{*})&\geq\;\;U_{0j}(pq,c^{*}_{-j}),\\ -U_{0j_{1}}(pq,c^{*})&\geq\;\;-U_{0j_{1}}(pq,c^{*}_{-j_{1}}),\\ -U_{0j_{2}}(pq,c^{*})&\geq\;\;-U_{0j_{2}}(pq,c^{*}_{-j_{2}}). \end{array} \right. \end{eqnarray}

Second, when two users corresponding to the summation $U_{0j}=U_{k}+U_{l}$ for an index $j\in\{1,2,3\}$ with two associated indices $k,l\in\{1,2,3\}$ as in one of (4.6) are selected, we can propose a 2-stage pricing and rate-scheduling policy at each time point by a Pareto maximal-utility Nash equilibrium point to the non-zero-sum game problem for a fixed $pq\in R_{+}^{3}$,

(4.8)\begin{align} \max_{c\in{\cal R}}U_{0j}(pq,c),\;\;\max_{c\in{\cal R}}U_{k}(pq,c),\;\;\max_{c\in{\cal R}}U_{l}(pq,c). \end{align}

To wit, if $c^{*}=(c_{k}^{*},c_{l}^{*})$ is a solution to the game problem in (4.8), we have that:

(4.9)\begin{eqnarray} &&\left\{\begin{array}{ll} U_{0j}(pq,c^{*})\geq U_{0j}(pq,c),&\\ U_{k}(pq,c^{*})\geq U_{1}(pq,c^{*}_{-k})&\mbox{with}\;\;\;c^{*}_{-k}=(c_{k},c^{*}_{l}),\\ U_{l}(pq,c^{*})\geq U_{l}(pq,c^{*}_{-l})&\mbox{with}\;\;\;c^{*}_{-l}=(c^{*}_{k},c_{l}). \end{array} \right. \end{eqnarray}

4.3. The third example

The third example is corresponding to a case with 2 pools (i.e., V = 2) as shown in Figure 9.

Figure 9. A 3-stage processing flow chart of users-selection, dynamic pricing, and rate scheduling for a service system with 2-pools and 3-users.

In reality, this system is corresponding to a MIMO wireless communication system with two base stations. Each of the base station is equipped with a single antenna. However, the two base stations can cooperated each other to form a transmission channel with a capacity region as shown in Figure 9. The rest of the illustration is similar to the one for the second example. Hence, we omit it here.

5. Simulation case studies via RDRS models

In this section, we conduct simulation case studies for Examples 4.14.3 presented in Section 4. The simulation for the third example with 2-pools in Section 4 is similar to the one for Example 4.3. Hence, we omit it here. The main point of these simulation studies is to illustrate our policies proposed in the two examples outperform several policies in certain ways. These policies used for the purpose of comparisons include an existing constant pricing policy, an existing 2D-Queue policy, a newly designed randomly users’ selection stochastic pooling policy, and an arbitrarily selected dynamic pricing policy. As mentioned in Section 4, Examples 4.14.3 are corresponding to a single-pool system with two-users and three-users, respectively. Thus, we will omit all the related pool index v. In an associated real-world system, the parameter vectors p and q in (4.2) (or (4.5)) are the randomly evolving pricing process P(t) in (2.8) and the queue length process Q(t) in (2.7). Hence, it is our concern of this section about how to employ the RDRS performance model in Definition 3.1 to evaluate the usefulness of our proposed myopic users-selection, dynamic pricing, and scheduling policies globally over the whole time horizon $[0,\infty)$ for Examples 4.14.3. To interpret our numerical simulation implementations, we first identify the corresponding dual-cost functions {$C_{j}(q,c)$ as defined in (3.22) with $j\in\{1,2,3\}$ for the associated $U_{j}(q,c)$ given in (4.1) and (4.4). More precisely,

(5.1)\begin{eqnarray} &&\left\{\begin{array}{ll} C_{1}(p_{1}q_{1},c_{1})&=\;\;\frac{1}{\mu_{1}}\int_{0}^{q_{1}}\frac{\partial U_{1}(p_{1}u,c_{1})}{\partial c_{1}}du\;\;=\;\;\frac{(p_{1}q_{1})^{2}}{2\mu_{1}c_{1}},\\ C_{2}(p_{2}q_{2},c_{2})&=\;\;\frac{1}{\mu_{2}}\int_{0}^{q_{2}}\frac{\partial U_{2}(p_{2}u,c_{2})}{\partial c_{2}}du\;\;=\;\;\frac{2(p_{2}q_{2})^{3}}{3\mu_{2}c^{3}_{2}},\\ C_{3}(p_{3}q_{3},c_{3})&=\;\;\frac{1}{\mu_{3}}\int_{0}^{q_{3}}\frac{\partial U_{3}(p_{3}u,c_{3})}{\partial c_{3}}du\;\;=\;\;\frac{(p_{3}q_{3})^{2}}{2\mu_{3}c_{3}}, \end{array} \right. \end{eqnarray}

where $1/\mu_{j}$ for all $j\in\{1,2,3\}$ are average quantum packet lengths associated with the three users as explained just before (2.6).

5.1. The simulation for example 4.1

Based on the first two dual-cost functions in (5.1), we can formulate a corresponding 2-stage minimal dual-cost non-zero-sum game problem for a price parameter $p\in R_{+}^{2}$ as follows,

(5.2)\begin{eqnarray} &&\min_{q\in R_{+}^{2}}C_{j}(pq,c)\;\;\mbox{subject to}\;\;\frac{q_{1}}{\mu_{1}}+\frac{q_{2}}{\mu_{2}}\geq w, \end{eqnarray}

for a fixed constant w > 0, a fixed $c\in{\cal R}$, and all $j\in\{0,1,2\}$ with $C_{0}(pq,c)=C_{1}(pq,c)+C_{2}(pq,c)$. Since $C_{j}(p_{j}q_{j},c_{j})$ for each $j\in\{1,2\}$ is strictly increasing with respect to $p_{j}q_{j}$ (or simply qj), a Pareto minimal dual-cost Nash equilibrium point to the problem in (5.2) must be located on the line where the equality of the constraint inequality is true (i.e., $q_{1}/\mu_{1}+q_{2}/\mu_{2}=w$). Thus, we know that:

(5.3)\begin{eqnarray} &&q_{j}=\mu_{j}\left(w-\frac{q_{2-j+1}}{\mu_{2-j+1}}\right)\;\;\;\mbox{with}\;\;\;j\in\{1,2\}. \end{eqnarray}

Hence, it follows from (5.3) that:

(5.4)\begin{eqnarray} &&\bar{f}(q_{1})\equiv\sum_{j=1}^{2}C_{j}(p_{j}q_{j},c_{j}) =\frac{p_{1}^{2}q^{2}_{1}}{2\mu_{1}c_{1}}+\frac{2p_{2}^{3}\mu_{2}^{2}}{3c_{2}^{3}}\left(w-\frac{q_{1}}{\mu_{1}}\right)^{3}. \end{eqnarray}

Then, by solving the equation $\frac{\partial\bar{f}(q_{1})}{\partial q_{1}}=0$, we can get the minimal value of the function $\bar{f}(q_{1})$ for each $p\in R_{+}^{2}$. More precisely, the unique Pareto minimal point $q^{*}(p,w)=(q_{1}^{*},q_{2}^{*})(p,w)$ to the problem in (5.2) can be explicitly given by:

(5.5)\begin{eqnarray} &&\;\;\;\;\left\{\begin{array}{ll} q_{1}^{*}(p,w)&\equiv\;\;\bar{g}_{1}(p,w)\;=\;\frac{1}{2}\left(\frac{2w}{\mu_{1}} +\frac{p_{1}^{2}c_{2}^{3}}{2p_{2}^{3}c_{1}\mu_{2}^{2}}\right)\mu_{1}^{2}-\sqrt{\frac{1}{4}\left(\frac{2w}{\mu_{1}} +\frac{p_{1}^{2}c_{2}^{3}}{2p_{2}^{3}c_{1}\mu_{2}^{2}}\right)^{2}\mu_{1}^{4}-\mu_{1}^{2}w^{2}},\\ q_{2}^{*}(p,w)&\equiv\;\;\bar{g}_{2}(p,q_{1}^{*}(p,w),w)\;=\;\mu_{2}\left(w-\frac{q_{1}^{*}(p,w)}{\mu_{1}}\right). \end{array} \right. \end{eqnarray}

From the green curve in the upper-left graph of Figure 10 where $p_{1}=p_{2}=1$ and $w=10,000$, we can see that this point is close to the one corresponding to $q_{1}=0$ and we can consider it as a Pareto minimal Nash equilibrium point near boundary. Another Nash equilibrium point is the intersection point of the red and blue curves in the left graph of Figure 10. Obviously, this point is not a minimal total cost point. However, we can use some transformation technique to shift the minimal point to this one and to design a more fairly balanced decision policy. Nevertheless, for the purpose of this research in finding the Pareto utility-maximization Nash equilibrium policy, we use the point in (5.5) as our decision policy. In this case, for the price parameter $p\in R_{+}^{2}$, we have:

Figure 10. Pareto optimal Nash equilibrium policies with dynamic pricing, where, the Price1/10,000 in the lower-right graph means that Price1 is divided by 10,000.

(5.6)\begin{eqnarray} &&\left\{\begin{array}{ll} C_{0}(pq^{*},c)\leq C_{0}(pq,c), &\\ C_{1}(pq^{*},c)\leq C_{1}(pq_{-1}^{*},c)&\mbox{with}\;\;\;q^{*}_{-1}=(q_{1},q_{2}^{*}),\\ C_{2}(pq^{*},c)\leq C_{2}(pq_{-2}^{*},c)&\mbox{with}\;\;\;q^{*}_{-2}=(q_{1}^{*},q_{2}). \end{array} \right. \end{eqnarray}

Then, associated with a given queue length based Pareto minimal Nash equilibrium point in (5.5), we can obtain the relationship between prices p 1 and p 2 as follows,

(5.7)\begin{eqnarray} &&\frac{p^{2}_{1}(q_{1},w)}{p_{2}^{3}(q_{1},w)}\equiv\kappa(q_{1},w)=\left(\frac{2c_{1}\mu_{2}^{2}}{c_{2}^{3}}\right) \left(\frac{\mu_{1}^{2}w^{2}+q_{1}^{2}}{\mu_{1}^{2}q_{1}}-\frac{2w}{\mu_{1}}\right). \end{eqnarray}

From (5.7), we can see that there are different choices of dynamic pricing policies corresponding to Pareto minimal Nash equilibrium point $q^{*}(p,w)$ in (5.5). For the current study, we take

(5.8)\begin{eqnarray} &&\;\;\;\;\left\{\begin{array}{ll} p_{1}(q_{1},w)&=\;\;\;\kappa^{2}(q_{1},w),\\ p_{2}(q_{1},w)&=\;\;\;\kappa(q_{1},w), \end{array} \right. \end{eqnarray}

whose dynamic evolutions with the queue length q 1 are shown in the upper-right graph of Figure 10.

Figure 11. In this simulation, the number of simulation iterative times is $N=6,000$, the simulation time interval is $[0,T]$ with T = 200, which is further divided into $n=5,000$ subintervals as explained in Subsection 5. Other values of simulation parameters introduced in Definition 3.1 and Subsubsection 4 are as follows: initialprice1 = 9, initialprice2 = 3, lowerboundprice1 = 0.64, lowerboundprice2 = 0.8, $\lambda_{1}=10/3$, $\lambda_{2}=5$, $m_{1}=3$, $m_{2}=1$, $\mu_{1}=1/10$, $\mu_{2}=1/20$, $\alpha_{1}=\sqrt{10/3}$, $\alpha_{2}=\sqrt{20}$, $\beta_{1}=\sqrt{10}$, $\beta_{2}=\sqrt{20}$, $\zeta_{1}=1$, $\zeta_{2}=\sqrt{2}$, $\rho_{1}=\rho_{2}=1,000$, $c_{1}^{2}=c_{2}^{1}=1,500$, $\theta_{1}=-1$, $\theta_{2}=-1.2$.

Next, by Theorem 3.5, we know that the coefficients of the 1-dimensional RDRS under our dynamic pricing and game-based scheduling policy for the physical workload process $\hat{W}$ can be denoted by:

(5.9)\begin{eqnarray} &&\left\{\begin{array}{ll} \hat{b}&=\;\;\theta_{1}/\mu_{1}+\theta_{2}/\mu_{2},\;\;\hat{\sigma}^{E}=\hat{\sigma}^{S}=\left(1/\mu_{1},1/\mu_{2}\right),\;\;\hat{R}=1,\\ \hat{\sigma}&=\;\;\sqrt{\left(\sum_{j=1}^{2}\hat{\sigma}_{j}^{E}\sqrt{\Gamma_{jj}^{E}}\right)^{2} +\left(\sum_{j=1}^{2}\hat{\sigma}_{j}^{S}\sqrt{\Gamma_{jj}^{S}}\right)^{2}}. \end{array} \right. \end{eqnarray}

Then, based on $\hat{W}$, we can get the dynamic queueing policy by (5.5) and its associated dynamic pricing policy through (5.8):

(5.10)\begin{eqnarray} &&\hat{Q}(t)=q^{*}(\hat{P}(t),\hat{W}(t)),\;\;\;\hat{P}(t)=\hat{P}(\hat{Q}_{1}(t),\hat{W}(t)). \end{eqnarray}

After determining the initial prices $\hat{P}(0)=(initialprice1, initialprice2)$, we suppose that $\hat{P}(t)$ has the lower bound price protection functionality, i.e., $\hat{P}(t)\in[lowerboundprice1,\infty)$ × $[lowerboundprice2,\infty)$. Corresponding to (5.8), this truncated price process still owns the Lipschitz continuity as imposed in (2.8). Then, by combining the policy in (5.10) with the simulation algorithms for RDRSs we can illustrate our policy in (5.10) is cost-effective in comparing with a constant pricing policy, a 2D-Queue policy and an arbitrarily selected dynamic pricing policy. These simulation comparisons are presented in Figure 11 with different parameters. The number N of simulation iterative times for these comparisons is 6,000 and the simulation time interval is $[0,T]$ with T = 200, which is further divided into $n=5,000$ subintervals. The first graph on the left-column in Figure 11 is the mean total cost difference (MTCD) at each time point ti with $i\in\{0,1,\ldots,5,000\}$ between our current dynamic pricing policy in (5.10) and the constant pricing policy with $P_{1}(t)=P_{2}(t)=1$ for all $t\in[0,\infty)$, i.e.,

(5.11)\begin{eqnarray} &&\mbox{MTCD}(t_{i})=\frac{1}{N}\sum_{j=1}^{N}\left(C_{0}(\hat{P}(\omega_{j},t_{i})\hat{Q}(\omega_{j},t_{i}),\rho) -C_{0}(P(\omega_{j},t_{i})Q(\omega_{j},t_{i}),\rho))\right), \end{eqnarray}

where ωj denotes the jth sample path and the $Q(\omega_{j},t_{i})$ in (5.11) is the queue length corresponding to the constant pricing policy at each time point ti. The second graph on the left-column in Figure 11 is the MTCD between our newly designed dynamic pricing policy in (5.10) and a 2D-Queue policy used as an alternative comparison policy in Dai [Reference Dai8]. For this 2D-Queue policy, the constant pricing with $P_{1}(t)=P_{2}(t)=1$ is employed and the associated Q(t) is presented as a two-dimensional RDRS model as in Dai [Reference Dai8]. The third graph on the left-column in Figure 11 is the MTCD between our current dynamic pricing policy in (5.10) and an arbitrarily selected dynamic pricing policy given by:

(5.12)\begin{eqnarray} &&\left\{\begin{array}{ll} P_{1}(t)&=\;\;\mbox{lowerboundprice1}+\frac{20}{0.05+\sqrt{Q_{1}(t)}},\\ P_{2}(t)&=\;\;\mbox{lowerboundprice2}+\frac{30}{0.1+\sqrt{Q_{2}(t)}} \end{array} \right. \end{eqnarray}

with the associated queue policy $Q(t)=\hat{Q}(t)$. The first and second graphs on the right-column in Figure 11 display the dynamics of $\hat{Q}(t)$ for both users. The third graph on the right-column in Figure 11 shows the price evolutions corresponding to two users. From the first graph in Figure 11, we can see that the cost is relatively large if the initial prices are relatively high. All of the other comparisons in Figure 11 show the cost-effectiveness of our policy in (5.10).

5.2. The simulation for example 4.3

Based on the three dual-cost functions in (5.1), we can first select any two of the three users for service by formulating the following minimal dual-cost zero-sum game problem for a price parameter $p\in R_{+}^{3}$, a constant w > 0, and a fixed $c\in{\cal R}$,

(5.13)\begin{eqnarray} &&\left\{\begin{array}{ll} \min_{q\in R_{+}^{3}}C_{0}(pq,c),&\\ \min_{q\in R_{+}^{3}}C_{0j}(pq,c)&\mbox{subject to}\;\;\;\;q_{1}/\mu_{1}+q_{2}/\mu_{2}\geq w,\\ \min_{q\in R_{+}^{3}}\left(-C_{0j_{1}}(pq,c)\right)&\mbox{subject to}\;\;\;\;q_{1}/\mu_{1}+q_{3}/\mu_{3}\geq w,\\ \min_{q\in R_{+}^{3}}\left(-C_{0j_{2}}(pq,c)\right)&\mbox{subject to}\;\;\;\;q_{2}/\mu_{2}+q_{3}/\mu_{3}\geq w, \end{array} \right. \end{eqnarray}

where $j\in\{1,2,3\}$, $j_{1}\in\{1,2,3\}\setminus\{\,j\}$, and $j_{2}\in\{1,2,3\}\setminus\{\,j,j_{1}\}$, and

(5.14)\begin{eqnarray} &&\left\{\begin{array}{ll} C_{0}(pq,c)&=\;\;C_{1}(p_{1}q_{1},c_{1})+C_{2}(p_{2}q_{2},c_{2})+C_{3}(p_{3}q_{3},c_{3}),\\ C_{01}(pq,c)&=\;\;C_{1}(p_{1}q_{1},c_{1})+C_{2}(p_{2}q_{2},c_{2}),\\ C_{02}(pq,c)&=\;\;C_{1}(p_{1}q_{1},c_{1})+C_{3}(p_{3}q_{3},c_{3}),\\ C_{03}(pq,c)&=\;\;C_{2}(p_{2}q_{2},c_{2})+C_{3}(p_{3}q_{3},c_{3}). \end{array} \right. \end{eqnarray}

In other words, if $q^{*}=(q_{1}^{*},q_{2}^{*},q_{3}^{*})$ is a solution to the game problem in (5.13), and if,

(5.15)\begin{eqnarray} &&q^{*}_{-j}=\left\{\begin{array}{ll} (q_{j},q^{*}_{j_{1}},q^{*}_{j_{2}})&\mbox{if}\;\;j=1,\\ (q^{*}_{j_{1}},q_{j},q^{*}_{j_{2}})&\mbox{if}\;\;j=2,\\ (q^{*}_{j_{1}},q^{*}_{j_{2}},q_{j})&\mbox{if}\;\;j=3, \end{array} \right. \end{eqnarray}

then, for any two fixed $p,c\in R_{+}^{3}$, we have that:

(5.16)\begin{eqnarray} &&\left\{\begin{array}{ll} C_{0}(pq^{*},c)&\leq\;\;C_{0}(pq,c),\\ C_{0j}(pq^{*},c)&\leq\;\;C_{0j}(pq^{*}_{-j},c),\\ -C_{0j_{1}}(pq^{*},c)&\leq\;\;-C_{0j_{1}}(pq^{*}_{-j_{1}},c),\\ -C_{0j_{2}}(pq^{*},c)&\leq\;\;-C_{0j_{2}}(pq^{*}_{-j_{2}},c). \end{array} \right. \end{eqnarray}

Furthermore, when two users corresponding to the summation $C_{0j}=C_{k}+C_{l}$ for an index $j\in\{1,2,3\}$ with two associated indices $k,l\in\{1,2,3\}$ as in one of (5.13) are selected, we can propose a 2-stage pricing and queueing policy at each time point by a Pareto minimal dual cost Nash equilibrium point to the non-zero-sum game problem for two fixed $p,c\in R_{+}^{3}$,

(5.17)\begin{eqnarray} &&\min_{q\in R_{+}^{3}}C_{0j}(pq,c),\;\;\min_{q\in R_{+}^{3}}C_{k}(pq,c),\;\;\min_{q\in R_{+}^{3}}C_{l}(pq,c). \end{eqnarray}

To wit, if $q^{*}=(q_{k}^{*},q_{l}^{*})$ is a solution to the game problem corresponding to the two users, we have that:

(5.18)\begin{eqnarray} &&\left\{\begin{array}{ll} C_{0j}(pq^{*},c)&\geq\;\;C_{0j}(pq,c),\\ C_{k}(pq^{*},c)&\geq\;\;C_{1}(pq^{*}_{-k},c)\;\;\;\mbox{with}\;\;\;q^{*}_{-k}=(q_{k},q^{*}_{l}),\\ C_{l}(pq^{*},c)&\geq\;\;C_{l}(pq^{*}_{-l},c)\;\;\;\;\mbox{with}\;\;\;q^{*}_{-l}=(q^{*}_{k},q_{l}). \end{array} \right. \end{eqnarray}

Thus, for the price parameter $p\in R_{+}^{3}$ and each $w\geq 0$, it follows from (5.13)-(5.16) and (5.17)-(5.18) that our queueing policy $(q_{1}^{*}(p,w),q_{2}^{*}(p,w),q_{3}^{*}(p,w))$ can be designed by:

(5.19)\begin{align} \;\;\;\;\left\{\begin{array}{ll} \left\{\begin{array}{ll} q_{1}^{*}(p,w)=\bar{g}_{1}(p_{1},p_{2},w),\\ q_{2}^{*}(p,w)=\bar{g}_{2}(p_{1},p_{2},q_{1}^{*},w), \end{array} \right. &\mbox{if}\;\; C_{01}(pq^{*},c)\leq\min\left\{C_{02}(pq^{*},c),C_{03}(pq^{*},c)\right\}, \\ \left\{\begin{array}{ll} q_{1}^{*}(p,w)=\hat{g}_{1}(p_{1},p_{3},w),\\ q_{3}^{*}(p,w)=\hat{g}_{3}(p_{1},p_{3},q_{1}^{*},w), \end{array} \right. &\mbox{if}\;\; C_{02}(pq^{*},c)\leq\min\left\{C_{01}(pq^{*},c),C_{03}(pq^{*},c)\right\}, \\ \left\{\begin{array}{ll} q_{2}^{*}(p,w)=\bar{g}_{2}(p_{3},p_{2},q_{3}^{*},w),\\ q_{3}^{*}(p,w)=\bar{g}_{1}(p_{3},p_{2},w), \end{array} \right. &\mbox{if}\;\; C_{03}(pq^{*},c)\leq\min\left\{C_{01}(pq^{*},c),C_{02}(pq^{*},c)\right\}, \end{array} \right. \end{align}

where the function $\bar{g}_{1}$ is given in (5.5) and $\hat{g}_{1}$ is calculated in a similar way as follows,

(5.20)\begin{align} \;\;\;\;\left\{\begin{array}{ll} \hat{g}_{1}(p_{1},p_{3},w)&=\frac{(p_{3}^{2}\mu_{3}w)/(\mu_{1}c_{3})}{(p_{1}^{2}/\mu_{1}c_{1})+(p_{3}^{2}\mu_{3}/\mu_{1}^{2}c_{3})},\\ \hat{g}_{3}(p_{1},p_{3},q_{1},w)&=\;\;\mu_{3}\left(w-(q_{1}/\mu_{1})\right). \end{array} \right. \end{align}

The intersection point of $\hat{g}_{1}$ and $\hat{g}_{3}$ in terms of q 1 is a Pareto optimal Nash equilibrium point as shown in the lower-left graph of Figure 10. Furthermore, based on (5.19)-(5.20), we can inversely determine our pricing policy $p=(p_{1},p_{2},p_{3})$ as follows,

(5.21)\begin{align} \;\;\;\;\left\{\begin{array}{ll} \left\{\begin{array}{ll} p_{1}(q^{*}_{1},w)=\kappa^{2}(q_{1}^{*},w),\\ p_{2}(q^{*}_{1},w)=\kappa(q^{*}_{1},w) \end{array} \right. &\mbox{if}\;\; C_{01}(pq^{*},c)\leq\min\left\{C_{02}(pq^{*},c),C_{03}(pq^{*},c)\right\}, \\ \left\{\begin{array}{ll} p_{1}(q^{*}_{1},w)=\varpi(q^{*}_{1})\hat{\kappa}(q^{*}_{1},w),\\ p_{3}(q^{*}_{1},w)=\varpi(q^{*}_{1})\sqrt{\hat{\kappa}(q_{1}^{*},w)} \end{array} \right. &\mbox{if}\;\; C_{02}(pq^{*},c)\leq\min\left\{C_{01}(pq^{*},c),C_{03}(pq^{*},c)\right\}, \\ \left\{\begin{array}{ll} p_{2}(q_{3}^{*},w)=\kappa(q_{3}^{*},w),\\ p_{3}(q_{3}^{*},w)=\kappa^{2}(q^{*}_{3},w) \end{array} \right. &\mbox{if}\;\; C_{03}(pq^{*},c)\leq\min\left\{C_{01}(pq^{*},c),C_{02}(pq^{*},c)\right\}, \end{array} \right. \end{align}

where, κ is defined in (5.7) and $\hat{\kappa}$ can be calculated in the same way as follows,

(5.22)\begin{align} \hat{\kappa}(q^{*}_{1},w)=\frac{\mu_{3}c_{1}}{q^{*}_{1}c_{3}}\left(w-\frac{q^{*}_{1}}{\mu_{1}}\right). \end{align}

Furthermore, $\varpi(q^{*}_{1})$ in (5.21) is a Non-negative function in terms of $q^{*}_{1}$ and it is taken to be the unity in the drawing of dynamic pricing evolving in the lower-graph of Figure 10 with $w=10,000$.

To show the cost-effectiveness of our queueing policy in (5.19) with its associated pricing policy in (5.21), we present an arbitrarily selected stochastic pooling policy for the purpose of comparisons as follows,

(5.23)\begin{align} \;\;\left\{\begin{array}{ll} \left\{\begin{array}{ll} q_{1}^{*}(p,w)=\bar{g}_{1}(p_{1},p_{2},w),\\ q_{2}^{*}(p,w)=\bar{g}_{2}(p_{1},p_{2},q_{1}^{*},w) \end{array} \right. &\mbox{if}\;\; u\in\left[0,\frac{1}{3}\right), \\ \left\{\begin{array}{ll} q_{1}^{*}(p,w)=\hat{g}_{1}(p_{1},p_{3},w),\\ q_{3}^{*}(p,w)=\hat{g}_{3}(p_{1},p_{3},q_{1}^{*},w) \end{array} \right. &\mbox{if}\;\; u\in\left[\frac{1}{3},\frac{2}{3}\right), \\ \left\{\begin{array}{ll} q_{2}^{*}(p,w)=\bar{g}_{2}(p_{3},p_{2},q_{3}^{*},w),\\ q_{3}^{*}(p,w)=\bar{g}_{1}(p_{3},p_{2},w) \end{array} \right. &\mbox{if}\;\; u\in\left[\frac{2}{3},1\right], \end{array} \right. \end{align}

where u is a uniformly distributed random number.

After determining the initial price vector $\hat{P}(0)=(initialprice1,initialprice2,initialprice3)$, we suppose that $\hat{P}(t)$ has the lower bound price protection and the upper bound constraint functionalities, i.e., $\hat{P}(t)\in[lowerboundprice1$, $upperboundprice1)$ × $[lowerboundprice2$, $upperboundprice2)$ × $[lowerboundprice3$, $upperboundprice3)$. Corresponding to (5.21), this truncated price process still own the Lipschitz continuity as imposed in (2.8). Then, by the similar explanations used for (5.11), we can conduct the corresponding simulation comparisons for this example as shown in Figures 45. The cost value evolution based on our queueing policy in (5.19) with its associated pricing policy in (5.21) is shown in the first graph of the left-column in each of Figures 45. Its MTCD in (5.11) compared with the arbitrarily selected stochastic pooling policy in (5.23) is displayed in the first graph of the right-column in each of Figures 45. The cost value evolution based on our queueing policy in (5.19) with constant pricing (i.e., $p_{1}=p_{2}=p_{3}$) is shown in the second graph of the left-column in each of Figures 45. In this constant pricing case, its MTCD in (5.11) compared with the arbitrarily selected stochastic pooling policy in (5.23) is displayed in the second graph of the right-column in each of Figures 45. The MTCD based on our queueing policy in (5.19) with its associated pricing policy in (5.21) and with the constant pricing policy is shown in the third graph of the left-column in each of Figures 45. The price evolutions for the three users are shown in the third graph of the right-column in each of Figures 45. In the special case with parameters as shown in Figure 5, the three price evolutions are the same. Furthermore, the MTCD between our dynamic pricing policy in (5.21) and the constant pricing policy is the number 0 as shown in the third graph of the left-column in Figure 5.

6. Justification of RDRS modeling

In this section, we theoretically prove the correctness of our RDRS modeling presented in Theorem 3.5. To be convenient for readers, we first outline the proof for Theorem 3.5, which is a technical generalization of the corresponding proofs in existing discussions in Dai [Reference Dai8, Reference Dai9]. The major breakthrough in this generalization is to incorporate the dynamic pricing functions in (2.8)-(2.9) in terms of both the queue length $Q_{j}(t)$ with $j\in\{1,\ldots,J\}$ and the random environment $\alpha(t)$ into the proof. In Dai [Reference Dai8], we prove a corresponding theorem for a generic game platform where the game-theoretic oriented resource scheduling policy is a solely “win-win” fairly sharing non-zero-sum game oriented one, which is designed for multiple resources-sharing and fairly competing users. The studied platform in Dai [Reference Dai8] consists of multiple intelligent (quantum) cloud-computing pools and parallel-queues. The arrival data streams associated with all the users are modeled by TSRRPs. Every user in the system can be served at the same time by multiple service pools and in the meantime every service pool consisting of parallel-servers can also provide services to multi-users simultaneously. The associated RDRS performance model is established under the “win-win” fairly resources-sharing scheduling policy via diffusion approximation. The study in Dai [Reference Dai8] is extended to the case for a “win-lose & win-win” 2-stage zero-sum and non-zero-sum mixed game-theoretic scheduling policy based platform in Dai [Reference Dai9]. However, the study in Dai [Reference Dai9] is still a 2-stage resources-competition and resources-sharing oriented one, which does not consider the dynamic pricing issue. Therefore, in the discussions of the following two subsections for proving our current Theorem 3.5, we incorporate the dynamic pricing functions in (2.8)-(2.9) into consideration, which is along the line of the corresponding proof in Dai [Reference Dai9].

6.1. The required conditions

In this subsection, we present the required conditions in proving our RDRS modeling.

6.1.1. Conditions on utility functions

The utility functions can be either simply taken as the well-known proportionally fair and minimal potential delay allocations as used in (4.1) for Example 5 or generally taken such that the existence of a utility based 2-stage game-theoretic policy corresponding to the game problem in (3.20)-(3.21) is guaranteed. More precisely, for each given $p\in R_{+}^{J}$, we can assume that $U_{vj}(p_{j}q_{j},c_{vj})$ for each $j\in{\cal J}(v)$ and $v\in{\cal V}(\,j)$ is defined on $R_{+}^{J}$. It is second-order differentiable and satisfies:

(6.1)\begin{eqnarray} &&\;\;\left\{\begin{array}{ll} U_{vj}(0,c_{vj})=0,\\ U_{vj}(p_{j}q_{j},c_{vj})=\Phi_{vj}(p_{j}q_{j})\Psi_{v}(c_{vj})\;\mbox{is strictly increasing/concave in}\;c_{vj}\;\mbox{for}\;p_{j}q_{j} \gt 0,\\ \Psi_{v}(\nu_{j}c_{vj})=\Psi_{v}(\nu_{j})\Psi_{v}(c_{vj})\;\;\mbox{or}\;\;\Psi_{v}(\nu_{j}c_{vj})=\Psi_{v}(\nu_{j})+\Psi_{v}(c_{vj})\;\;\mbox{for constant}\;\;\nu_{j}\geq 0,\\ \frac{\partial U_{vj}(p_{j}q_{j},c_{vj})}{\partial c_{vj}}\;\;\mbox{is strictly increasing in}\;\;p_{j}q_{j}\geq 0,\;\;\\ \frac{\partial U_{vj}(0,c_{vj})}{\partial c_{vj}}=0\;\;\mbox{and}\;\;\lim_{q_{j}\rightarrow\infty}\frac{\partial U_{vj}(p_{j}q_{j},c_{vj})}{\partial c_{vj}}=+\infty\;\;\mbox{for each}\;\;c_{vj} \gt 0. \end{array} \right. \end{eqnarray}

Furthermore, we suppose that $\{U_{vj}(p_{j}q_{j},c_{vj}),j\in{\cal J}(v),v\in{\cal V}(\,j)\}$ satisfies the radial homogeneity condition at each given time point $t\in[0,\infty)$. In other words, for any scalar a > 0, each q > 0, $i\in{\cal K}$, $v\in{\cal V}$, and each $j_{l}\in{\cal M}(i,v,t)$ with $l\in\{1,\ldots,M_{v}\}$, its Pareto maximal utility Nash equilibrium point for the game has the radial homogeneity:

(6.2)\begin{eqnarray} &&c_{vj_{l}}(apq,i)=c_{vj_{l}}(pq,i). \end{eqnarray}

6.1.2. Complete resource pooling condition

Complete resource pooling (CRP) condition is commonly used in queueing network scheduling literature (see e.g., Stolyar [Reference Stolyar26], Ye and Yao [Reference Ye and Yao28]). Roughly speaking, under this condition, the network service resources can completely be shared in certain way by all allowed users. There are different (but essentially equivalent) ways to describe CRP condition (see e.g., Stolyar [Reference Stolyar26], Ye and Yao [Reference Ye and Yao28]). Here, we adopt the way in Stolyar [Reference Stolyar26] to present our CRP condition. More precisely, let $\bar{{\cal R}}_{v}(i)$ for $i\in{\cal K}$ and $v\in{\cal V}$ denote the boundary of ${\cal R}_{v}(i)$ in (3.10). Moreover, let $\bar{{\cal R}}^{*}_{v}(i)$ for $i\in{\cal K}$ and $v\in{\cal V}$ denote the outer (“north-east”) boundary of $\bar{{\cal R}}_{v}(i)$. Then, we have the following concepts.

A vector $\rho^{*}_{v}(i)$ with $i\in{\cal K}$ and $v\in{\cal V}$ is said to satisfy resource pooling (RP) condition if $\rho^{*}_{v}(i)\in\bar{{\cal R}}^{*}_{v}(i)$ and the outer normal vector ζ to $\bar{{\cal R}}_{v}(i)$ at $\rho^{*}_{v}(i)$ is unique (up to a scaling). In other words, the RP condition holds if $\rho^{*}_{v}(i)$ lies in the (relative) interior of one of curved facets of $\bar{{\cal R}}^{*}_{v}(i)$. Furthermore, $\rho^{*}_{v}(i)$ is said to satisfy the CRP condition if it satisfies the RP condition and all components of the corresponding normal vector ζ are strictly positive.

6.1.3. Heavy traffic condition

In addition, we introduce a sequence of independent Markov processes indexed by $r\in{\cal R}$, i.e., $\{\alpha^{r}(\cdot),r\in{\cal R}\}$. These systems all have the same basic structure as presented in the last section except the arrival rates $\lambda^{r}_{j_{l}}(i)$ and the holding time rates $\gamma^{r}(i)$ for all $i\in{\cal K}$, which may vary with $r\in{\cal R}$. Here, we suppose that they satisfy the heavy traffic condition:

(6.3)\begin{eqnarray} && r\left(\lambda_{j_{l}}^{r}(i)-\lambda_{j_{l}}(i)\right)m_{j_{l}}(i) \rightarrow\theta_{j_{l}}(i)\;\;\mbox{as}\;\;r\rightarrow\infty, \;\;\gamma^{r}(i)=\frac{\gamma(i)}{r^{2}}, \end{eqnarray}

where, $\theta_{j_{l}}(i)\in R$ is some constant for each $i\in{\cal K}$. Moreover, we suppose that the nominal arrival rate $\lambda_{j_{l}}(i)$ is given by:

(6.4)\begin{eqnarray} &&\lambda_{j_{l}}(i)m_{j_{l}}(i)\equiv\mu_{j_{l}}\rho_{j_{l}}(i), \end{eqnarray}

and $\rho_{j_{l}}(i)$ in (6.4) for $j_{l}\in{\cal M}(i,v,t)$ with $l\in\{1,\ldots,M_{v}\}$ is the nominal throughput determined by:

(6.5)\begin{eqnarray} &&\rho_{j_{l}}(i)=\sum_{v\in{\cal V}(\,j_{l})}\rho_{vj_{l}}(i)\;\;\; \mbox{and}\;\;\;\rho_{vj_{l}}(i)=\nu_{vj_{l}}\bar{\rho}_{vj_{l}}(i), \end{eqnarray}

with $\rho_{v\cdot}(i)\in{\cal O}_{v}(i)$ that is corresponding to the dimension Mv. In addition, $\nu_{v\cdot}$ and $\bar{\rho}_{v\cdot}(i)$ are a Jv-dimensional constant vector and a reference service rate vector, respectively, at service pool v, satisfying:

(6.6)\begin{eqnarray} \sum_{j_{l}\in{\cal M}(i,v,t)\bigcap{\cal J}(v)}\nu_{j_{l}}&=&J_{v},\;\nu_{j_{l}}\geq 0\;\;\;\mbox{are constants for all}\;\;j_{l}\in{\cal M}(i,v,t)\cap{\cal J}(v), \end{eqnarray}
(6.7)\begin{eqnarray} \;\;\;\;\;\;\;\;\; \sum_{j_{l}\in{\cal M}(i,v,t)\cap{\cal J}(v)}\bar{\rho}_{vj_{l}}(i)&=&{\cal C}_{U_{v}}(i)\;\mbox{and}\;\bar{\rho}_{vj_{1}}(i)=\bar{\rho}_{vj_{l}}(i)\;\;\mbox{for all}\;\;j_{l}\in{\cal M}(i,v,t)\cap{\cal J}(v). \end{eqnarray}

Remark 6.1. By (3.11), $\bar{\rho}_{v\cdot}(i)$ for each $i\in{\cal K}$ and $v\in{\cal V}(\,j_{l})$ can indeed be selected, which satisfy the second condition in (6.7). Thus, the CRP condition is true. Hence, the nominal throughput $\rho(i)$ in (6.4) can be determined. One simple example that satisfies these conditions is to take $\nu_{vj_{l}}=1$ for all $j_{l}\in{\cal M}(i,v,t)\cap{\cal J}(v)$ and $v\in{\cal V}(\,j_{l})$. Thus, the conditions in (6.4)-(6.7) mean that the system manager wishes to maximally and fairly allocate capacity to all users. Moreover, the design parameters $\lambda_{j_{l}}(i)$ for all $j_{l}\in{\cal M}(i,v,t)\cap{\cal J}$ and each $i\in{\cal K}$ can be determined by (6.4).

Next, we assume that the inter-arrival time associated with the kth arriving job batch to the system indexed by $r\in{\cal R}$ is given by:

(6.8)\begin{eqnarray} &&u_{j_{l}}^{r}(k,i)=\frac{\hat{u}_{j_{l}}(k)}{\lambda_{j_{l}}^{r}(i)}\;\;\mbox{for each}\;\;j_{l}\in{\cal M}(i,v,t)\cap{\cal J},\;k\in\{1,2,\ldots\}, \;i\in{\cal K}, \end{eqnarray}

where the $\hat{u}_{j_{l}}(k)$ does not depend on r and i. Moreover, it has mean one and finite squared coefficient of variation $\alpha_{j_{l}}^{2}$. In addition, the number of packets, $w_{j_{l}}(k)$, and the packet length $v_{j_{l}}(k)$ are assumed not to change with r. Thus, it follows from the heavy traffic condition in (6.3) for the rth environmental state process $\alpha^{r}(\cdot)$ with $r\in{\cal R}$ that $\alpha^{r}(r^{2}\cdot)$ and $\alpha(\cdot)$ equal to each other in distribution since they own the same generator matrix (see e.g., the definition in pages 384–388 of Resnick [Reference Resnick24]). Therefore, under the sense of distribution, all of the systems indexed by $r\in{\cal R}$ in (3.1) has the same random environment over any time interval $[0,t]$.

6.2. Proof of Theorem 3.5

First, it follows from the second condition in (6.3) that the processes $\alpha^{r}(r^{2}\cdot)$ for each $r\in{\cal R}$ and $\alpha(\cdot)$ are equal in distribution. Hence, without loss of generality, we can assume that:

(6.9)\begin{eqnarray} &&\alpha^{r}(r^{2}t)=\alpha(t)\;\;\;\mbox{for each}\;\;\;r\in{\cal R}\;\;\;\mbox{and}\;\;\;t\in[0,\infty). \end{eqnarray}

Thus, for each $j\in{\cal J}$, $r\in{\cal R}$ and by the radial homogeneity of $\Lambda(pq,i)$ of the policy in (6.2), we can define the fluid and diffusion scaled processes as follows,

(6.10)\begin{eqnarray} E^{r}_{j}(\cdot)&\equiv&A^{r}_{j}(r^{2}\cdot), \end{eqnarray}
(6.11)\begin{eqnarray} \bar{T}^{r}_{j}(\cdot)&\equiv&\int_{0}^{\cdot}\Lambda_{j}\left(\bar{P}^{r}(s)\bar{Q}^{r}(s),\alpha(s),s\right)ds=\frac{1}{r^{2}}T_{j}^{r}(r^{2}\cdot), \end{eqnarray}
(6.12)\begin{eqnarray} \bar{Q}_{j}^{r}(t)&\equiv&\frac{1}{r^{2}}Q^{r}_{j}(r^{2}t), \end{eqnarray}
(6.13)\begin{eqnarray} \bar{P}^{r}_{j}(t)&=&f_{j}(\bar{Q}^{r}_{j}(t),\alpha(t)), \end{eqnarray}
(6.14)\begin{eqnarray} \bar{E}^{r}_{j}(t)&\equiv&\frac{1}{r^{2}}E_{j}^{r}(t), \end{eqnarray}
(6.15)\begin{eqnarray} \bar{S}^{r}_{j}(t)&\equiv&\frac{1}{r^{2}}S_{j}^{r}(r^{2}t). \end{eqnarray}

Then, it follows from (2.7), (6.9), the assumptions among arrival and service processes that

(6.16)\begin{eqnarray} &&\hat{Q}^{r}_{j}(\cdot)=\frac{1}{r}E^{r}_{j}(\cdot)-\frac{1}{r}S^{r}_{j}(\bar{T}^{r}_{j}(\cdot)). \end{eqnarray}

Furthermore, for each $j\in{\cal J}$, let

(6.17)\begin{eqnarray} &&\hat{E}^{r}(\cdot)=(\hat{E}^{r}_{1}(\cdot),\ldots,\hat{E}^{r}_{J}(\cdot))'\;\;\mbox{with}\;\;\hat{E}^{r}_{j}(\cdot)=\frac{1}{r} \left(A^{r}_{j}(r^{2}\cdot)-r^{2}\bar{\lambda}^{r}_{j}(\cdot)\right), \end{eqnarray}
(6.18)\begin{eqnarray} &&\hat{S}^{r}(\cdot)=(\hat{S}^{r}_{1}(\cdot),\ldots,\hat{S}^{r}_{J}(\cdot))'\;\;\;\;\mbox{with}\;\; \hat{S}_{j}^{r}(\cdot)=\frac{1}{r}\left(S_{j}(r^{2}\cdot)-\mu_{j}r^{2}\cdot\right), \end{eqnarray}

where

(6.19)\begin{eqnarray} &&\bar{\lambda}^{r}_{j}(\cdot)\equiv\int_{0}^{\cdot}m_{j}(\alpha(s),s)\lambda_{j}^{r}(\alpha(s),s)ds =\frac{1}{r^{2}}\int_{0}^{r^{2}\cdot}m_{j}(\alpha^{r}(s),r^{2}s)\lambda_{j}^{r}(\alpha^{r}(s),r^{2}s)ds. \end{eqnarray}

For convenience, we define

(6.20)\begin{eqnarray} \bar{\lambda}^{r}(\cdot)&=&\left(\bar{\lambda}^{r}_{1}(\cdot),\ldots,\bar{\lambda}^{r}_{J}(\cdot)\right)'. \end{eqnarray}

In addition, we let $\bar{Q}^{r}(\cdot)$, $\bar{E}^{r}(\cdot)$, $\bar{S}^{r}(\cdot)$, and $\bar{T}^{r}(\cdot)$ be the associated vector processes. Then, for the processes in (6.10)-(6.16), we define the corresponding fluid limit related processes,

(6.21)\begin{eqnarray} \bar{Q}_{j}(t)&=&\bar{Q}_{j}(0)+\bar{\lambda}_{j}(t,\zeta_{t}(\cdot))-\mu_{j}\bar{T}_{j}(t)\;\;\mbox{for each}\;\;j\in{\cal J}, \end{eqnarray}

where $\zeta_{t}(\cdot)$ denotes a process depending on the external environment, i.e.,

(6.22)\begin{eqnarray} \bar{\lambda}(t) &=&\left(\bar{\lambda}_{1}(t),\ldots,\bar{\lambda}_{J}(t)\right)',\;\; \bar{\lambda}_{j}(t)\equiv\int_{0}^{t}m_{j}\lambda_{j}(\alpha(s),s)ds. \end{eqnarray}

Furthermore, we have that

(6.23)\begin{eqnarray} \bar{T}_{j}(t)&=&\int_{0}^{t}\bar{\Lambda}_{j}(\bar{P}(s)\bar{Q}(s),\alpha(s),s)ds, \end{eqnarray}
(6.24)\begin{eqnarray} \bar{P}(t)&=&f(\bar{Q}(t),\alpha(t)), \end{eqnarray}

where for each $i\in{\cal K}$ and $t\in[0,\infty)$, we have that,

(6.25)\begin{eqnarray} \bar{\Lambda}_{j}(pq,i,t)&=&\left\{\begin{array}{ll} \Lambda_{j}(pq,i,t)&\mbox{if}\;\;q_{j} \gt 0,j\in\bigcup_{v\in{\cal V}}{\cal M}(i,v,t),\\ \rho_{j}(i,t)&\mbox{if}\;\;q_{j} \gt 0,j\nsubseteq\bigcup_{v\in{\cal V}}{\cal M}(i,v,t),\\ \rho_{j}(i,t)\;\;\;\;&\mbox{if}\;\;q_{j}=0. \end{array}\right. \end{eqnarray}

Then, we have the following lemma concerning the weak convergence to a stochastic fluid limit process under our game-competition based dynamic pricing and scheduling strategy.

Lemma 6.2. Assume that the initial queue length $\bar{Q}^{r}(0)\Rightarrow\bar{Q}(0)$ along $r\in{\cal R}$. Then, the joint convergence in distribution along a subsequence of ${\cal R}$ is true under our game-competition based dynamic pricing and scheduling strategy in (3.20) and (3.28) with the conditions required by Theorem 3.5,

(6.26)\begin{eqnarray} &&\left(\bar{E}^{r}(\cdot),\bar{S}^{r}(\cdot),\bar{T}^{r}(\cdot),\bar{Q}^{r}(\cdot)\right)\Rightarrow\left(\bar{E}(\cdot),\bar{S}(\cdot), \bar{T}(\cdot),\bar{Q}(\cdot)\right). \end{eqnarray}

In addition, if $\bar{Q}(0)=0$, the convergence is true along the whole ${\cal R}$ and the limit satisfies:

(6.27)\begin{eqnarray} &&\bar{E}(\cdot)=\bar{\lambda}(\cdot),\;\;\bar{S}(\cdot)=\mu(\cdot),\;\;\bar{T}(\cdot)=\bar{c}(\cdot),\;\;\bar{Q}(\cdot)=0, \end{eqnarray}

where $\bar{\lambda}(\cdot)$ is defined in (6.22), $\mu(\cdot)\equiv(\mu_{1},\ldots,\mu_{J})'\cdot$, and $\bar{c}(\cdot)$ is defined by:

(6.28)\begin{eqnarray} \;\;\;\;\bar{c}(t)&=&\left(\bar{c}_{1}(t),\ldots,\bar{c}_{J}(t)\right)'\;\;\mbox{and}\;\;\bar{c}_{j}(t)\equiv\int_{0}^{t}\rho_{j}(\alpha(s),s)ds\;\; \mbox{for each}\;\;j\in{\cal J}. \end{eqnarray}

Proof. First, by the proof of Lemma 1 in Dai [Reference Dai6] and the implicit function theorem, we can show that the pricing function f constructed through (3.22), (3.20), and (3.28) can be assumed to be Lipschitz continuous. Then, by extending the proof of Lemma 3 in Dai [Reference Dai6] and under the conditions in (6.1)-(6.2) and the just illustrated Lipschitz continuity for f, we know that, if $\Lambda(pq,i)\in F_{{\cal Q}}(i)$ for each $i\in{\cal K}$ is a given utility based 2-stage game-theoretic policy corresponding to the game problem in (3.20) and $\{p^{l}q^{l},l\in{\cal R}\}$ is a sequence of valued queue lengths, which satisfies $p^{l}q^{l}\rightarrow pq\in R_{+}^{J}$ as $l\rightarrow\infty$. Then, for each $j\in{\cal J}\setminus {\cal Q}(q)$ and $v\in{\cal V}(\,j)$, we have that:

(6.29)\begin{eqnarray} &&\Lambda_{vj}(p^{l}q^{l},i)\rightarrow\Lambda_{vj}(pq,i)\;\;\mbox{as}\;\;l\rightarrow\infty. \end{eqnarray}

Second, due to the proof of Lemma 7 in Dai [Reference Dai6], we only need to prove that a weak fluid limit on the RHS of (6.26) satisfies (6.28). In doing so, we suppose that the weak fluid limit on the RHS of (6.26) corresponds to a subsequence of the RHS of (6.26), which is indexed by $r_{l}\in{\cal R}$ with $l\in\{1,2,\ldots\}$. Furthermore, it follows from (6.11), (2.11), and the discussion in the proof of Lemma 7 of Dai [Reference Dai6] that the fluid limit process on the right-hand side of (6.26) is uniformly Lipschitz continuous almost surely. Thus, our discussion can focus on a fixed sample path and each regular point t > 0 over an interval $(\tau_{n-1},\tau_{n})$ with $n\in\{1,2,\ldots\}$ for $\bar{T}_{j}$ with $j\in{\cal J}$. More precisely, it follows from (6.21) that $\bar{Q}$ is differential at t and satisfies:

(6.30)\begin{eqnarray} &&\frac{d\bar{Q}_{j}(t)}{dt}=m_{j}\lambda_{j}(\alpha(t),t)-\mu_{j}\frac{d\bar{T}_{j}(t)}{dt} \end{eqnarray}

for each $j\in{\cal J}$. If $\bar{Q}_{j}(t)=0$ for some $j\in{\cal J}$, then it follows from $\bar{Q}_{j}(\cdot)\geq 0$ that

(6.31)\begin{eqnarray} &&\frac{d\bar{Q}_{j}(t)}{dt}=0\;\;\mbox{which implies that}\;\;\frac{d\bar{T}_{j}(t)}{dt}=\frac{m_{j}\lambda_{j}(\alpha(t),t)}{\mu_{j}} =\rho_{j}(\alpha(t),t). \end{eqnarray}

If $\bar{Q}_{j}(t) \gt 0$ for the $j\in{\cal J}$, there is a finite interval $(a,b)\in[0,\infty)$ containing t in it such that $\bar{Q}_{j}(s) \gt 0$ for all $s\in(a,b)$ and hence we can take sufficiently small δ > 0 such that $\bar{Q}_{j}(t+s) \gt 0$ with $s\in(0,\delta)$. Furthermore, by (2.8), $P_{j}(t+s) \gt 0$. Now, let rl with $l\in{\cal R}$ be the subsequence ${\cal R}$ and let $\delta_{l}\in(0,\delta]$ be a sequence such that $\delta_{l}\rightarrow 0$ as $l\rightarrow\infty$ while $\Lambda_{j}$ determined by a same group of users over $(0,\delta_{l}]$. Then, it follows from (6.11) that:

(6.32)\begin{eqnarray} &&\left|\frac{1}{\delta_{l}}\left(\bar{T}^{r_{l}}_{j}(t+\delta_{l})-\bar{T}^{r_{l}}_{j}(t)\right)-\Lambda_{j}(\bar{P}(t)\bar{Q}(t),\alpha(t),t)\right|\\ &\leq&\frac{1}{\delta_{l}}\int_{0}^{\delta_{l}}\Big|\Lambda_{j}(\bar{P}^{r_{l}}(t+s)\bar{Q}^{r_{l}}(t+s),\alpha(t+s),t+s) \nonumber\\ &&\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;-\Lambda_{j}(\bar{P}(t+s)\bar{Q}(t+s),\alpha(t+s),t+s)\Big|ds \nonumber\\ &&+\frac{1}{\delta_{l}}\int_{0}^{\delta_{l}}\left|\Lambda_{j}(\bar{P}(t+s)\bar{Q}(t+s),\alpha(t+s),t+s) -\Lambda_{j}(\bar{P}(t)\bar{Q}(t),\alpha(t),t)\right|ds \nonumber\\ \nonumber\\ &\rightarrow&0\;\;\;\;\mbox{as}\;\;l\rightarrow\infty, \nonumber \end{eqnarray}

where, the last claim in (6.32) follows from the Lebesgue dominated convergence theorem, the right-continuity of $\alpha(\cdot)$, the Lipschitz continuity of $\bar{Q}(\cdot)$, and the fact in (6.29). Since t is a regular point of $\bar{T}$, it follows from (6.32) that:

(6.33)\begin{eqnarray} &&\frac{d\bar{T}_{j}(t)}{dt} =\frac{d\bar{T}_{j}(t^{+})}{dt}=\bar{\Lambda}_{j}(\bar{Q}(t),\alpha(t),t)\;\;\mbox{for each}\;\;j\in{\cal J}, \end{eqnarray}

which implies that the claims in (6.23)-(6.25) are true.

Along the line of the proofs for Lemma 4.2 in Dai [Reference Dai8], Lemma 4.1 in Dai [Reference Dai9], and Lemma 7 in Dai [Reference Dai6], it suffices to prove the claim that $\bar{Q}(\cdot)=0$ in (6.27) holds for the purpose of our current paper. In fact, for each $i\in{\cal K}$ and $l\in\{1,\ldots,M_{v}\}$, we define

(6.34)\begin{eqnarray} &&\psi(pq,i)\equiv\sum_{v\in{\cal V}}\psi_{v}(pq,i)=\sum_{v\in{\cal V}}\sum_{j_{l}\in{\cal M}(i,v)\bigcap{\cal J}(v)}C_{vj_{l}}(p_{j_{l}}q_{j_{l}},\rho_{vj_{l}}(i)). \end{eqnarray}

Then, at each regular time $t\geq 0$ of $\bar{Q}(t)$ over time interval $(\tau_{n-1},\tau_{n})$ with a given $n\in\{1,2,\ldots\}$, we have that:

(6.35)\begin{align} &\;\;\;\;\;\frac{d\psi(\bar{P}(t)\bar{Q}(t),\alpha(t))}{dt}\\ &=\sum_{v\in{\cal V}}\sum_{j_{l}\in{\cal M}(i,v,t)\bigcap{\cal J}(v)}\Bigg(\rho_{vj_{l}}(\alpha(t),t)-\Lambda_{vj_{l}}(\bar{P}(t)\bar{Q}(t),\alpha(t),t)\Bigg) \nonumber\\ &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial U_{vj}(\bar{P}(t)\bar{Q}_{j_{l}}(t),\rho_{vj_{l}}(\alpha(t),t))}{\partial\rho_{vj_{l}}(\alpha(t),t)}I_{\{\bar{P}_{j_{l}}(t)\bar{Q}_{j_{l}}(t) \gt 0\}} \nonumber\\ &\leq0. \nonumber \end{align}

Note that, the second equality in (6.35) follows from the concavity of the utility functions and the fact that $\Lambda_{vj}(\bar{P}(t)\bar{Q}(t),\alpha(t),t)$ is the Pareto maximal Nash equilibrium policy to the utility-maximal game problem in (3.20) when the system is in a particular state. Thus, for any given $n\in\{0,1,2,\ldots\}$ and each $t\in[\tau_{n},\tau_{n+1})$,

(6.36)\begin{align} \;\;\;\;\;\;\;\;0&\leq\psi(\bar{P}(t)\bar{Q}(t),\alpha(t))\\ &\leq\psi(\bar{P}(\tau_{n})\bar{Q}(\tau_{n}),\alpha(\tau_{n})) \nonumber\\ &=\sum_{v\in{\cal V}}\sum_{j_{l}\in{\cal M}(i,v,\tau_{n})\bigcap{\cal J}(v)}\frac{1}{\mu_{j_{l}}}\int_{0}^{\bar{Q}_{j_{l}}(\tau_{n})}\frac{\partial U_{vj_{l}}(\bar{P}(\tau_{n})u,\rho_{vj_{l}}(\alpha(\tau_{n})))}{\partial C_{vj_{l}}}du \nonumber\\ &=\sum_{v\in{\cal V}}\left(\frac{d\psi_{v}(\bar{\rho}_{vj_{1}}(\alpha(\tau_{n})))} {dc_{vj_{1}}}\right)\left(\frac{d\psi_{v}(\bar{\rho}_{vj_{1}}(\alpha(\tau_{n-1})))}{dc_{vj_{1}}} \right)^{-1}\psi_{v}(\bar{P}(\tau_{n})\bar{Q}(\tau_{n}), \alpha(\tau_{n-1})) \nonumber\\ &\ldots \nonumber\\ &\leq\sum_{v\in{\cal V}}\left(\frac{d\psi_{v}(\bar{\rho}_{vj_{1}}(\alpha(\tau_{n})))} {dc_{vj_{1}}}\right)\left(\frac{d\psi_{v}(\bar{\rho}_{vj_{1}}(\alpha(\tau_{0})))}{dc_{vj_{1}}} \right)^{-1}\psi_{v}(\bar{P}(0)\bar{Q}(0),\alpha(0)) \nonumber\\ &\leq\kappa\psi(\bar{P}(0)\bar{Q}(0),\alpha(0)), \nonumber \end{align}

where κ is a positive constant, i.e.,

\begin{equation*} \kappa=\max_{v\in{\cal V}}\max_{i,j\in{\cal K}}\left(\frac{d\psi_{v}(\bar{\rho}_{vj_{1}}(i))} {dc_{vj_{1}}}\right)\left(\frac{d\psi_{v}(\bar{\rho}_{vj_{1}}(\,j))}{dc_{vj_{1}}}\right)^{-1}. \end{equation*}

Then, by the fact in (6.36), we know that $\bar{Q}(t)=0$ for all $t\geq 0$. Therefore, we complete the proof of the lemma.

In the end, by considering a specific state $i\in{\cal K}$ and by the index way as used in the proof of Lemma 6.2, we can extend the proofs for Lemma 4.3 to Lemma 4.5 in Dai [Reference Dai8] to the current setting. Then, by using the results in these lemmas to the proof for Theorem 1 in [Reference Dai6], we can reach a proof for Theorem 3.5 of this paper. $\Box$

7. Conclusion

In this paper, we study 2-stage game-theoretic problem oriented 3-stage service policy computing, CNN based algorithm design, and simulation for a blockchained buffering system with federated learning. More precisely, based on the game-theoretic problem consisting of both “win-lose” and “win-win” 2-stage competitions, we derive a 3-stage dynamical service policy via a saddle point to a zero-sum game problem and a Nash equilibrium point to a non-zero-sum game problem. This policy is concerning users-selection, dynamic pricing, and online rate resource allocation via stable digital currency for the system. The main focus is on the design and analysis of the joint 3-stage service policy for given queue/environment state dependent pricing and utility functions. The asymptotic optimality and fairness of this dynamic service policy is justified by diffusion modeling with approximation theory. A general CNN based policy computing algorithm flow chart along the line of the so-called big model framework is presented. Simulation case studies are conducted for the system with three users, where only two of the three users can be selected into the service by a zero-sum dual cost game competition policy at a time point. Then, the selected two users get into service and share the system rate service resource through a non-zero-sum dual cost game competition policy. Applications of our policy in the future blockchain based Internet (e.g., metaverse and web3.0) and supply chain finance are also briefly illustrated.

Acknowledgments

The project is funded by National Natural Science Foundation of China with Grant No. 11771006.

Competing interests

The author declares that he has no conflict of interest.

References

Applebaum, D. (2005). Lévy Processes and Stochastic Calculus. Cambridge: Cambridge University Press.Google Scholar
Ayaz, F., Sheng, Z., Tian, D., & Guan, Y.L. (2022). A blockchain based federated learning for message dissemination in vehivular networks. IEEE Transactions on Vehicular Technology 71(2): 19271940.10.1109/TVT.2021.3132226CrossRefGoogle Scholar
Bramson, M. (1998). State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems 30(1-2): 89148.10.1023/A:1019160803783CrossRefGoogle Scholar
Buterin, V. (2013). Ethereum: a next-generation smart contract and decentralized application platform. http://ethereum.org/ethereum.html.