back

A brief introduction of stochastic ranking process


A note on observation of time evolution of ranking number

Let us first go back to the schematic drawing of a time evolution of Amazon.co.jp ranking number of a book. In the main page I showed a 1 year long figure, out of which I reproduce below 1 month, for the sake of simplicity of the graph.
time evolution of web ranking; schematic drawings Note: The figure is a schematic drawing and oversimplifies the actual time evolution. However, it is based on actual observations and the numbers are not exaggeration. I was astonished to receive a referee's comment of a research journal in mathematics which said

'A book falls from #1 among all books to #200,000 in only 100 hours (5 days)?!? I completely fail to believe that.'

The number DOES fall 200 thousands in 5 days. Better than hearing 100 times, see once! Any one could observe it. Ignorance wakes up prejudice. Prejudice keeps ignorance. The referee's comment convinced me that a mere discovery that near the top ranking the number falls 200 thousands in 5 days while moves much slowly near the tail, is a new fact worth pointing out.
The figure above is actually a part of conclusions of all the observation on the Amazon.co.jp ranking and the mathematical study on the stochastic ranking process.

It isn't easy to find out above kind of behavior only from the very small data available. I emphasized the above graph to be a schematic drawing, partly because it is more of a theoretical consequence than actual data, and should be regarded as a conjecture. We need a large set of data taken every hour for, say a month, and to do that we need a computer program to automate a regular access to Amazon.co.jp and data accumulation.

So far all the data I have is taken manually (by myself). For example, the actual data corresponding to the above figure (Oct. 2007, Random walk and renormalization group, Tetsuya Hattori) is the set of points in the left graph below. Superposing the above graph and the below left graph we have the right graph below, which shows that a theory explains reality, at least qualitatively.
time evolution of web ranking; a data from early studies time evolution of web ranking; fit of data to theory - schematic drawings

It is however not easy to deduce a behavior in the first figure solely from the data I have. I needed to go into a bit of abstract thinking to find a mathematical model which would explain what must be going on, and what I found was the stochastic ranking process.


Stochastic ranking process

Let us return to the first graph in this page. We notice 2 distinct kind of motion in the figure; intervals of smooth concave curve (denoted by a in the figure), and abrupt jumps down close to horizontal axis (denoted by b).

One can check that the points b correspond to the time when the book is ordered for sale, by ordering at Amazon.co.jp and watching the changes in the ranking. The jump may look like touching the top rank (number 1) in the graph, but actually the furthest one can go by ordering 1 book is slightly below 10,000. Apparently, there are constantly about 10 thousand books which sells 2 or more copies every hour (i.e., between the updates of ranking number). We are however usually observing the ranking numbers of order 100 thousands, hence 10 thousand is relatively negligible, and could be regarded as 1 for a simplified model. In summary,

  1. Ranking number jumps to 1 each time a book is sold (ordered for sale).

Next, consider the interval a in the figure with gradual increase in the ranking number. Since there is no order coming in for the book in observation, the changes must be a side effect of sales of other books.

  1. Ranking number increases (is squeezed to tail side) each time another book is sold and its ranking number jumps to 1.

Some people might like to consider the increase in the ranking number, while the book is not sold, as a decrease in popularity (= decrease in contribution to the bookstore sales) due to time elapse without additional sales. From this view point of ranking number representing popularity or contribution, the two properties I and II above may be explained as follows. Since popularity is a relative evaluation among the books in the store,

Thus the properties I and II are consistent with the meaning a ranking number is expected to represent.

We need one more definition containing randomness of the sales transaction, in a manner so that quantitative calculations are possible.

  1. Denote by N the total number of book titles in the online bookstore. Then the raking number for each book title is a distinct integer between 1 and N. Moreover, the jump time of the ranking is random, and for each i=1,2,...,N, wi, the average rate of sales (average jump rate) for the i-th book is a fixed positive constant. (The value can be different among different book titles.)
A stochastic process (particle system) defined by I, II, and III, a model of ranking number with wild and random time evolution, is what we call the stochastic ranking process.
back inserted by FC2 system