An experimental approach to the study of limit order books lies in the availability of data. Most of the results presented in this book - in any case, those we have produced ourselves - use the Thomson Reuters Tick History (TRTH) database. All exchange-traded assets worldwide are present in the TRTH database, where they are identified by their Reuters Identification Code (RIC). Similar to most historical databases directly provided by the exchanges, the TRTH data come into the form of two separate files, a trade file recording all transactions, and an event file recording every change in the limit order book. Some very specific information, such as traders’ identities, cannot be publicly disclosed for obvious confidentiality reasons, but in theory, one could reconstruct the sequence of order arrivals of all types using this trade and event files.
After explaining the algorithm used for the processing of limit order book data, we describe the specific data sets that have been used at various places in this book. That way, our results can be reproduced, extended and, of course, challenged, based on the very same data sets we have used.
Limit Order Book Data Processing
Because one cannot distinguish market orders from cancellations just by observing changes in the limit order book (the “event” file), and since, the timestamps of the “trade” and “event” files are asynchronous, we use a matching procedure to reconstruct the order book events.
In a nutshell, we proceed as follows for each stock and each trading day:
1. Parse the “event” file to compute order book state variations:
• If the variation is positive (volume at one or more price levels has increased), then label the event as a limit order.
• If the variation is negative (volume at one or more price levels has decreased), then label the event as a “likely market order”.
• If no variation—this happens when there is just a renumbering in the field “Level” that does not affect the state of the book—do not count an event.
2. Parse the “trade” file and for each trade:
• Compare the trade price and volume to likely market orders whose timestamps are in, where tTr is the trade timestamp and Δt is a predefined, market-dependent time window.