Bitcoin is a peer-to-peer electronic payment system that has rapidly grown in popularity in recent years1,2,3,4. As a distributed ledger technology (DLT), Bitcoin records newly generated transactions in a decentralized way, eliminating the need for intermediaries like banks and reducing transaction costs5,6,7.
Bitcoin relies on recording the unspent transaction outputs (UTXO) to efficiently verify newly generated transactions8,9,10,11. An illustrative example of UTXO is shown in Fig. 1. A UTXO can be generated either as block rewards or outputs of transactions. Block rewards are newly minted bitcoins (BTC) distributed to miners for their work to maintain the network, such as routing transactions and validating blocks. In fact, all UTXOs can be dated back to block rewards. The timestamp is recorded when a UTXO is generated. A UTXO is spent and converted into a spent transaction output (STXO) when it is used as the input of a transaction. A timestamp is again recorded when the UTXO is spent, and each UTXO can be spent only once. Such a unique feature allows us to calculate the age of each UTXO and the lifespan of each STXO as we do in population data. Take Fig. 1 as an example. As of July 1, 2020, UTXOs 1–3 are 8.5-years, 1-year, and 1-day old, respectively. Immediately after Alice’s payment to Bob on January 1, 2021, UTXOs 1–3 are converted to STXOs with ages of 9 years, 1.5 years, and 0.5 years and 1-day old, respectively.
Noticing the unique structure of the Bitcoin blockchain data, we apply cohort analysis12,13,14,15,16, originally developed for population data, to analyze it. To continue the analogy with the population data, we say a UTXO is born when it is generated as block rewards or the output of a transaction, and we say a UTXO is dead when it is spent as the input of another transaction. In this way, all UTXOs generated on the same day form a daily birth cohort, and all UTXOs spent on the same day form a daily death cohort. We define the age of a UTXO as the difference between “now” (the date on which we are working) and the time when it was born. We define the lifespan of an STXO as the difference between the time when the STXO was dead and the time when it was born. Thus, all UTXOs within an age range form an age cohort, and all STXOs within a lifespan range form a lifespan cohort. With this framework, we naturally replicate in Bitcoin blockchain data a trinity of birth, death, and age cohorts using population cohort analysis.
Usually, we need to query the complete history of Bitcoin blockchain data to acquire variables with economic meaning. With over 1.6 billion historical transactions on the Bitcoin blockchain, it has become increasingly difficult and computationally intensive now to download the complete Bitcoin blockchain records. It is thus important to query Bitcoin transaction data in a way that is more efficient and provides economic insights17. Cohort analysis provides a new perspective from which we can analyze data within each cohort separately before integrating them into a time series.
Our workflow is displayed in Fig. 2. We query and process Bitcoin transaction input and output data within each daily cohort. By doing so, we successfully create datasets and visualizations for some key Bitcoin transactions indicators, including the daily lifespan distributions of STXOs as percentages (Fig. 3) and the cumulative daily age distributions of UTXOs (Fig. 4). These visualizations can be used to study the functions of bitcoin (BTC) as a currency. The three functions of a currency include acting as a store of value, unit of account, and medium of exchange. For example, Fig. 4 shows the number of BTCs in UTXOs (i.e., BTCs that have not been spent) by age distribution. By the end of 2020, approximately 2 million BTCs had not been transacted for more than 10 years. In the past 5–10 years, 2–5 years, and 1–2 years, approximately 2 million, 4.5 million, and 3 million BTCs, respectively, remained inactive. This equals approximately 11.5 million BTCs not having been transacted for more than 1 year. These BTCs serve as a time deposit and act as a store of value. Moreover, approximately 5 million BTCs are alive for 1 month to 1 year. These BTCs are similar to a demand deposit. Frequently transacted BTCs are those with ages between 1 day and 1 month (2 million) and less than 1 day (0.2 million). These BTCs act as a medium of exchange.
Our final datasets include one dataset that characterizes STXOs and one that characterizes UTXOs, which are both smaller than 1 MB. Moreover, cohort analysis keeps data querying and processing to a minimum for future updates and enables automated updates. We thus provide a computationally feasible approach for characterizing BTC transactions, which paves the way for future economic studies of Bitcoin. Our methods can be generally applied to other cryptocurrencies that adopt UTXO protocols, including Litecoin, Dash, Zcash, Dogecoin, and Bitcoin Cash.