Deciphering Bitcoin Blockchain Data by Cohort Analysis

Bitcoin is a peer-to-peer electronic payment system that has rapidly grown in popularity in recent years1,2,3,4. As a distributed ledger technology (DLT), Bitcoin records newly generated transactions in a decentralized way, eliminating the need for intermediaries like banks and reducing transaction costs5,6,7.

Bitcoin relies on recording the unspent transaction outputs (UTXO) to efficiently verify newly generated transactions8,9,10,11. An illustrative example of UTXO is shown in Fig. 1. A UTXO can be generated either as block rewards or outputs of transactions. Block rewards are newly minted bitcoins (BTC) distributed to miners for their work to maintain the network, such as routing transactions and validating blocks. In fact, all UTXOs can be dated back to block rewards. The timestamp is recorded when a UTXO is generated. A UTXO is spent and converted into a spent transaction output (STXO) when it is used as the input of a transaction. A timestamp is again recorded when the UTXO is spent, and each UTXO can be spent only once. Such a unique feature allows us to calculate the age of each UTXO and the lifespan of each STXO as we do in population data. Take Fig. 1 as an example. As of July 1, 2020, UTXOs 1–3 are 8.5-years, 1-year, and 1-day old, respectively. Immediately after Alice’s payment to Bob on January 1, 2021, UTXOs 1–3 are converted to STXOs with ages of 9 years, 1.5 years, and 0.5 years and 1-day old, respectively.

Fig. 1

An example of UTXO birth and death. UTXOs 1, 2, and 3 were spent in a transaction taking place between Alice and Bob and were transformed to UTXOs 4 and 5. UTXOs 1, 2, and 3 became STXOs after the transaction.

Noticing the unique structure of the Bitcoin blockchain data, we apply cohort analysis12,13,14,15,16, originally developed for population data, to analyze it. To continue the analogy with the population data, we say a UTXO is born when it is generated as block rewards or the output of a transaction, and we say a UTXO is dead when it is spent as the input of another transaction. In this way, all UTXOs generated on the same day form a daily birth cohort, and all UTXOs spent on the same day form a daily death cohort. We define the age of a UTXO as the difference between “now” (the date on which we are working) and the time when it was born. We define the lifespan of an STXO as the difference between the time when the STXO was dead and the time when it was born. Thus, all UTXOs within an age range form an age cohort, and all STXOs within a lifespan range form a lifespan cohort. With this framework, we naturally replicate in Bitcoin blockchain data a trinity of birth, death, and age cohorts using population cohort analysis.

Usually, we need to query the complete history of Bitcoin blockchain data to acquire variables with economic meaning. With over 1.6 billion historical transactions on the Bitcoin blockchain, it has become increasingly difficult and computationally intensive now to download the complete Bitcoin blockchain records. It is thus important to query Bitcoin transaction data in a way that is more efficient and provides economic insights17. Cohort analysis provides a new perspective from which we can analyze data within each cohort separately before integrating them into a time series.

Our workflow is displayed in Fig. 2. We query and process Bitcoin transaction input and output data within each daily cohort. By doing so, we successfully create datasets and visualizations for some key Bitcoin transactions indicators, including the daily lifespan distributions of STXOs as percentages (Fig. 3) and the cumulative daily age distributions of UTXOs (Fig. 4). These visualizations can be used to study the functions of bitcoin (BTC) as a currency. The three functions of a currency include acting as a store of value, unit of account, and medium of exchange. For example, Fig. 4 shows the number of BTCs in UTXOs (i.e., BTCs that have not been spent) by age distribution. By the end of 2020, approximately 2 million BTCs had not been transacted for more than 10 years. In the past 5–10 years, 2–5 years, and 1–2 years, approximately 2 million, 4.5 million, and 3 million BTCs, respectively, remained inactive. This equals approximately 11.5 million BTCs not having been transacted for more than 1 year. These BTCs serve as a time deposit and act as a store of value. Moreover, approximately 5 million BTCs are alive for 1 month to 1 year. These BTCs are similar to a demand deposit. Frequently transacted BTCs are those with ages between 1 day and 1 month (2 million) and less than 1 day (0.2 million). These BTCs act as a medium of exchange.

Fig. 2
figure 2

Workflow of cohort analysis on BTC UTXO data.

Fig. 3
figure 3

Lifespan distribution of BTC STXOs. The figure shows the log percentage of spent transaction outputs with different lifespans in each day until Feb. 2021. For example, by Feb. 2021, the STXOs with lifespans of less than one day accounted for 80% of all STXOs, while those with lifespans between 1 day and 1 month accounted for another 15%.

Fig. 4
figure 4

Number of BTC UTXOs by age. The figure shows the total cumulative unspent transaction outputs by age. For example, by Feb. 2021, there were approximately 200k UTXOs less than 1 day old used as the medium of exchange and approximately 2 million UTXOs more than 10 years old lost or used as store of value.

Our final datasets include one dataset that characterizes STXOs and one that characterizes UTXOs, which are both smaller than 1 MB. Moreover, cohort analysis keeps data querying and processing to a minimum for future updates and enables automated updates. We thus provide a computationally feasible approach for characterizing BTC transactions, which paves the way for future economic studies of Bitcoin. Our methods can be generally applied to other cryptocurrencies that adopt UTXO protocols, including Litecoin, Dash, Zcash, Dogecoin, and Bitcoin Cash.