I'd like to analyze the gambling activities in Bitcoin.
Does anyone has a list of addresses for gambling services such as SatoshiDICE and LuckyBit?
For example, I found addresses of SatoshiDICE here.
My suggestion would be to go and look for a list of popular addresses, i.e., addresses that received and/or sent a lot of transactions. Most gambling sites will use vanity addresses that include part of the site's name in the address, so you might also just search in the addresses for similar patterns.
It's rather easy to build such a list using Rusty Russell's
bitcoin-iterate if you have a synced full node:
bitcoin-iterate --output "%os" -q > outputscripts.csv
This will get you a list of all output scripts in confirmed transactions in the blockchain. The output scripts include the pubkey hash that is also encoded in the address.
Let's keep only the P2PKH scripts of the form
grep -E '^76a914.*88ac$' outputscripts.csv > p2pkhoutputs.csv
Just for reference, the 90.03% (484715631/538368714) of outputs are to P2PKH scripts, so we should be getting pretty accurate results. So let's get a count for each outputscript and count its occurence:
sort p2pkhoutputs.csv | uniq -c | sort -g > uniqoutputscripts.csv
And finally let's convert the scripts to the addresses. We'll need to do the base58 encoding, and I chose the python
from base58 import b58encode_check
def script2address(s): h = s.decode('hex')[3:23] h = chr(0) + h return b58encode_check(h)
For details on how addresses are generated please refer to the Bitcoin wiki. And here we have the top 10 addresses sorted by incoming transactions:
1880739, 1NxaBCFQwejSZbQfWcYNwgqML5wWoE3rK4 1601154, 1dice8EMZmqKvrGE4Qc9bUFf9PX3xaYDp 1194169, 1LuckyR1fFHEsXYyx5QK4UFzv3PEAepPMK 1105378, 1dice97ECuByXAvqXpaYzSaQuPVvrtmz6 595846, 1dice9wcMu5hLF4g81u8nioL5mmSHTApw 437631, 1dice7fUkz5h4z2wPc1wLMPWgB5mDwKDx 405960, 1MPxhNkSzeTNTHSZAibMaS8HS1esmUL1ne 395661, 1dice7W2AicHosf5EL3GFDUVga7TgtPFn 383849, 1LuckyY9fRzcJre7aou7ZhWVXktxjjBb9S
As you can see SatishiDice and LuckyBit are very much present in the set. Grepping for the vanity addresses unearths a lot of addresses too.
I would suggest using the usual chain analysis approach: send money to these services and note the addresses. Then perform transitive, symmetric etc closures on the same in the blockchain transaction graph to get all addresses in their wallet.
No technique can determine addresses in a wallet of the user is intelligent enough to mix properly.