- Wallet addresses are vertices AKA nodes. All vertices currently use the label
walletso the.hasLabel()predicates seen in many tutorials are irrelevant and do not need to be used. - Transactions are edges and contain properties like
value,block_number, etc. All edges use the labeltransactionso the same rule applies. Theoretically we could distinguishETHtxions fromERC20txions from gas fees from swaps etc. down the road but for now they're justtransaction. - Look in Graph for examples of queries that actually work.
This is known to work with these versions of docker and docker-compose.
$ docker-compose --version
Docker Compose version v2.10.2
$ docker --version
Docker version 20.10.17, build 100c701
To launch docker containers and load graph data:
# Clone repo and cd into the repo directory
git clone https://github.com/michelcrypt4d4mus/ethecycle.git
cd ethecycle
# Generate ssh key pair your containers can use to talk to each other and create some .env files:
scripts/docker/container_file_setup.sh
# Edit local copy of .env to set TXION_DATA_DIR as the location of some txion CSVs
# NOTE: After the first build you may want to set REBUILD_CHAIN_ADDRESS_DB to avoid rebuilding
# the chain addresses DB every time.
vi .env
# When you run this command docker-compose should build everything and leave you in a
# bash shell, at which point you can run 'bpython' to get a python REPL etc.
scripts/docker/python_etl/shell.sh
# Run script to get updated JVM settings for neo4j once you have allocated docker memory:
scripts/docker/neo4j/generate_.neo4j.env_file.sh # Add -h for helpOnce you are in the container shell the ./load_transactions.py script will prep CSVs for Neo4j's bulk loader and then load them (unless you specify --extract-only). This approach uses the neo4j-admin database import tooling (documentation) which is theoretically significantly faster than LOAD CSV at getting data from the disk and into Neo4j.
load_transactions.py takes a directory of CSVS or a single CSV, processes them to add some columns (e.g. token_symbol and blockchain), does decimal conversion where it can, etc., and writes 2 output CSVs for each input CSV (wallets, txns) output/ directory along with two one line CSVs (one each for the wallet and transaction headers).
How to run it:
# Show help:
./load_transactions.py --help
# First time you must run with --drop to overwrite the database called 'neo4j' (community edition limitation):
./load_transactions.py /path/to/transactions.csv --drop
# You can also run it against an entire directory of CSVs:
./load_transactions.py /path/to/transactions/ --drop
# Load only USDT txions:
./load_transactions.py /path/to/transactions.csv --token USDT --drop
# Perform the extraction and transformation but display load command on screen rather than actually execute it:
./load_transactions.py /path/to/transactions.csv --drop --extract-onlyExample output:
IMPORTANT! It usually takes a little while for Neo4j to finish processing the transactions you just loaded and make the database available for connections and queries!
Cannot guarantee these steps work but they probably will work.
- Create a virtual env in the project dir:
python -m venv .venv - Activate the venv:
. .venv/bin/activate
- Some queries can be found in the
queries/folder. - Queries can be run from python via the
Neo4jclass.
Some reasonable guesses as to useful ways to index transactions can be found here.
- Get a shell on the Python ETL container:
scripts/docker/python_etl/shell.sh - Get
bpythonREPL on the Python ETL container:scripts/docker/python_etl/bpython.sh - Get a shell on the test env Python ETL container:
scripts/docker/python_etl/test_shell.sh - Get a shell on the Neo4j container:
scripts/docker/neo4j/shell.sh - Generate
.env.neo4jfile (example) with Neo4j official recommendations:scripts/docker/neo4j/generate_.neo4j.env_file.sh -h - Build images with fresh build of chain address DB:
REBUILD_CHAIN_ADDRESS_DB=freshly_built_address_db docker-compose build(you can also change the value in.envbut this is quicker)
- Display the wallet tags:
show_chain_addresses - Display known tokens:
show_tokens - Connect to the chain address sqlite DB:
chain_address_db - Print a query that can be run on Dune to find new wallet tags:
dune_query - Reimport chain addresses database:
./import_chain_addresses.py -h(note that these won't persist on the image!) - Set the environment variable
DEBUG=truewhen running commands to see various debug ouutput
IMPORTANT: The community edition only allows you to have one database per server and it must be called neo4j.
After starting you can browse to http://localhost:7474/browser/ to run queries. Alternatively (and more 'performantly') Neo4j makes a desktop application.
- Addresses start with
0x(same as etherscan) - All addresses in the DB are lowercased, so make sure to use
toLower()on an address of mixed/upper case. - Occasionally Neo4j from docker messes up the permissions. If that happens it may help to get on the container and run
cd /var/lib/neo4j/data sudo chown -R neo4j:neo4j databases/ sudo chown -R neo4j:neo4j transactions/
- Official Cypher Introduction. Cypher is Neo4j's custom query language.
- Cypher query style guide
- Neo4j 5.1.0 on dockerhub
- Official Neo4j on Docker operations manual (There's also an overview of Neo4j on Docker here as well as a list of configuration options settable by env vars)
- Neo4j Desktop app
- Curated list of apps and plugins for Neo4j
- Neo4J operations manual
- Neo4j admin tools/config
- Article on supernodes and Neo4j
- Bulk load data into Neo4j
- CSV header format docs
- Neo4j ETL Tool Claims to be able to connect to an RDBMS and port data quickly.
- Neo4j LOAD CSV example that creates nodes from a single relatonships file.
- 5 Tricks for Batch Updates
- IIRC you said the txion amounts were already correctly adjusted for decimals? (AKA divided by
10^18for most tokens) - Current unique ID for edge is
transaction_id = f"{self.transaction_hash}-{self.log_index}". Does that make sense? - Do you have a rough estimate as far as blocks per hour and/or blocks per day?
- Identify the largest short term pass through wallets (AKA wallets with large xfers in and out in a short time frame that end up w/0 balances and are not used again)
- More address sources:
- ArangoDB - Second most commonly recommended after Neo4j.
- Apache AGE - Postgres extension. No Tinkerpop support, only OpenCypher.
- ArcadeDB - New fork of OrientDB. Gremlin and OpenCypher support.
- MemGraph - In memory graph DB.
- TigerGraph comes up sometimes
