Contributions by @instagibbs to devel docs

Thanks also (in alphabetical order) to @cbeams, @mikehearn, and
@tgeller, among others.

The last pre-squash commit was: c2b8d562aa107c7b68c60946cea14cdccc5159ad
This commit is contained in:
instagibbs 2014-05-09 22:13:59 -04:00 committed by David Harding
parent 46780d3177
commit 82378ddcb4
4 changed files with 522 additions and 0 deletions

View file

@ -0,0 +1,91 @@
## Operating Modes
{% autocrossref %}
Currently there are two primary methods of validating the block chain as a client: Full nodes and SPV clients. Other methods, such as server-trusting methods, are not discussed as they are not recommended.
{% endautocrossref %}
### Full Node
{% autocrossref %}
The first and most secure model is the one followed by Bitcoin Core, also known as a “thick” or “full chain” client. This security model assures the validity of the block chain by downloading and validating blocks from the genesis block all the way to the most recently discovered block. This is known as using the *height* of a particular block to verify the clients view of the network.
For a client to be fooled, an adversary would need to give a complete alternative block chain history that is of greater difficulty than the current “true” chain, which is impossible due to the fact that the longest chain is by definition the true chain. After the suggested six confirmations, the ability to fool the client become intractable, as only a single honest network node is needed to have the complete state of the block chain.
![Block Height Compared To Block Depth](/img/dev/en-block-height-vs-depth.svg)
{% endautocrossref %}
### Simplified Payment Verification (SPV)
{% autocrossref %}
An alternative approach detailed in the [original Bitcoin paper][bitcoinpdf] is a client that only downloads the headers of blocks during the initial syncing process and then requests transactions from full nodes as needed. This scales linearly with the height of the block chain at only 80 bytes per block header, or up to 4.2MB per year, regardless of total block size.
As described in the white paper, the Merkle root in the block header along with a Merkle branch can prove to the SPV client that the transaction in question is embedded in a block in the block chain. This does not guarantee validity of the transactions that are embedded. Instead it demonstrates the amount of work required to perform a double-spend attack.
The block's depth in the block chain corresponds to the cumulative difficulty that has been performed to build on top of that particular block. The SPV client knows the Merkle root and associated transaction information, and requests the respective Merkle branch from a full node. Once the Merkle branch has been retrieved, proving the existence of the transaction in the block, the SPV client can then look to block *depth* as a proxy for transaction validity and security. The cost of an attack on a user by a malicious node who inserts an invalid transaction grows with the cumulative difficulty built on top of that block, since the malicious node alone will be mining this forged chain.
{% endautocrossref %}
#### Potential SPV Weaknesses
{% autocrossref %}
If implemented naively, an SPV client has a few important weaknesses.
First, while the SPV client can not be easily fooled into thinking a transaction is in a block when it is not, the reverse is not true. A full node can simply lie by omission, leading an SPV client to believe a transaction has not occurred. This can be considered a form of Denial of Service. One mitigation strategy is to connect to a number of full nodes, and send the requests to each node. However this can be defeated by network partitioning or Sybil attacks, since identities are essentially free, and can be bandwidth intensive. Care must be taken to ensure the client is not cut off from honest nodes.
Second, the SPV client only requests transactions from full nodes corresponding to keys it owns. If the SPV client downloads all blocks and then discards unneeded ones, this can be extremely bandwidth intensive. If they simply ask full nodes for blocks with specific transactions, this allows full nodes a complete view of the public addresses that correspond to the user. This is a large privacy leak, and allows for tactics such as denial of service for clients, users, or addresses that are disfavored by those running full nodes, as well as trivial linking of funds. A client could simply spam many fake transaction requests, but this creates a large strain on the SPV client, and can end up defeating the purpose of thin clients altogether.
To mitigate the latter issue, Bloom filters have been implemented as a method of obfuscation and compression of block data requests.
{% endautocrossref %}
#### Bloom Filters
{% autocrossref %}
A Bloom filter is a space-efficient probabilistic data structure that is used to test membership of an element. The data structure achieves great data compression at the expense of a prescribed false positive rate.
A Bloom filter starts out as an array of n bits all set to 0. A set of k random hash functions are chosen, each of which output<!--noref--> a single integer between the range of 1 and n.
When adding an element to the Bloom filter, the element is hashed k times separately, and for each of the k outputs<!--noref-->, the corresponding Bloom filter bit at that index is set to 1.
<!-- Add picture here from wikipedia to explain the bits -->
Querying of the Bloom filter is done by using the same hash functions as before. If all k bits accessed in the bloom filter are set to 1, this demonstrates with high probability that the element lies in the set. Clearly, the k indices could have been set to 1 by the addition of a combination of other elements in the domain, but the parameters allow the user to choose the acceptable false positive rate.
Removal of elements can only be done by scrapping the bloom filter and re-creating it from scratch.
{% endautocrossref %}
#### Application Of Bloom Filters
{% autocrossref %}
Rather than viewing the false positive rates as a liability, it is used to create a tunable parameter that represents the desired privacy level and bandwidth trade-off. A SPV client creates their Bloom filter and sends it to a full node using the message `filterload`, which sets the filter for which transactions are desired. The command `filteradd` allows addition of desired data to the filter without needing to send a totally new Bloom filter, and `filterclear` allows the connection to revert to standard block discovery mechanisms. If the filter has been loaded, then full nodes will send a modified form of blocks, called a merkleblock. The merkleblock is simply the block header with the merkle branch associated with the set Bloom filter.
An SPV client can not only add transactions as elements to the filter, but also public keys, data from input and outputs scripts, and more. This enables P2SH transaction finding.
If a user is more privacy-conscious, he can set the Bloom filter to include more false positives, at the expense of extra bandwidth used for transaction discovery. If a user is on a tight bandwidth budget, he can set the false-positive rate to low, knowing that this will allow full nodes a clear view of what transactions are associated with his client.
**Resources:** [BitcoinJ](http://bitcoinj.org), a Java implementation of Bitcoin that is based on the SPV security model and Bloom filters. Used in most Android wallets.
Bloom filters were standardized for use via [BIP37](https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki). Review the BIP for implementation details.
{% endautocrossref %}
### Future Proposals
{% autocrossref %}
There are future proposals such as Unused Output Tree (UOT) in the block chain to find a more satisfactory middle-ground for clients between needing a complete copy of the block chain, or trusting that a majority of your connected peers are not lying. UOT would enable a very secure client using a finite amount of storage using a data structure that is authenticated in the block chain. These type of proposals are, however, in very early stages, and will require soft forks in the network.
Until these types of operating modes are implemented, modes should be chosen based on the likely threat model, computing and bandwidth constraints, and liability in bitcoin value.
**Resources:** [Original Thread on UOT](https://bitcointalk.org/index.php?topic=88208.0), [UOT Prefix Tree BIP Proposal](https://github.com/maaku/bips/blob/master/drafts/auth-trie.mediawiki)
{% endautocrossref %}

View file

@ -0,0 +1,78 @@
## P2P Network
{% autocrossref %}
The Bitcoin [network][network]{:#term-network}{:.term} uses simple methods to perform peer discovery and communicate between nodes. The following section applies to both full nodes and SPV clients, with the exception that SPV's Bloom filters take the role of block discovery.
{% endautocrossref %}
### Peer Discovery
{% autocrossref %}
Bitcoin Core maintains a list of [peers][peer]{:#term-peer}{:.term} to connect to on startup. When a full node is started for the first time, it must be bootstrapped to the network. This is done automatically today in Bitcoin Core by a short list of trusted DNS seeds. The option `-dnsseed` can be set to define this behavior, though the default is `1`. DNS requests return a list of IP addresses that can be connected to. From there, the client can start connecting the Bitcoin network.
Alternatively, bootstrapping can be done by using the option `-seednode=<ip>`, allowing the user to predefine what seed server to connect to, then disconnect after building a peer list. Another method is starting Bitcoin Core with `-connect=<ip>` which disallows the node from connecting to any peers except those specified. Lastly, the argument `-addnode=<ip>` simply allows the user to add a single node to his peer list.
After bootstrapping, nodes send out a `addr` message containing their own IP to peers. Each peer of that node then forwards this message to a couple of their own peers to expand the pool of possible connections.
To see which peers one is connected with (and associated data), use the `getpeerinfo` RPC.
{% endautocrossref %}
### Connecting To Peers
{% autocrossref %}
Connecting to a peer is done by sending a `version` message, which contains your version number, block, and current time to the remote node. Once the message is received by the remote node, it must respond with a `verack` message, which may be followed by its own `version` message if the node desires to peer.
Once connected, the client can send to the remote node `getaddr` and `addr` messages to gather additional peers.
In order to maintain a connection with a peer, nodes by default will send a message to peers before 30 minutes of inactivity. If 90 minutes pass without a message being received by a peer, the client will assume that connection has closed.
{% endautocrossref %}
### Block Broadcasting
{% autocrossref %}
At the start of a connection with a peer, both nodes send `getblocks` messages containing the hash of the latest known block. If a peer believes they have newer blocks or a longer chain, that peer will send an `inv` message which includes a list of up to 500 hashes of newer blocks, stating that it has the longer chain. The receiving node would then request these blocks using the command `getdata`, and the remote peer would reply via `block`<!--noref--> messages. After all 500 blocks have been processed, the node can request another set with `getblocks`, until the node is caught up with the network. Blocks are only accepted when validated by the receiving node.
New blocks are also discovered as miners publish their found blocks, and these messages are propagated in a similar manner. Through previously established connections, an `inv` message is sent with the new block hashed, and the receiving node requests the block via the `getdata` message.
{% endautocrossref %}
### Transaction Broadcasting
{% autocrossref %}
In order to send a transaction to a peer, an `inv` message is sent. If a `getdata` response message is received, the transaction is sent using `tx`. The peer receiving this transaction also forwards the transaction in the same manner, given that it is a valid transaction. If the transaction is not put into a block for an extended period of time, it will be dropped from mempool, and the client of origin will have to re-broadcast the message.
{% endautocrossref %}
### Misbehaving Nodes
{% autocrossref %}
Take note that for both types of broadcasting, mechanisms are in place to punish misbehaving peers who take up bandwidth and computing resources by sending false information. If a peer gets a banscore above the `-banscore=<n>` threshold, he will be banned for the number of seconds defined by `-bantime=<n>`, which is 86,400 by default (24 hours).
{% endautocrossref %}
### Alerts
{% autocrossref %}
In case of a bug or attack,
the Bitcoin Core developers provide a
[Bitcoin alert service](https://bitcoin.org/en/alerts) with an RSS feed
and users of Bitcoin Core can check the error field of the `getinfo` RPC
results to get currently active alerts for their specific version of
Bitcoin Core.
These messages are aggressively broadcast using the `alert` message, being sent to each peer upon connect for the duration of the alert.
These messages are signed by a specific ECDSA private key that only a small number of developers control.
**Resource:** More details about the structure of messages and a complete list of message types can be found at the [Protocol Specification](https://en.bitcoin.it/wiki/Protocol_specification) page of the Bitcoin Wiki.
{% endautocrossref %}

328
_includes/guide_wallets.md Normal file
View file

@ -0,0 +1,328 @@
## Wallets
{% autocrossref %}
Bitcoin wallets at their core are a collection of private keys. These collections are stored digitally in a file, or can even be physically stored on pieces of paper.
{% endautocrossref %}
### Private Key Formats
{% autocrossref %}
Private keys are what are used to unlock satoshis from a particular address. In Bitcoin, a private key in standard format is simply a 256-bit number, between the values:
0x1 and 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFE BAAE DCE6 AF48 A03B BFD2 5E8C D036 4141, representing nearly the entire range of 2<sup>256</sup>-1 values. The range is governed by the secp256k1 ECDSA encryption standard used by Bitcoin.
{% endautocrossref %}
#### Wallet Import Format (WIF)
{% autocrossref %}
In order to make copying of private keys less prone to error, [Wallet Import Format][]{:#term-wallet-import-format}{:.term} may be utilized. WIF uses base58Check encoding on an private key, greatly decreasing the chance of copying error, much like standard Bitcoin addresses.
1. Take a private key.
2. Add a 0x80 byte in front of it for mainnet addresses or 0xef for testnet addresses.
3. Perform a SHA-256 hash on the extended key.<!--noref-->
4. Perform a SHA-256 hash on result of SHA-256 hash.
5. Take the first four bytes of the second SHA-256 hash; this is the checksum.
6. Add the four checksum bytes from point 5 at the end of the extended key<!--noref--> from point 2.
7. Convert the result from a byte string into a Base58 string using Base58Check encoding.
The process is easily reversible, using the Base58 decoding function, and removing the padding.
{% endautocrossref %}
#### Mini Private Key Format
{% autocrossref %}
Mini private key format is a method for encoding a private key in under 30 characters, enabling keys to be embedded in a small physical space, such as physical bitcoin tokens, and more damage-resistant QR codes.
1. The first character of mini keys is 'S'.
2. In order to determine if a mini private key is well-formatted, a question mark is added to the private key.
3. The SHA256 hash is calculated. If the first byte produced is a `00, it is well-formatted. This key restriction acts as a typo-checking mechanism. A user brute forces the process using random numbers until a well-formatted mini private key is produced.
4. In order to derive the full private key, the user simply takes a single SHA256 hash of the original mini private key. This process is one-way: it is intractable to compute the mini private key format from the derived key.
Many implementations disallow the character '1' in the mini private key due to its visual similarity to 'l'.
**Resource:** A common tool to create and redeem these keys is the [Casascius Bitcoin Address Utility][casascius
address utility].
{% endautocrossref %}
### Hierarchical Deterministic Key Creation
<!--
For consistent word ordering:
[normal|hardened|] [master|parent|child|grandchild] [extended|non-extended|] [private|public|chain] [key|code]
-->
{% autocrossref %}
The hierarchical deterministic key creation and transfer protocol ([HD
protocol][]{:#term-hd-protocol}{:.term}) greatly simplifies wallet
backups, eliminates the need for repeated communication between multiple
programs using the same wallet, permits creation of child accounts which
can operate independently, gives each parent account the ability to
monitor or control its children even if the child account is
compromised, and divides each account into full-access and
restricted-access parts so untrusted users or programs can be allowed to
receive or monitor payments without being able to spend them.
The HD protocol takes advantage of the ECDSA public key creation
function, [`point()`][point function]{:#term-point-function}{:.term},
which takes a large integer (the private key) and turns it into a graph
point (the public key):
{% endautocrossref %}
point(private_key) == public_key
{% autocrossref %}
Because of the way `point()` functions, it's possible to create a [child
public key][]{:#term-child-public-key}{:.term} by combining an
existing [(parent) public key][parent public
key]{:#term-parent-public-key}{:.term} with another public key created from any
integer (*i*) value. This child public key is the same public key which
would be created by the `point()` function if you added the *i* value to
the original (parent) private key and then found the remainder of that
sum divided by a global constant used by all Bitcoin software (*G*):
{% endautocrossref %}
point( (parent_private_key + i) % G ) == parent_public_key + point(i)
{% autocrossref %}
This means that two or more independent programs which agree on a
sequence of integers can create a series of unique [child key][]{:#term-child-key}{:.term} pairs from
a single parent key pair without any further communication.
Moreover, the program which distributes new public keys for receiving
payment can do so without any access to the private keys, allowing the
public key distribution program to run on a possibly-insecure platform such as
a public web server.
Child public keys can also create their own child public keys
(grandchild public keys) by repeating the child key derivation
operations:
{% endautocrossref %}
point( (child_private_key + i) % G ) == child_public_key + point(i)
{% autocrossref %}
Whether creating child public keys or further-descended public keys, a
predictable sequence of integer values would be no better than using a
single public key for all transactions, as anyone who knew one child
public key could find all of the other child public keys created from
the same parent public key. Instead, a random seed can be used to
deterministically generate the sequence of integer values so that the
relationship between the child public keys is invisible to anyone
without that seed.
The HD protocol uses a single root seed to create a hierarchy of
child, grandchild, and other descended keys with unlinkable
deterministically-generated integer values. Each child key also gets
a deterministically-generated seed from its parent, called a [chain
code][]{:#term-chain-code}{:.term}, so the compromising of one chain
code doesn't necessary compromise the integer sequence for the whole
hierarchy, allowing the [master chain
code][]{:#term-master-chain-code}{:.term} to continue being useful
even if, for example, a web-based public key distribution program
gets hacked.
![Overview Of Hierarchical Deterministic Key Derivation](/img/dev/en-hd-overview.svg)
As illustrated above, HD key derivation takes four inputs<!--noref-->:
* The *[parent private key][]{:#term-parent-private-key}{:.term}* and
*parent public key* are regular uncompressed 256-bit ECDSA keys.
* The [parent chain code][]{:#term-parent-chain-code}{:.term} is 256
bits of seemingly-random data.
* The [index][key index]{:#term-key-index}{:.term} number is a 32-bit integer specified by the program.
In the normal form shown in the above illustration, the parent chain
code and the index number are fed into a one-way cryptographic hash
([HMAC-SHA512][]) to produce 512 bits of
deterministically-generated-but-seemingly-random data. The
seemingly-random 256 bits on the righthand side of the hash output are
used as a new child chain code. The seemingly-random 256 bits on the
lefthand side of the hash output are used as the integer value to be combined
with either the parent private key or parent public key to,
respectively, create either a child private key or child public key:
{% endautocrossref %}
point( (parent_private_key + lefthand_hash_output) % G ) == child_public_key
point(child_private_key) == parent_public_key + point(lefthand_hash_output)
{% autocrossref %}
Specifying different index numbers will create different unlinkable
child keys from the same parent keys. Repeating the procedure for the
child keys using the child chain code will create unlinkable grandchild keys.
Because creating child keys requires both a key and a chain code, the
key and chain code together are called the [extended
key][]{:#term-extended-key}{:.term}. An [extended private
key][]{:#term-extended-private-key}{:.term} and its corresponding
[extended public key][]{:#term-extended-public-key}{:.term} have the
same chain code. The (top-level parent) [master private
key][]{:#term-master-private-key}{:.term} and master chain
code are derived from random data,
as illustrated below.
![Creating A Root Extended Key Pair](/img/dev/en-hd-root-keys.svg)
A [root seed][]{:#term-root-seed}{:.term} is created from either 128
bits, 256 bits, or 512 bits of random data. This root seed of as little
as 128 bits is the the only data the user needs to backup in order to
derive every key created by a particular wallet program using
particular settings.
(**Warning:** as of this writing, HD wallet programs are not expected to
be fully compatible, so users must only use the same HD wallet program
with the same HD-related settings for a particular root seed.)
The root seed is hashed to create 512 bits of seemingly-random data,
from which the master private key and master chain code are created
(together, the master extended private key). The master public key is
derived from the master private key using `point()`, which, together
with the master chain code, is the master extended public
key. The master extended keys are functionally equivalent to other
extended keys; it is only their location at the top of the hierarchy
which makes them special.
{% endautocrossref %}
#### Hardened Keys
{% autocrossref %}
Deriving [child extended keys][child extended key]{:#term-child-extended-key}{:.term} from parent extended keys is more nuanced
than described earlier due to the presence of two extended private key
derivation formulas. The normal formula, described above, combines
together only the index number and the parent chain code to create the
child chain code and the integer value which is combined with the parent
private key to create the child private key.
![Creating Child Public Keys From An Extended Private Key](/img/dev/en-hd-private-parent-to-private-child.svg)
The hardened formula, illustrated above, combines together the index
number, the parent chain code, and also the parent private key to create
the data used to generate the child chain code and child private key.
This formula makes it impossible to create child public keys without
knowing the parent private key. In other words, parent extended public
keys can't create hardened child public keys.
Because of that, a [hardened extended private
key][]{:#term-hardened-extended-private-key}{:.term} is much less
useful than a normal extended private key---however, it's more secure
against multi-level key compromise. If an attacker gets a normal parent
chain code, he can brute-force find all 2<sup>31</sup> normal chain
codes deriving from it. If the attacker also obtains a child, grandchild, or
further-descended private key, he can use the chain code to generate all
of the extended private keys descending from that private key.
![Cross-Generational Key Compromise](/img/dev/en-hd-cross-generational-key-compromise.svg)
For this reason, the chain code part of an extended public key should be
better secured than standard public keys and users should be advised
against exporting even non-extended private keys to
possibly-untrustworthy environments.
Hardened extended private keys create a firewall through which
multi-level key derivation compromises cannot happen. Because hardened
child extended public keys cannot generate grandchild chain codes on
their own, the compromise of a parent extended public key cannot be
combined with the compromise of a grandchild private key to create
great-grandchild extended private keys.
The HD protocol uses different index numbers to indicate
whether a normal or hardened key should be generated. Index numbers from
0x00 to 0x80000000 (0 to 2<sup>31</sup>) will generate a normal key; index
numbers from 0x80000001 to 0x100000000 will generate a hardened key. To
make descriptions easy, many developers use the [prime symbol][] to indicate
hardened keys, so the first normal key (0x00) is 0 and the first hardened
key (0x80000001) is 0´.
(Bitcoin developers typically use the ASCII apostrophe rather than
the unicode prime symbol, a convention we will henceforth follow.)
This compact description is further combined with slashes prefixed by
*m* or *M* to indicate hierarchy and key type, with *m* being a private
key and *M* being a public key. For example, m/0'/0/122' refers to the
123rd hardened private child (by index number) of the first normal child
(by index) of the first hardened child (by index) of the master private
key. The following hierarchy illustrates prime notation and hardened key
firewalls.
![Example HD Wallet Tree Using Prime Notation](/img/dev/en-hd-tree.svg)
The HD protocol also describes a serialization format for extended
public keys and extended private keys. For details, please see the
[wallet section in the developer reference][devref wallets] or BIP32
for the full HD protocol specification.
{% endautocrossref %}
#### Storing Root Seeds
{% autocrossref %}
Root seeds in the HD protocol are 128, 256, or 512 bits of random data
which must be backed up precisely. To make it more convenient to use
non-digital backup methods, such as memorization or hand-copying, BIP39
defines a method for creating a 512-bit root seed from a pseudo-sentence
(mnemonic) of common natural-language words which was itself created
from 128 to 256 bits of entropy and optionally protected by a password.
The number of words generated correlates to the amount of entropy used:
| Entropy Bits | Words |
|--------------|--------|
| 128 | 12 |
| 160 | 15 |
| 192 | 18 |
| 224 | 21 |
| 256 | 24 |
The passphrase can be of any length. It is simply appended to the mnemonic
pseudo-sentence, and then both the mnemonic and password are hashed
2,048 times using HMAC-SHA512, resulting in a seemingly-random 512-bit seed. Because any
input<!--noref--> to the hash function creates a seemingly-random 512-bit seed,
there is no fundamental way to prove the user entered the correct
password, possibly allowing the user to protect a seed even when under
duress.
For implementation details, please see BIP39.
{% endautocrossref %}
### Loose-Key Wallets
{% autocrossref %}
Loose-Key wallets, also called "Just a Bunch Of Keys (JBOK)", are a deprecated form of wallet that originated from the Bitcoin Core client wallet. The Bitcoin Core client wallet would create 100 private key/public key pairs automatically via a Pseudo-Random-Number Generator (PRNG) for later use. Once all these keys are consumed or the RPC call `keypoolrefill` is run, another 100 key pairs would be created. This created considerable difficulty<!--noref--> in backing up ones keys, considering backups have to be run manually to save the newly-generated private keys. If a new key pair set is generated, used, and then lost prior to a backup, the stored satoshis are likely lost forever. Many older-style mobile wallets followed a similar format, but only generated a new private key upon user demand.
This wallet type is being actively phased out and discouraged from being used due to the backup hassle.
{% endautocrossref %}

25
_includes/ref_wallets.md Normal file
View file

@ -0,0 +1,25 @@
## Wallets
### Deterministic Wallet Formats
#### Type 1: Single Chain Wallets
{% autocrossref %}
Type 1 deterministic wallets are the simpler of the two, which can
create a single series of keys from a single seed. A primary weakness is
that if the seed is leaked, all funds are compromised, and wallet
sharing is extremely limited.
{% endautocrossref %}
#### Type 2: Hierarchical Deterministic (HD) Wallets
{% autocrossref %}
![Overview Of Hierarchical Deterministic Key Derivation](/img/dev/en-hd-overview.svg)
For an overview of HD wallets, please see the [developer guide
section][devguide wallets]. For details, please see BIP32.
{% endautocrossref %}