Here we start our series of publications regarding databases and filesystems that are either decentralized themselves or work well in combination with blockchain/DLT. Our first publication is devoted to introduction into IPFS.
IPFS stands for InterPlanetary File System. It’s a peer to peer distributed file system which provides distributed storage to peers in the network. The fundamental data structure used by IPFS is Merkel DAG. A similar data structure is also used in distributed version control systems like Git. The peer nodes do not need to trust each other in order to store and access data on the IPFS network. IPFS works on content-addressed storage, according to which the data stored on IPFS will be linked with the cryptographic hash of its content. IPFS is similar to BitTorrent, Distributed Hash Table, Git, Self-Certified FileSystems because it reuses similar concepts. IPFS provides a decentralized way of storing and referring to files but gives you more control and refers to files by hashes. There are no privileges for peer nodes in IPFS, so there is no concept of a master node or governing node. One of the advantages with IPFS over HTTP is that it provides higher throughput when dealing with large chunks of data. IPFS supports a decentralized architecture, unlike HTTP which supports a client-server architecture. IPFS generally operates on port number 5001.
IPFS is a distributed system for storing and accessing files, websites, applications, and data.
According to IPFS Documentation
It means you can store any type of data/files in IPFS. After storing data, IPFS provides a hash which starts with Qm. One can then access the data using that hash.
Content Identifier
Hash in technical terms is known as Content Identifier (CID) in IPFS. CID is a label used to point to material in IPFS. It doesn’t indicate where the content is stored. The cryptographic hash of the content is used to generate CID. A different CID is generated based on the encoding or version used.
Version 0: Its base-58 encoded multihash. It has 46 characters starting with “Qm”.
Version 1: A multihash prefix, specifying the encoding used for the remainder of the CID.
CID version identifier which indicates which version of CID is generated. A multi-codec identifier, indicating the format of the target content.
The hash corresponds to a multihash of 46 characters which starts with “Qm”. The letter ‘Qm’ defines the algorithm (SHA-256) and length (32 bytes) used by IPFS.
Hashes
Hashes are functions that take some arbitrary input and returns a fixed length value. Hashes can be represented in different bases (base2, base16, base32, etc). IPFS makes use of that as part of its content identifiers and supports multiple base representations at the same time using the multi-base protocol.
Characteristics of cryptographic hashes:
- Deterministic
- Uncorrelated
- Unique
- One-way
Pinning
IPFS has no guarantee that the data will continue to be stored for a long time as IPFS nodes treat the data as they are stored in the cache. So, to solve this problem, IPFS has a concept of Pinning. Pinning a CID/Hash tells the IPFS server that the data is important and must not be thrown away. It works similar to caching in memory.
Mutable File System (MFS)
IPFS has a Mutable file system (MFS). MFS is a tool built in IPFS which allows you to treat files as normal name based file. You can perform all operations which are available in the normal file system like add, delete, read, write, etc. there is no need to update hashes and links MFS takes care of that for you.
Inter-Planetary Name System (IPNS)
Inter-Planetary Naming System – is a naming system. It’s working is somewhat similar to DNS.
According to IPNS docs:
Inter-Planetary Name System (IPNS) is a system for creating and updating mutable links to IPFS content. Since objects in IPFS are content-addressed, their address changes every time their content does. That’s useful for a variety of things, but it makes it hard to get the latest version of something.
A name in IPNS is the hash of a public key. It is associated with a record containing information about the hash it links to that is signed by the corresponding private key. New records can be signed and published at any time.”Example IPNS address: /ipns/QmSrPmbaUKA3ZodhzPWZnpFgcPMFWF4QsxXbkWfEptTBJd
IPFS Cluster
IPFS cluster provides automatic replication of data to all the peer nodes across the network. This also provides us with a backup of the data across the network. IPFS-cluster-service daemon is used to start a cluster peer node. The cluster peer nodes have to be seen by other peer nodes in the network. This process is of detecting peer nodes is done using bootstrapping with the multiaddress of the 1st node or bootstrap node.
ipfs-cluster-service daemon — bootstrap <multiaddres>
Why one should use IPFS-cluster?
On its own IPFS does not propagate data automatically to the network. Files only propagate through IPFS when they are requested by a node. If you’re running a node and you upload a file to it, no other nodes will pick up that file by default. If you were to then request that file (by hash) through ipfs.io then a copy would remain on that node for a certain amount of time. If someone then requested it from their own IPFS node then they would receive a copy from the 2 nodes that already have it. Unless the file is “pinned”, then the nodes would delete it when their stores were getting full to make space for other files. This means that after sometime garbage collector will automatically remove the file from the network if such file hasn’t been accessed for a long time or space is getting full.
IPFS cluster services are useful in various aspects:-
- It provides auto pinning of hashes in the IPFS network.
- Pinned files are exempt from garbage collection process.
- IPFS cluster provides data replication.
Research Paper to explain the importance of IPFS
From this research paper, you will understand why one should use IPFS rather than storing data directly to the blockchain. So, In this scenario, IPFS acts like a de-centralized database and save some gas. https://www.irjet.net/archives/V5/i8/IRJET-V5I8204.pdf
Useful Resources:-
IPFS whitepaper by Juan Benet:
https://ipfs.io/ipfs/QmV9tSDx9UiPeWExXEeH6aoDvmihvx6jD5eLb4jbTaKGps
The blog provides basics to setup the IPFS node and IPFS cluster:
https://medium.com/mvp-workshop/ipfs-publishing-and-ipfs-cluster-cff3a099993a
This is an interesting blog on how to store IPFS hash efficiently using a smart contract:
Highlights: “Storing IPFS hash in byte type variable is more efficient than string type variable in terms of gas consumption and storing IPFS hash in 2 byte32 variables“
IPFS installation guide is useful to setup IPFS on a node:-