👩🏼‍🎨 Retrievability on Filecoin: State of the Art

Retrievability Option Description Challenges Guarantee Trust Assumptions
1. Basic Retrieval with Single SP Retrieve data from a single SP. If SP becomes unavailable, retrieval fails. No guarantee; depends entirely on SP reliability. Trust SP infrastructure and reliability for retrieval.
2. Retrieval with Data Redundancy (Replication) Data stored with multiple SPs for redundancy. Increases storage cost and complexity. Higher retrieval reliability, not a full guarantee. Trust SPs to serve data; risk of failure if all replicas are down.
3. Retrieval with off-chain SLAs SPs offer SLAs with performance guarantees (e.g., speed, availability). Limited to terms agreed in SLA; trust in SP’s infrastructure. Performance guarantees as per SLA, but not 100% retrieval. Trust SP to fulfill SLA conditions; dependent on SP infrastructure.
4. Spark Protocol for Optimized Retrieval Reputation-based system optimizing retrieval by evaluating SPs. No guarantee of full file availability; relies on past performance. Higher likelihood of successful retrieval, but no 100% guarantee. Trust the historical data and reputation system and the way it is implemented (checker node and SP do not collude); SPs’ performance history.
5. Proof of Data Possession (PDP) Verifies that SP has access to the data. Doesn’t ensure retrieval speed or availability; only data possession. Guarantee of data possession, but no retrieval guarantee. Trust in SP’s possession of data; no guarantee on retrieval.
6. Off-Chain Backup and Caching Backup and cache data in centralized or third-party systems. Increases operational complexity and costs; not decentralized. Full retrieval guarantee from off-chain backups. Trust centralized systems for retrieval; undermines Filecoin's decentralization.
7. CDN Gateway Use of CDNs with arbitration and penalties to guarantee retrieval. Potentially adds external costs and protocol complexity; impacts retrieval speed. Rational retrievability guarantee via CDN arbitration. Honest majority of the CDN committee providing the service

✨ Key Metrics

Category Metric Description Level of Impact
Availability Metrics Data Availability Measures if data is accessible at the time of request. High
Redundancy (Replica Availability) Tracks the availability of data replicas, increasing retrieval success even if some are unavailable. High
Time to Data Recovery Measures time to recover lost or corrupted data. High
Performance Metrics Retrieval Speed (Throughput) Indicates the speed at which data can be retrieved, typically measured in MB/s or GB/s. High
Time to First Byte (TTFB) Measures time from request to the receipt of the first byte of data. High
Retrieval Latency Measures delay between initiating a retrieval request and receiving the full data file. High
Reliability Metrics Retrieval Success Rate Tracks the percentage of successful retrieval attempts. Critical
Uptime and Reliability Measures the operational time of the storage provider’s system and failure frequency. Critical
Error Rate Indicates the frequency of retrieval failures or corrupted data. Critical
Data Integrity Ensures that retrieved data is correct, consistent, and unmodified. High
Cost-related Metrics Cost Efficiency of Retrieval Evaluates the total costs incurred during the retrieval process, including fees and bandwidth. Moderate
Network Bandwidth Usage Measures the bandwidth consumed during the retrieval process, including upload and download. Moderate
Quality Metrics Quality of Service (QoS) Measures the overall retrieval experience, including factors like speed, reliability, and packet loss. High
Data Consistency Measures how well retrieved data matches the original, ensuring it is up-to-date and accurate. Moderate to High
Geographic Proximity Indicates the proximity of the storage provider or data replica to the client’s location. Moderate

đź’ł Payment Options

Payment Method Pros Cons Challenges
Off-Chain Payments Familiar, quick, lower fees Not Web3 native, lacks transparency, centralized Legal disputes, limited automation
On-Chain Payments (FIL) Decentralized, transparent, smart contract integration Transaction fees, price volatility, network congestion Conversion to FIL, scalability issues
On-Chain Payments (Stablecoin) Price stability, secure, smart contract integration Centralization risks, liquidity, regulatory scrutiny Adoption and support, regulatory uncertainty
On-Chain Payments (ERC-20 Tokens) Interoperability, variety of tokens, decentralized, dApp integration Gas fees, cross-chain limitations, volatility Network congestion, cross-chain compatibility, complexity

🦾 Payment Strategies

Payment Strategy Description Pros Cons Challenges Level of Trust
Upfront Payment Clients pay in advance for guaranteed retrieval access. Guaranteed payment for SP; predictable costs for clients. High initial cost; risk if SP fails to deliver. Risk of overpayment; reliance on reputation systems/contracts/CDNs. High trust (Client trusts SP’s commitment).
Pay-to-Retrieve Clients pay when retrieval occurs, based on volume and speed. Flexible, low upfront costs. Uncertainty for SP; potentially higher retrieval costs. Risk of failure to fulfill requests; need for reputation systems/contracts/CDNs. Moderate to high trust (Client trusts SP to deliver).
Periodical Payment Clients make recurring payments for unlimited retrievals. Predictable, steady costs; guaranteed availability. Risk of overpayment; reduced SP incentive. Misaligned incentives; need for performance-based bonuses. Moderate trust (Trust in long-term reliability).
Retrievability Tickets Clients buy tickets in advance for specific retrievals. Flexible, predictable costs. Limited flexibility; unused tickets may expire. Ticket expiration; dynamic pricing needed. Low to moderate trust (Client trusts ticket system is honored and eventually stop buying tickets).
Hybrid Strategy Combination of upfront payment and pay-as-you-go. Balances flexibility and security. Complexity; risk of overpayment. Clarity in terms; need for smart contract enforcement. Moderate trust (Client trusts both upfront and variable components).

💾  Retrieval Services: SP Selection Strategies

Deal-Making Process Description Pros Cons
Direct Negotiation The client and SP engage directly to establish terms for the retrieval deal. The client has full control over selection, negotiating factors like cost, location, and performance guarantees. - Full control over the selection process
SP Selection Mechanism Description Pros Cons Trust Issues
Reputation-Based System SPs are selected based on historical performance metrics (e.g., retrieval success rates, uptime). Clients rely on reputation scores (e.g., Spark protocol). - Helps ensure reliable and fast retrievals- Mitigates the risk of choosing unreliable SPs - Reputation scores may fluctuate, leading to unpredictability- Relies on honest data, which could be manipulated - Reputation can be tampered with unless verified via cryptographic proofs or trusted oracles
Auction-Based Selection SPs bid to handle the client’s retrieval request, and the client selects based on the best offer (e.g., lowest price, best service). - Competitive pricing, potentially lowering costs- Flexibility in selecting based on price and service quality - Risk of race-to-the-bottom, where SPs cut corners to offer lower prices

🧮 Retrieval Services: Client Selection Strategies

Deal Making Process Description Pros Cons Trade-Offs/Considerations
Direct Negotiation SP and client negotiate terms directly. - Full control over terms (price, performance, guarantees). - Extended negotiation periods. - Potential for misunderstandings. - Time and effort required from SP. - Flexibility vs. potential delays and risks.
Automated or Delegated Deal-Making Intermediaries or automated systems (e.g., smart contracts, CDNs, auction systems) handle the process for SP. - Reduces manual workload. - Optimizes terms via real-time data and market forces. - Helps secure competitive deals. - Reduced control for the SP. - Reliance on third-party systems. - Potential additional costs. - Efficiency and optimization vs. control and reliance on third parties.
Client Selection Mechanism Description Pros Cons Trust Issues
First-Come, First-Served (FCFS) Requests are processed in the order they are received. Simple, transparent, no prioritization needed. Inefficient during peak demand, no differentiation between clients. Clients are trusted to be paying for retrievability service.
Reputation-Based Client Selection Priority is given to clients with better reputation scores based on past performance and payment history. Reduces risk by selecting reliable clients. New clients may be disadvantaged; reputation systems can be manipulated. Relies on the security and reliability of the reputation system.
Long-Term Contracts / SLAs Long-term contracts guaranteeing retrieval performance in exchange for steady payments. Predictable revenue, stable relationships with clients. Limited flexibility, risk of underutilized capacity. Requires trust in the contract terms and enforcement by both parties.
Payment-Driven Client Selection Clients are prioritized based on how much they are willing to pay for retrieval. Maximizes revenue by focusing on higher-paying clients. Excludes clients with lower budgets. Requires trust in the fairness of the payment model; clients must be paying for retrievability.
Automated Algorithmic Client Selection Clients are selected algorithmically based on price, reputation, and past performance. Efficient, objective, and consistent selection. Lack of flexibility if client needs change after setup. Trust in the fairness and transparency of the algorithm.
Redundancy-Based Client Selection Clients are prioritized based on data replication across multiple providers to ensure availability. Increases reliability, reduces the risk of retrieval failure. Higher storage costs and coordination required. Replication doesn’t guarantee actual retrievability; requires trust in overall system.