CLOUD STORAGE SECURITY
CS838, December 3, 2012

BACKGROUND
- Safety & liveness
    - Safety: nothing bad happens
    - Liveness: something good eventually happens
- Consistency (Strongest to weakest)
    [http://www.mimuw.edu.pl/~ms209495/talks/2011/depot/foil05.html]
    - Sequential consistency: all nodes see all writes in the same order 
    - Causal consistency: writes that are causally related are seen by all 
      nodes in the same order; concurrent writes (i.e., writes which are not 
      causally related) may be seen in a different order by each node
    - Fork consistency: correct server has total order of writes; faulty
      server may show different version to different clients
    - Fork join consistency: correct server is same as fork consistency;
      fork resolution (join) appears as concurrent write
    - Fork join causal consistency: causal consistency + fork join consistency
- Prior work on reducing trust assumptions
    - Quorum and replicated state machine: tolerate failures by a fraction of
      servers; sacrifice safety and liveness when faults exceed a threshold
    - Fork-based systems: maintain safety without trusting a server;
      sacrifice liveness when server is unreachable, or faulty server may
      permanently partition correct clients

DEPOT OVERVIEW
* What is the primary goal of Depot?
    - Ensure data availability and durability even with untrusted storage
      service provider
- Correctness assumptions
    - Safety: a client only need trust itself
    - Liveness and availability
        - Puts: client can always update and any subset of connected, correct
          clients can always share updates
        - Gets: allow reads to be served by any node (even other clients)

DEPOT OPERATION
- Information maintained by nodes
    - Logical clock
        - Incremented when node performs a local write
        - Advanced when node receives an update from another node
    - Version vector
        - Contains an entry for each node in the system whose value is the
          highest logical clock observed for any update from that node
    - Log
        - List of all updates in causal consistency order (i.e., ordered by
          logical clock times, with node id used to break ties)
    - Checkpoint
        - Reflects current state of system

- Node N calls write(key, value)
    - Node N increments its logical clock
    - Node N constructs update message
        DependencyVersionVector, SignedByN{key, Hash(value), LogicalClock@N,
            Hash(history)}
        - Hash(history) encodes history on which update depends
        - DependencyVersionVector indicates the version vector the history
          hash covers
        - SignedByN node signs update with its private key
- Correct node C accepts update u if it meets 4 conditions:
    1) u is properly signed 
    2) u must be newer than any updates C already received from signing node
    3) C's version vector must include u's DependencyVersionVector
    4) u's history hash must match a hash computed by C across every node's 
       last update at time DependencyVersionVector
    3 & 4 ensure that before receiving update u, C has already received all
        updates on which u depends
- Faulty node F can create forking updates
    - Two updates u@F u'@F such that neither update's history includes the 
      other's
    - Send each update to different node: u to N and u' to N'
    - Node N creates update u2 depending on u and sends this to N'
    - N' cannot detect problem based only on DependencyVersionVector, since
      it would think it already received u on which u2 depends
    - However, Hash(history) computed by N' will not match Hash(history) in u2
    - Without Hash(history), N' would accept u2 violating causality and never
      allowing the system to reach eventual consistency
- Joining a fork
    - N and N' find the latest VersionVector with a common history
    - N sends its updates beginning from this point to N'
    - VersionVector entry for node M, whcih issued the forking updates is
      expanded to contain 3 values:
        - Pre-fork entry for node M
        - Post-fork entry with logical clock for u
        - Post-fork entry with logical clock for u'
    - Join appears as a concurrent update 
    - Unclear if the entries can be compressed again at some point

DISCUSSION
* Is the overhead of Depot acceptable?
* Is all the complexity introduced by Depot really necessary?
* What other security concerns are associated with storage other than data
  availability and durability?