# Practical Techniques for Searches on Encrypted Data——Literature Notes

## Section one: abstract & introduction

### Part one: Motivation:

in order to solve **How to support searching functionality without any loss if data confidentiality.**

#### Why searches on encrypted data?

- Searching on encrypted e-mails on servers
- Searching on encrypted files on file servers
- Searching on encrypted database

#### Why is this hard?

- perform computations on encrypted data is often hard
- security and functionality

### part two: methods:

five different schemes based on **probabilistic searching**(all positions with erroneous positions($ l/2^m $), $l$ means the length of document）

### part three: the two main parts：

- cryptographic schemes for the problem of
**searching on encrypted data** **proofs of security**for the resulting crypto systems.

### part four: Provable security:

- provide
**provably secure**, untrusted server cannot get any information from the data storage - provide
**query isolation**for searchers, means limited information(searching results) for untrusted server - provide
**controlled searching**, cannot search other information without authorization - provide
**Hidden queries**, does not reveal the search words.

### part five: Efficiency

- Low computation overhead
- Low space and communication overhead
- Low management overhead

#### Other information about the algorithm:

- O(n) for a document of length n, fast and simple
- efficient and practical

## section two: Background and Definition

### part one: two types of approaches:

- build up an index
- perform a sequential scan without index

comparation:

- index may be faster when the documents are large, but storing and updating the index can be substantial overhead.
- the index-based schemes require less sophisticated construction and which are more suitable for mostly-read-only data.

### part two: Definition

notion | description |
---|---|

$R$-breaks | attacks algorithm with resource specified by $R$ |

$R$-secure | no algorithm can $R$-breaks it |

Advantage of $A$ | distinguishing probability of $A$ |

$A$ | an arbitrary algorithm: $\{0,1\}^n \rightarrow \{0,1\}$ |

$G$:(t,e)-secure | A pseudorandom generator: $\mathcal{K}_G \rightarrow S$ |

$F$:(t,q,e)-secure | A pseudorandom function:$\mathcal{K}_F \times \mathcal{X} \rightarrow \mathcal{Y}$ |

$E$:(t,q,e)-secure | A pseudorandom permutation:$\mathcal{K}_E \times \mathcal{Z} \rightarrow \mathcal{Z}$ |

notion | Adv A | ||
---|---|---|---|

$A$ | $Adv A = | Pr[A(X) = 1]-Pr[A(Y) = 1] | $ |

$G$ | $Adv A = | Pr[(A(G(U_{mathcal{K}_G}))) = 1]-Pr[A(U_S) = 1] | $ |

$F$ | $Adv A = | Pr[A^{F_k} =1]-Pr[A^R = 1] | $ |

$E$ | $Adv A = | Pr[A^{E_k,E_k^{-1}} = 1] - Pr[A^{pi,pi^{-1}} = 1] | $ |

## Section three： Solution with sequential scan

### Part one: Scheme I - The Basic Scheme

step one: generates a sequence of pseudorandom values $S_1,...,S_{\mathcal{l}}$ via the pseudorandom generator $G$. and each $S$ is $n - m$ bits long.

step two: then, take $S_i$ to set $T_i := <S_i,F_{k_i}(S_i)>$, the length of $T_i$ is the same as the length of a word($n$) , and let $L_i = S_i$ and $R_i = F_{k_i}(S_i) $

step three: output the ciphertext: $C_i := W_i \oplus T_i$

the Key value: $k_i$ is stored by Alice, not the server. so only Alice can generate the pseudorandom bits to encrypt and decrypt.

Encryption and Decryption as follows.

#### Problems with Basic Scheme

- Queries are not hidden, server learn word
- Query isolation is not satisfied, server learns K and search for arbitrary word.

In other words, Alice must reveal all the $k_i$(thus potentially revealing the entire document), or Alice must know in advance which locations $W$ may appear at.

### Part two： Scheme II - Controlled Searching

Instead of $ R_i \leftarrow F_k(L_i)$, just generate $ S_i$ where $K_i = F_k^{'}(W_i)$.

$T_i := <R_i,F_{k_i}(R_i)>, where K_i = F_k{'}(W_i)$ .

So there an additional pseudorandom function: $ f : \mathcal{K}_F \times \{0,1\}^* \rightarrow \mathcal{K}_F$, using this function to choose keys as $k_i := f_{k^{'}}(W_i)$.

And there are many different ways to generate $k_i$:

- $k_i := f_{k^{'}}(<C,W>)$, where $C$ is chapter.
set $k_i := f_l(<0,C>)$ and $l := f_{k^{'}}(<1,W_i>)$,

- for each chapter, to reveal $k_i := f_{f_{f^{'}}(<1,W>)}(<0,C>)$
- for all chapter, to reveal $l := f_{k^{'}}(<1,W_i>)$

### Part three: Scheme III - Hidden Searches

pre-encrypt each word W of the clear text separately using a deterministic encryption algorithm $E_{k^{"}}$

About the $E_{k^{"}}$:

- not allowed to use any randomness.
- the computation $E_{k^{"}}(x)$ may depend only on $x$ and not the position $i$

Hidden Queries schemes and improved Security(Change K):

### Part four: Scheme IV - Final Scheme

in order to fix a problem that Alice maybe bot decrypt without knowing $E_{k^{"}}(W_i)$ or last $m$ bits of it.

there is a simple fix:

- split the pre-encrypted word $X_i := E_{k^{"}}(W_i) $ into two parts$<L_i,R_i>$
- Instead of generating $k_i := X_i$, generate $k_i := E_{k^{"}}(L_i)$

Encryption and decryption as follows:

## Section four: Variable length words encryption scheme

in order to deal with the fact that English words differ in length: