Georgia Tech: Networking & Telecommunications
 Group


Lucent's Extensions to Cryptopan

David Stott (email:lastname at lucent) and his deparment have made some extention to Crypto-PAn for one of his research project in Lucent. Below are some of his interesting thoughts in the emails we extchanged regarding Crytpo-PAn. He  also provided the source code. The code makes calls to OpenSSL and assumes openssl include files and library are in their default locations, where the compiler can find them.

Orginal notes from David

1. I observed that the anonymized addresses often shared many bits with theoriginal address.  We I printed the XOR of the two addresses (e.g., the 'result' variable), there were long sequences of 1s or 0s.  Digging a little further, I found that the input to the encryption algorithm (PRF) often did not change between successive iterations.  More specifically, the probability that rin_input was the same between two iterations is .5 and the probability that the PRF returns the same MSB given different inputs is .5.  As a result, the probability that the PRF returns the same MSB in successive iterations is .25. There are a few simple fixes to this. 

The key to removing the strings of 1s or 0s is to note (a) that the only real restriction on the padding is that you use the same bits of the padding*in the same iteration* of the anonymization and de-anonymization functions and (b) you will get runs of 1s or 0s if you only modify 1 bit of the padding each iteration.  The reason that the algorithm currently has runs is that the probability that rin_input changes between iterations is .5, not 1.0.  When rin_input doesn't change, you get the same answer from the PRF.  I choose to wrap the overflow bits from  the pad back into the pad (e.g., if the pad is being shifted left, I set the LSB to the old MSB).  Another solution is to choose a different bit position from rin_output each iteration (thus if rin_input doesn't change every iteration, you still get fair random bit).

(Notes from Jinliang: The comment is related to the practical ways of using AES as a pseudo-random generator. I will try to incorporate this second solution he suggested in a new version of Cryto-PAn. It will introduce better randomness in the resulting anonymized addresses.)

2. I had to play with the byte order a few times in the code for it to be endian-safe (just in case someone would anonymize on one machine and
de-anonymize on another, with a different byte-order).

(In the source code),  for the endian fixes, I set things up so that the addresses are in network byte order for the caller.  All the bit operations need to be in host byte order. The input to the encryption function (rin_input) goes to network byte order, and the encrypted result (rin_output) gets swapped back to host byte order.

3. There were a few places where I simplified the code, without affecting the algorithm.

4. We have a case where we may want to store the addresses so that they are anonymized without preserving the prefix structure in such a way that we can deliver the trace to (a) Alice, with no keys, who uses the non-prefix preserved (random) address, (b) Bob, who has a key to convert it to prefix-preserved addresses, (c) Charlie, with two keys, who can obtain the original addresses or the same prefix-preserved addresses that Bob can see. 

It turns out that there is a simple variation of your algorithm that provides this property. The solution is simply to run the algorithm a second time on the data (using a second key), but reversing the order of the bits.  That is, in the first iteration, we run Crypt-PAn as normal to randomize the high bits, and in the second iteration, we randomize the low bits.

In our setup, we have the data being collected at one (or multiple) hosts and then copied to a host in our lab, acting as a data repository.  The collection hosts creates a non-PA mapping and ships the two keys (encrypted under the repository's public RSA key) and the data to the repository.  From a single copy of the data at the repository, we can construct any of the three versions of the addresses (original, PA, or non-PA).  Similarly, we can allow anyone (with permission to see at least the non-PA) to have non-PA version and send the appropriate keys to allow individual users to obtain either the PA or original addresses. The simplest solution for this to swap the order of the bits before calling the usual anonymization and after the deanonymization function.

(In the source code), the part that implements the extension is the short 'reverse_bits' function and the a pair of new functions to call the original PA function (with the address reversed).  There could also be functions (a) that takes the original address and two keys to compute the both transformations, (b) that takes two keys and computes the inverse double transformation, and (c) that takes one key goes from a double transformed address to a PA address.  Each of these functions is about two lines long.

5. The code I have is in C++ and makes calls to OpenSSL.  A while ago I profiled the OpenSSL AES implementation against the Rijndael.cpp code and found OpenSSL was noticeably faster (and I can run other algorithms by changing a single string). The attached code assume openssl include files and library are in their default locations, where the compiler can find them

Back to Cryptopan