NFS Traffic Anonymizer

This page provides a description of the NFS traffic anonymizer.

The NFS benchmarking project page is here: http://www.gelato.unsw.edu.au/IA64wiki/NFSBenchmarking

I can be reached at <shehjart AT gelato DOT NO SPAM unsw DOT edu GREEBLIES DOT au>

News

Intro

The NFS traffic anonymizer is an extension to the Wireshark/TShark NFS dissector. It works by zeroing out fields in the NFS payload that could violate the privacy of the users. For READ and WRITE payloads, it completely zeroes out the contents, only leaving behind meta data like, offset, length, etc. For fields like, filenames and paths, which occur in requests like LOOKUP, CREATE, MKDIR, etc, the fields are replaced by randomly generated filenames and path. Some NFS implementations encode data into the NFS filehandles. Considering this, the filehandles are also anonymized by generally unintelligible numbers.

Most filesystem meta data gets anonymized except some fields which are required for file system hierarchy replication. Note that the timestamps associated with time of capture are not anonymized either.

Releases

The releases are currently available in the following formats:

Source

The source files for the anonymizer differ depending on the wireshark revision you'll be building it with. For now, the source is available against just one Wireshark revision, though I am pretty sure that the anonymizer will work with later revisions too.

Building

  1. Check out the Wireshark source from:
     svn co -rREV http://anonsvn.wireshark.org/wireshark/trunk/ wireshark
    and replace REV with the one of the Wireshark revision numbers in the table above.
  2. Check out NFS anonymizer source:
     svn co -rREV https://nfsreplay.svn.sourceforge.net/svnroot/nfsreplay/trunk/misc/nfsanon ./nfsanon
    and replace REV with the corresponding nfsreplay revision number in the table above.
  3. Next, copy the nfsanon/packet-nfs.c file into the Wireshark source tree, into the subdirectory epan/dissectors/.

  4. Then run the following scripts:
     $ ./autogen.sh;
     $ CFLAGS=-D_GNU_SOURCE\ -D_FILE_OFFSET_BITS=64 ./configure --enable-warnings-as-errors=no --enable-wireshark=no --without-zlib
     $ make
    Notes:
    • We can do without Wireshark since we just need TShark.
    • zlib needs to be disabled because we want the whole trace in a single file and zlib does not have Large File Support. A zlib compressed trace file cannot be larger than 2G on Linux.
    • Need to define _GNU_SOURCE and _FILE_OFFSET_BITS=64 to make TShark read and write files larger than 2G on 32-bit machines.

Usage

  1. Lets first capture traffic from the network. I prefer to capture first and anonymize during a second run. This is to prevent losing packets during the processing required for anonymization and also to show that this can be done, if TShark is not available on the capture system.
     $ tcpdump -i <ifname> -w <dumpfile> -s 0

    Use the command above to capture traffic from interface <ifname> and dump it into <dumpfile>. The snap len(-s) is set to maximum using the value 0, because the RPC fragments can start in the middle of a TCP segment. For more info on RPC-over-TCP traffic capture, see RPCOverTCPCapture. I recommend using tcpdump mainly because it is lightweight as compared to TShark and also because it can write large files on Linux by default unlike TShark. So if you're looking at trace files of sizes larger than 2G, use tcpdump. To filter out any non-NFS traffic while capturing, use a filter as shown below:

     $ tcpdump tcp port 2049 -w <dumpfile> -s 65535
    NFS generally operates over TCP port 2049 so we can use that as a filter.
  2. Next, we'll anonymize the trace in dumpfile. As mentioned on the RPCOverTCPCapture page, we need to set the following three preferences to ensure TShark sees all the RPC fragments. Now we need to use the tshark binary we built earlier, so go into that directory.

    •   $ ./tshark -r <dumpfile> -o "tcp.desegment_tcp_streams:TRUE" -o "rpc.defragment_rpc_over_tcp:TRUE" -o  "rpc.desegment_rpc_over_tcp:TRUE" -o "tcp.check_checksum:FALSE" -o "nfs.anon_dump_file:<ANONDUMPFILE>" > /dev/null

    In the above command, replace <dumpfile> with the path to the previously captured traffic dump. Also replace <ANONDUMPFILE> with the filename into which the anonymized trace should be written. We disable checksum checks in TCP because TShark does not pass packets which fail checksum checks to upper layer protocol dissectors like packet-nfs.c. This prevents collection of complete state information at the NFS layer as some legitimate NFS messages in a trace do fail such checks.

  3. If the original filenames and paths in the NFS messages are required intact, i.e. dumping the trace without anonymizing the filenames and paths, append an argument to the previous command line as shown below:
    •   $ ./tshark -r <dumpfile> -o "tcp.desegment_tcp_streams:TRUE" -o "rpc.defragment_rpc_over_tcp:TRUE" -o  "rpc.desegment_rpc_over_tcp:TRUE" -o "tcp.check_checksum:FALSE" -o "nfs.anon_dump_file:<ANONDUMPFILE>" -o "nfs.disable_filepath_anon:TRUE" > /dev/null
    This will anonymize all other data except the filenames and paths. Leave out the redirection to /dev/null if the packet info needs to be displayed on the terminal.

Support

Use the nfsreplay mailing lists for support and discussion.

Frequently Asked Questions

IA64wiki: NFSTrafficAnonymizer (last edited 2009-12-10 03:14:02 by localhost)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.