NFS Traffic Anonymizer
This page provides a description of the NFS traffic anonymizer.
The NFS benchmarking project page is here: http://www.gelato.unsw.edu.au/IA64wiki/NFSBenchmarking
I can be reached at <shehjart AT gelato DOT NO SPAM unsw DOT edu GREEBLIES DOT au>
Jun 27, 2007: Added flags for building the anonymizer and TShark with Large File Support on Linux.
The NFS traffic anonymizer is an extension to the Wireshark/TShark NFS dissector. It works by zeroing out fields in the NFS payload that could violate the privacy of the users. For READ and WRITE payloads, it completely zeroes out the contents, only leaving behind meta data like, offset, length, etc. For fields like, filenames and paths, which occur in requests like LOOKUP, CREATE, MKDIR, etc, the fields are replaced by randomly generated filenames and path. Some NFS implementations encode data into the NFS filehandles. Considering this, the filehandles are also anonymized by generally unintelligible numbers.
Most filesystem meta data gets anonymized except some fields which are required for file system hierarchy replication. Note that the timestamps associated with time of capture are not anonymized either.
NOTE: Some build flag changes have outdated this release and I will update it ASAP, in the mean time please build from source - June 27, 2007
The releases are currently available in the following formats:
RPM AMD x86_64: wireshark-0.99.6-2.x86_64-nfsanon-r21520_19.rpm
The source files for the anonymizer differ depending on the wireshark revision you'll be building it with. For now, the source is available against just one Wireshark revision, though I am pretty sure that the anonymizer will work with later revisions too.
nfsreplay svn revision
Enables large file support
- Check out the Wireshark source from:
svn co -rREV http://anonsvn.wireshark.org/wireshark/trunk/ wiresharkand replace REV with the one of the Wireshark revision numbers in the table above.
- Check out NFS anonymizer source:
svn co -rREV https://nfsreplay.svn.sourceforge.net/svnroot/nfsreplay/trunk/misc/nfsanon ./nfsanonand replace REV with the corresponding nfsreplay revision number in the table above.
Next, copy the nfsanon/packet-nfs.c file into the Wireshark source tree, into the subdirectory epan/dissectors/.
- Then run the following scripts:
$ ./autogen.sh; $ CFLAGS=-D_GNU_SOURCE\ -D_FILE_OFFSET_BITS=64 ./configure --enable-warnings-as-errors=no --enable-wireshark=no --without-zlib $ makeNotes:
- We can do without Wireshark since we just need TShark.
- zlib needs to be disabled because we want the whole trace in a single file and zlib does not have Large File Support. A zlib compressed trace file cannot be larger than 2G on Linux.
- Need to define _GNU_SOURCE and _FILE_OFFSET_BITS=64 to make TShark read and write files larger than 2G on 32-bit machines.
- Lets first capture traffic from the network. I prefer to capture first and anonymize during a second run. This is to prevent losing packets during the processing required for anonymization and also to show that this can be done, if TShark is not available on the capture system.
$ tcpdump -i <ifname> -w <dumpfile> -s 0
Use the command above to capture traffic from interface <ifname> and dump it into <dumpfile>. The snap len(-s) is set to maximum using the value 0, because the RPC fragments can start in the middle of a TCP segment. For more info on RPC-over-TCP traffic capture, see RPCOverTCPCapture. I recommend using tcpdump mainly because it is lightweight as compared to TShark and also because it can write large files on Linux by default unlike TShark. So if you're looking at trace files of sizes larger than 2G, use tcpdump. To filter out any non-NFS traffic while capturing, use a filter as shown below:
$ tcpdump tcp port 2049 -w <dumpfile> -s 65535NFS generally operates over TCP port 2049 so we can use that as a filter.
Next, we'll anonymize the trace in dumpfile. As mentioned on the RPCOverTCPCapture page, we need to set the following three preferences to ensure TShark sees all the RPC fragments. Now we need to use the tshark binary we built earlier, so go into that directory.
$ ./tshark -r <dumpfile> -o "tcp.desegment_tcp_streams:TRUE" -o "rpc.defragment_rpc_over_tcp:TRUE" -o "rpc.desegment_rpc_over_tcp:TRUE" -o "tcp.check_checksum:FALSE" -o "nfs.anon_dump_file:<ANONDUMPFILE>" > /dev/null
In the above command, replace <dumpfile> with the path to the previously captured traffic dump. Also replace <ANONDUMPFILE> with the filename into which the anonymized trace should be written. We disable checksum checks in TCP because TShark does not pass packets which fail checksum checks to upper layer protocol dissectors like packet-nfs.c. This prevents collection of complete state information at the NFS layer as some legitimate NFS messages in a trace do fail such checks.
- If the original filenames and paths in the NFS messages are required intact, i.e. dumping the trace without anonymizing the filenames and paths, append an argument to the previous command line as shown below:
$ ./tshark -r <dumpfile> -o "tcp.desegment_tcp_streams:TRUE" -o "rpc.defragment_rpc_over_tcp:TRUE" -o "rpc.desegment_rpc_over_tcp:TRUE" -o "tcp.check_checksum:FALSE" -o "nfs.anon_dump_file:<ANONDUMPFILE>" -o "nfs.disable_filepath_anon:TRUE" > /dev/null
Use the nfsreplay mailing lists for support and discussion.