User Manual

WARNING: This document refers to a project in active development, so it can undergo heavy and frequent changes. Please check for updates. The version of the software here described is 1.0.0 beta.

TIE: overview

TIE (Traffic Identification Engine) is a platform for traffic identification supporting multiple classification techniques. TIE is opensource and runs on Linux, FreeBSD, and MacOS X.

Definitions

Classification Plugin

TIE is able to manage an expandable set of classification techniques, each one implemented as a dynamically loadable software module, called classification plugin (or shortly classifier).

Fingerprints

Each classification plugin operates a classification attempt against a set of fingerprints, contained in a plugin-specific fingerprint file. This file is usually already available but a new one can be sometimes (depending on the specific technique implemented) generated by the plugin itself during learning mode.

Sessions

TIE decomposes network traffic into sessions. Each session is separately classified (i.e. assigned to an application).

TIE supports multiple types of session (to be chosen at startup), with the simplest one corresponding to the common definition of flow.

Sessions classified by TIE can be of type (type is chosen at startup): flow, biflow, host.

  • flow
    Traffic is decomposed into flows. A traffic flow is identified by the tuple <source_host, source_port, destination_host, destination_port, transport_protocol> and an inactivity timeout (default is 60s).
  • biflow
    Traffic is decomposed into biflows. A biflow is identified by the tuple <source_host, source_port, destination_host, destination_port, transport_protocol>, where source and destination can be swapped, and an inactivity timeout (default is 60s). This is to consider traffic flowing in both directions as generated by a single application.
  • host (CURRENTLY UNDER DEVELOPMENT)
    analyze all packets generated from a single host.

Operating Mode

When classifying traffic, TIE can operate in three different modes: Offline, Realtime, and Cyclic.

Offline (default)

  • TIE analyzes live traffic or a traffic trace. Information regarding the classification of a session is generated only when the session ends or at the end of program execution.
  • This operating mode is typically used by researchers testing classification techniques, when there are no timing requirements regarding classification output and user is interested in obtaining information regarding entire session lifetime.

Realtime

  • TIE analyzes live traffic. Information regarding the classification of a session is generated as soon as it is available.
  • Timing constraints are strict. Typical usage is for policy enforcement of classified traffic (QoS, Admission, Billing, Firewalling, etc.)

Cyclic

  • TIE generates a new output file at regular intervals (e.g. each 5 minutes). Each output file contains information regarding sessions that generated traffic during the corresponding interval.
  • Typical usage is for building reporting graphs and web pages (see "Web Reporting HowTo")

Pre-filtering

To narrow down operations performed by TIE to a subset of the input traffic, you can specify filters in tcpdump/bpf syntax at the end of the command line (as in tcpdump). This allows to “a priori” discard some packets, so that TIE processing doesn't take them into account.

See command line options for details.

Executables

TIE consists of an executable (tie) and a set of classifier plugins. Plugins are built as shared objects, that can be dynamically loaded at runtime, fetched from the plugins/ directory.

Each Plugin has its own configuration files for plugin-specific options and fingerprints/*1)*/.

Command line options

Below, all command line options are summarized.

See file “EXAMPLES” to find some typical examples of TIE usage.

Main options
-m mode set the working mode. mode can be:
→ 'o' for offline mode (default)
→ 'c' for cyclic mode
→ 'r' for realtime mode
-q type session type definition. type can be:
→ 'h' for host
→ 'f' for flow
→ 'b' for biflow (default)
-t num set sessions timeout (in seconds)
-i if read packets from interface 'if'
-r file read traffic from file in pcap format
-l trsld enable fingerprint collection using 'trsld' as threshold value (disables classification)
-k enable classification when -l option is specified
-s num set snaplen when capturing traffic
Feature related options2)
-b num store the first num packet sizes of each session
-p num store the first num packet sizes of each session
-P num store payload contents of first packet in each direction (only num bytes per packet)
-S num store num bytes of payloads stream per session
-I num save the first num IPTs of each session
Filtering options
-C num skip the first num packets
-c num stop after num packets
-D 0-6 consider only packets from a specific day of week
-F path use BPF file specified in path to filter out packets
-T string set a specific time range you want to analyze.
string is a time range in the form 'hh:mm-HH:MM'
-Z num set a custom timezone offset
-f disable filter
Tcpdump/bpf style filters can be specified at the end of the command line
(e.g. for HTTP launch: ./tie [options] tcp port 80 )
Other options
-a num set cyclic mode interval duration in seconds
-d path output directory
-e file classification results input file name
-E file classification results ouput file name
-L suffix set the suffix to append to file containing training result
-h print help and exit
-H path read host table from file
-M num perform periodical dump of data and garbage collection each num packets (default 10k pkts)
-o ip port enable classification notifications toward a remote host
-O use persistent connection to a remote host
-n write classification output using labels instead of IDs
-w path dump traffic to file 'pathtofile'
-W num dump packet contents in pcap file containing up to L4 headers plus 'num' bytes of payload
-x enable TCP heuristics (watching SYN/FIN flags)

For the full list of options, execute a

 tie -h 

or see README file.

TIE Output format

Output format is unique but semantics depend upon the operating mode TIE is in (see Operating Modes).

id 5-tuple timestart timeend* pkt-up* pkt-dw* bytes-up* bytes-dw* app_id app_subid confidence
Meaning of field labels
id flow identifier
5-tuple l4 protocol, source and destination addresses, source and destination ports
timestart timestamp of the start of session
timeend* timestamp of the end of session
pkt-up* number of upstream packets
pkt-dw* number of downstream packets
bytes-up* number of upstream bytes
bytes-dw* number of downstream bytes
app_id ID of application resulted from classification
app_subid SubID of application resulted from classification
confidence confidence level of classification process

Starred fields have the following semantics based on operating mode:

  • Offline: refers to the overall session
  • Realtime: refers to the classification time
  • Cyclic: refers to the current interval

Example output file

# tie output version: 1.0 (text format)
# generated by: ./tie -P 20 -t 125 

# 2 plug-ins enabled: nbyte port 

# begin trace interval: 1221921072
# trace interval duration: 300 s

#id     src_ip          dst_ip          proto   sport   dport   dwpkts  uppkts  dwbytes upbytes t_start                 t_last                  app_id  sub_id  confidence
7       10.0.0.55       10.0.0.129      17      4672    4672    1       1       19      48      1221921072.799580       1221921072.892036       40      0       25
5       10.0.0.55       10.0.0.209      17      4672    4672    1       1       19      225     1221921072.799178       1221921073.033699       40      0       25
12      10.0.0.54       10.0.0.80       17      33332   53      1       1       124     34      1221921073.257437       1221921073.274311       5       0       50
47      10.0.0.55       10.0.0.151      17      4672    4672    1       1       19      48      1221921074.989144       1221921075.108251       40      0       25
51      10.0.0.55       10.0.0.57       17      4672    4672    1       1       169     35      1221921075.039750       1221921075.254110       0       0       0
40      10.0.0.55       10.0.0.125      6       2094    4662    1       1       92      108     1221921074.984972       1221921075.299248       127     0       50
248     10.0.0.54       10.0.0.67       6       38629   1863    1       1       8       5       1221921088.905082       1221921089.114479       57      0       50

Sample use

Here are some example to show how to launch tie for different purposes.

IMPORTANT: Some classifier plugins will disable themselves (lamenting “some requisites not satisfied”) if needed options are not specified in command line: check in plugin specific documentation if this happens.

Classification

  • Capturing live traffic from a physical interface in realtime mode:
 tie -m r -i eth0 
  • Reading traffic from a libpcap dump file in offline mode:
 tie -m o -r traffic.pcap 

Ground Truth Creation

To create a ground truth file based on deep packet inspection it is necessary to enable a payload based classification plugin (e.g. l7filter) editing bin/plugins/enabled_plugins file. Once enabled classification can be launched using the following options:

 tie -m o -r traffic.pcap -S 2048 -E traffic.gt

where “-S” option specifies how many payload bytes to collect for each session and “-E” option specifies the ground truth file name.

NOTE: if -S option is not specified the l7filter plugin will disable itself because it can't obtain the features it needs

Evaluation of Classification through pre-classified data (e.g. ground-truth)

To evaluate a classification technique implemented by a plugin it is mandatory to have a pre-classification file, created before with some other techniques (see Ground Truth Creation). After that it is necessary to enable only the classification plugin to be tested (editing bin/plugins/enabled_plugins) and launch TIE with the following options:

 tie -m o -r traffic.pcap -E plugin_test.out 

When the execution is terminated, it is possible to show the resulting confusion matrix using the stats.pl script in the following manner:

 stats.pl -p traffic.gt plugin_test.out 

Training

Some classification plugins could need a training phase. To train a classification plugin under TIE it is mandatory to have a ground truth file (see Ground Truth Creation). After that it is necessary to enable only the classification plugin to be trained (editing bin/plugins/enabled_plugins) and launch TIE with the following options:

 tie -m o -r training_trace.pcap -e logs/training_trace.gt -l 20 

where the “-e” option specifies the ground truth file to use and the “-l” option enables training (disabling classification) with a threshold value to be used by the plugin to select sessions (with per-plugin defined semantics: please refer to specific plugin documentation for details).

1)
indicare nella nota la disponibilità pubblica dei file di fingerprint, con il link
2)
some of these could be mandatory to run specific plugins - see plugin documentation for reference
sections/documentation/user_manual.txt · Last modified: 2015/04/23 14:37 (external edit)
CC Attribution-Noncommercial-Share Alike 4.0 International
Recent changes RSS feed Copyright 2008-2012 COMICS Research Group, Computer Science Department, University of Naples "Federico II"