Traffic classification
Traffic classification is an automated process which categorises computer network traffic according to various parameters (for example, based on port number or protocol) into a number of traffic classes.[1] Each resulting traffic class can be treated differently in order to differentiate the service implied for the data generator or consumer.
Typical uses
    
Packets are classified to be differently processed by the network scheduler. Upon classifying a traffic flow using a particular protocol, a predetermined policy can be applied to it and other flows to either guarantee a certain quality (as with VoIP or media streaming service[2]) or to provide best-effort delivery. This may be applied at the ingress point (the point at which traffic enters the network, typically an edge device) with a granularity that allows traffic management mechanisms to separate traffic into individual flows and queue, police and shape them differently.[3]
Classification methods
    
Classification is achieved by various means.
Port numbers
    
- Fast
- Low resource-consuming
- Supported by many network devices
- Does not implement the application-layer payload, so it does not compromise the users' privacy
- Useful only for the applications and services, which use fixed port numbers
- Easy to cheat by changing the port number in the system
Deep Packet Inspection
    
- Inspects the actual payload of the packet
- Detects the applications and services regardless of the port number, on which they operate
- Slow
- Requires a lot of processing power
- Signatures must be kept up to date, as the applications change very frequently
- Encryption makes this method impossible in many cases
Matching bit patterns of data to those of known protocols is a simple widely used technique. An example to match the BitTorrent protocol handshaking phase would be a check to see if a packet began with character 19 which was then followed by the 19-byte string 'BitTorrent protocol'.[4]
A comprehensive comparison of various network traffic classifiers, which depend on Deep Packet Inspection (PACE, OpenDPI, 4 different configurations of L7-filter, NDPI, Libprotoident, and Cisco NBAR), is shown in the Independent Comparison of Popular DPI Tools for Traffic Classification.[5]
Statistical classification
    
- Relies on statistical analysis of attributes such as byte frequencies, packet sizes and packet inter-arrival times.[6]
- Very often uses Machine Learning Algorithms, as K-Means, Naive Bayes Filter, C4.5, C5.0, J48, or Random Forest
- Fast technique (compared to deep packet inspection classification)
- It can detect the class of yet unknown applications
Encrypted traffic classification
    
Nowadays the traffic is more complex, and more secure, for this, we need a method to classify the encrypted traffic in a different way than the classic mode (based on IP traffic analysis by probes in the core network). A form to achieve this is by using traffic descriptors from connection traces in the radio interface to perform the classification.[7]
This same problem with traffic classification is also present in multimedia traffic. It has been generally proven that using methods based on neural networks, vector support machines, statistics, and the nearest neighbors are a great way to do this traffic classification, but in some specific cases some methods are better than others, for example: neural networks work better when the whole observation set is taken into account.[8]
Implementation
    
Both, the Linux network scheduler and Netfilter contain logic to identify and mark or classify network packets.
Typical traffic classes
    
Operators often distinguish three broad types of network traffic: Sensitive, Best-Effort, and Undesired.
Sensitive traffic
    
Sensitive traffic is traffic the operator has an expectation to deliver on time. This includes VoIP, online gaming, video conferencing, and web browsing. Traffic management schemes are typically tailored in such a way that the quality of service of these selected uses is guaranteed, or at least prioritized over other classes of traffic. This can be accomplished by the absence of shaping for this traffic class, or by prioritizing sensitive traffic above other classes.
Best-effort traffic
    
Best effort traffic is all other kinds of non-detrimental traffic. This is traffic that the ISP deems isn't sensitive to Quality of Service metrics (jitter, packet loss, latency). A typical example would be peer-to-peer and email applications.[9] Traffic management schemes are generally tailored so best-effort traffic gets what is left after sensitive traffic.
Undesired traffic
    
This category is generally limited to the delivery of spam and traffic created by worms, botnets, and other malicious attacks. In some networks, this definition can include such traffic as non-local VoIP (for example, Skype) or video streaming services to protect the market for the 'in-house' services of the same type. In these cases, traffic classification mechanisms identify this traffic, allowing the network operator to either block this traffic entirely, or severely hamper its operation.
File sharing
    
Peer-to-peer file sharing applications are often designed to use any and all available bandwidth which impacts QoS-sensitive applications (like online gaming) that use comparatively small amounts of bandwidth. P2P programs can also suffer from download strategy inefficiencies, namely downloading files from any available peer, regardless of link cost. The applications use ICMP and regular HTTP traffic to discover servers and download directories of available files.
In 2002, Sandvine Incorporated determined, through traffic analysis, that P2P traffic accounted for up to 60% of traffic on most networks.[10] This shows, in contrast to previous studies and forecasts, that P2P has become mainstream.
P2P protocols can and are often designed so that the resulting packets are harder to identify (to avoid detection by traffic classifiers), and with enough robustness that they do not depend on specific QoS properties in the network (in-order packet delivery, jitter, etc. - typically this is achieved through increased buffering and reliable transport, with the user experiencing increased download time as a result). The encrypted BitTorrent protocol does for example rely on obfuscation and randomized packet sizes in order to avoid identification.[11] File sharing traffic can be appropriately classified as Best-Effort traffic. At peak times when sensitive traffic is at its height, download speeds will decrease. However, since P2P downloads are often background activities, it affects the subscriber experience little, so long as the download speeds increase to their full potential when all other subscribers hang up their VoIP phones. Exceptions are real-time P2P VoIP and P2P video streaming services who need permanent QoS and use excessive overhead and parity traffic to enforce this as far as possible.
Some P2P applications[12] can be configured to act as self-limiting sources, serving as a traffic shaper configured to the user's (as opposed to the network operator's) traffic specification.
Some vendors advocate managing clients rather than specific protocols, particularly for ISPs. By managing per-client (that is, per customer), if the client chooses to use their fair share of the bandwidth running P2P applications, they can do so, but if their application is abusive, they only clog their own bandwidth and cannot affect the bandwidth used by other customers.
References
    
- IETF RFC 2475 "An Architecture for Differentiated Services" section 2.3.1 - IETF definition of classifier.
- SIN 450 Issue 1.2 May 2007 Suppliers' Information Note For The BT Network BT Wholesale - BT IPstream Advanced Services - End User Speed Control and Downstream Quality of Service - Service Description
- Ferguson P., Huston G., Quality of Service: Delivering QoS on the Internet and in Corporate Networks, John Wiley & Sons, Inc., 1998. ISBN 0-471-24358-2.
- BitTorrent Protocol
- Tomasz Bujlow; Valentín Carela-Español; Pere Barlet-Ros. "Independent Comparison of Popular DPI Tools for Traffic Classification". In press (Computer Networks). Retrieved 2014-11-10.
- E. Hjelmvik and W. John, “Statistical Protocol IDentification with SPID: Preliminary Results”, in Proceedings of SNCNW, 2009
- Gijón, Carolina (2020). "Encrypted Traffic Classification Based on Unsupervised Learning in Cellular Radio Access Networks". IEEE. 8: 167252–167263. doi:10.1109/ACCESS.2020.3022980. S2CID 221913926.
- Canovas, Alejandro (2018). "Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns". IEEE. 32 (6): 100–107. doi:10.1109/MNET.2018.1800121. hdl:10251/116174. S2CID 54437310.
- The spam problem has actually led some network operators to implement Traffic shaping on SMTP traffic. See Tarpit (networking)
- Leydon, John. "P2P swamps broadband networks". The Register. The Register article which refers to Sandvine report - access to the actual report requires registration with Sandvine
- Identifying the Message Stream Encryption (MSE) protocol
- "Optimize uTorrent Speeds Jatex Weblog". Example for client side P2P traffic limiting