Abstract
An increasing number of websites are making use of HTTPS encryption to enhance security and privacy for their users. However, HTTPS encryption makes it very difficult to identify the service over HTTPS flows, which poses challenges to network security management. In this paper we present DTA-HOC, a novel DNS-based two-level association HTTPS traffic online service identification method for large-scale networks, which correlates HTTPS flows with DNS flows using big data stream processing and association technologies to label the service in an HTTPS flow with a specific associated domain name. DTA-HOC has been specifically designed to address three practical challenges in the service identification process: domain name ambiguity, domain name query invisibility, and data association time window size contradictions. Several experiments on datasets collected from a 10-Gbps campus network are conducted alongside offline and online testing. Results show that DTA-HOC can achieve an average online association rate on HTTPS traffic of 83% and a generic accuracy of 86.16%. Its processing time for one minute of data is less than 20 seconds. These results indicate that DTA-HOC is an efficient method for online identification of services in HTTPS flows for large-scale networks. Moreover, our proposed method can contribute to the identification of other applications which make a Domain Name System (DNS) communication before establishing a connection.