These days i’m messing around with an application that index thousands of documents per day and perform hundreds of queries per hour, so query performance is crucial. The main aim is to provide detection of URLs and IP addresses (want to play a bit? take a look to a previous post) but full-text searching capabilities is also desired althought less used, so i have given a try to improve performance and, specifically, query times, and here is my tests results.


What is moloch?

As his own website says: “Moloch is an open source, large scale IPv4 packet capturing (PCAP), indexing and database system. A simple web interface is provided for PCAP browsing, searching, and exporting. APIs are exposed that allow PCAP data and JSON-formatted session data to be downloaded directly.” it will be very useful as a network forensic tool to analyze captured traffic (moloch can also index previously captured pcap files as we will see) in case of a security incident or detecting some suspicious behaviour like, for example, some kind of alert in our IDS.

Thanks of indexing pcaps with elasticsearch, moloch provide us with the ability to perform almost real-time searches among dozens or hundreds of captured GB network traffic being able to apply several filtering options on the way. It isn’t as complete as Wireshark filtering system for example but will save us tons of work when dealing with some filtering and visualization as well as Moloch will provide us with some features Wireshark lacks, like filtering by country or AS.

I’m sure to not be the only who would have loved to rely on moloch when analyzing dozens of GB with tshark and wireshark, particularly each time you apply a filter to show some kind of data…


Most of us, when conducting OSINT tasks or gathering information for preparing a pentest, draw on Google hacking techniques like site:company.acme filetype:pdf “for internal use only” or something similar to search for potential sensitive information uploaded by mistake. Other times, a customer ask us to know if they have leaked in a negligence this kind of sensitive information and we proceed to make some google hacking fu.
But, what happens if we don’t want to make this queries against Google and, furthermore, follow links from search that could potentially leak referers? Sure we could download documents and review them manually in local but it’s boring and time consuming. Here is where Apache Solr comes into play for processing documents and create index of them to give us almost real time searching capabilities.