The value of IP and Domain information
Starting with the end in mind, make the data you have consumable (and thus actionable), make sense of the data, drive risk decisions, and share data with trusted partners and shift from routine monitoring to internal threat intelligence. Start with what you can easily work with like IP addresses and domains.
Some debate is raging within enclaves of the internet about the value and accuracy of the APT1 report. The criticism is in the tacit link between the victims and the numerous sources from a single region of the world, Shanghai China. The APT1 report shared the indicators of compromise allowing myself and others to compare indicators, signatures etc. and evaluate the conclusions.
I set out to evaluate and compare file names, hashes, IP addresses, Domain and all the other atomic indicators from the report against data I had collected. The overlap confirmed quite a bit of what I knew prior. How many orgs could go and compare notes at that level? Not enough.
Anytime a new report is issued with unmasked indicators, each of us should evaluate the findings. Sharing and tracking starts with internally sourced threat intelligence and I would argue that every organization needs the capability, starting with a simple tracking system. Any atomic indicator such as hash, IP address, domain, filename, has a half-life of sorts. As the effectiveness decreases, it is less likely that you will continue to see in any specific remote edge of the internet. If you mined the APT1 report for indicators, most are useless by now.
The net effect of the APT1 report is higher salaries of these with TI on resumes, and driving business to Mandiant. It made a bunch of security vendors shift positions and consider how to capitalize on threats. How does one capitalize on any large list of IPs and domains as found in the APT1 report? If the thinking is to toss it all into a SIEM you might as well stop reading here.
Threats are associated with IP addresses and domains, but focusing on IP and Domains alone is pointless because the threat will move and you have a stale list, ultimately wasting time on false positives. At what point does an IP address stop being a threat? The domains, IP addresses, and AS numbers are part of threat intelligence. In a simple example, any point of concentration of IP or domains tells you that nearby IPs and domains are worthy of examination during the fleeting time the space is being used.
Why internal threat intelligence?
Internal threat intelligence was initially leveraged by government and the defense industrial base, at least the smarter ones. Then, it was the telecoms and large internet service providers and now energy and financial sectors are making a play for top talent to consume internal TI. How far does it go? Can we ‘rent’ the skills from TI brokers or commission specialized reports?
Anton Chuvakin in a recent blog explored the difficulties and objectives of internally sourced threat intelligence, it is worth the read. http://blogs.gartner.com/anton-chuvakin/2014/03/20/on-internally-sourced-threat-intelligence My short take is that Internal threat intelligence evaluates the same information sourced from incidents as anyone doing monitoring. Internal Threat Intelligence takes a depth and breadth analytical approach with the available information. Internal Threat Intelligence is the split between detection and response providing threat suppression.
Internal Threat Intelligence is the hedge against the immediate threat landscape and what is over the horizon. In my personal view, threat intelligence is not grounded in large sets of IP and Domains with poor reputations but having context, history, and in the narrative of objectives and actors. One cannot reach that stratum without a solid foundation to collect and analyze the local information, compare against rational and trusted resources, postulate and test hypothesis and eventually point fingers. The APT1 report was a confirmation of findings, not a revelation.
I think internal threat intelligence will be a required part of monitoring. All analyst should seek to track IP and addresses locally. It is not enough to consume external indicators alone. It is not enough to purchase a set of sensors, plugging in data looking for matches. Monitoring won’t go away but it will become more automated, becoming easier to match events, provide entity specific information, and use large data sources to evaluate relationships, measure impact and finally, drive the incident response.
Linking events, actions and incident through to actors will continue to be in demand with the cost being reduced by emerging platforms that synthesize internal information and rational findings from the outside world. Fusion is the by-product of the most useful and reliable sources measured to bring value in understanding threats. TI is predictive in nature and about the only way divest from a Maginot line (popular commentary made by the RSAC ‘C’ level speaking collective, yet useful in my thinking as well).
I was introduced to passive DNS as a useful analytical tool a few years ago when someone I once worked with wrote his own variant using Python and MySQL. What I learned is the value of tracking just the right amount of information exceeds the value of tracking all of the information, and using only public DNS information is futile. The instant utility of passive DNS for each enterprise seemed evident to me, however this particular implementation suffered performance issues that became unbearable in just a few days. Ultimately, it was scrapped after a month or so.
Before I go further, exact matching IP and Domains is nearly futile and won’t compare to the value of computed and some behavioral indicators. Understanding why a stream between host contains 'system32' with a file write operation of a ‘rar’ file or batch file has a better chance a detecting pivot for example. If you intend to track, choose to do it well. The idea that you could shop and explore known IP or domain address and get a sense of when the query was first made, how many and what neighboring domains and addresses were doing is valuable.
Passive DNS is not new, but I don’t think the general utility to analyst is well understood. In my view, SecurityOnion should have the capability. Tracking means a composite of domains and IP address within your own environment, instantly searchable. At higher levels, a potential threat detection system with room to innovate. Florian Weimer introduced passive DNS many years ago (paper http://www.enyo.de/fw/software/dnslogger/first2005-paper.pdf) and it is well worth the read. Plenty of services exist in the commercial space for analyzing your DNS records for potential threats like OpenDNS’s Umbrella and Damballa. With the services, you get the value of the analytics at scale, and each has invested in exploring and improving detection, with the most virulent malicious actions being noticed and suppressed with speed.
However, the targeted attacks may not have enough concurrency to get noticed, that is to say a single domain that does not fall into set characteristic using SVM, or not newly registered, falls below a noticeable threshold used in large-scale detection. This is where your own passive DNS tracking comes in handy and could be complimentary to any network security monitoring service, network monitoring, and especially local sourced threat intelligence.
Passive DNS version 2
Like my former coworker, I wrote scripts to collect and analyze DNS information but made some design decisions that require a bit of explaining up front. Before getting into the specific, it is important to know that passive DNS or pDNS is not logging. That is to say, each record is not being appended to a file. However, each DNS query and response is being tracked. If a domain and IP has never been seen before, a new record is created, if already in database, the count is incremented and only date field is overwritten, reducing the amount of data stored.
MySQL was replaced with Redis for speed improvements. In my home spun version, the same fields parsed from any DNS query and response are present including keys for threatening IP addresses and Domains. Time and time again, analyst visit the web and dig and lookup names, use reputation systems to enumerate all the bad in the world. Useful and important to validate against the outside world and get a sense of risk, but a step closer to home is far more useful first. The worst part about this sort of query is the lack of accounting. Analyst will most likely make the same query time and again or several analyst may make the same query and only tracking confirmed threats. Lost is the idea of possible threats and sites that are trusted.
In my own variation of pDNS (I call it pDNS2), The basic properties allow analyst to find the most useful information quickly:
· Seek all domains that end, start, or contain a particular word
· Seek specific TTL or very low TTLs
· List all the known threatening IP or domains
· Contrast threat information against other IPs and domains
· Export the threatening IP and Domains
· Return a count of specific subdomains for a given domain example
· Counts the top ‘hits' for domains in order
· Query a range of IP addresses
· Find local resolved IP addresses for parked domains like 127.0.0.1
· Locate all the domains that point to a single IP
· Locate all the IP associated with any domain
· Tag a Domain or IP address with a notion of trust, threat, or interest
· Search by Date
· Count how many new domains show up each day
· Returns Euclidian distance from a queried IP to a tagged threat
· Find the most unanswered queries by count
For the most part the code is useful as a means to explore and track, initiating a home grown threat intelligence effort. You can find the code on github here: https://github.com/bez0r/pDNS2 (specifically the query tool ‘pdns2_query_api.py’ was released in support of this post)
With the basics of passive DNS covered, a separate analytical script was developed to explore specific information and calculate a concept of risk. New domains are checked against a corpus of known good and bad using simple Bayesian ML and another does a random forest walk (concept was from a talk in 2013 by EndGame Systems). In the Bayesian example below, a check of unknowns was pulled from old ‘conficker’ domains to get a sense of how well it works (source: http://blogs.technet.com/b/msrc/archive/2009/02/12/conficker-domain-information.aspx)
Queries can assist an analyst in finding out the most likely domains a site is trying to squat or mimic. Any local domain suspected as a potential threat can be submitted to any of the top reputation sites with returned results used to help score the potential threat.
Additional static sources were included to support domain queries from project Sonar (https://community.rapid7.com/community/infosec/sonar/blog/2013/09/26/welcome-to-project-sonar), so a query returned information about the reverse IPv4, SSL certs, and the usual data around regions and registration. Scoring threat by known properties such as how new a domain was registered, low TTL, and resource record types like TXT that can be used as a command and control channel can lead to analyst starting with the most probable threats. One little script goes out and scrapes sites for IP/Dom and another script import/export for STIX files.
Over time the pDNS2 scripts were tied to each sensor so a simple right click would provide context. The pDNS2 tools started out as a way to quickly makes sense of the IP and Domain space, and later, a support system for local threat intelligence and a driver of Analytics. After all this, I converted the basic scripts into glorious functions and tossed the entire tool set into an iPython Notebook where analyst can save and share notebooks.
You won’t find any ‘ground truth’ data above or research, but you should be thinking about elevating monitoring and use the information already at your finger tips to capitalize on ‘internal threat intelligence’. I did not attend RSAC this year but the undercurrent of talks is in a wide range of hub and spoke information sharing collectives as a service is intriguing. You can share in the trading of useful information or you can buy a service that will do it on your behalf. Most organization want to consume indicators, yet lack the ability to organize the right information. Especially if the information is not directly involved in incidents is too sensitive to share. This is where Mandiant comes in for the win, sharing IOCs while offering a veil of protection for some victims.
I contend the pDNS2 is trivial to initiate and to get into a workflow but it does not stand alone. Other interesting tools like ‘malcom’ (https://github.com/jipegit/malcom) has overlap and does a far better job at presentation and incorporates several feeds against internal ‘live’ sensors.
This post is making an argument that internal threat intelligence is worth the effort, that tracking data such as IP and Domains is not futile and that now is the time for analyst and monitors to effectively become internal TI.