Reader's tip of the day: ratios vs. raw counts
Today, I'd like to present another of our tips of the day (see the whole series here). This one provided by one of our faithful readers, Dai Morgan, in response to my log analysis stories from last month. Here is an excerpt from the e-mail we received:
---------------------------
Jim Clausing, jclausing --at-- isc dot sans dot org
0 comment(s)
I've recently been dealing with a harvesting incident and needed to identify IP addresses which were running scripts against a web site. If you just look at the high talkers then big customers and gateways can be as big as the bad guys. After some work I found it was useful to look at the ratio of URLs to hits. Normal users hit a wide variety of pages, but the scripts just churn round and round at the same URLs.
Using perl it's easy to pull the source IP address and the URL as you loop through the web server logs. To analyse this data it needs to be loaded into a hash of hashes to keep a count of urls per ip address.
$hash{$ip}{$url}++;
When you've finished the log file loop, start another loop through the hash , you can get the url count as follows
my $url_count = (keys (%{hash{$ip}}));
Then it is just a matter of dividing number of hits by the url count. The bad guys have a higher ratio than normal users. Each site will have slightly different characteristics, so some degree of local tuning will be required. It also helps to strip out any in URL tokens, either in the perl or externally via a 'grep -v'. (or sed/awk, JAC)
I think this technique has other applications, for example looking at signon success and failures. Its also possible to produce summary data of IDS data.
I thought it was an excellent observation that the ratio rather than raw number of hits might provide some very useful data. Dai has provided the script and you can see it here. Dai's explanation and usage docs are here and here, respectively. The explanation doc goes into a lot of detail on what the Perl is actually doing which is quite educational if you aren't a Perl guru. Dai, thanks for sharing the tip and script with our readers.Using perl it's easy to pull the source IP address and the URL as you loop through the web server logs. To analyse this data it needs to be loaded into a hash of hashes to keep a count of urls per ip address.
$hash{$ip}{$url}++;
When you've finished the log file loop, start another loop through the hash , you can get the url count as follows
my $url_count = (keys (%{hash{$ip}}));
Then it is just a matter of dividing number of hits by the url count. The bad guys have a higher ratio than normal users. Each site will have slightly different characteristics, so some degree of local tuning will be required. It also helps to strip out any in URL tokens, either in the perl or externally via a 'grep -v'. (or sed/awk, JAC)
I think this technique has other applications, for example looking at signon success and failures. Its also possible to produce summary data of IDS data.
---------------------------
Jim Clausing, jclausing --at-- isc dot sans dot org
SANS ISC presentation in Brazil
This is for the brazilian security community. I will give a talk about SANS ISC and security threats in the Colaris (Conferencia Latino Americana de Resposta a Incidentes de Segurança ) Security Conference in Brazil next week as part of the FIRST Technical Colloquium. The Colaris one is open to general public and registration can be done until Oct. 4th. I will be speaking in the second day (Oct 10th) and will be a pleasure to talk to those that want to meet me there. For english information click here .
Keywords:
0 comment(s)
Back to green, but the exploits are still running wild
Folks, as is our policy here at the Internet Storm Center, once we feel we've raised awareness of an issue by raising infocon to yellow, we move it back to green (otherwise, with the constant release of exploits of unpatched vulnerabilities, infocon would stay at a heightened level and become as meaningless as the DHS terrorist threat level). Normally, we do this after 24 hours, but in this case, since we didn't raise infocon until Saturday, we felt we should wait until most folks had made it back to work on Monday before going back. That doesn't mean that there is no more risk. Quite to the contrary, until the vulnerabilities are patched, the risk remains high because we know there are many variants of the exploit in the wild as I type this. There were even Metasploit modules released over the weekend, so it doesn't take much talent at this point to create a new exploit. However, we feel that things have leveled off somewhat. We've published pointers to the workarounds in Saturday's story, so there isn't much more that we can do at this point other than remain vigilant.
0 comment(s)
×
Diary Archives
Comments
Anonymous
Dec 3rd 2022
9 months ago
Anonymous
Dec 3rd 2022
9 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
Anonymous
Dec 26th 2022
8 months ago
Anonymous
Dec 26th 2022
8 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
8 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
8 months ago
Anonymous
Dec 26th 2022
8 months ago
https://defineprogramming.com/
Dec 26th 2022
8 months ago
distribute malware. Even if the URL listed on the ad shows a legitimate website, subsequent ad traffic can easily lead to a fake page. Different types of malware are distributed in this manner. I've seen IcedID (Bokbot), Gozi/ISFB, and various information stealers distributed through fake software websites that were provided through Google ad traffic. I submitted malicious files from this example to VirusTotal and found a low rate of detection, with some files not showing as malware at all. Additionally, domains associated with this infection frequently change. That might make it hard to detect.
https://clickercounter.org/
https://defineprogramming.com/
Dec 26th 2022
8 months ago
rthrth
Jan 2nd 2023
8 months ago