2012年8月1日 星期三

[引用][shell] 使用shell 去 parse access_log .

引用來源: http://www.intuitive.com/wicked/84-exploring-apache-access_log-shell-script.shtml

A typical line in an access_log looks like the following:
63.203.109.38 - - [02/Sep/2003:09:51:09 -0700] "GET /index.php HTTP/1.1"
301 248 "http://test.com/xxx.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
Table 1 shows the value, by column, for the common log format.
Table 1: Common Log File Layout
Column Value
1IP of host accessing the server
2-3Security information for https/SSL connections
4Date and time zone offset of the specific request
5Method invoked
6URL requested
7Protocol used
8Result code
9Number of bytes transferred
10Referrer
11Browser identification string

將下列檔案存成: calculate_log.sh #!/bin/sh # webaccess - analyze an Apache-format access_log file, extracting # useful and interesting statistics bytes_in_gb=1048576 host="self.com" if [ $# -eq 0 -o ! -f "$1" ] ; then echo "Usage: $(basename $0) logfile" >&2 exit 1 fi firstdate="$(head -1 "$1" | awk '{print $4}' | sed 's/\[//')" lastdate="$(tail -1 "$1" | awk '{print $4}' | sed 's/\[//')" echo "Results of analyzing log file $1" echo "" echo " Start date: $(echo $firstdate|sed 's/:/ at /')" echo " End date: $(echo $lastdate|sed 's/:/ at /')" hits="$(wc -l < "$1" | sed 's/[^[:digit:]]//g')" echo " Hits: $hits (total accesses)" pages="$(grep -ivE '(.txt|.gif|.jpg|.png)' "$1" | wc -l | sed 's/[^[:digit:]]//g')" echo " Pageviews: $pages (hits minus graphics)" totalbytes="$(awk '{sum+=$10} END {print sum}' "$1")" echo -n " Transferred: $totalbytes bytes " # now let's scrape the log file for some useful data: echo "" echo "The ten most popular pages were:" awk '{print $7}' "$1" | grep -ivE '(.gif|.jpg|.png)' | \ sed 's/\/$//g' | sort | \ uniq -c | sort -rn | head -10 # 若是改成這樣, 會把重複ip 排名出來. # awk '{print $1}' "$1" | \ # sed 's/\/$//g' | sort | \ # uniq -c | sort -rn | head -10 echo "" echo "The ten most common referrer URLs were:" awk '{print $11}' "$1" | \ grep -vE "(^"-"$|/www.$host|/$host)" | \ sort | uniq -c | sort -rn | head -10 echo "" exit 0 
執行: 
# ./calculate_log.sh /var/log/apache2/access.log

【下列文章您可能也有興趣】

沒有留言: