星期二, 六月 19, 2012

Extract all URL from a file or website

Frequently one may want to extract all the URL in a webpage.  There are many ways to do this.  Below is an example:
#!/usr/bin/env bash
gawk 'BEGIN{
RS=""
IGNORECASE=1
}
{
  for(o=1;o<=NF;o++){
    if ( $o ~ /href/){
      gsub(/.*href=\042/,"",$o)
      gsub(/\042.*/,"",$o)
      print $(o)
    }
  }
}' 
Pipe the webpage to this script will do the job.

没有评论: