Web Link List Example In Ruby

last modified: December 30, 2005

WebLinkListExample in RubyLanguage using RegularExpression matching (because I'm not aware of anything in the standard library for parsing HTML):

#!/usr/bin/env ruby

require 'net/http'
require 'uri'

# usage:   linklister.rb url
# example: linklister.rb http://www.google.com/

url = ARGV[0] ? ARGV[0] : 'http://www.google.com/'
parsed_url = URI.parse url
page = Net::HTTP.get parsed_url.host, parsed_url.path

nested_hrefs = page.scan /href\s*=\s*\"([^\"]*)|href\s*=\s*\'([^\']*)|href\s*=\s*([^\s\"\'>]+)/i

puts "*** hrefs in #{url}, ***"
nested_hrefs.flatten.each do |href|
    puts href if href
end

Output on 2005-12-30:

*** hrefs in http://www.google.com/ ***
/url?sa=p&pref=ig&pval=2&q=http://www.google.com/ig%3Fhl%3Den
https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en
/imghp?hl=en&tab=wi&ie=UTF-8
http://groups.google.com/grphp?hl=en&tab=wg&ie=UTF-8
http://news.google.com/nwshp?hl=en&tab=wn&ie=UTF-8
http://froogle.google.com/frghp?hl=en&tab=wf&ie=UTF-8
/lochp?hl=en&tab=wl&ie=UTF-8
/intl/en/options/
/advanced_search?hl=en
/preferences?hl=en
/language_tools?hl=en
/ads/
/services/
/intl/en/about.html

-- ElizabethWiethoff


CategoryRuby


Loading...