Absolutize Relative Links Using PHP and Preg_Replace_Callback

I was in the market for a simple php script to replace hrefs with their absolute paths from scraped web pages. I wrote one myself. I used the preg_replace_callback function so that I could pass the parsed results as a single variable.

<?php
$domain = "http://seanbehan.com";
$pattern = "/\bhref=[\"|'](.*?)[\"|']/";
$string = file_get_contents($domain);

// prepends relative links w/ $domain skips returns the match if already absolute
function replace_href($match){
  global $domain;
  if(substr($match[1], 0, 7)!=="http://" && substr($match[1],0,8)!=="https://"){
    return "href='".$domain.$match[1]."'";
  } else {
    return "href='".$match[1]."'s";
  }
}
print preg_replace_callback($pattern, "replace_href", $string);

Related posts:

  1. Parse for Links with Prototype JS
  2. Accessing Links in Nested TD Cells with Prototype
  3. Highlight String in PHP

1 Comments

  1. a little shorter still…
    <?php
    $url = parse_url($_GET['url']);
    function absolutize($string){
    global $url;
    $absolute = substr($string[1],0,7) == “http://” ? true : false;
    if($absolute){ return $string[0]; }
    return “href=’”.$url['scheme'].”://”.$url['host'].”".$string[1].”‘”;
    }
    $contents = file_get_contents($_GET['url']);
    $contents = preg_replace_callback(“/\bhref=[\"|'](.*?)[\"|']/”, “absolutize”, $contents);
    print $contents;



Add Your Comment