Posts Tagged ‘regex’

Absolutize Relative Links Using PHP and Preg_Replace_Callback

Posted 13 Jan 2010 — by admin
Category php

I was in the market for a simple php script to replace hrefs with their absolute paths from scraped web pages. I wrote one myself. I used the preg_replace_callback function so that I could pass the parsed results as a single variable.

<?php
$domain = "http://seanbehan.com";
$pattern = "/\bhref=[\"|'](.*?)[\"|']/";
$string = file_get_contents($domain);

// prepends relative links w/ $domain skips returns the match if already absolute
function replace_href($match){
  global $domain;
  if(substr($match[1], 0, 7)!=="http://" && substr($match[1],0,8)!=="https://"){
    return "href='".$domain.$match[1]."'";
  } else {
    return "href='".$match[1]."'s";
  }
}
print preg_replace_callback($pattern, "replace_href", $string);

Regular Expression for finding absolute URLs

Posted 15 Sep 2009 — by admin
Category Programming

Regular Expression for finding absolute URLs in a bunch of text… like a log file.

/(http:(.*?)\s)/

Email Obfuscation and Extraction from Text with Rails

Posted 10 Jul 2009 — by admin
Category Ruby on Rails

There is a helper method for handling the obfuscation of email addresses in Rails.

mail_to "me@domain.com", "My email", :encode => "hex"
 # => My email

If you want to then extract an email address(or all email addresses) from a block of text here is the code. I created a helper function called “emailitize” and put it in the ApplicationHelper module inside helpers/application_helper.rb

module ApplicationHelper
  #takes a string and will return the same string but with email addresses encoded and hyperlinked
  def emailitize(text)
    text.gsub(/([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})/i) {|m|
        mail_to(m, m.gsub("@", "[at]"), :encode=>:hex)
    }
  end
end

It’s important to remember that you’ll need to pass a block to the gsub method. You can’t do something like this instead

text.gsub( /([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})/i, mail_to('\\1@\\2', '\\1@\\2', :encode=>:hex) )

It will work except the encode will fail. It will evaluate the ‘\\1@\\2′ strings rather than as dynamic variables.

You can then use this function in your views

<%= emailitize @job.how_to_apply %>

More information is available in the Rails and Ruby docs:

http://api.rubyonrails.org/classes/ActionView/Helpers/UrlHelper.html#M001887

http://ruby-doc.org/core/classes/String.html#M000817

Parse for Links with Prototype JS

Posted 25 Mar 2009 — by admin
Category Programming

Parsing for links with the Prototype javascript library is easy. Here is the pattern for finding links

/(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^
=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?/

And to implement it you can loop through your containers that might contain links

document.observe("dom:loaded", function(){
var posts = $$("div#posts");
for(var i = 0; i < posts.length; i++){
var link_regex = /(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^
=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?/;
var parsed_string = posts[i].innerHTML.gsub(link_regex, '<a href="#{0}"
target="_blank">#{0}</a>');
posts[i].innerHTML = parsed_string;
}
});