php: absolute links file_get_contents php preg_replace_callback regex relative links
by bseanvt
1 comment
Absolutize Relative Links Using PHP and Preg_Replace_Callback
I was in the market for a simple php script to replace hrefs with their absolute paths from scraped web pages. I wrote one myself. I used the preg_replace_callback function so that I could pass the parsed results as a single variable.
<?php
$domain = "http://seanbehan.com";
$pattern = "/\bhref=[\"|'](.*?)[\"|']/";
$string = file_get_contents($domain);
// prepends relative links w/ $domain skips returns the match if already absolute
function replace_href($match){
global $domain;
if(substr($match[1], 0, 7)!=="http://" && substr($match[1],0,8)!=="https://"){
return "href='".$domain.$match[1]."'";
} else {
return "href='".$match[1]."'s";
}
}
print preg_replace_callback($pattern, "replace_href", $string);
Regular Expression for finding absolute URLs
Regular Expression for finding absolute URLs in a bunch of text… like a log file.
/(http:(.*?)\s)/
Ruby on Rails: email hyperlinking obfuscation parsing recipes regex regular expressions security
by bseanvt
1 comment
Email Obfuscation and Extraction from Text with Rails
There is a helper method for handling the obfuscation of email addresses in Rails.
mail_to "me@domain.com", "My email", :encode => "hex" # => My email
If you want to then extract an email address(or all email addresses) from a block of text here is the code. I created a helper function called “emailitize” and put it in the ApplicationHelper module inside helpers/application_helper.rb
module ApplicationHelper
#takes a string and will return the same string but with email addresses encoded and hyperlinked
def emailitize(text)
text.gsub(/([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})/i) {|m|
mail_to(m, m.gsub("@", "[at]"), :encode=>:hex)
}
end
end
It’s important to remember that you’ll need to pass a block to the gsub method. You can’t do something like this instead
text.gsub( /([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})/i, mail_to('\\1@\\2', '\\1@\\2', :encode=>:hex) )
It will work except the encode will fail. It will evaluate the ‘\\1@\\2′ strings rather than as dynamic variables.
You can then use this function in your views
<%= emailitize @job.how_to_apply %>
More information is available in the Rails and Ruby docs:
http://api.rubyonrails.org/classes/ActionView/Helpers/UrlHelper.html#M001887
http://ruby-doc.org/core/classes/String.html#M000817
Programming: interface javascript js parse prototype regex view
by bseanvt
2 comments
Parse for Links with Prototype JS
Parsing for links with the Prototype javascript library is easy. Here is the pattern for finding links
/(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^ =%&:/~\+#]*[\w\-\@?^=%&/~\+#])?/
And to implement it you can loop through your containers that might contain links
document.observe("dom:loaded", function(){
var posts = $$("div#posts");
for(var i = 0; i < posts.length; i++){
var link_regex = /(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^
=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?/;
var parsed_string = posts[i].innerHTML.gsub(link_regex, '<a href="#{0}"
target="_blank">#{0}</a>');
posts[i].innerHTML = parsed_string;
}
});