How to Extract all Images from a Webpage with Ruby
Here is a little ruby snippet that will download all pictures from a webpage. Rather than using XPath, we are going to first reduce the source code to capture everything inside of quotes. Some websites use JSON w/in a script tag to lazy load images an...
Written by Sean Behan on 08/01/2018
Regex for Extracting URLs in Plain Text
Here is a Regex for extracting URLs from text. However, these links will not already be hyperlinked or source attribtues from images or iframes. This example is in PHP. I was trying to format a Wordpress page to auto hyperlink but preserve embeded ima...
Written by Sean Behan on 04/14/2017
Matching email addresses in Javascript
Matching email addresses in Javascript regex = /\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/img "hello sean@example.com how are you? do you know bob@example.com?".match(regex) // => ["sean@example.com", "bob@example.com"]
Written by Sean Behan on 03/24/2017
How to Create a Slug in Python with the Re Module
There are a few 3rd party modules that do this sort of thing. But there is a pretty solution using out of the box Python functionality. You don't have to install any dependencies if you use the `re` module. import re text = ' asdfladf ljklasfj 2324...
Written by Sean Behan on 03/02/2017
A Ruby Regex for Removing Links and Images from Text
r = /https?:\/\/[\S]+/i you_string.gsub(r, '') Here's the rubular regex to play around with yourself http://rubular.com/r/SRKkYrW4IJ
Written by Sean Behan on 11/14/2013
Email Regex
Regular Expression that Matches Email Addresses: /\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/
Written by Sean Behan on 06/17/2012
Absolutize Relative Links Using PHP and Preg_Replace_Callback
I was in the market for a simple php script to replace hrefs with their absolute paths from scraped web pages. I wrote one myself. I used the preg_replace_callback function so that I could pass the parsed results as a single variable. <?php $domain =...
Written by Sean Behan on 06/17/2012
Regular Expression for finding absolute URLs
Regular Expression for finding absolute URLs in a bunch of text... like a log file. /(http:(.*?)\s)/
Written by Sean Behan on 06/17/2012
Email Obfuscation and Extraction from Text with Rails
There is a helper method for handling the obfuscation of email addresses in Rails. mail_to "me@domain.com", "My email", :encode => "hex" # => My email If you want to then extract an email address(or all email addresses) from a block of text here is the...
Written by Sean Behan on 06/17/2012
Parse for Links with Prototype JS
Parsing for links with the Prototype javascript library is easy. Here is the pattern for finding links /(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^ =%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?/ And to implement it you can loop through your con...
Written by Sean Behan on 06/17/2012