Friday, December 08, 2006

 

What to do with dirty data in an Syndication feed

Hello all,

Yes it has been some time since I posted on this blog. I have a bunch of other blogs that have gotten more attention of late. Also finding appropriate stuff to post here is tricky sometimes. Anyway onto the meat of this one:

So I've been working with feeds of late and found myself in need of a way to handle the case where a summary is not present in a feed but the main content is. Some of the time this content can contain HTML and I hope I don't need to tell you what a pain that can be when trying to create a summary. So what did I do you ask, I used a simple regular expression and then took a substring of the resulting string.

Here is the code: (Java Code BTW)

public String process(String content){

String result = new String(content);

result = result.replaceAll("<(.|\n)*?>", "").trim();

return result.substring(0,(result.length()>100)?100:result.length())+"...";

}

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?