Friday, December 08, 2006
What to do with dirty data in an Syndication feed
Hello all,
Yes it has been some time since I posted on this blog. I have a bunch of other blogs that have gotten more attention of late. Also finding appropriate stuff to post here is tricky sometimes. Anyway onto the meat of this one:
So I've been working with feeds of late and found myself in need of a way to handle the case where a summary is not present in a feed but the main content is. Some of the time this content can contain HTML and I hope I don't need to tell you what a pain that can be when trying to create a summary. So what did I do you ask, I used a simple regular expression and then took a substring of the resulting string.
Here is the code: (Java Code BTW)
Yes it has been some time since I posted on this blog. I have a bunch of other blogs that have gotten more attention of late. Also finding appropriate stuff to post here is tricky sometimes. Anyway onto the meat of this one:
So I've been working with feeds of late and found myself in need of a way to handle the case where a summary is not present in a feed but the main content is. Some of the time this content can contain HTML and I hope I don't need to tell you what a pain that can be when trying to create a summary. So what did I do you ask, I used a simple regular expression and then took a substring of the resulting string.
Here is the code: (Java Code BTW)
public String process(String content){
String result = new String(content);
result = result.replaceAll("<(.|\n)*?>", "").trim();
return result.substring(0,(result.length()>100)?100:result.length())+"...";
}