[tei-council] identifying tickets

Tue Apr 12 23:19:12 EDT 2011

On the suspicion that I wasn't the only one has found tracking down 
tickets in the last two days a bit of a pain, I have written a script 
that converts the RSS feeds from sourceforge into TEI:

http://dl.dropbox.com/u/4479425/sourceforgeRSS2TEI/sourceforgeRSS2TEI.xsl

Current results are at:

http://dl.dropbox.com/u/4479425/sourceforgeRSS2TEI/bugs.xml
http://dl.dropbox.com/u/4479425/sourceforgeRSS2TEI/features.xml

I seriously considered generating wiki markup instead, so it could be 
cut-n-pasted into the wiki (where by-column sorting functionality and 
fancy layout options are 'free'), but figured that TEI could be 
re-purposed by more people. I'm open to suggestions as to where this 
script and the resulting data should be exposed publicly.

Feedback is welcome, as always.

Current known issues:

* Double escaping of <, > and & (that's how it comes in the RSS and | 
didn't want to start shaving that yak)
* Limit of 100 bugs + 100 features (can use multiple requests + xinclude 
to get more if needed?)
* Comments appear unavailable via API (and i'm not scraping the HTML)
* title field may be truncated if it contains a <
* Dates aren't in an easily sortable format
* The header is impoverished, as is usual for such documents

How I use the script:

wget --output-document bugs.rss 
"http://sourceforge.net/api/artifact/index/tracker-id/644062/limit/1000/rss"
wget --output-document features.rss 
"http://sourceforge.net/api/artifact/index/tracker-id/644065/limit/1000/rss"
10096	 xsltproc ./sourceforgeRSS2TEI.xsl ./bugs.rss | xmllint --format - 
 > bugs.xml
10098	 xsltproc ./sourceforgeRSS2TEI.xsl ./features.rss | xmllint 
--format - > ./features.xml

cheers
stuart
-- 
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/