[tei-council] TEI Newsfeed problem

James Cummings James.Cummings at it.ox.ac.uk
Mon Dec 31 06:24:18 EST 2012


Hiya David and Council

My email has been being spammed by the TEI Newsfeed because it 
keeps updating.  This is because for some reason when we wget the 
Atom feed it has changed the links to be <a> elements, and 
sometimes it hasn't.  Weird eh?

e.g. it 2012-12-31T05:30:03.atom.xml we have:

===
You can see the details of the vote from here: 
https://opavote.appspot.com/results/678015/0</p>
===

whereas in  2012-12-31T05:45:03.atom.xml we have:

===
You can see the details of the vote from here: <a 
href="https://opavote.appspot.com/results/678015/0" 
rel="nofollow">https://opavote.appspot.com/results/678015/0</a></p>
===

Nothing has changed really in the meantime, just the way that 
wordpress serializes its HTML seems to be one of the two ways 
randomly.

The tests that wget-news.sh does are:

1) If identical then stop
2) If not well-formed then error
3) If doesn't at least have one <entry in it, then error

Previously we used to test if one was bigger than the other, but 
that now won't work since we don't see all articles.

The wget-newsfeed.sh is in SVN in the TEIC/newsfeed/ directory:
http://tei.svn.sourceforge.net/viewvc/tei/trunk/TEIC/newsfeed/wget-news.sh?revision=11283&content-type=text%2Fplain

Can anyone think of a bash shell script test that we could do to 
see if this is a real changed/updated file or not? Is there a 
wordpress setting we can change? Is there some way to wget it 
while telling it we do/don't want this? Any ideas appreciated.

It isn't doing it every 15 minutes ... sometimes the expansion or 
not is the same as the previous time it grabbed it, but it did it 
more than 40 times on 2012-12-30. Now I could turn off the 
email'ing me of the updated atom file easily enough -- though 
getting this helps me know if there are any sudden problems.

However, currently every time the newsfeed updates it makes a 
dateTime-stamped backup copy in: 
/lv1/projects/tei/web/include/news-old/ on the tei-c.org machine. 
   At 224K each, the files from 2012-12-30 took up 10 meg. While 
we're not going to run out of space soon (14G currently available 
on /lv1), it does seem like a problem we should solve.

Any suggestions appreciated.

-James

-- 
Dr James Cummings, James.Cummings at it.ox.ac.uk
Research Support, IT Services, University of Oxford


More information about the tei-council mailing list