[tei-council] TEI Newsfeed problem
James Cummings
James.Cummings at it.ox.ac.uk
Mon Dec 31 06:24:18 EST 2012
Hiya David and Council
My email has been being spammed by the TEI Newsfeed because it
keeps updating. This is because for some reason when we wget the
Atom feed it has changed the links to be <a> elements, and
sometimes it hasn't. Weird eh?
e.g. it 2012-12-31T05:30:03.atom.xml we have:
===
You can see the details of the vote from here:
https://opavote.appspot.com/results/678015/0</p>
===
whereas in 2012-12-31T05:45:03.atom.xml we have:
===
You can see the details of the vote from here: <a
href="https://opavote.appspot.com/results/678015/0"
rel="nofollow">https://opavote.appspot.com/results/678015/0</a></p>
===
Nothing has changed really in the meantime, just the way that
wordpress serializes its HTML seems to be one of the two ways
randomly.
The tests that wget-news.sh does are:
1) If identical then stop
2) If not well-formed then error
3) If doesn't at least have one <entry in it, then error
Previously we used to test if one was bigger than the other, but
that now won't work since we don't see all articles.
The wget-newsfeed.sh is in SVN in the TEIC/newsfeed/ directory:
http://tei.svn.sourceforge.net/viewvc/tei/trunk/TEIC/newsfeed/wget-news.sh?revision=11283&content-type=text%2Fplain
Can anyone think of a bash shell script test that we could do to
see if this is a real changed/updated file or not? Is there a
wordpress setting we can change? Is there some way to wget it
while telling it we do/don't want this? Any ideas appreciated.
It isn't doing it every 15 minutes ... sometimes the expansion or
not is the same as the previous time it grabbed it, but it did it
more than 40 times on 2012-12-30. Now I could turn off the
email'ing me of the updated atom file easily enough -- though
getting this helps me know if there are any sudden problems.
However, currently every time the newsfeed updates it makes a
dateTime-stamped backup copy in:
/lv1/projects/tei/web/include/news-old/ on the tei-c.org machine.
At 224K each, the files from 2012-12-30 took up 10 meg. While
we're not going to run out of space soon (14G currently available
on /lv1), it does seem like a problem we should solve.
Any suggestions appreciated.
-James
--
Dr James Cummings, James.Cummings at it.ox.ac.uk
Research Support, IT Services, University of Oxford
More information about the tei-council
mailing list