[tei-council] TEI Newsfeed problem

David Sewell dsewell at virginia.edu
Mon Dec 31 10:20:55 EST 2012


As Kevin had a query to me about the newsfeed and I think I may have a 
solution to James's problem, I'm going to respond offline to the two of 
them so as not to clutter the list with technical details,

David

On Mon, 31 Dec 2012, James Cummings wrote:

> Hiya David and Council
>
> My email has been being spammed by the TEI Newsfeed because it keeps 
> updating.  This is because for some reason when we wget the Atom feed it has 
> changed the links to be <a> elements, and sometimes it hasn't.  Weird eh?
>
> e.g. it 2012-12-31T05:30:03.atom.xml we have:
>
> ===
> You can see the details of the vote from here: 
> https://opavote.appspot.com/results/678015/0</p>
> ===
>
> whereas in  2012-12-31T05:45:03.atom.xml we have:
>
> ===
> You can see the details of the vote from here: <a 
> href="https://opavote.appspot.com/results/678015/0" 
> rel="nofollow">https://opavote.appspot.com/results/678015/0</a></p>
> ===
>
> Nothing has changed really in the meantime, just the way that wordpress 
> serializes its HTML seems to be one of the two ways randomly.
>
> The tests that wget-news.sh does are:
>
> 1) If identical then stop
> 2) If not well-formed then error
> 3) If doesn't at least have one <entry in it, then error
>
> Previously we used to test if one was bigger than the other, but that now 
> won't work since we don't see all articles.
>
> The wget-newsfeed.sh is in SVN in the TEIC/newsfeed/ directory:
> http://tei.svn.sourceforge.net/viewvc/tei/trunk/TEIC/newsfeed/wget-news.sh?revision=11283&content-type=text%2Fplain
>
> Can anyone think of a bash shell script test that we could do to see if this 
> is a real changed/updated file or not? Is there a wordpress setting we can 
> change? Is there some way to wget it while telling it we do/don't want this? 
> Any ideas appreciated.
>
> It isn't doing it every 15 minutes ... sometimes the expansion or not is the 
> same as the previous time it grabbed it, but it did it more than 40 times on 
> 2012-12-30. Now I could turn off the email'ing me of the updated atom file 
> easily enough -- though getting this helps me know if there are any sudden 
> problems.
>
> However, currently every time the newsfeed updates it makes a 
> dateTime-stamped backup copy in: /lv1/projects/tei/web/include/news-old/ on 
> the tei-c.org machine.   At 224K each, the files from 2012-12-30 took up 10 
> meg. While we're not going to run out of space soon (14G currently available 
> on /lv1), it does seem like a problem we should solve.
>
> Any suggestions appreciated.
>
> -James
>
>

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/


More information about the tei-council mailing list