summaryrefslogtreecommitdiff
path: root/textproc/py-feedparser
AgeCommit message (Collapse)AuthorFilesLines
2011-03-16-update to 5.0.1drochner2-6/+9
changes: fixes for issues: -invalid text in XML declaration causes sanitizer to crash -sanitization can be bypassed by malformed XML comments -sanitizer doesn't strip unsafe URI schemes -add test target
2011-01-28Update to 5.0. From the changelog:schmonz2-9/+10
* Improved MathML support * Support microformats (rel-tag, rel-enclosure, xfn, hcard) * Support IRIs * Allow safe CSS through sanitization * Allow safe HTML5 through sanitization * Support SVG * Support inline XML entity declarations * Support unescaped quotes and angle brackets in attributes * Support additional date formats * Added the request_headers argument to parse() * Added the response_headers argument to parse() * Support multiple entry, feed, and source authors * Officially make Python 2.4 the earliest supported version * Support Python 3 * Bug fixes, bug fixes, bug fixes
2009-08-05Update to 4.2-pre-294. From the commit log:schmonz2-7/+7
* Handle Jacques distler's nested svg/mathml * Render correctly when item description contains <code> with <br /> * Add "controls" attribute for video * Strip "autoplay" attribute from video
2008-08-07Update to a prerelease 4.2 snapshot, as 4.1 no longer copesschmonz2-11/+8
particularly well with many feeds and there's no indication that a release is imminent. From the changelog: * Support for parsing microformats, including rel=enclosure, rel=tag, XFN, and hCard. * Updated the whitelist of acceptable HTML elements and attributes based on the latest draft of the HTML 5 specification. * Support for CSS Sanitization. (Previous versions of Universal Feed Parser simply stripped all inline styles.) Many thanks to Sam Ruby for implementing this, despite my insistence that it was impossible. * Support for SVG Sanitization. * Support for MathML Sanitization. Many thanks to Jacques Distler for patiently debugging this feature. * IRI support for every element that can contain a URI. * Ability to disable relative URI resolution. * Command-line arguments and alternate serializers, for manipulating Universal Feed Parser from shell scripts or other non-Python sources. * More robust parsing of author email addresses, misencoded win-1252 content, rel=self links, and better detection of HTML content in elements with ambiguous content types.
2008-06-12Add DESTDIR support.joerg1-1/+3
2008-04-25Update PYTHON_VERSIONS_COMPATIBLEjoerg1-2/+1
- assume that Python 2.4 and 2.5 are compatible and allow checking for fallout. - remove PYTHON_VERSIONS_COMPATIBLE that are obsoleted by the 2.3+ default. Modify the others to deal with the removals.
2006-02-05Recursive revision bump / recommended bump for gettext ABI change.joerg1-1/+2
2006-01-24Update to 4.1. From the changelog:schmonz2-6/+6
* removed socket timeout * added support for chardet library
2005-12-28Update to 4.0.2. From the changelog:schmonz2-7/+7
* bug fixes for Python 2.1 compatibility * cleared _debug flag
2005-12-27Update to 4.0. From the changelog:schmonz3-12/+13
* Support for Atom 1.0. * Support for iTunes extensions. * Support for dc:contributor. * Universal Feed Parser now captures the feed's namespaces. See Namespace Handling for details. * Lots of things have been renamed to match Atom 1.0 terminology. issued is now entries[i].published, modified is now entries[i].updated, and url is now href everywhere. You can still access these elements with the old names, so you shouldn't need to change any existing code, but don't be surprised if you can't find them during debugging. * category and categories have been replaced by tags, see feed.tags and entries[i].tags. The old names still work. * mode is gone from all detail and content dictionaries. It was never terribly useful, since Universal Feed Parser unescapes content automatically. * entries[i].source is now a dictionary of feed metadata as per section 4.2.11 of RFC 4287. Universal Feed Parser no longer supports the RSS 2.0's source element. * Content in unknown namespaces is no longer discarded (bug 993305) * Lots of other bug fixes.
2005-04-11Remove USE_BUILDLINK3 and NO_BUILDLINK; these are no longer used.tv1-2/+1
2005-02-24Add RMD160 digests to the SHA1 ones.agc1-1/+2
2005-01-27Accept Python 2.4.schmonz1-2/+2
2005-01-23Build Python with thread support by default and turn the existingrecht1-2/+2
python*-pth packages into meta-packages which will install the non-pth packages. Bump PKGREVISIONs on the non-pth versions to propagate the thread change, but leave the *-pth versions untouched to not affect existing installations. Sync all PYTHON_VERSIONS_AFFECTED lines in package Makefiles.
2004-08-28Fix PLIST.schmonz1-1/+2
2004-07-22add python as categoryrecht1-2/+2
ok'd a while back at pkgsrcCon by agc and wiz
2004-07-17Update to 3.3.schmonz2-5/+5
Changes in 3.2: * use cjkcodecs and iconv_codec if available * always convert feed to UTF-8 before passing to XML parser * completely revamped logic for determining character encoding and attempting XML parsing (much faster) * increased default timeout to 20 seconds * test for presence of Location header on redirects * added tests for many alternate character encodings * support various EBCDIC encodings * support UTF-16BE and UTF16-LE with or without a BOM * support UTF-8 with a BOM * support UTF-32BE and UTF-32LE with or without a BOM * fixed crashing bug if no XML parsers are available * added support for "Content-encoding: deflate" * send blank "Accept-encoding: " header if neither gzip nor zlib modules are available Changes in 3.3: * optimize EBCDIC to ASCII conversion * fix obscure problem tracking xml:base and xml:lang if element declares it, child doesn't, first grandchild redeclares it, and second grandchild doesn't * refactored date parsing * defined public registerDateHandler so callers can add support for additional date formats at runtime * added support for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1) * added zopeCompatibilityHack() which turns FeedParserDict into a regular dictionary, required for Zope compatibility, and also makes command-line debugging easier because pprint module formats real dictionaries better than dictionary-like objects * added NonXMLContentType exception, which is stored in bozo_exception when a feed is served with a non-XML media type such as "text/plain" * respect Content-Language as default language if not xml:lang is present * cloud dict is now FeedParserDict * generator dict is now FeedParserDict * better tracking of xml:lang, including support for xml:lang="" to unset the current language * recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default namespace * don't overwrite final status on redirects (scenarios: redirecting to a URL that returns 304, redirecting to a URL that redirects to another URL with a different type of redirect) * add support for HTTP 303 redirects
2004-06-30Update to 3.1. From the changelog:schmonz2-5/+5
* added and passed tests for converting HTML entities to Unicode equivalents in illformed feeds (aaronsw) * added and passed tests for converting character entities to Unicode equivalents in illformed feeds (aaronsw) * test for valid parsers when setting XML_AVAILABLE * make version and encoding available when server returns a 304 * add handlers parameter to pass arbitrary urllib2 handlers (like digest auth or proxy support) * add code to parse username/password out of url and send as basic authentication * expose downloading-related exceptions in bozo_exception (aaronsw) * added __contains__ method to FeedParserDict (aaronsw) * added publisher_detail (aaronsw)
2004-06-27Import Universal Feed Parser 3.0.1.schmonz4-0/+42
Universal Feed Parser is a Python module for downloading and parsing syndicated feeds. It can handle RSS 0.90, Netscape RSS 0.91, Userland RSS 0.91, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom, and CDF feeds. To use Universal Feed Parser, you will need Python 2.1 or later. Universal Feed Parser is not meant to run standalone; it is a module for you to use as part of a larger Python program. Universal Feed Parser is easy to use; the module is self-contained in a single file, feedparser.py, and it has only one public function, parse. parse takes a number of arguments, but only one is required, and it can be a URL, a local filename, or a raw string containing feed data in any format.