2 TODO for the XML parser and stuff:
3 ==================================
7 this tend to be outdated :-\ ...
12 - use case of using XInclude to load for example a description.
13 order document + product base -(XSLT)-> quote with XIncludes
15 HTML output with description of parts <---(XSLT)--
20 - fix the C code prototype to bring back doc/libxml-undocumented.txt
22 - Computation of base when HTTP redirect occurs, might affect HTTP
24 - listing all attributes in a node.
25 - Correct standalone checking/emitting (hard)
26 2.9 Standalone Document Declaration
27 - Better checking of external parsed entities TAG 1234
28 - Go through erratas and do the cleanup.
29 http://www.w3.org/XML/xml-19980210-errata ... started ...
30 - jamesh suggestion: SAX like functions to save a document ie. call a
31 function to open a new element with given attributes, write character
32 data, close last element, etc
33 + inversted SAX, initial patch in April 2002 archives.
34 - htmlParseDoc has parameter encoding which is not used.
35 Function htmlCreateDocParserCtxt ignore it.
36 - fix realloc() usage.
37 - compliance to XML-Namespace checking, see section 6 of
38 http://www.w3.org/TR/REC-xml-names/
39 - Stricten the UTF8 conformance (Martin Duerst):
40 http://www.w3.org/2001/06/utf-8-test/.
41 The bad files are in http://www.w3.org/2001/06/utf-8-wrong/.
47 - move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to
48 global.c. Bjorn noted that the following files depends on parser.o solely
49 because of these string functions: entities.o, global.o, hash.o, tree.o,
52 - Optimization of tag strings allocation ?
54 - maintain coherency of namespace when doing cut'n paste operations
55 => the functions are coded, but need testing
57 - function to rebuild the ID table
58 - functions to rebuild the DTD hash tables (after DTD changes).
64 - Fix output of <tst val="x
y"/>
66 - Tools to produce man pages from the SGML docs.
68 - Add Xpointer recognition/API
70 - Add Xlink recognition/API
71 => started adding an xlink.[ch] with a unified API for XML and HTML.
75 => Really need to be done <grin/>
79 => this is a somewhat ugly mix of HTML and XML, adding a specific
80 routine in the comment parsing code of HTML and plug the XML
81 parsing one in-there should not be too hard. Key point is to get
82 XSL to transform all this to something decent ...
85 - extend the shell with:
88 - mv (yum, yum, but it's harder because directories are ordered in
89 our case, mvup and mvdown would be required)
91 - Add HTML validation using the XHTML DTD
92 - problem: do we want to keep and maintain the code for handling
93 DTD/System ID cache directly in libxml ?
95 - Add a DTD cache prefilled with xhtml DTDs and entities and a program to
96 manage them -> like the /usr/bin/install-catalog from SGML
97 right place seems $datadir/xmldtds
98 Maybe this is better left to user apps
100 - Add output to XHTML in case of HTML documents.
106 - Implement OASIS XML Catalog support
107 http://www.oasis-open.org/committees/entity/
109 - Get OASIS testsuite to a more friendly result, check all the results
110 once stable. the check-xml-test-suite.py script does this
116 => attributes addressing troubles
117 => defaulted attributes handling
119 done as XSLT got debugged
121 - bug reported by Michael Meallin on validation problems
122 => Actually means I need to add support (and warn) for non-deterministic
124 - Handle undefined namespaces in entity contents better ... at least
127 int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr);
128 => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp
130 - HTML: handling of Script and style data elements, need special code in
131 the parser and saving functions (handling of < > " ' ...):
132 http://www.w3.org/TR/html4/types.html#type-script
133 Attributes are no problems since entities are accepted.
135 xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value)
136 - problem when parsing hrefs with & with the HTML parser (IRC ac)
137 - If the internal encoding is not UTF8 saving to a given encoding doesn't
138 work => fix to force UTF8 encoding ...
139 done, added documentation too
140 - Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii)
141 - Issue warning when using non-absolute namespaces URI.
142 - the html parser should add <head> and <body> if they don't exist
143 started, not finished.
144 Done, the automatic closing is added and 3 testcases were inserted
145 - Command to force the parser to stop parsing and ignore the rest of the file.
146 xmlStopParser() should allow this, mostly untested
147 - support for HTML empty attributes like <hr noshade>
148 - plugged iconv() in for support of a large set of encodings.
149 - xmlSwitchToEncoding() rewrite done
150 - URI checkings (no fragments) rfc2396.txt
151 - Added a clean mechanism for overload or added input methods:
152 xmlRegisterInputCallbacks()
153 - dynamically adapt the alloc entry point to use g_alloc()/g_free()
154 if the programmer wants it:
155 - use xmlMemSetup() to reset the routines used.
156 - Check attribute normalization especially xmlGetProp()
157 - Validity checking problems for NOTATIONS attributes
158 - Validity checking problems for ENTITY ENTITIES attributes
159 - Parsing of a well balanced chunk xmlParseBalancedChunkMemory()
160 - URI module: validation, base, etc ... see uri.[ch]
161 - turn tester into a generic program xmllint installed with libxml
162 - extend validity checks to go through entities content instead of
163 just labelling them PCDATA
164 - Save Dtds using the children list instead of dumping the tables,
165 order is preserved as well as comments and PIs
166 - Wrote a notice of changes requires to go from 1.x to 2.x
167 - make sure that all SAX callbacks are disabled if a WF error is detected
168 - checking/handling of newline normalization
169 http://localhost/www.xml.com/axml/target.html#sec-line-ends
170 - correct checking of '&' '%' on entities content.
171 - checking of PE/Nesting on entities declaration
172 - checking/handling of xml:space
174 - handling done, not well tested
175 - Language identification code, productions [33] to [38]
176 => done, the check has been added and report WFness errors
177 - Conditional sections in DTDs [61] to [65]
178 => should this crap be really implemented ???
179 => Yep OASIS testsuite uses them
180 - Allow parsed entities defined in the internal subset to override
181 the ones defined in the external subset (DtD customization).
182 => This mean that the entity content should be computed only at
183 use time, i.e. keep the orig string only at parse time and expand
184 only when referenced from the external subset :-(
185 Needed for complete use of most DTD from Eve Maler
186 - Add regression tests for all WFC errors
187 => did some in test/WFC
188 => added OASIS testsuite routines
189 http://xmlsoft.org/conf/result.html
191 - I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
192 by the XML parser, UTF-8 should be checked when there is no "encoding"
194 - Support for UTF-8 and UTF-16 encoding
195 => added some convertion routines provided by Martin Durst
196 patched them, got fixes from @@@
197 I plan to keep everything internally as UTF-8 (or ISO-Latin-X)
198 this is slightly more costly but more compact, and recent processors
199 efficiency is cache related. The key for good performances is keeping
200 the data set small, so will I.
201 => the new progressive reading routines call the detection code
202 is enabled, tested the ISO->UTF-8 stuff
203 - External entities loading:
204 - allow override by client code
205 - make sure it is alled for all external entities referenced
206 Done, client code should use xmlSetExternalEntityLoader() to set
207 the default loading routine. It will be called each time an external
208 entity entity resolution is triggered.
209 - maintain ID coherency when removing/changing attributes
210 The function used to deallocate attributes now check for it being an
211 ID and removes it from the table.
212 - push mode parsing i.e. non-blocking state based parser
213 done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
214 and xmlParseChunk() and html counterparts.
215 The tester program now has a --push option to select that parser
216 front-end. Douplicated tests to use both and check results are similar.
218 - Most of XPath, still see some troubles and occasionnal memleaks.
219 - an XML shell, allowing to traverse/manipulate an XML document with
220 a shell like interface, and using XPath for the anming syntax
221 - use of readline and history added when available
222 - the shell interface has been cleanly separated and moved to debugXML.c
223 - HTML parser, should be fairly stable now
224 - API to search the lang of an attribute
225 - Collect IDs at parsing and maintain a table.
226 PBM: maintain the table coherency
227 PBM: how to detect ID types in absence of DtD !
228 - Use it for XPath ID support
229 - Add validity checking
230 Should be finished now !
231 - Add regression tests with entity substitutions
233 - External Parsed entities, either XML or external Subset [78] and [79]
234 parsing the xmllang DtD now works, so it should be sufficient for
237 - progressive reading. The entity support is a first step toward
238 asbtraction of an input stream. A large part of the context is still
239 located on the stack, moving to a state machine and putting everyting
240 in the parsing context should provide an adequate solution.
241 => Rather than progressive parsing, give more power to the SAX-like
242 interface. Currently the DOM-like representation is built but
243 => it should be possible to define that only as a set of SAX callbacks
244 and remove the tree creation from the parser code.
247 - DOM support, instead of using a proprietary in memory
248 format for the document representation, the parser should
249 call a DOM API to actually build the resulting document.
250 Then the parser becomes independent of the in-memory
251 representation of the document. Even better using RPC's
252 the parser can actually build the document in another
254 => Work started, now the internal representation is by default
255 very near a direct DOM implementation. The DOM glue is implemented
256 as a separate module. See the GNOME gdome module.
258 - C++ support : John Ehresman <jehresma@dsg.harvard.edu>
259 - Updated code to follow more recent specs, added compatibility flag
260 - Better error handling, use a dedicated, overridable error
263 - Keep track of line numbers for better error reporting.
264 - Support for PI (SAX one).
265 - Support for Comments (bad, should be in ASAP, they are parsed
266 but not stored), should be configurable.
267 - Improve the support of entities on save (+SAX).