Convenience method for web scraping. Requires arsd.http2 to be included in the build.
This is just something I'm toying with. Right now, you use opIndex to put in css selectors. It returns a struct that forwards calls to all elements it holds, and returns itself so you can chain it.
If you're using this for some other kind of XML, you can set the content type here.
implementing the FileResource interface, useful for sending via http automatically.
implementing the FileResource interface; it calls toString.
Concatenates any consecutive text nodes This will set delegates for parseSaw* (note: this overwrites anything else you set, and you setting subsequently will overwrite this) that add those things to the dom tree when it sees them. Call this before calling parse(). Note this will also preserve the prolog and doctype from the original file, if there was one.
If the parser sees a html comment, it will call this callback <!-- comment --> will call parseSawComment(" comment ") Return true if you want the node appended to the document.
If the parser sees <% asp code... %>, it will call this callback. It will be passed "% asp code... %" or "%= asp code .. %" Return true if you want the node appended to the document.
If the parser sees <?php php code... ?>, it will call this callback. It will be passed "?php php code... ?" or "?= asp code .. ?" Note: dom.d cannot identify the other php <? code ?> short format. Return true if you want the node appended to the document.
if it sees a <?xxx> that is not php or asp it calls this function with the contents. <?SOMETHING foo> calls parseSawQuestionInstruction("?SOMETHING foo") Unlike the php/asp ones, this ends on the first > it sees, without requiring ?>. Return true if you want the node appended to the document.
if it sees a <! that is not CDATA or comment (CDATA is handled automatically and comments call parseSawComment), it calls this function with the contents. <!SOMETHING foo> calls parseSawBangInstruction("SOMETHING foo") Return true if you want the node appended to the document.
Given the kind of garbage you find on the Internet, try to make sense of it. Equivalent to document.parse(data, false, false, null); (Case-insensitive, non-strict, determine character encoding from the data.) NOTE: this makes no attempt at added security.
Parses well-formed UTF-8, case-sensitive, XML or XHTML Will throw exceptions on things like unclosed tags.
Parses well-formed UTF-8 in loose mode (by default). Tries to correct tag soup, but does NOT try to correct bad character encodings.
Take XMLish data and try to make the DOM tree out of it.
Gets the <title> element's innerText, if one exists
Sets the title of the page, creating a <title> element if needed.
These functions all forward to the root element. See the documentation in the Element class.
These functions all forward to the root element. See the documentation in the Element class.
FIXME: btw, this could just be a lazy range......
This returns the <body> element, if there is one. (It different than Javascript, where it is called 'body', because body is a keyword in D.)
this uses a weird thing... it's [name=] if no colon and [property=] if colon
Sets a meta tag in the document header. It is kinda hacky to work easily for both Facebook open graph and traditional html meta tags/
.
.
.
.
.
.
.
.
.
Writes it out with whitespace for easier eyeball debugging
.
if these were kept, this is stuff that appeared before the root element, such as <?xml version ?> decls and <!DOCTYPE>s
stuff after the root, only stored in non-strict mode and not used in toString, but available in case you want it
.
Specializes Document for handling generic XML. (always uses strict mode, uses xml mime type and file header)