Document

The main document interface, including a html parser.

class Document : FileResource {}

Constructors

this
this(string data, bool caseSensitive, bool strict)

.

this
this()

Creates an empty document. It has *nothing* in it at all.

Members

Functions

clear
void clear()

.

createElement
Element createElement(string name)

.

createForm
Form createForm()

.

createFragment
Element createFragment()

.

createTextNode
Element createTextNode(string content)

.

dispatchMutationEvent
void dispatchMutationEvent(DomMutationEvent e)
Undocumented in source. Be warned that the author may not have intended to support it.
enableAddingSpecialTagsToDom
void enableAddingSpecialTagsToDom()

Concatenates any consecutive text nodes This will set delegates for parseSaw* (note: this overwrites anything else you set, and you setting subsequently will overwrite this) that add those things to the dom tree when it sees them. Call this before calling parse(). Note this will also preserve the prolog and doctype from the original file, if there was one.

findFirst
Element findFirst(bool delegate(Element) doesItMatch)

.

forms
Form[] forms()

.

getData
immutable(ubyte)[] getData()

implementing the FileResource interface; it calls toString.

getElementById
Element getElementById(string id)
getElementsByClassName
Element[] getElementsByClassName(string tag)
getElementsBySelector
Element[] getElementsBySelector(string selector)
getElementsByTagName
Element[] getElementsByTagName(string tag)

These functions all forward to the root element. See the documentation in the Element class.

getFirstElementByTagName
Element getFirstElementByTagName(string tag)

FIXME: btw, this could just be a lazy range......

getMeta
string getMeta(string name)

this uses a weird thing... it's [name=] if no colon and [property=] if colon

handleDataEncoding
Utf8Stream handleDataEncoding(string rawdata, string dataEncoding, bool strict)
Undocumented in source. Be warned that the author may not have intended to support it.
mainBody
Element mainBody()

This returns the <body> element, if there is one. (It different than Javascript, where it is called 'body', because body is a keyword in D.)

opIndex
ElementCollection opIndex(string selector)

This is just something I'm toying with. Right now, you use opIndex to put in css selectors. It returns a struct that forwards calls to all elements it holds, and returns itself so you can chain it.

optionSelector
MaybeNullElement!SomeElementType optionSelector(string selector, string file, size_t line)
Undocumented in source. Be warned that the author may not have intended to support it.
parse
void parse(string rawdata, bool caseSensitive, bool strict, string dataEncoding)

Take XMLish data and try to make the DOM tree out of it.

parseGarbage
void parseGarbage(string data)

Given the kind of garbage you find on the Internet, try to make sense of it. Equivalent to document.parse(data, false, false, null); (Case-insensitive, non-strict, determine character encoding from the data.) NOTE: this makes no attempt at added security.

parseStream
void parseStream(Utf8Stream data, bool caseSensitive, bool strict)
Undocumented in source. Be warned that the author may not have intended to support it.
parseStrict
void parseStrict(string data)

Parses well-formed UTF-8, case-sensitive, XML or XHTML Will throw exceptions on things like unclosed tags.

parseUtf8
void parseUtf8(string data, bool caseSensitive, bool strict)

Parses well-formed UTF-8 in loose mode (by default). Tries to correct tag soup, but does NOT try to correct bad character encodings.

querySelector
Element querySelector(string selector)
querySelectorAll
Element[] querySelectorAll(string selector)
requireElementById
SomeElementType requireElementById(string id, string file, size_t line)
requireSelector
SomeElementType requireSelector(string selector, string file, size_t line)

These functions all forward to the root element. See the documentation in the Element class.

setMeta
void setMeta(string name, string value)

Sets a meta tag in the document header. It is kinda hacky to work easily for both Facebook open graph and traditional html meta tags/

setProlog
void setProlog(string d)

.

toPrettyString
string toPrettyString(bool insertComments)

Writes it out with whitespace for easier eyeball debugging

toString
string toString()

.

Properties

contentType
string contentType [@property setter]

If you're using this for some other kind of XML, you can set the content type here.

contentType
string contentType [@property getter]

implementing the FileResource interface, useful for sending via http automatically.

prolog
string prolog [@property getter]
Undocumented in source. Be warned that the author may not have intended to support it.
title
string title [@property getter]

Gets the <title> element's innerText, if one exists

title
string title [@property setter]

Sets the title of the page, creating a <title> element if needed.

Static functions

fromUrl
Document fromUrl(string url)

Convenience method for web scraping. Requires arsd.http2 to be included in the build.

Variables

_contentType
string _contentType;
Undocumented in source.
eventObservers
void delegate(DomMutationEvent)[] eventObservers;
Undocumented in source.
loose
bool loose;

.

parseSawAspCode
bool delegate(string) parseSawAspCode;

If the parser sees <% asp code... %>, it will call this callback. It will be passed "% asp code... %" or "%= asp code .. %" Return true if you want the node appended to the document.

parseSawBangInstruction
bool delegate(string) parseSawBangInstruction;

if it sees a <! that is not CDATA or comment (CDATA is handled automatically and comments call parseSawComment), it calls this function with the contents. <!SOMETHING foo> calls parseSawBangInstruction("SOMETHING foo") Return true if you want the node appended to the document.

parseSawComment
bool delegate(string) parseSawComment;

If the parser sees a html comment, it will call this callback <!-- comment --> will call parseSawComment(" comment ") Return true if you want the node appended to the document.

parseSawPhpCode
bool delegate(string) parseSawPhpCode;

If the parser sees <?php php code... ?>, it will call this callback. It will be passed "?php php code... ?" or "?= asp code .. ?" Note: dom.d cannot identify the other php <? code ?> short format. Return true if you want the node appended to the document.

parseSawQuestionInstruction
bool delegate(string) parseSawQuestionInstruction;

if it sees a <?xxx> that is not php or asp it calls this function with the contents. <?SOMETHING foo> calls parseSawQuestionInstruction("?SOMETHING foo") Unlike the php/asp ones, this ends on the first > it sees, without requiring ?>. Return true if you want the node appended to the document.

piecesAfterRoot
Element[] piecesAfterRoot;

stuff after the root, only stored in non-strict mode and not used in toString, but available in case you want it

piecesBeforeRoot
Element[] piecesBeforeRoot;

if these were kept, this is stuff that appeared before the root element, such as <?xml version ?> decls and <!DOCTYPE>s

root
Element root;

.

Inherited Members

From FileResource

contentType
string contentType [@property getter]

the content-type of the file. e.g. "text/html; charset=utf-8" or "image/png"

getData
immutable(ubyte)[] getData()

the data

Meta