XML Storage

BlobCity offers exhaustive support for XML storage. It allows storage and retrieval of XML, along with running queries on XML as if they were collected into a tabular structure.

A schema inference engine processes the XML document or object being submitted to BlobCity and maps to a tabular schema for appropriate indexing and search-ability. The XML elements are converted to column names, and the values encapsulated within the elements associate with the value of the particular column. Nesting of XML element is permitted with the nested tags forming a column name column name by concatenating the tags separated by a ‘.’ (dot).

Let us consider starting with an empty collection and then insert our first XML document into it. An empty collection is shown in figure below.

Empty table with a single _id (primary key) column

_id
 

Consider an insert of the following XML document into our table.

<?xml version="1.0" encoding="UTF-8"?>
<user>
    <name>John</name>
    <age>45</age>
</user>

The collection after taking the insert of the above XML appears as shown below.

Collection with the 1st XML record inserted

_id user.name user.age
xxx John 45

The insert operation results in two columns getting automatically created within the collection. The columns are named user.name and user.age respectively. Although not visible within a collection schema, the name and age columns are actually placed within the user column. The user column will not show up in a collection schema, but it can be used to specify the columns to be selected in a select query, but cannot be used inside conditional where clauses or for any other purpose.

The response of all SQL queries is defaulted to a JSON array and is consistent with the response received against JSON data present within the collection. The XML is automatically mapped to equivalent JSON when running a SQL query. The database however stores the XML, and it can be fetched in its original form if explicitly specified. However the record cannot be retrieved in XML form as part of the SQL search query.

SELECT * FROM ds1.users
[
  {"user": {"name": "John", "age": 45}}
]

The response is a JSON array with each record being a JSON object entry within the array. A major difference between the working of the JSON and working of the XML document is that the name and age value actually get nested within the user key as was present in the original document. This may happen to be the case for most XML documents or objects you use. A standard practice with XML encapsulates the enters within a root tag, which results in the nesting.

The root tag and the XML format specification tag is not mandatory for BlobCity to infer the XML document. Let us see the table with an insert without the xml element at the beginning of the document.

<user>
    <name>Tom</name>
    <age>26</age>
</user>

Collection with a 2nd XML record inserted

_id user.name user.age
xxx John 45
xxx Tom 26

The insert is successful with the collection getting an additional record. The xxx is the placeholder for auto-defined values, with the actual record containing a unique value that is auto generated by the database.

To explicitly specify the _id value for any element, the following XML structure needs to be used.

<?xml version="1.0" encoding="UTF-8"?>
<_id>1000</_id>
<user>
    <name>Mary</name>
    <age>30</age>
</user>

Collection with 3rd XML record having an explicitly specified _id value

_id user.name user.age
xxx John 45
xxx Tom 26
1000 Mary 30

Nested XML’s

<?xml version="1.0" encoding="UTF-8"?>
<user>
<name>Stacy</name>
    <age>43</age>
    <addr>
        <line1>line1</line1>
        <line2></line2>
        <state>IL</state>
        <zip>60002</zip>
    </addr>
</user>

Collection post insertion of a nested XML

_id user.name user.age user.addr user.addr.line1 user.addr.line2 user.addr.state user.addr.zip
xxx John 45          
xxx Tom 26          
1000 Mary 30          
xxx Stacy 43 line1 IL 60002    

Four new columns are added into the table. These columns are added under a sub-column of user.addr. While user.addr is not an actual column that store data, while user.addr.line1, user.addr.line2, user.addr.state, user.addr.zip are actual columns, the column user.addr can be queried only in select queries to select all of the nested columns.

The created columns will remain in the schema unless explicitly deleted, even if the record for Stacy is deleted.

No root tag

While XML documents by general convention have a root tag, the same can be skipped when inserting an XML into BlobCity. The database supports interpretation of XML documents that have an XML like structure without a root tag present. The root tag in the XML that we used so far has been the <user></user> tag.

<?xml version="1.0" encoding="UTF-8"?>
<name>Stacy</name>
<age>43</age>
<addr>
    <line1>line1</line1>
    <line2></line2>
    <state>IL</state>
    <zip>60002</zip>
</addr>
_id name age addr.line1 addr.line2 addr.state addr.zip
xxx Stacy 43 line1   IL 60002

When the above XML is inserted a new empty table, the columns created are not under the user root tag as were previously. It is important to note that the structure of this table is exactly same as the table created by inserting an equivalent JSON object.

Arrays in XML

When XML documents have a tag that is repeated, depending on the position of the tag, the inference engine consider occurrence of the tag as separate records, or a collection field within a single record. If the root tag is repeated, and no other tag occurs at the root level, then each root tag and its contents are treated as a separate record. If sub-tags are repeated, or a tag at the root level is repeated, but there are other tags other than the repeated tag at the root level, then contents of each repeated tag are collected into the single column as a collection field.