Metah.X: An XML Metaprogramming Language

Metah.X(简称MX)用自创的语法实现了XML Schema 1.0的语义,并且用C#实现了一个Schema-lized Document Object Model (SDOM),编译器编译MX代码后将生成使用SDOM的C#代码,这将XML Schema的语义映射到C#上,从而完全释放出XML Schema的力量。尽管现在只有C#版,实现Java版或其它语言版本是完全可能的。

超越标记

下面两个关键字,markup和infoset,谁更能体现XML本质?markup是语法,infoset是语义,这个语义也是标准化的:XML Information Set。所谓的XML 1.0,即充斥着尖括号的文本,不过是infoset的展现/序列化方式之一,当然也是最常用的,完全可以设计出另一种展现/序列化infoset的方式,比如微软的.NET Binary Format: XML Data Structure,然而由于W3C的不作为,至今没有一个"Binary XML"的序列化标准。
下面简要介绍XML的语义,XML数据(或叫XML实例,XML文档)是一个树形的结构,由element、attribute和字符构成,element和attribute的full name由namespace uri和local name组成,下面的XML数据:
<E1 xmlns="http://ns1">
  <E2 A1="123">abc</E2>
  <E3 xmlns="">
    <p:E4 xmlns:p="http://ns2" p:A1="true" A1="123" />
    def
  </E3>
</E1>
上例中,"E1","E2","E3","E4"和"A1"是local name,namespace uri有两种宣告方式,default式和prefix式,最终可以得到element和attribute的effective namespace uri,上例中,E1和E2的effective namespace uri为"http://ns1",E2的A1,E3和E4的第二个A1的effective namespace uri为empty,E4和E4的第一个A1的effective namespace uri为"http://ns2",如果element和attribute的effective namespace uri不为空,称它们是(namespace) qualified,否则是unqualified。如果用{NamespaceUri}LocalName格式来表示full name,那么上例中,E1的full name是{http://ns1}E1,E3的full name是{}E3,E4的两个A1的full name分别是{http://ns2}A1和{}A1。
Element可以拥有零到多个attribute,这些attribute的full name必须唯一,它们是无序的,attribute的值只能是字符。Element的children(或叫content)有四种形式:text-only, element-only, mixed和empty。如果element的children只包括字符,那么它是text-only的,如上例中的E2。XML whitespace character包括' ', '\n', '\r', '\t'这四个字符,如果element的children只包括子element和零到多个XML whitespce char,那么它是element-only的,如上例中的E1。如果element的children包括子element和任意字符,那么它是mixed的,如上例中的E3。如果element没有children,那么它是empty,如上例中的E4。

类型化

XML是门元描述语言,用来构造领域特定的词汇,可以把XML想象成一堆取之不尽用之不竭的原子,你可以用这些原子“堆”出世间万物。但我们通常需要特定的东西而不是泛泛的物,或者说我们需要把物类型化,人,鸡蛋,石头就是类型化后的物。有两种类型化的方法:nominal typing和structural typing(我不知道学界是怎么定义的,姑且借用一下structural typing这个很酷的术语)。对nominal typing的感性认识是用模具烤蛋糕,绝大数的面向对象编程语言使用nominal typing,模具是class,蛋糕是object。Structural typing也可以在现实生活中找到例子,四川有句俗语,比着箍箍买鸭蛋,说一个人太笨,分不清鸡蛋鸭蛋鹅蛋,别人就做了个箍箍,鸭蛋刚好能穿过,不管什么东西,只要通过箍箍的检查,它就是鸭蛋,这就把物类型化成了鸭蛋,把箍箍叫做鸭蛋验证器。
我们通常需要类型化的XML数据而不是任意的XML数据,XML Schema定义了XML数据需要遵守的结构规则和含义,XML validator是XML Schema的应用,如果一XML数据通过了一XML validator的验证,可以说把该XML数据类型化成XML Schema所定义的类型。

语法和语义

语法和语义无处不在。如同markup与infoset的关系,XSD文件是语法,XML Schema 1.0规范讲述的是语义。一个语义可以由多种语法来表达,MX自创了用户友好的语法来表达XML Schema 1.0的语义,下面的MX代码:
FirstLook.png
//FirstLook.mxcs
alias "http://schemas.example.com/projecta" as nsa;

xnamespace {nsa} [namespace: Example.ProjectA] {
    type String10 restrict String
        facets{
            lengthrange: 1..10;
        };
    ;
    type String20 restrict String
        facets{
            lengthrange: 1..20;
        };
    ;
    type String40 restrict String
        facets{
            lengthrange: 1..40;
        };
    ;
    type Int32List list Int32;
    type Email restrict String40
        facets{
            patterns: @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}";
        };
    ;
    type Phone extend String20
        attributes{
            attribute PhoneType[?; default: "Unknown"] as PhoneType;
        };
    ;
    type PhoneType restrict String
        facets{
            enums: Unknown = "Unknown", Work = "Work", Home = "Home"
        };
    ;
    type Phones
        children{
            element Phone[+; membername: Phones] as Phone;
        };
    ;
    type Address
        children{
            choice{
                element Normal as NormalAddress;
                element Geography as GeographyAddress;
            };
        };
    ;
    type NormalAddress
        attributes{
            attribute Country as String20;
            attribute State[?] as String20;
            attribute City as String20;
            attribute Address as String40;
            attribute ZipCode as String10;
        };
    ;
    type GeographyAddress
        attributes{
            attribute Longitude as SpatialNumber;
            attribute Latitude as SpatialNumber;
        };
    ;
    type SpatialNumber restrict Decimal
        facets{
            digits: 8..5;
        };
    ;
    type Customer
        attributes{
            attribute Id[?] as Int32;
            attribute Name as String10;
            attribute Email as Email;
            attribute RegistrationDate[?] as DateTime;
            attribute OrderIds[?] as Int32List;
        };
        children{
            element Phones as Phones;
            element Address as Address;
        };
    ;
    element Customer as Customer;
}
下面的XSD:
<?xml version="1.0" encoding="utf-8"?>
<!--FirstLook.xsd-->
<schema targetNamespace="http://schemas.example.com/projecta" elementFormDefault="qualified"
    xmlns:tns="http://schemas.example.com/projecta" xmlns="http://www.w3.org/2001/XMLSchema">
    <simpleType name="String10">
        <restriction base="string">
            <minLength value="1"></minLength>
            <maxLength value="10"></maxLength>
        </restriction>
    </simpleType>
    <simpleType name="String20">
        <restriction base="string">
            <minLength value="1"></minLength>
            <maxLength value="20"></maxLength>
        </restriction>
    </simpleType>
    <simpleType name="String40">
        <restriction base="string">
            <minLength value="1"></minLength>
            <maxLength value="40"></maxLength>
        </restriction>
    </simpleType>
    <simpleType name="Int32List">
        <list itemType="int"></list>
    </simpleType>
    <simpleType name="Email">
        <restriction base="tns:String40">
            <pattern value="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}"></pattern>
        </restriction>
    </simpleType>
    <complexType name="Phone">
        <simpleContent>
            <extension base="tns:String20">
                <attribute name="PhoneType" use="optional" default="Unknown" type="tns:PhoneType"></attribute>
            </extension>
        </simpleContent>
    </complexType>
    <simpleType name="PhoneType">
        <restriction base="string">
            <enumeration value="Unknown"></enumeration>
            <enumeration value="Work"></enumeration>
            <enumeration value="Home"></enumeration>
        </restriction>
    </simpleType>
    <complexType name="Phones">
        <sequence>
            <element name="Phone" minOccurs="1" maxOccurs="unbounded" type="tns:Phone"></element>
        </sequence>
    </complexType>
    <complexType name="Address">
        <choice>
            <element name="Normal" type="tns:NormalAddress"></element>
            <element name="Geography" type="tns:GeographyAddress"></element>
        </choice>
    </complexType>
    <complexType name="NormalAddress">
        <attribute name="Country" use="required" type="tns:String20"></attribute>
        <attribute name="State" use="optional" type="tns:String20"></attribute>
        <attribute name="City" use="required" type="tns:String20"></attribute>
        <attribute name="Address" use="required" type="tns:String40"></attribute>
        <attribute name="ZipCode" use="required" type="tns:String10"></attribute>
    </complexType>
    <complexType name="GeographyAddress">
        <attribute name="Longitude" use="required" type="tns:SpatialNumber"></attribute>
        <attribute name="Latitude" use="required" type="tns:SpatialNumber"></attribute>
    </complexType>
    <simpleType name="SpatialNumber">
        <restriction base="decimal">
            <totalDigits value="8"></totalDigits>
            <fractionDigits value="5"></fractionDigits>
        </restriction>
    </simpleType>
    <complexType name="Customer">
        <sequence>
            <element name="Phones" type="tns:Phones"></element>
            <element name="Address" type="tns:Address"></element>
        </sequence>
        <attribute name="Id" use="optional" type="int"></attribute>
        <attribute name="Name" use="required" type="tns:String10"></attribute>
        <attribute name="Email" use="required" type="tns:Email"></attribute>
        <attribute name="RegistrationDate" use="optional" type="dateTime"></attribute>
        <attribute name="OrderIds" use="optional" type="tns:Int32List"></attribute>
    </complexType>
    <element name="Customer" type="tns:Customer"></element>
</schema>
表达了同一个语义。相信很多人对XML Schema的印象不好,因为它太繁杂,从而敬而远之,应该说XML Schema的语义很好,语法(XSD)很糟糕。即便你对XML Schema不熟悉,通过MX来学习XML Schema的语义将会非常轻松,我相信MX会改变你对XML Schema的成见,拓宽你对它的理解,只要是耐心读完本文。

编程

XML Schema不太流行的另一个原因是它浮在空中,没接到地气,什么是“地气”?大家都是程序员,C#, Java, C++等每天形影不离的编程语言就是地气。当前XSD的用处,在DOM load XML时检查XML是否合法而已,相信很多程序员没用过PSVI (Post Schema Validation Infoset),即便用过又怎样?Programming是一个世界,XML Schema是另一个世界,这是两个隔绝的异构的世界,我们需要把XML Schema融入到programming中。考虑如何对XML进行编程,主要通过Document Object Model(DOM),XML的语义被投射在DOM上,比如XML中的element, attribute, text对应于DOM的element node, attribute node, text node,这些node组成了一个树形的数据结构,用户可以创建、维护、查询、序列化及反序列化这个数据结构。同样的道理,我们可以自创一套object model,将XML Schema的语义投射其上,我称它为Schema-lized Document Object Model(SDOM)。SDOM与DOM唯一相同之处是它们的目的都是为了创建、维护、查询、序列化及反序列化XML数据,实现上则完全不同。可以这么说,DOM是无类型的(如同XML数据是任意的一样),SDOM是类型化的。下面简略介绍下SDOM:
//Metah.X.cs
//Metah.X runtime file (Schema-lized Document Object Model(SDOM))
namespace Metah.X {
    public abstract class Object {
        public Object Parent { get; }
        public T GetAncestor<T>(bool @try = true, bool testSelf = false) where T : class;
        public virtual Location? Location { get; set; }
        public virtual Object DeepClone();
        public T DeepClone<T>() where T : Object;
        public bool TryValidate(Context context); 
        protected virtual bool TryValidating(Context context, bool fromValidate);
        protected virtual bool TryValidated(Context context, bool success);
        //...
    }
    public struct Location {
        public string SourceUri { get; }
        public int Line { get; }//1-based
        public int Column { get; }//1-based
        //...
    }
    public class Context {
        public List<Diagnostic> Diagnostics { get; }
        public virtual void Reset();
        //...
    }
    public class Diagnostic {
        public Object ObjectSource { get; }
        public Location? Location { get; }
        public DiagnosticSeverity Severity { get; }
        public int RawCode { get; }
        public DiagnosticCode Code { get; }
        public string Message { get; }
        //...
    }
    public enum DiagnosticSeverity { Error = 0, Warning, Info }
    public enum DiagnosticCode {
        InvalidObjectClrType = 1,
        SimpleTypeValueRequired,
        //...
        Extended = 1000,
    }
    public abstract class Type : Object { ... }
    public class SimpleType : Type, IEquatable<SimpleType> {
        public object Value { get; set; }
        public object SetValue(object value, bool direct = false);
        public static object CloneValue(object value);
        public static bool ValueEquals(object x, object y);
        public static int GetValueHashCode(object value);
        public static bool ValuesEquals(IReadOnlyList<object> x, IReadOnlyList<object> y);
        public static int GetValuesHashCode(IReadOnlyList<object> values);
        public static IEqualityComparer<object> ValueEqualityComparer { get; }
        public static IEqualityComparer<IReadOnlyList<object>> ValuesEqualityComparer { get; }
        //...
    }
    public abstract class AtomicSimpleType : SimpleType { ... }
    public class String : AtomicSimpleType {
        new public string Value { get; set; }
        //...
    }
    public class Decimal : AtomicSimpleType {
        new public decimal? Value { get; set; }
        //...
    }
    public class Integer : Decimal { ... }
    public class NonPositiveInteger : Integer { ... }
    public class NegativeInteger : NonPositiveInteger { ... }
    public class NonNegativeInteger : Integer { ... }
    public class PositiveInteger : NonNegativeInteger { ... }
    public class Int64 : Integer {
        new public long? Value { get; set; }
        //...
    }
    public class Int32 : Int64 {
        new public int? Value { get; set; }
        //...
    }
    public class Int16 : Int32 {
        new public short? Value { get; set; }
        //...
    }
    public class SByte : Int16 {
        new public sbyte? Value { get; set; }
        //...
    }
    public class UInt64 : NonNegativeInteger {
        new public ulong? Value { get; set; }
        //...
    }
    public class UInt32 : UInt64 {
        new public uint? Value { get; set; }
        //...
    }
    public class UInt16 : UInt32 {
        new public ushort? Value { get; set; }
        //...
    }
    public class Byte : UInt16 {
        new public sbyte? Value { get; set; }
        //...
    }
    public class Boolean : AtomicSimpleType {
        new public bool? Value { get; set; }
        //...
    }
    public class Single : AtomicSimpleType {
        new public float? Value { get; set; }
        //...
    }
    public class Double : AtomicSimpleType {
        new public double? Value { get; set; }
        //...
    }
    public abstract class Binary : AtomicSimpleType {
        new public byte[] Value { get; set; }
        //...
    }
    public class Base64Binary : Binary {
        //...
    }
    public class HexBinary : Binary {
        //...
    }
    public class TimeSpan : AtomicSimpleType {
        new public System.TimeSpan? Value { get; set; }
        //...
    }
    public abstract class DateTimeBase : AtomicSimpleType {
        new public System.DateTime? Value { get; set; }
        //...
    }
    public class DateTime : DateTimeBase {
        //...
    }
    //more atomic simple types...
    public abstract class ListedSimpleType<T> : SimpleType, IList<T>, IReadOnlyList<T> {
        new public ListedSimpleTypeValue Value { get; set; }
        //IList<T> members...
        //...
    }
    public sealed class ListedSimpleTypeValue : IList<object>, IReadOnlyList<object>, IEquatable<ListedSimpleTypeValue>, ICloneable {
        //IList<object> members...
        //...
    }
    public abstract class UnitedSimpleType : SimpleType {
        new public UnitedSimpleTypeValue Value { get; set; }
        public object NetValue { get; set; }
        //...
    }
    public sealed class UnitedSimpleTypeValue : IEquatable<UnitedSimpleTypeValue>, ICloneable {
        public object Value { get; set; }
        public object SetValue(object value, bool direct = false);
        //...
    }
    public class ComplexType : Type {
        public AttributeSet AttributeSet { get; set; }
        public T EnsureAttributeSet<T>(bool @try = false) where T : AttributeSet;
        //
        public SimpleType SimpleChild { get; set; }
        public T EnsureSimpleChild<T>(bool @try = false) where T : SimpleType;
        public object Value { get; set; }
        //
        public ChildContainer ComplexChild { get; set; }
        public T EnsureComplexChild<T>(bool @try = false) where T : ChildContainer;
        //...
    }
    public interface IEntityObject {//Attribute and Element impls this interface
        System.Xml.Linq.XName Name { get; }
        Type Type { get; }
        object Value { get; }
    }
    public class Attribute : Object, IEntityObject {
        protected Attribute();
        public Attribute(XName name);
        public Attribute ReferentialAttribute { get; set; }
        public XName Name { get; }
        public SimpleType Type { get; set; }
        public T EnsureType<T>(bool @try = false) where T : SimpleType;
        public object Value { get; set; }
        public bool TrySetToDefaultValue(bool force = false);
        //...
    }
    public class AttributeSet : Object, ICollection<Attribute>, IReadOnlyCollection<Attribute> {
        //ICollection<Attribute> members...
        public bool Contains(XName name);
        public Attribute TryGet(XName name);
        public void AddRange(IEnumerable<Attribute> attributes);
        public void AddOrSet(Attribute attribute);
        public bool Remove(XName name);
        public IEnumerable<T> Attributes<T>(Func<T, bool> filter = null) where T : Attribute;
        public IEnumerable<T> Attributes<T>(XName name) where T : Attribute;
        public IEnumerable<Attribute> WildcardAttributes { get; }
        public int TryAddDefaultAttributes(bool force = false);
        //...
    }
    public abstract class Child : Object {
        public IEnumerable<T> ElementAncestors<T>(Func<Element, bool> filter = null) where T : Element;
        public IEnumerable<T> ElementAncestors<T>(XName name) where T : Element;
        public virtual int ChildOrder { get; }
        public virtual int SpecifiedOrder { get; set; }
        //...
    }
    public abstract class ContentChild : Child { ... }
    public class Element : ContentChild, IEntityObject {
        public Element ReferentialElement { get; set; }
        public XName Name { get; set; }
        public bool IsNull { get; set; }
        public Type Type { get; set; }
        public T EnsureType<T>(bool @try = false) where T : Type;
        public SimpleType SimpleType { get; }
        public ComplexType ComplexType { get; }
        //
        public AttributeSet AttributeSet { get; set; }
        public T EnsureAttributeSet<T>(bool @try = false) where T : AttributeSet;
        public IEnumerable<T> Attributes<T>(Func<T, bool> filter = null) where T : Attribute;
        public IEnumerable<T> Attributes<T>(XName name) where T : Attribute;
        //
        public SimpleType SimpleChild { get; set; }
        public T EnsureSimpleChild<T>(bool @try = false) where T : SimpleType;
        //
        public ChildContainer ComplexChild { get; set; }
        public T EnsureComplexChild<T>(bool @try = false) where T : ChildContainer;
        public IEnumerable<T> ContentChildren<T>(Func<T, bool> filter = null) where T : ContentChild;
        public IEnumerable<T> ContentDescendants<T>(Func<T, bool> filter = null) where T : ContentChild;
        public IEnumerable<T> SelfAndContentDescendants<T>(Func<T, bool> filter = null) where T : ContentChild;
        public IEnumerable<T> ElementChildren<T>(Func<T, bool> filter = null) where T : Element;
        public IEnumerable<T> ElementDescendants<T>(Func<T, bool> filter = null) where T : Element;
        public IEnumerable<T> SelfAndElementDescendants<T>(Func<T, bool> filter = null) where T : Element;
        public IEnumerable<T> ElementChildren<T>(XName name) where T : Element;
        public IEnumerable<T> ElementDescendants<T>(XName name) where T : Element;
        public IEnumerable<T> SelfAndElementDescendants<T>(XName name) where T : Element;
        public IEnumerable<T> SelfAndElementAncestors<T>(Func<Element, bool> filter = null) where T : Element;
        public IEnumerable<T> SelfAndElementAncestors<T>(XName name) where T : Element;
        //
        public object Value { get; set; }
        public bool TrySetToDefaultValue(bool force = false);
        //
        public IReadOnlyDictionary<XName, IdentityConstraint> IdentityConstraints { get; }
        public IReadOnlyDictionary<object, Id> Ids { get; }
        public IReadOnlyList<IIdRefObject> IdRefs { get; }
        //
        public void Save(XmlWriter writer);
        //...
    }
    public sealed class IdentityConstraint {
        public Element ContainingElement { get; }
        public XName Name { get; }
        public IdentityConstraintKind Kind { get; }
        public bool IsSingleValue { get; }
        public IReadOnlyDictionary<object, IdentityValue> Keys { get; }
        public IReadOnlyDictionary<IReadOnlyList<object>, IdentityValues> MultipleValueKeys { get; }
        public IReadOnlyList<KeyRefIdentityValue> KeyRefs { get; }
        public IReadOnlyList<KeyRefIdentityValues> MultipleValueKeyRefs { get; }
        public IdentityConstraint ReferentialConstraint { get; }
        //...
    }
    public enum IdentityConstraintKind { Key, Unique, KeyRef }
    public struct IdentityValue {
        public object Value { get; }
        public Element IdentityElement { get; }
        public IEntityObject ValueEntityObject { get; }
        //...
    }
    public struct IdentityValues {
        public IReadOnlyList<object> Values { get; }
        public Element IdentityElement { get; }
        public IReadOnlyList<IEntityObject> ValueEntityObjects { get; }
        //...
    }
    public struct KeyRefIdentityValue {
        public IdentityValue ReferenceIdentityValue { get; }
        public IdentityValue ReferentialIdentityValue { get; }
        //...
    }
    public struct KeyRefIdentityValues {
        public IdentityValues ReferenceIdentityValues { get; }
        public IdentityValues ReferentialIdentityValues { get; }
        //...
    }
    public sealed class Text : ContentChild {
        public string Value { get; set; }
        //...
    }
    public class ChildContainer : Child, IList<Child>, IReadOnlyList<Child> {
        //IList<Child> members...
        public bool TryGetIndexOf(int order, out int index);
        public int IndexAfter(int order);
        public bool Contains(int order);
        public void AddRange(IEnumerable<Child> children);
        public void AddOrSet(Child child);
        public Child TryGet(int order);
        public bool Remove(int order);
        public void SortChildren(bool recursive = true);
        public IEnumerable<T> ContentChildren<T>(Func<T, bool> filter = null) where T : ContentChild;
        public IEnumerable<T> ContentDescendants<T>(Func<T, bool> filter = null) where T : ContentChild;
        public IEnumerable<T> ElementChildren<T>(Func<T, bool> filter = null) where T : Element;
        public IEnumerable<T> ElementDescendants<T>(Func<T, bool> filter = null) where T : Element;
        public IEnumerable<T> ElementChildren<T>(XName name) where T : Element;
        public IEnumerable<T> ElementDescendants<T>(XName name) where T : Element;
        //...
    }
    public abstract class ChildList<T> : ChildContainer, IList<T>, IReadOnlyList<T> where T : Child {
        //IList<T> members...
        //...
    }
}
namespace Metah.X.Extensions {
    public static class ExtensionMethods {
        public static IEnumerable<T> ElementAncestors<T>(this IEnumerable<Child> source, Func<Element, bool> filter = null) where T : Element;
        public static IEnumerable<T> ElementAncestors<T>(this IEnumerable<Child> source, XName name) where T : Element;
        public static IEnumerable<T> SelfAndElementAncestors<T>(this IEnumerable<Element> source, Func<Element, bool> filter = null) where T : Element;
        public static IEnumerable<T> SelfAndElementAncestors<T>(this IEnumerable<Element> source, XName name) where T : Element;
        public static IEnumerable<T> Attributes<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : Attribute;
        public static IEnumerable<T> Attributes<T>(this IEnumerable<Element> source, XName name) where T : Attribute;
        public static IEnumerable<T> ContentChildren<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : ContentChild;
        public static IEnumerable<T> ContentDescendants<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : ContentChild;
        public static IEnumerable<T> SelfAndContentDescendants<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : ContentChild;
        public static IEnumerable<T> ElementChildren<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : Element;
        public static IEnumerable<T> ElementDescendants<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : Element;
        public static IEnumerable<T> SelfAndElementDescendants<T>(this IEnumerable<Element> source, Func<T, bool> filter = null) where T : Element;
        public static IEnumerable<T> ElementChildren<T>(this IEnumerable<Element> source, XName name) where T : Element;
        public static IEnumerable<T> ElementDescendants<T>(this IEnumerable<Element> source, XName name) where T : Element;
        public static IEnumerable<T> SelfAndElementDescendants<T>(this IEnumerable<Element> source, XName name) where T : Element;
        //...
    }
}
我很赞同一句话,写代码是在表达自己,读代码是在理解别人。我不会说自己是个容易被理解的人,只是觉得自己的思维不算混乱。感性扫描下上面的代码。
SDOM又叫MX runtime,在Metah.X.cs文件中,仅此一个文件。SDOM也是DOM,DOM是由节点构成的树形的数据结构,Object(本文默认指Metah.X.Object,System.Object以C# keyword object指代)是所有节点的基类,Object.Parent返回父节点,除了根节点,树中其余节点的Parent值都不为null。不知读者是否有这样的经验,实现一个树形数据结构通常需要实现自动深克隆功能,深克隆通过Object.DeepClone()完成,也就是,若一个节点的Parent值不为null,当把它添加到另一个节点中时,需要深克隆它,然后添加克隆出来的节点,深克隆是自动的,用户通常不需要手工调用DeepClone()。
Object.TryValidate()是最重要的方法之一,用来验证节点是否合法,当在一个节点上调用TryValidate()时,它除了验证自己,也会调用子节点的TryValidate(),验证的结果信息叫做Diagnostic,每个节点会把diagnostic添加到Context.Diagnostics列表中,如果存在error diagnostic,TryValidate()返回false,否则返回true。
Type(本文默认指Metah.X.Type,System.Type会显示指明)是SimpleType和ComplexType的基类。Simple type是只能由字符表示的数据的类型,用于attribute的值和element的text-only children。Complex type只能用于element,ComplexType.AttributeSet包括零到多个attribute,ComplexType.SimpleChild和ComplexType.ComplexChild只能有一个被赋值,若SimpleChild被赋值,则表示element的children是text-only的,若ComplexChild被赋值则表示element的children是element-only或mixed的,若两者都无值则表示element的children是empty的。
Attribute和element统称为entity,实现了IEntityObject,entity本质上是一个名值对,名是full name,由System.Xml.Linq.XName表示,值由Type表示,attribute的值只能是SimpleType,element的值即可以是ComplexType,也可以是SimpleType,对于后者,element不能有attribute,且children为text-only。
Type是抽象的,entity是具体的,XML Schema分开了抽象与具体。现在只需要了解这些,后面会详细讲述。

元编程

考虑前面的FirstLook.mxcs,MX编译器编译它时会做些什么?检查语法及分析语义,这没得说,然后呢?生成使用SDOM的C#代码,欢迎来到元编程的世界!
String40类型,它对系统String类型进行约束,值的长度范围为1到40个字符。编译器为它生成这样的C#代码:
namespace Example.ProjectA {
    public partial class String40 : global::Metah.X.String {
        new public static readonly global::Metah.X.AtomicSimpleTypeInfo ThisInfo = ...;
        public override global::Metah.X.ObjectInfo ObjectInfo { get { return ThisInfo; } }
        //...
    }
}
Object.TryValidate()到底依据什么信息来进行验证?元数据,元数据包含了语义信息,元数据的class name以Info结尾,TryValidate()通过ObjectInfo virtual property得到具体的元数据信息,从而进行验证。上例中的ThisInfo包含了长度范围为1到40这样的元数据信息。
Email类型约束自String40,元数据不仅包含父类元数据的信息,也包含正则表达式的信息:
namespace Example.ProjectA {
    public partial class Email: String40 {
        new public static readonly global::Metah.X.AtomicSimpleTypeInfo ThisInfo = ...;
        public override global::Metah.X.ObjectInfo ObjectInfo { get { return ThisInfo; } }
        //...
    }
}
PhoneType类型约束自String,指定其值只能是Unknown, Work, Home这三者之一,编译器生成这样代码:
namespace Example.ProjectA {
    public partial class PhoneType : global::Metah.X.String {
        public static readonly string @Unknown = @"Unknown";
        public static readonly string @Work = @"Work";
        public static readonly string @Home = @"Home";
        //...
    }
}
Int32List类型,它是Int32的列表,编译器生成这样代码:
namespace Example.ProjectA {
    public partial class Int32List : global::Metah.X.ListedSimpleType<int?> {
        //...
    }
}
上面四个都是simple type,前三个是atomic simple type,后一个是listed simple type。
考虑Phone类型,它扩展自String20 simple type,加入了一个可选的attribute PhoneType,它是simple child complex type,XML数据的形状如下:
<ElementName PhoneType="Work">1234567-789</ElementName>
编译器生成这样代码:
namespace Example.ProjectA {
    public partial class Phone : global::Metah.X.ComplexType {
        public partial class AttributeSetClass : global::Metah.X.AttributeSet {
            public partial class PhoneType_Class : global::Metah.X.Attribute {
                new public global::Example.ProjectA.PhoneType Type { get; set; }
                new public global::Example.ProjectA.PhoneType EnsureType();
                new public string Value { get; set; }
                //...
            }
            public global::Example.ProjectA.Phone.AttributeSetClass.PhoneType_Class PhoneType { get; set; }
            public global::Example.ProjectA.Phone.AttributeSetClass.PhoneType_Class Ensure_PhoneType(bool @try = false);
            public string PhoneType_Value { get; set; }
            //...
        }
        new public global::Example.ProjectA.Phone.AttributeSetClass AttributeSet { get; set; }
        public global::Example.ProjectA.Phone.AttributeSetClass EnsureAttributeSet();
        new public global::Example.ProjectA.String20 SimpleChild { get; set; }
        public global::Example.ProjectA.String20 EnsureSimpleChild();
        new public string Value { get; set; }
        //...
    }
}
可以看出,SDOM是general的,编译器生成的代码特化了SDOM。编译器在attribute set为每个attribute生成了一个名为其local name的property,以及<LocalName>_Value的property。关于EnsureXXX(),如果XXX property不为null,则返回其值,否则new出一个对象,赋值给该property,然后返回其值。可以直接给Value property赋值,该property将自动调用EnsureXXX()。
Phones类型是个complex child complex type,没有attribute,XML数据的形状如下:
<ElementName>
    <Phone PhoneType="Work">1234567-789</Phone>
    <Phone>7654321</Phone>
</ElementName>
编译器生成这样代码:
namespace Example.ProjectA {
    public partial class Phones : global::Metah.X.ComplexType {
        public partial class ComplexChildClass : global::Metah.X.ChildContainer {
            public partial class Phones_Class : global::Metah.X.ChildList<global::Example.ProjectA.Phones.ComplexChildClass.Phones_Class.ItemClass> {
                public partial class ItemClass : global::Metah.X.Element {
                    new public global::Example.ProjectA.Phone Type { get; set; }
                    new public global::Example.ProjectA.Phone EnsureType(bool @try = false);
                    new public global::Example.ProjectA.Phone.AttributeSetClass AttributeSet { get; set; }
                    new public global::Example.ProjectA.Phone.AttributeSetClass EnsureAttributeSet(bool @try = false);
                    new public global::Example.ProjectA.String20 SimpleChild { get; set; }
                    new public global::Example.ProjectA.String20 EnsureSimpleChild(bool @try = false);
                    new public string Value { get; set; }
                    //...
                }
                public global::Example.ProjectA.Phones.ComplexChildClass.Phones_Class.ItemClass CreateItem();
                public global::Example.ProjectA.Phones.ComplexChildClass.Phones_Class.ItemClass CreateAndAddItem();
                //...
            }
            public global::Example.ProjectA.Phones.ComplexChildClass.Phones_Class Phones { get; set; }
            public global::Example.ProjectA.Phones.ComplexChildClass.Phones_Class Ensure_Phones(bool @try = false);
            //...
        }
        new public global::Example.ProjectA.Phones.ComplexChildClass ComplexChild { get; set; }
        public global::Example.ProjectA.Phones.ComplexChildClass EnsureComplexChild();
        //...
    }
}
很机械无趣的代码。有一点值得一提,因为Phones可能包含多个Phone element,编译器生成了一个child list。
Customer类型是个complex child complex type,既有attribute又有子element,如法炮制。
对于Customer element,编译器生成这样的代码:
namespace Example.ProjectA {
    public partial class Customer_ElementClass : global::Metah.X.Element {
        new public global::Example.ProjectA.Customer Type { get; set; }
        new public global::Example.ProjectA.Customer EnsureType(bool @try = false);
        new public global::Example.ProjectA.Customer.AttributeSetClass AttributeSet { get; set; }
        new public global::Example.ProjectA.Customer.AttributeSetClass EnsureAttributeSet(bool @try = false);
        new public global::Example.ProjectA.Customer.ComplexChildClass ComplexChild { get; set; }
        new public global::Example.ProjectA.Customer.ComplexChildClass EnsureComplexChild(bool @try = false);
        public static bool TryLoadAndValidate(global::System.Xml.XmlReader reader, global::Metah.X.Context context, out global::Example.ProjectA.Customer_ElementClass result);
        //...
    }
}
直接在namespace中定义的element为global element,为防止名字冲突,生成的class名字后加上_ElementClass后缀,编译器还会为global element生成TryLoadAndValidate()静态方法。TryLoadAndValidate()和Metah.X.Element.Save()是一对,含义不言自明。
不要被编译器生成的代码吓倒,使用它们是简单明了的:
//Program.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;//for XmlReader & XmlWriter
using System.Xml.Linq;//for XName & XNamespace
using X = Metah.X;
using Metah.X.Extensions;

namespace Example.ProjectA {
    partial class Phone {
        private Phone() { }
        public Phone(string value, string phoneType = null) {
            Value = value;
            if (phoneType != null) EnsureAttributeSet().PhoneType_Value = phoneType;
        }
    }
    partial class Phones {
        private Phones() { }
        public Phones(params Phone[] phones) {
            var phoneList = EnsureComplexChild().Ensure_Phones();
            foreach (var phone in phones)
                phoneList.CreateAndAddItem().Type = phone;
        }
    }
    partial class Address {
        private Address() { }
        public Address(string country, string state, string city, string address, string zipCode) {
            var attset = EnsureComplexChild().Ensure_Choice().Ensure_Normal().EnsureAttributeSet();
            attset.Country_Value = country;
            if (state != null) attset.State_Value = state;
            attset.City_Value = city;
            attset.Address_Value = address;
            attset.ZipCode_Value = zipCode;
        }
        public Address(decimal longitude, decimal latitude) {
            var attset = EnsureComplexChild().Ensure_Choice().Ensure_Geography().EnsureAttributeSet();
            attset.Longitude_Value = longitude;
            attset.Latitude_Value = latitude;
        }
        public override string ToString() {
            var normal = ComplexChild.Choice.Normal;
            if (normal != null) {
                var nattset = normal.AttributeSet;
                return nattset.Country_Value + ", " + nattset.City_Value + ", " + nattset.Address_Value;
            }
            var gattset = ComplexChild.Choice.Geography.AttributeSet;
            return "(" + gattset.Longitude_Value + ", " + gattset.Latitude_Value + ")";
        }
    }
    partial class Customer {
        private Customer() { }
        public Customer(string name, string email, Phones phones, Address address) {
            var attset = EnsureAttributeSet();
            attset.Name_Value = name;
            attset.Email_Value = email;
            attset.RegistrationDate_Value = DateTime.Now;
            var cc = EnsureComplexChild();
            cc.Ensure_Phones().Type = phones;
            cc.Ensure_Address().Type = address;
        }
        public override string ToString() {
            var attset = AttributeSet;
            var cc = ComplexChild;
            return string.Format("Name: {0}, Email: {1}, Address: {2}", attset.Name_Value, attset.Email_Value, cc.Address.Type);
        }
    }
    class Program {
        static void Main(string[] args) {
            var customer = new Customer("Tank", "someone@example.com",
                new Phones(new Phone("1234567", PhoneType.Home), new Phone("7654321")),
                new Address("China", "Sichuan", "Suining", "somewhere", "629000"));
            var ctx = new X.Context();
            if (!customer.TryValidate(ctx)) {
                Display(ctx);
                return;
            }
            Console.WriteLine(customer);
            customer.EnsureComplexChild().Ensure_Address().Type = new Address(105.123M, 30.345M);
            Console.WriteLine(customer);
            var customerElement = new Customer_ElementClass { Type = customer };
            using (var writer = XmlWriter.Create(@"d:\customer.xml", new XmlWriterSettings { Indent = true }))
                customerElement.Save(writer);
            ctx.Reset();
            using (var reader = XmlReader.Create(@"d:\customer.xml")) {
                Customer_ElementClass customerElement2;
                if (!Customer_ElementClass.TryLoadAndValidate(reader, ctx, out customerElement2)) {
                    Display(ctx);
                    return;
                }
                Console.WriteLine(customerElement2.Type);
            }
        }
        static void Display(X.Context ctx) {
            foreach (var diag in ctx.Diagnostics)
                Console.WriteLine(diag);
        }
    }
}
下面是d:\customer.xml的内容:
<?xml version="1.0" encoding="utf-8"?>
<e0:Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="e0:Customer" Name="Tank" Email="someone@example.com" RegistrationDate="2014-03-03T04:00:10.8105625Z" xmlns:e0="http://schemas.example.com/projecta">
  <e0:Phones xsi:type="e0:Phones">
    <e0:Phone xsi:type="e0:Phone" PhoneType="Home">1234567</e0:Phone>
    <e0:Phone xsi:type="e0:Phone" PhoneType="Unknown">7654321</e0:Phone>
  </e0:Phones>
  <e0:Address xsi:type="e0:Address">
    <e0:Geography xsi:type="e0:GeographyAddress" Longitude="105.123" Latitude="30.345" />
  </e0:Address>
</e0:Customer>
在ctx.Reset();处设置一个断点,运行到此处后,修改customer.xml,去掉email值中的@号,继续运行,程序将报这样的错误:
file:///d:/customer.xml(2, 147): Error 17: Canonical string 'someoneexample.com' not match with pattern '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}'

感谢C#的partial修饰符,它把生成的代码和手写的代码结合在一起。可以看出,编译器生成的代码大量使用C# nested class,实际很容易就会嵌套五六层甚至更多,这时,在.cs文件中为嵌套的class写补充代码就非常非常麻烦。既然在设计语言实现编译器,我们当然要高水平要求自己,我们可以做得更好。
//SecondLook.mxcs
alias "http://schemas.example.com/projectb" as nsb;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;//for XmlReader & XmlWriter
using System.Xml.Linq;//for XName & XNamespace
using X = Metah.X;
using Metah.X.Extensions;

xnamespace {nsb} [namespace: Example.ProjectB] {
    type String10 restrict String
        facets{
            lengthrange: 1..10;
        };
    ;
    type String20 restrict String
        facets{
            lengthrange: 1..20;
        };
    ;
    type String40 restrict String
        facets{
            lengthrange: 1..40;
        };
    ;
    type Int32List list Int32;
    type Email restrict String40
        facets{
            patterns: @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}";
        };
    ;
    type Phone extend String20
        attributes{
            attribute PhoneType[?; default: "Unknown"] as PhoneType;
        };
        ##{
            private Phone() { }
            public Phone(string value, string phoneType = null) {
                Value = value;
                if (phoneType != null) EnsureAttributeSet().PhoneType_Value = phoneType;
            }
        }
    ;
    type PhoneType restrict String
        facets{
            enums: Unknown = "Unknown", Work = "Work", Home = "Home"
        };
    ;
    type Phones
        children{
            element Phone[+; membername: Phones] as Phone;
        };
        ##{
            private Phones() { }
            public Phones(params Phone[] phones) {
                var phoneList = EnsureComplexChild().Ensure_Phones();
                foreach (var phone in phones)
                    phoneList.CreateAndAddItem().Type = phone;
            }
        }
    ;
    type Address
        children{
            choice{
                element Normal as NormalAddress;
                element Geography as GeographyAddress;
            }
            ##{
                protected override bool TryValidating(X.Context context, bool fromValidate){
                    var success = base.TryValidating(context, fromValidate);
                    if(success){
                        var myCtx = (MyContext)context;
                        if(myCtx.SomeCondition && Geography != null){
                            myCtx.Diagnostics.Add(new X.Diagnostic(this, null, X.DiagnosticSeverity.Error,
                                (int)MyDiagnosticCode.GeographyAddressNotAllowed, "Geography address not allowed, use normal address instead"));
                            success = false;
                        }
                    }
                    return success;
                }
            }
            ;
        };
        ##{
            private Address() { }
            public Address(string country, string state, string city, string address, string zipCode) {
                var attset = EnsureComplexChild().Ensure_Choice().Ensure_Normal().EnsureAttributeSet();
                attset.Country_Value = country;
                if (state != null) attset.State_Value = state;
                attset.City_Value = city;
                attset.Address_Value = address;
                attset.ZipCode_Value = zipCode;
            }
            public Address(decimal longitude, decimal latitude) {
                var attset = EnsureComplexChild().Ensure_Choice().Ensure_Geography().EnsureAttributeSet();
                attset.Longitude_Value = longitude;
                attset.Latitude_Value = latitude;
            }
            public override string ToString() {
                var normal = ComplexChild.Choice.Normal;
                if (normal != null) {
                    var nattset = normal.AttributeSet;
                    return nattset.Country_Value + ", " + nattset.City_Value + ", " + nattset.Address_Value;
                }
                var gattset = ComplexChild.Choice.Geography.AttributeSet;
                return "(" + gattset.Longitude_Value + ", " + gattset.Latitude_Value + ")";
            }
        }
    ;
    type NormalAddress
        attributes{
            attribute Country as String20;
            attribute State[?] as String20;
            attribute City as String20;
            attribute Address as String40;
            attribute ZipCode as String10;
        };
    ;
    type GeographyAddress
        attributes{
            attribute Longitude as SpatialNumber;
            attribute Latitude as SpatialNumber;
        };
    ;
    type SpatialNumber restrict Decimal
        facets{
            digits: 8..5;
        };
    ;
    type Customer
        attributes{
            attribute Id[?] as Int32;
            attribute Name as String10;
            attribute Email as Email;
            attribute RegistrationDate[?] as DateTime;
            attribute OrderIds[?] as Int32List;
        };
        children{
            element Phones as Phones;
            element Address as Address;
        };
        ##{
            private Customer() { }
            public Customer(string name, string email, Phones phones, Address address) {
                var attset = EnsureAttributeSet();
                attset.Name_Value = name;
                attset.Email_Value = email;
                attset.RegistrationDate_Value = DateTime.Now;
                var cc = EnsureComplexChild();
                cc.Ensure_Phones().Type = phones;
                cc.Ensure_Address().Type = address;
            }
            public override string ToString() {
                var attset = AttributeSet;
                var cc = ComplexChild;
                return string.Format("Name: {0}, Email: {1}, Address: {2}", attset.Name_Value, attset.Email_Value, cc.Address.Type);
            }
        }
    ;
    element Customer as Customer;
    //
    //
    public class MyContext : X.Context{
        public bool SomeCondition{get; set;}
    }
    public enum MyDiagnosticCode {
        GeographyAddressNotAllowed = X.DiagnosticCode.Extended,
    }
    public static class Test{
        public static void Run(){
            var customer = new Customer("Tank", "someone@example.com",
                new Phones(new Phone("1234567", PhoneType.Home), new Phone("7654321")),
                new Address(105.123M, 30.345M));
            var ctx = new MyContext{SomeCondition = true};
            if (!customer.TryValidate(ctx)) {
                Display(ctx);
                return;
            }
            Console.WriteLine(customer);
        }
        static void Display(X.Context ctx) {
            foreach (var diag in ctx.Diagnostics)
                Console.WriteLine(diag);
        }
    }
}
肤浅的语法层面上的把戏,MX的语法是C#语法的超集,编译器会把用户的代码(以##开始)与生成的代码合并成最终的class。上例override了Object.TryValidating()以进行自定义的预验证,也可以override Object.TryValidated()进行自定义的后验证。自定义的diagnostic code值需要大于或等于X.DiagnosticCode.Extended(1000)。

互操作

这是一个五彩缤纷的世界,“五彩缤纷”是“异构”的唯美说法,异构带来的最大问题是互操作,XML和XSD作为平台无关的标准,用来作为互操作的解决方案。如前所述,MX不过是XML Schema 1.0语义的另一种表达方式,那么它也应该是一个平台无关的格式,但为了编程方便,把某一编程语言(这里是C#)直接嵌入其中,这使它丧失了平台无关性。通过schema-only和schema+code两种文件来解决。Schema-only只包含schema的代码,它是平台无有关的,文件扩展名是.mx,当前MX只有C#版,以后若有Java版乃至其它语言的版本,它们必须遵守一个共同的schema-only的语法及语义。Schema+code,各种版本会把各自的编程语言和schema混合,C#版的schema+code的文件扩展名是.mxcs,Java版的文件扩展名(应该)是.mxj。
考虑这么一个例子,要实现一广义的web service(或叫Web API),开发人员首先写好schema-only的ThirdLook.mx:
//ThirdLook.mx
alias "http://schemas.example.com/projectc" as nsc;

xnamespace {nsc} {//注意没有C# namespace annotation
    //内容和FirstLook.mxcs一样
}
ThirdLook.mx没有任何关于特定语言的内容,可以把它看作是XSD的变形版,它描述了与service交互的XML数据要遵守的规则及含义,然后将ThirdLook.mx通过某种方式发布出去,比如放在SDK中,或放在网上。该service使用C#来开发(ok,因为MX当前只有C#版),在schema+code文件中实现service的应用逻辑:
//ThirdLook.mxcs
alias "http://schemas.example.com/projectc" as nsc;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;//for XmlReader & XmlWriter
using System.Xml.Linq;//for XName & XNamespace
using X = Metah.X;
using Metah.X.Extensions;

xnamespace {nsc} [namespace: Example.ProjectC] {
    type Phone
        ##{
            private Phone() { }
            public Phone(string value, string phoneType = null) {
                Value = value;
                if (phoneType != null) EnsureAttributeSet().PhoneType_Value = phoneType;
            }
        }
    ;
    type Address
        children{
            choice { }
            ##{
                protected override bool TryValidating(X.Context context, bool fromValidate){
                    //...
                    return base.TryValidating(context, fromValidate);
                }
            }
            ;
        };
        ##{
            private Address() { }
            public Address(string country, string state, string city, string address, string zipCode) {
                //...
            }
            public Address(decimal longitude, decimal latitude) {
                //...
            }
            public override string ToString() {
                //...
                return base.ToString();
            }
        }
    ;
    //...
}
有意思的ThirdLook,ThirdLook.mx只有schema,ThirdLook.mxcs主要是代码,把它俩送进C#版的MX编译器,编译器首先把多个物理语法对象合并成一个逻辑语法对象,然后进行语义分析及代码生成,结果等价于SecondLook.mxcs。
某client使用Java开发,开发人员首先得到ThirdLook.mx,然后在ThirdLook.mxj中实现client的应用逻辑,然后把它俩送进Java版的MX编译器……故事就这样。需要指出的是,XML Schema/MX只关注于数据,service如何设计(比如REST式还是SOAP式)超出了XML Schema/MX的范围,也许某天会出现Metah.S: A Service Metaprogramming Language。

使用MX

你需要Visual Studio 2013 Professional或更高级的版本。MX是门如此具有想象力的语言,真值得你去下载一个Visual Studio 2013 90天试用版
实现一门语言通常分三个层次:
  • 命令行编译器,可以在命令行或脚本中被调用
  • 与build系统(比如MSBuild)集成
  • 与IDE(比如Visual Studio)集成,实现代码着色,智能提示,debug支持等功能
MX当前没有命令行编译器,尽管实现一个没什么难度,我不认为它有多大意义,MX是与MSBuild集成的,当前MX与Visual Studio集成做得非常少,只实现了一个粗燥的代码着色,没有智能提示、dubug支持等功能,当前为了方便,通过VSIX(Visual Studio的一个插件机制)来部署MX,一个问题是,在无人值守的build server上,显然不会安装Visual Studio,MX以后版本会做到MSBuild集成与VS集成可以分开部署。
Metah.0.5.vsix for VS 2013下载Metah.0.5.vsix后双击安装,也许需要重启Visual Studio。
打开New Project对话框,在Visual C# templates下可以看到Metah.X:
NewProject.png
只有Console Application和Class Library两个模板,如果你正好需要,用它们即可。MX project是对C# project的扩展,进行以下几步就可以让任意的C# project支持MX:右击Solution Explorer窗口中的project node,选择Unload Project,然后右击选择Edit .csproj,在.csproj文件最后添加以下代码:
<PropertyGroup>
    <MetahXEmbedRuntime>true</MetahXEmbedRuntime>
</PropertyGroup>
<Import Project="$([System.IO.Directory]::GetFiles($([System.IO.Path]::Combine($([System.Environment]::GetFolderPath(SpecialFolder.LocalApplicationData)), `Microsoft\VisualStudio\12.0\Extensions`)), `Metah.X.targets`, System.IO.SearchOption.AllDirectories))" />
EditCSProj.png
最后Reload Project, 完成。
打开Add New Item对话框,Metah.X下有两个模板,Schema+Code和Schema-Only:
AddNewItem.png
打开SOURCE CODE\Examples\X\HelloMX\HelloMX.sln,Build Solution后,在Solution Explorer窗口选中project node,然后点击"Show All Files"按钮,可以看到以下的文件:
ShowAllFiles.png
  • Metah.X.cs:MX runtime source file(SDOM),仅此一个文件
  • .mx.cs和.mxcs.cs:MX编译器生成的代码文件
默认情况下,MX runtime通过源文件包含的方式与目标project编译成一个整体。也可以通过程序集引用的方式引用MX runtime程序集,MX runtime程序集是C:\Users\<UserName>\AppData\Local\Microsoft\VisualStudio\12.0\Extensions\<RandomName>\Metah.X.dll,添加该程序集引用后,修改.csproj文件,<MetahXEmbedRuntime>false</MetahXEmbedRuntime>,即可。
SOURCE CODE\Examples\X\Example.EBusiness是一个接近现实应用的示例,它通过耳熟能详customer-order-detail-product用例展示了MX语言的大部分特性,它还模拟了service与client的交互:
EBusiness.png
EBusiness.mx是service/client共用的schema-only文件,EBusinessService.mxcs和EBusinessClient.mxcs分别包含了各自的代码逻辑,建议阅读。
如果你是个求甚解的人,建议看完本文之后花些时间阅读Metah.X.cs和编译器生成的代码,源码之前,了无秘密。

元编译

通过修改.csproj文件,MX编译器被注入到MSBuild的"pipeline"中,我称MX的编译过程为元编译。MX编译器接受以下的输入:
  • .mx(schema-only)文件和.mxcs(schema+code)文件
  • .cs文件(项目中的所有cs文件都会输入到MX编译器)
  • 程序集引用和预处理符号
MX编译器首先解析.mx和.mxcs文件,构造syntax tree,再进行语义分析构造semantic tree,接着生成C#代码(.mx.cs和.mxcs.cs),然后对生成的cs及输入的cs进行编译,这不是完整编译,不会得到.dll或.exe,只是在内存中分析cs的语义。回忆下前面提到EnsureXXX()方法,如果XXX property值为null,MX runtime通过反射new出一个对象,然后赋值给XXX,还有TryLoadAndValidate()静态方法,它从XmlReader读入数据,然后通过反射new出对象,这就要求每个class必须要有parameterless constructor,否则在运行时刻会抛出异常,通过检查cs的语义就可以知道class是否有parameterless constructor,从而在编译时刻报错。此外,.mxcs文件包含C#代码,若C#代码有语义错误,可以直接在.mxcs文件上定位错误而不是在生成的.mxcs.cs中。如果MX编译过程正确完成,它的生成的C#文件和(可选的)Metah.X.cs将会作为常规编译的输入,常规编译后得到.dll或.exe。

MX详解

Schema+code和schema-only支持下面的词法:
  • Single and mutiple line comment:同C#
  • Preprocessing directive:同C#,支持#define, #undef, #if, #elif, #else, #endif, #region, #endregion
  • Keyword:MX新增了这些reserved keywords: alias, attribute, attributes, choice, element, import, seq, type, unordered, xnamespace。若标识符为它们之一,则需要在前添加@号
  • IdentifierToken:同C#,常规形式a和verbatim形式@a。(当前只支持A-Za-z0-9_,即以字母或下划线开头,后跟零到多个字母、数字、下划线,以后会支持unicode)
  • StringLiteralToken:同C#,常规形式"abc"和verbatim形式@"abc"
  • NumericLiteralToken:一到多个数字(0-9)
  • 约定俗成不言自明的词法
Schema+code的文法是C#的文法的超集,如果去掉MX的文法就得到C#的文法,Schema+code的文法也是schema-only的文法的超集,如果去掉C#的文法就得到schema-only的文法,下面只列出schema+code的文法。如果文法已经说清楚了,不一定有文字说明,若有疑问,请细读文法,若还有疑问,请提问。本文默认都是MX的概念,C#的概念会明确指出。

Compilation Unit

文法:
CompilationUnit: C#ExternAliasDirective* (UriAliasing | C#UsingDirective)* C#GlobalAttributeList* (Namespace | C#NamespaceDeclaration | C#TypeDeclaration)*
UriAliasing: 'alias' UriLiteral 'as' UriAlias ';'
UriLiteral: StringLiteralToken
UriAlias: IdentifierToken

Uri aliasing,即给uri取个别名,因为uri很长且可能会被多次用到,用别名来引用方便且不会出错,例:
alias "http://schemas.example.com/projecta" as nsa;
alias "http://schemas.example.com/projectb" as nsb;
编译器会去掉uri literal前后的xml whitespace chars(' ', '\n', '\r', '\t'),若结果是empty string,则是empty uri,强烈建议uri literal中不要包含xml whitespace chars,若有,用%进行转义。Uri alias不能重复。

Namespace

文法:
Namespace: 'xnamespace' UriOrAlias NamespaceAnnotations? '{' C#ExternAliasDirective* (NamespaceImport | C#UsingDirective)* NamespaceMember* '}' ';'?
UriOrAlias: '{' (UriLiteral | UriAlias)? '}'
NamespaceAnnotations: '[' (NamespaceAnnotation (';' NamespaceAnnotation)* ';'?)? ']'
NamespaceAnnotation: ElementQualification | AttributeQualification | DerivationProhibition | InstanceProhibition | CodeNamespace
ElementQualification: 'element' ':' Qualification
AttributeQualification: 'attribute' ':' Qualification
Qualification: 'qualified' | 'unqualified'
DerivationProhibition: 'derivationprohibition' ':' DerivationProhibitionItems
DerivationProhibitionItems: DerivationProhibitionItem (',' DerivationProhibitionItem)* ','?
DerivationProhibitionItem: 'none' | 'extend' | 'restrict' | 'list' | 'unite' | 'all'
InstanceProhibition: 'instanceprohibition ' ':' InstanceProhibitionItems
InstanceProhibitionItems: InstanceProhibitionItem (',' InstanceProhibitionItem)* ','?
InstanceProhibitionItem: 'none' | 'extend' | 'restrict' | 'substitute' | 'all'
CodeNamespace: 'namespace' ':' C#NamespaceName
NamespaceImport: 'import' UriOrAlias ('as' NamespaceAlias)? ';'
NamespaceAlias: IdentifierToken
Name: IdentifierToken
QualifiableName: (NamespaceAlias ':')? Name
NamespaceMember: Type | GlobalAttribute | GlobalAttributeSet | GlobalElement | GlobalChildStruct | C#NamespaceMember

Uri or alias用大括号定界,内容既可以是uri literal,也可以是uri alias,决议uri alias即可得到uri值,如果没有内容,则为empty uri。Namespace的uri通过uri or alias指定,其uri不能为empty。Annotations用中括号定界,annotation间用分号分隔,最后一个分号是可选的。Code namespace annotation指定C#的namespace name,其余的annotation后文会讲述。Namespace import将其它namespace import到本namespace,一个namespace不能被重复import,namespace alias是可选的,不能重复且不能是"sys"。例:
alias "http://schemas.example.com/projecta" as nsa;
alias "http://schemas.example.com/projectb" as nsb;
alias "http://schemas.example.com/projectc" as nsc;
xnamespace {nsa}[namespace: Example.ProjectA] {  }
xnamespace {nsb}[namespace: Example.ProjectB] {
    import {nsa};
}
xnamespace {nsc}[namespace: Example.ProjectC] {
    import {nsa} as pa;
    import {nsb} as pb;
}
System namespace是内置的,包含内置的type,它的uri是"http://www.w3.org/2001/XMLSchema",它总是被隐含import到每个用户定义的namespace中,alias是"sys"。

全局对象及引用

Type, attribute, attribute set, element, child struct统称为object。直接在namespace中声明的是global object,否则是local object,global object必须被命名,它们可以通过名字被引用,local object无法被引用。这五种global object有各自的名字空间,也就是在一个namespace中,type, attribute, attribute set, element, child struct都可以被命名为Foo。
要引用另一个namespace的global object,首先得import该namespace,然后使用qualifiable name进行引用,它有两种形式:qualified和unqualified,它们都需要指定被引用的全局对象的名字,且前者还指定了被引用对象所在namespace的alias。决议一个qualified qualifiable name,首先通过namespace alias找到被import的namespace,然后在该namespace中寻找指定名字特定种类的global object。决议一个unqualified qualifiable name,首先在本namespace中寻找指定名字特定种类的global object,找到则决议成功,否则在所有被import的namespace中寻找指定名字特定种类的global object,若找到且只找到一个则决议成功,若找到多个则名字含混。例:
alias "http://schemas.example.com/projecta" as nsa;
alias "http://schemas.example.com/projectb" as nsb;
xnamespace {nsa} {
    type T1 restrict Int32;//unqualified qualifiable name
    type T2 restrict sys:String;//qualified qualifiable name
}
xnamespace {nsb} {
    import {nsa} as pa;
    type T1 restrict pa:T1;
    type TB restrict T2;
    type TC restrict T1;
}
上例中,restrict关键字后是指定要被约束的global type的qualifiable name。要决议"Int32"这个qualifiable name,首先在本namespace中寻找名叫Int32的global type,没有,接着在所有被import的namespace中寻找,找到且只找到一个,sys:Int32,决议成功。同理,{nsb}.TC中的"T1"这个qualifiable name被决议为{nsb}.T1而不是pa:T1。下例中,编译器无法决议{nsc}.TA中的"T1",因为找到两个,含混的:
alias "http://schemas.example.com/projecta" as nsa;
alias "http://schemas.example.com/projectb" as nsb;
alias "http://schemas.example.com/projectc" as nsc;
xnamespace {nsa} {
    type T1 ...;
}
xnamespace {nsb} {
    type T1 ...;
}
xnamespace {nsc} {
    import {nsa} as pa;
    import {nsb} as pb;
    type TA restrict T1;
}

物理对象、逻辑对象及对象合并

考虑下面的代码:
alias "http://schemas.example.com/projecta" as nsa;
xnamespace {nsa} {
    type T1 restrict Int32;
}
xnamespace {nsa} {
    type T2 restrict Int32;
}
这里声明了两个物理namespace,因为它们拥有相同的uri,编译器会把它们合并成一个逻辑namespace,包含T1和T2两个object。可以这么说,namespace是可合并的,这和C#的namespace一样。C#还有一个特性,如果给class, struct, interface的声明加上partial修饰符,那么多个具有相同名字的physical class declaration会被C#编译器合并成一个logical class declaration。
MX的对象总是可合并的,合并是这样进行的,首先合并具有相同uri的namespace,然后合并namespace中相同种类(type与type合并,element和element合并)相同名字的global object,后面会讲到object合并的细节。例:
//some.mx
alias "http://schemas.example.com/projecta" as nsa;
xnamespace {nsa}[derivationprohibition: all] {
    type T1 restrict Int32
        facets{ valuerange: [100..999] };
    ;
}
//some.mxcs
using System;
xnamespace { "http://schemas.example.com/projecta" }[namespace: Example.ProjectA] {
    type T1
        ##{
            public void SomeMethod(){ Console.WriteLine(Value); }
        }
    ;
}
对于上面的例子,编译器会合并出这样的逻辑结果:
xnamespace { "http://schemas.example.com/projecta" }[namespace: Example.ProjectA; derivationprohibition: all] {
    type T1 restrict Int32
        facets{ valuerange: [100..999] };
        ##{
            public void SomeMethod(){ Console.WriteLine(Value); }
        }
    ;
}
对象合并的要点是合并:你出信息A,我出信息B,那么我们共同出了信息A、B,如果我们都出了信息C,那么它必须相等。下面的代码代码无法通过编译,因为C# namespace name不等:
xnamespace {nsa}[namespace: Example.ProjectA; derivationprohibition: all] { }
xnamespace {nsa}[namespace: Example.ProjectB; derivationprohibition: all] { }

Type

文法:
TypeOrRef: Type | QualifiableName
Type: 'type' Name? TypeAnnotations? (TypeList | TypeUnion | TypeDirectness | TypeExtension | TypeRestriction)? CodeBlock? ';' 
TypeAnnotations: '[' (TypeAnnotation (';' TypeAnnotation)* ';'?)? ']'
TypeAnnotation: Abstract | Mixed | DerivationProhibition | InstanceProhibition 
Abstract: 'abstract'
Mixed: 'mixed'
TypeList: 'list' TypeOrRef
TypeUnion: 'unite' '{' TypeUnionMember+ '}'
TypeUnionMember: 'member' Name ('as' TypeOrRef)? ';'
TypeDirectness: RootAttributeSet RootChildStruct? | RootChildStruct
TypeExtension: 'extend' TypeOrRef RootAttributeSet? RootChildStruct?
TypeRestriction: 'restrict' TypeOrRef RootAttributeSet? (FacetSet | RootChildStruct)?
CodeBlock: '##' ('as' IdentifierToken)? C#AttributeList* (':' C#InterfaceName (',' C#InterfaceName)*)? ('{' C#ClassMember* '}')?
Literal: StringLiteralToken | 'true' | 'false' | ('+' | '-')? C#NumericLiteralToken | UriOrAlias IdentifierToken?
FacetSet: 'facets' '{' Facets? '}' CodeBlock? ';'
Facets: Facet (';' Facet)* ';'?
Facet: LengthRangeFacet | DigitsFacet | ValueRangeFacet | EnumerationsFacet | PatternsFacet | WhitespaceFacet
FacetAnnotations: '[' ('fixed' ';'?)? ']'
LengthRangeFacet: 'lengthrange' ':' (NumericFacet '..' NumericFacet? | '..' NumericFacet)
DigitsFacet: 'digits' ':' (NumericFacet '..' NumericFacet? | '..' NumericFacet)
NumericFacet: NumericLiteralToken FacetAnnotations?
ValueRangeFacet: 'valuerange' ':' (LowerValueFacet '..' UpperValueFacet? | '..' UpperValueFacet)
LowerValueFacet: ('[' | '(') Literal FacetAnnotations?
UpperValueFacet: Literal FacetAnnotations? (']' | ')')
EnumerationsFacet: 'enums' ':' EnumerationsFacetItems
EnumerationsFacetItems: EnumerationsFacetItem (',' EnumerationsFacetItem)* ','?
EnumerationsFacetItem: (IdentifierToken '=')? Literal
PatternsFacet: 'patterns' ':' StringLiteralTokens
StringLiteralTokens: StringLiteralToken (',' StringLiteralToken)* ','?
WhitespaceFacet: 'whitespace' ':' ('preserve' | 'replace' | 'collapse') FacetAnnotations?

直接在namespace中声明的是global type,否则是local type,global type必须命名,local type不能命名。
类型层次:
Type(anyType)
  |-<ComplexType>
  |-SimpleType(anySimpleType)
     |-<UnitedSimpleType>
     |-<ListedSimpleType>
     |  |-IdRefs(IDREFS)
     |  |-NameTokens(NMTOKENS)
     |  |-Entities(ENTITIES)
     |-<AtomicSimpleType>
        |-String(string)
        |  |-NormalizedString(normalizedString)
        |    |-Token(token)
        |      |-Language(language)
        |      |-NameToken(NMTOKEN)
        |      |-Name
        |        |-NonColonizedName(NCName)
        |          |-Id(ID)
        |          |-IdRef(IDREF)
        |          |-Entity(ENTITY)
        |-Decimal(decimal)
        |  |-Integer(integer)
        |    |-NonPositiveInteger(nonPositiveInteger)
        |    |  |-NegativeInteger(negativeInteger)
        |    |-NonNegativeInteger(nonNegativeInteger)
        |    |  |-PositiveInteger(positiveInteger)
        |    |  |-UInt64(unsignedLong)
        |    |    |-UInt32(unsignedInt)
        |    |      |-UInt16(unsignedShort)
        |    |        |-Byte(unsignedByte)
        |    |-Int64(long)
        |      |-Int32(int)
        |        |-Int16(short)
        |          |-SByte(byte)
        |-Single(float)
        |-Double(double)
        |-Boolean(boolean)
        |-Uri(anyURI)
        |-FullName(QName)
        |-Base64Binary(base64Binary)
        |-HexBinary(hexBinary)
        |-TimeSpan(duration)
        |-DateTime(dateTime)
        |-Date(date)
        |-Time(time)
        |-YearMonth(gYearMonth)
        |-Year(gYear)
        |-MonthDay(gMonthDay)
        |-Month(gMonth)
        |-Day(gDay)

上图中,节点的名字是type name,它们都是global type,存在于system namespace中,尖括号中的节点是抽象的,仅为了构造类型层次,不能被引用,小括号中的是XSD中对应的名字。
Type是根,表示任意XML数据,simple type是只能被文本表达的数据类型,complex type是能被element表达的数据类型。在语义上,类型间存在两种派生(derivation)方式,约束(restriction)和扩展(extension),约束派生,子类缩小了(或不改变)父类的值范围或约束了父类的成员,扩展派生用于自定义的complex type,如同面向对象编程语言中的继承,子类首先继承了父类的成员,然后添加自己的成员,上图中类型间全是约束派生。
Atomic simple type,无法被分割的简单类型,它的派生类多数不需要解释,有些仅为兼容XML Schema,若用不到请直接无视。
Listed simple type,简单类型的列表,它的条目类型不能是listed simple type,也就是不能list of lists,例:
type Int32List list Int32;
United simple type由一到多个member simple type组成,每个成员的名字必须唯一,类型为引用的成员必须声明在类型为local type的成员之前。验证数据时,按成员的宣告顺序对XML数据进行验证,若成功则停止验证,united simple type的结果类型是该成员的类型,若XML数据没通过任何成员类型的验证,则验证失败,例:
type Length unite {
    member Number as UInt64;
    member Max as
        type restrict String facets { enums: "Max" };;
    ;
};
上例中,若XML数据是数字,那么Length的类型是UInt64,若XML数据是Max,那么Length的类型是第二个成员的local type。
因为simple type能被文本表达,那么除String及派生类型以外的所有类型都拥有文本格式,浮点数类型、时间类型等可以拥有多种文本格式,XML Schema规定了一个标准(canonical)格式,对于标准格式,文本前后的XML whitespace chars会被去掉,如下表:
Simple type 文本格式 标准文本格式
Decimal及派生类型 同常规 去掉开头的+及0
Single及Double 同常规以及NaN, INF, -INF "0.0###############E0"以及NaN, INF, -INF
Boolean true, false, 1, 0 true, false
Uri 字符串值 字符串值
FullName "{URI?}LocalName",例"{http://example.com}T1","{}T1" 同左
Base64Binary 同常规 同常规
HexBinary 同常规 大写A-F
TimeSpan "PnYnMnDTnHnMnS",n是数字,YMD年月日,T日期时间分割符,HMS时分秒,例"P1Y2M4DT6H37M54S" 同左
DateTime "yyyy-MM-ddTHH:mm:ss.FFFFFFF", "yyyy-MM-ddTHH:mm:ss.FFFFFFFZ", "yyyy-MM-ddTHH:mm:ss.FFFFFFFzzz" 第二个格式
Date "yyyy-MM-dd", "yyyy-MM-ddZ", "yyyy-MM-ddzzz" 第二个格式
Time "HH:mm:ss.FFFFFFF", "HH:mm:ss.FFFFFFFZ", "HH:mm:ss.FFFFFFFzzz" 第二个格式
YearMonth "yyyy-MM", "yyyy-MMZ", "yyyy-MMzzz" 第二个格式
Year "yyyy", "yyyyZ", "yyyyzzz" 第二个格式
MonthDay "--MM-dd", "--MM-ddZ", "--MM-ddzzz" 第二个格式
Month "--MM", "--MMZ", "--MMzzz" 第二个格式
Day "---dd", "---ddZ", "---ddzzz" 第二个格式
ListedSimpleType 条目用XML whitespace chars分割 条目用' '分割

MX中的literal可以表达任意simple type的值,所有类型都可以使用string literal,遵守上表的文本格式,以下类型可以直接书写:Boolean(true, false),Decimal及派生类型,Single, Double,Uri,FullName。Uri的例子:{nsa},它和"http://schemas.example.com/projecta"等价。FullName的例子:{nsa}T1,它和"{http://schemas.example.com/projecta}T1"等价。
上面的类型层次间全是约束派生,这些约束是系统内置的,只可意会,无法表达。但String到它派生类,Decimal到它派生类的约束是可以表达出来的,用facet表达,存在以下的facet:
  • Length Range:长度范围,".."前面指定最小长度,后面指定最大长度,可以只指定一个,如果最小长度等于最大长度,则为固定长度,适用于String及派生类型(字符数),Base64Binary、HexBinary(字节数),listed simple type(条目数)。
  • Digits:定点数或整数的总位数及定点数的小数位数,".."前面指定总位数,后面指定小数位数,可以只指定一个,小数位数只适用于Decimal,总位数适用于Decimal及派生类型,从Decimal到Integer的派生可以这么表达:
type Integer restrict Decimal
    facets{
        digits: ..0;
    };
;
另一个例子:
type Money restrict Decimal
    facets{
        digits: 19..2;
    };
;
  • Value Range:值范围,".."前面指定最小值,后面指定最大值,可以只指定一个,使用literal表达值,""或""表示包含,"("或")"表示不包含,适用于Decimal及派生类型,Single,Double,TimeSpan及时间类型,例:
type T1 restrict Int32
    facets{
        valuerange: [100..1000);
    };
;
上例约束Int32值大于等于100且小于1000。从Integer到Int64的派生可以这么表达:
type Int64 restrict Integer
    facets{
        valuerange: [-9223372036854775808..9223372036854775807];
    };
;
  • Enumerations:指定枚举值,使用literal表达条目值,适用于所有类型,验证时,数据的值必须等于条目值之一,例:
type Color restrict String
    facets{
        enums: Red = "Red", "Green", "Blue";
    };
;
如果给条目取了名字,那么编译器将在生成的class中生成同名的static readonly property,上例生成的代码如下:
public partial class Color : global::Metah.X.String {
    public static readonly string @Red = @"Red";
    //...
}
下例模拟了编程语言中的bitwise-or枚举:
type SomeFlags restrict Int32
    facets{
        enums: Flag1 = 1, Flag2 = 2, Flag3 = 4, Flag4 = 8, All = 15;
    };
;
  • Patterns:指定正则表达式,它由一到多个子pattern构成,子pattern以或的方式合并成结果pattern,适用于所有类型,验证时,用正则表达式检查值的标准格式的文本,例:
type T1 restrict String
    facets{
        patterns: "abc", "def";
    };
;
上例中的结果pattern是"(abc)|(def)"。
  • Whitespace:严格的说,它是action而不是rule,适用于String和NormalizedString,它对文本中的XML whitespace chars施以下面的行为之一:preserve,保护空白;replace,将'\n', '\r', '\t'替换成' ';collapse,先施以replace的行为,对于多个连续的空格,替换成一个,最后去掉前后的空格。NormalizedString, Token, Language是这样派生的:
type NormalizedString restrict String
    facets{
        whitespace: replace;
    };
;
type Token restrict NormalizedString
    facets{
        whitespace: collapse;
    };
;
type Language restrict Token
    facets{
        patterns: @"[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*";
    };
;
可以从一个simple type约束派生出另一个simple type,facet set是可选的,例:
type Int32List2 restrict Int32List;
type Int32List3 restrict Int32List2
    facets{
        lengthrange: 1..;
    };
;
若子类未声明facet set,则继承父类的facet set,若子类未声明某facet,则继承自父类,若子类和父类都声明了某facet,则子类只能施加比父类更强或相等的约束,对于lengthrange,valuerange,更窄的长度/值范围,例:
type T1 restrict String
    facets{
        lengthrange: 10..40;
    };
;
type T2 restrict T1
    facets{
        lengthrange: ..20;//min length继承自父类,即10..20
    };
;
type T3 restrict Int32
    facets{
        valuerange: [100..1000);
    };
;
type T4 restrict T3
    facets{
        valuerange: (100..;//upper value继承自父类,即(100..1000)
    };
;
此外,总位数和小数位数必须小于或等于父类的值,枚举值必须在父类的枚举值中,whitespace的处理强度按preserve, replace, collapse递增,可以指定任意新的patterns,值的文本必须满足父类及子类的patterns,即施加了更强的约束。
如果为facet加上fixed annotation,则派生类不能施加更强的约束:
type T1 restrict String
    facets{
        lengthrange: 10[fixed]..40[fixed];
    };
;
type T2 restrict Int32
    facets{
        valuerange: [100[fixed]..1000[fixed]);
    };
;
type Money restrict Decimal
    facets{
        digits: 19[fixed]..2[fixed];
    };
;
type T4 restrict String
    facets{
        whitespace: replace[fixed];
    };
;
不能为enums和patterns加上fixed annotation。
回忆前面列出的SDOM代码,Metah.X.SimpleType.Value类型为object,其派生class特定化了Value的类型,如下表:
Simple type Value property CLR/C# type
ListedSimpleType Metah.X.ListedSimpleTypeValue
UnitedSimpleType Metah.X.UnitedSimpleTypeValue
String及派生类 string
Decimal及所有的integer decimal?
Int64 long?
Int32 int?
Int16 short?
SByte sbyte?
UInt64 ulong?
UInt32 uint?
UInt16 ushort?
Byte byte?
Single float?
Double double?
Boolean bool?
Uri System.Xml.Linq.XNamespace
FullName Metah.X.FullNameValue
Base64Binary及HexBinary byte[]
TimeSpan System.TimeSpan?
所有的时间类型 System.DateTime?

编译器为每个用户定义的simple type生成的一个class,对于listed simple type,编译器生成的class继承自ListedSimpleType<T>:
public partial class Int32List : global::Metah.X.ListedSimpleType<int?> { ... }
注意,ListedSimpleType<T>的type argument是条目类型的Value的类型(int?),不是条目类型的class的类型(Metah.X.Int32)。
对于united simple type,编译器生成的class继承自UnitedSimpleType:
public partial class Length : global::Metah.X.UnitedSimpleType {
    public ulong? Number { get; set; }
    public string Max { get; set; }
}
编译器为每个成员生成一个同名的porperty,从程序的角度看联合类型,是在候选的成员类型中选择一个,如果为Number property赋了值,即选择了它,那么Max property将被自动置为null,反之亦然。
对于用户定义的约束派生的简单类型,编译器生成的class继承自父类的class。
Complex type分为simple child complex type和complex child complex type,对于前者,element数据的children是text-only,对于后者,element数据的children是element-only或mixed。Complex type拥有零到多个attribute。
Simple child complex type从simple type扩展或从其它simple child complex type扩展或约束:
type T1 extend Int32;//T1是simple child complex type
type T2 extend T1
    attributes{
        attribute A1 as Boolean;
    };
};
type T3 extend Decimal
    attributes{
        attribute A1 as Int64;
    };
;
type T4 restrict T3
    facets{
        valuerange: [20..;
    };
;
Complex child complex type可以直接声明,也可以从其它complex child complex type扩展或约束:
type T1//直接声明
    attributes{
        attribute A1 as Boolean;
    };
;
type T2//直接声明
    attributes{
        attribute A1[?] as Boolean;
        attribute A2 as Int64;
    };
    children{
        element E1 as T1;
        element E2 as Int64;
    };
;
type T3 extend T1
    children{
        element E1 as String;
    };
;
type T4 restrict T2
    attributes{
        attribute A2 as Int32;
    };
    children{
        element E1 as T1;
        element E2 as Int32;
    };
;
Complex type的父类只能是global type的引用,不能是local type。对于扩展派生,若子类未声明attribute set或children,则继承自父类的。对于约束派生,若子类未声明attribute set或facet set或children,则表示不约束父类的。
派生其实有两个指代,语法上的和语义上的,在语义上只存在约束和扩展两种派生方式,在语法上则存在约束、扩展、列表、联合四种派生方式。语法派生和语义派生也许相同也许不同,下面只说不同之处:listed simple type在语法上列表派生自条目类型,语义上约束派生自sys:SimpleType;united simple type语法上联合派生自所有的成员类型,语义上约束派生自sys:SimpleType;从simple type扩展的simple child complex type在语法上扩展派生自simple type,语义上约束派生自sys:Type;直接声明的complex child complex type在语法上没有派生,在语义上约束派生自sys:Type。
如同C#的sealed修饰符,可以为global type指定derivation prohibition annotation,表示不能从此类型进行语法上的派生,它的值是none, extend, restrict, list, unite, all这六个值的一个或组合(bitwise-or),因为none的值是0,它与其它值组合,它没有意义,又因为all代表所有值,它与其它值组合,其它值没有意义。若type中指定了derivation prohibition annotation,则使用它,否则使用namespace中指定的值,若namespace中也未指定,默认为none。例:
type T1[derivationprohibition: restrict, extend, list, unite] restrict Int32;
type T2 restrict T1;//ERROR: Restriction derivation prohibited
type T3 list T1;//ERROR: List derivation prohibited
type T4 unite {
    member M as T1;//ERROR: Union derivation prohibited
};
type T5 extend T1;//ERROR: Extension derivation prohibited
可以为global complex type指定abstract annotation,意义如同编程语言的abstract关键字,且生成的class是abstract的。若complex child complex type指定了mixed annotation,则element数据的children是mixed的,否则是element-only的。Instance prohibition annotation将在后面讲述。
对于complex type,编译器生成的class要么继承自父类的class,要么继承自Metah.X.ComplexType。生成的attribute set nested class要么继承自父类的attribute set nested class,要么继承自Metah.X.AttributeSet。对于下面的MX:
type T1
    attributes{};
;
type T2 extend T1
    attributes{};
;
type T3 restrict T2
    attributes{};
;
编译器将生成如下的代码:
public partial class T1 : global::Metah.X.ComplexType {
    public partial class AttributeSetClass : global::Metah.X.AttributeSet { ... }
    new public AttributeSetClass AttributeSet { get; set; }
    public AttributeSetClass EnsureAttributeSet();
}
public partial class T2 : T1 {
    new public partial class AttributeSetClass : T1.AttributeSetClass { ... }
    new public AttributeSetClass AttributeSet { get; set; }
    new public AttributeSetClass EnsureAttributeSet();
}
public partial class T3 : T2 {
    new public partial class AttributeSetClass : T2.AttributeSetClass { ... }
    new public AttributeSetClass AttributeSet { get; set; }
    new public AttributeSetClass EnsureAttributeSet();
}
对于simple child complex type,编译器也许会生成一个simple child nested class。对于下面的MX:
type T1 extend Int32;
type T2 restrict T1
    facets{};
;
编译器将生成如下的代码:
public partial class T1 : global::Metah.X.ComplexType {
    new public global::Metah.X.Int32 SimpleChild { get; set; }
    public global::Metah.X.Int32 EnsureSimpleChild();
    new public int? Value { get; set; }
}
public partial class T2 : T1 {
    public partial class SimpleChildClass : global::Metah.X.Int32 { ... }
    new public SimpleChildClass SimpleChild { get; set; }
    new public SimpleChildClass EnsureSimpleChild();
    new public int? Value { get; set; }
}
可以直接给Value property赋值,它将自动调用EnsureSimpleChild()。如何为T2.SimpleChildClass指定code block?回顾前面的文法,facet set中也可以指定code block,它是专门为这种情况准备的,其余情况下,在type中指定code block。
对于complex child complex type,生成的complex child nested class要么继承自父类的complex child nested class,要么继承自Metah.X.ChildContainer。对于下面的MX:
type T1
    children{};
;
type T2 restrict T1
    children{};
;
type T3 extend T2
    children{};
;
编译器将生成如下的代码:
public partial class T1 : global::Metah.X.ComplexType {
    public partial class ComplexChildClass : global::Metah.X.ChildContainer { ... }
    new public ComplexChildClass ComplexChild { get; set; }
    public ComplexChildClass EnsureComplexChild();
}
public partial class T2 : T1 {
    new public partial class ComplexChildClass : T1.ComplexChildClass { ... }
    new public ComplexChildClass ComplexChild { get; set; }
    public ComplexChildClass EnsureComplexChild();
}
public partial class T3 : T2 {
    new public partial class ComplexChildClass : T2.ComplexChildClass { ... }
    new public ComplexChildClass ComplexChild { get; set; }
    public ComplexChildClass EnsureComplexChild();
}

Attribute

文法:
RootAttributeSet: 'attributes' '{' AttributeSetMember* AttributesWildcard? '}' CodeBlock? ';'
GlobalAttributeSet: 'attributes' Name '{' AttributeSetMember* AttributesWildcard? '}' ';'
AttributeSetMember: LocalAttribute | AttributeRef | AttributeSetRef
AttributeSetRef: 'attributesref' QualifiableName? AttributeSetRefAnnotations? ';'
AttributeSetRefAnnotations: '[' (AttributeSetRefAnnotation (';' AttributeSetRefAnnotation)* ';'?)? ']'
AttributeSetRefAnnotation: MemberName
LocalAttribute: 'attribute' Name? LocalAttributeAnnotations? ('as' TypeOrRef)? CodeBlock? ';'
LocalAttributeAnnotations: '[' (LocalAttributeAnnotation (';' LocalAttributeAnnotation)* ';'?)? ']'
LocalAttributeAnnotation: Qualification | Optional | MemberName | DefaultOrFixedValue
GlobalAttribute: 'attribute' Name GlobalAttributeAnnotations? ('as' TypeOrRef)? CodeBlock? ';'
GlobalAttributeAnnotations: '[' (GlobalAttributeAnnotation (';' GlobalAttributeAnnotation)* ';'?)? ']'
GlobalAttributeAnnotation: DefaultOrFixedValue
AttributeRef: 'attributeref' QualifiableName? AttributeRefAnnotations? CodeBlock? ';'
AttributeRefAnnotations: '[' (AttributeRefAnnotation (';' AttributeRefAnnotation)* ';'?)? ']'
AttributeRefAnnotation: Optional | MemberName | DefaultOrFixedValue
AttributesWildcard: 'wildcard' Wildcard? AttributesWildcardAnnotations? ';'
AttributesWildcardAnnotations: '[' (AttributesWildcardAnnotation (';' AttributesWildcardAnnotation)* ';'?)? ']'
AttributesWildcardAnnotation: MemberName
Wildcard: WildcardUris ValidationMode
WildcardUris: WildcardUri (',' WildcardUri)* ','
WildcardUri: 'any' | 'other' | 'this' | 'unqualified' | UriOrAlias
ValidationMode: 'skipvalidate' | 'tryvalidate' | 'mustvalidate'
Optional: '?'
MemberName: 'membername' ':' IdentifierToken
DefaultOrFixedValue: ('default' | 'fixed') ':' Literal

Global attribute的full name的namespace uri总是所在namespace的uri,即它总是qualified的,local attribute的full name的namespace uri由qualification annotation决定,若attribute中指定了就是用其值,否则使用所在namespace中指定的值,若namespace中未指定,默认是unqualified,若qualification的值是qualified,则local attribute的full name的namespace uri是所在namespace的uri,否则是empty uri。Attribute ref用来引用global attribute,它继承了global attribute的所有特性,比如full name, type。
Global attribute set在namespace中声明,它只是语法上的宏,通过attribute set ref来引用。Root attribute set在complex type中声明,在语义上,只存在root attribute set及“平坦化”了的成员,包括零到多个成员attribute(local attribute和attribute ref)和一个可选的attributes wildcard,成员attribute的full name必须唯一。验证时,attribute数据通过full name和成员attribute声明匹配,若未匹配则尝试通过full name的namespace uri匹配attributes wildcard,wildcard由一到多个wildcard uri组成,含义如下:
  • any:任意uri(non-empty或empty)
  • other:不等于本namespace uri的non-empty uri
  • this:本namespace uri
  • unqualified:empty uri
  • Uri or alias:指定的uri
other和unqualified组合是非法的。Validation mode的含义如下:
  • skipvalidate:不验证通配到的attribute数据
  • tryvalidate:若通配到的attribute数据是qualified,则查找相同full name的global attribute声明,若找到则用它对attribute数据进行验证
  • mustvalidate:同tryvalidate,若未找到global attribute声明,则验证出错
成员attribute可以声明optional annotation,表明它是可选的,否则是必须的。Attribute还可以声明default or fixed value annotation,验证时,若不存在attribute数据,validator将自动生成一个,值为声明的值。若声明为fixed的且存在attribute数据,则它的值必须等于声明的值。
Attribute set member可以声明member name annotation,若声明了则使用其值,否则通过下面的方法获得默认值:
  • Local attribute:自己的名字
  • Attribute ref:global attribute的名字
  • Attribute set ref:global attribute set的名字
  • AttributesWildcard:"Wildcard"
Attribute set member的member name必须唯一,回忆前面提到的多个物理对象合并成一个逻辑对象,attribute set member通过member name进行合并。
例:
alias "http://schemas.example.com/projecta" as nsa;
xnamespace {nsa} [namespace: Example.ProjectA] {
    attribute GA1 as Int32;
    attributes AS1{
        attribute A1[?] as Int32;
        attributeref GA1;
    };
    type T1
        attributes{
            attribute A1[qualified; membername: QA1] as Int32;
            attributesref AS1;
            wildcard other tryvalidate;
        };
    ;
}
对于complex type的扩展派生,子类的attribute set包括继承自父类的attribute set的成员和自己的成员。对于complex type的约束派生,子类的attribute set将约束父类的attribute set,子类声明的是约束父类attribute set后的结果,子类的attribute通过full name和父类的attribute关联,存在如下的约束方法:
  • 若父类的attribute是可选的,子类的attribute可声明为必须的(即去掉optional annotation),也可以去掉它,即不声明它
  • 子类的attribute的type等于或语义约束派生自父类的attribute的type
  • 将default value约束成fixed value
  • Attributes wildcard的uris等于或更窄于父类的attributes wildcard的uris,validation mode等于或更严格(严格程度:mustvalidate > tryvalidate > skipvalidate),也可以去掉父类的attributes wildcard,即不声明它
  • 子类的attribute set中可以添加新的attribute,只要其uri符合父类的attributes wildcard的uris
对wildcard的uri存在如下的约束方法:
  • 把any约束成其它值
  • 把other约束成特定的不等于本namespace uri的non-empty uri
例:
alias "http://schemas.example.com/projecta" as nsa;
xnamespace {nsa} [namespace: Example.ProjectA] {
    attribute GA1 as Int32;
    attributes AS1{
        attribute A1[?] as Int32;
        attributeref GA1;
    };
    type T1
        attributes{
            attribute A1[qualified; membername: QA1] as Int32;
            attributesref AS1;
            wildcard other tryvalidate;
        };
    ;
    type T2 restrict T1
        attributes{
            attribute A1[qualified; membername: QA1] as Int32;
            attribute GA1[qualified] as Int16;
            wildcard {"http://ns1"} mustvalidate;
        };
    ;
}
可以看出,约束是在语义上进行的。
对于上面的例子,编译器将生成这样的代码:
namespace Example.ProjectA {
    public partial class GA1_AttributeClass : global::Metah.X.Attribute {
        new public global::Metah.X.Int32 Type { get; set; }
        new public global::Metah.X.Int32 EnsureType();
        new public int? Value { get; set; }
        public static readonly global::System.Xml.Linq.XName ThisName = global::System.Xml.Linq.XName.Get(@"GA1", @"http://schemas.example.com/projecta");
        //...
    }
    public partial class T1 : global::Metah.X.ComplexType {
        public partial class AttributeSetClass : global::Metah.X.AttributeSet {
            public partial class QA1_Class : global::Metah.X.Attribute {
                new public global::Metah.X.Int32 Type { get; set; }
                new public global::Metah.X.Int32 EnsureType();
                new public int? Value { get; set; }
                public static readonly global::System.Xml.Linq.XName ThisName = global::System.Xml.Linq.XName.Get(@"A1", @"http://schemas.example.com/projecta");
                //...
            }
            public global::Example.ProjectA.T1.AttributeSetClass.QA1_Class QA1 { get; set; }
            public global::Example.ProjectA.T1.AttributeSetClass.QA1_Class Ensure_QA1(bool @try = false);
            public int? QA1_Value { get; set; }
            //
            public partial class A1_Class : global::Metah.X.Attribute {
                new public global::Metah.X.Int32 Type { get; set; }
                new public global::Metah.X.Int32 EnsureType();
                new public int? Value { get; set; }
                public static readonly global::System.Xml.Linq.XName ThisName = global::System.Xml.Linq.XName.Get(@"A1", @"");
                //...
            }
            public global::Example.ProjectA.T1.AttributeSetClass.A1_Class A1 { get; set; }
            public global::Example.ProjectA.T1.AttributeSetClass.A1_Class Ensure_A1(bool @try = false);
            public int? A1_Value { get; set; }
            //
            public partial class GA1_Class : global::Metah.X.Attribute {
                new public global::Example.ProjectA.GA1_AttributeClass ReferentialAttribute { get; set; }
                new public global::Metah.X.Int32 Type { get; set; }
                new public global::Metah.X.Int32 EnsureType();
                new public int? Value { get; set; }
                public static readonly global::System.Xml.Linq.XName ThisName = global::System.Xml.Linq.XName.Get(@"GA1", @"http://schemas.example.com/projecta");
                //...
            }
            public global::Example.ProjectA.T1.AttributeSetClass.GA1_Class GA1 { get; set; }
            public global::Example.ProjectA.T1.AttributeSetClass.GA1_Class Ensure_GA1(bool @try = false);
            public int? GA1_Value { get; set; }
            //...
        }
        //...
    }
    public partial class T2 : global::Example.ProjectA.T1 {
        new public partial class AttributeSetClass : global::Example.ProjectA.T1.AttributeSetClass {
            new public partial class QA1_Class : global::Example.ProjectA.T1.AttributeSetClass.QA1_Class {
                //...
            }
            new public global::Example.ProjectA.T2.AttributeSetClass.QA1_Class QA1 { get; set; }
            new public global::Example.ProjectA.T2.AttributeSetClass.QA1_Class Ensure_QA1(bool @try = false);
            public int? QA1_Value { get; set; }
            //
            new public partial class GA1_Class : global::Example.ProjectA.T1.AttributeSetClass.GA1_Class {
                new public global::Metah.X.Int16 Type { get; set; }
                new public global::Metah.X.Int16 EnsureType();
                new public short? Value { get; set; }
                //...
            }
            new public global::Example.ProjectA.T2.AttributeSetClass.GA1_Class GA1 { get; set; }
            new public global::Example.ProjectA.T2.AttributeSetClass.GA1_Class Ensure_GA1(bool @try = false);
            new public short? GA1_Value { get; set; }
            //...
        }
        //...
    }
}
可以看出,对于global attribute,编译器在namespace中生成名叫<Name>_AttributeClass的class,对于local attribute和attribute ref,编译器在AttributeSetClass中生成名叫<MemberName>_Class的nested class和名叫<MemberName>的property,Ensure_<MemberName>()方法,以及名叫<MemberName>_Value的value property,可以直接给value property赋值,它将自动调用Ensure_<MemberName>()。此外,对于attribute ref,还生成了一个名叫ReferentialAttribute的property,类型是global attribute的类型,这体现了它是global attribute的引用这一本质。使用生成的代码是简单明了的:
    type T1
        ##{
            public void Set(){
                var attset = EnsureAttributeSet();
                attset.QA1_Value = 123;
                var a1 = attset.Ensure_A1(true);
                if(a1 != null) a1.Value = 456;
                attset.GA1_Value = 789;
                //也可以这样: attset.Ensure_GA1().ReferentialAttribute = new GA1_AttributeClass { Value = 789 };
                attset.Add(new X.Attribute(XName.Get("{http://ns1}SomeAtt")) { Value = DateTime.Now });//wildcard attribute
            }
        }
    ;

Child

文法:
RootChildStruct: 'children' '{' ChildStructMember* '}' CodeBlock? ';'
LocalChildStruct: ('seq' | 'choice' | 'unordered') ChildAnnotations? '{' ChildStructMember* '}' CodeBlock? ('*' CodeBlock)? ';'
GlobalChildStruct: ('seq' | 'choice' | 'unordered') Name '{' ChildStructMember* '}' ';'
ChildStructMember: LocalChildStruct | ChildStructRef | LocalElement | ElementRef | ElementWildcard
ChildStructRef: 'childstructref' QualifiableName? ChildAnnotations? CodeBlock? ('*' CodeBlock)? ';'
LocalElement: 'element' Name? LocalElementAnnotations? ('as' TypeOrRef)? IdentityConstraint* CodeBlock? ('*' CodeBlock)? ';'
LocalElementAnnotations: '[' (LocalElementAnnotation (';' LocalElementAnnotation)* ';'?)? ']'
LocalElementAnnotation: Occurrence | MemberName | Qualification | DefaultOrFixedValue | Nullable | InstanceProhibition
GlobalElement: 'element' Name GlobalElementAnnotations? ('as' TypeOrRef)? IdentityConstraint* CodeBlock? ';'
GlobalElementAnnotations: '[' (GlobalElementAnnotation (';' GlobalElementAnnotation)* ';'?)? ']'
GlobalElementAnnotation: DefaultOrFixedValue | Nullable | InstanceProhibition | Abstract | Substitution | DerivationProhibition
ElementRef: "elementref" QualifiableName? ChildAnnotations? CodeBlock? ('*' CodeBlock)? ';'
ElementWildcard: 'wildcard' Wildcard? ChildAnnotations? CodeBlock? ('*' CodeBlock)? ';'
ChildAnnotations: '[ ' (ChildAnnotation (';' ChildAnnotation)* ';'?)? ']'
ChildAnnotation: Occurrence | MemberName
Occurrence: NumericLiteralToken '..' NumericLiteralToken? | '?' | '*' | '+'
Nullable: 'nullable'
Substitution: 'substitute' ':' QualifiableName
IdentityConstraint: ((('key' | 'unique') Name) | ('keyref' Name KeyRefAnnotations? 'ref' QualifiableName)) 'as' PathExpression '=>' PathExpressions ';'
KeyRefAnnotations: '[ ' ('splitlistvalue' ';'?)? ']'
PathExpressions: PathExpression (',' PathExpression)* ','?
PathExpression: Path ('|' Path)*
Path: Step ('/' Step)*
Step: ('.' '**'?) | '**' | ('@'? ('*' | (UriOrAlias (IdentifierToken | '*'))))

Global child struct在namespace中声明,它只是语法上的宏,通过child struct ref来引用。Global和local child struct有三种类型,sequence, choice和unordered,它们指定了其成员的结构形式,sequence,成员按声明顺序排列;choice,在成员中选择一个;unordered,成员是无序的,其成员只能是max occurrence等于1的local element和element ref,且它或引用它的child struct ref只能是root child struct的唯一直接成员。Root child struct在complex type中声明,在语义上,它是一个sequence child struct。
Child struct member可以指定occurrence annotation,'..'左边指定最小值,右边指定最大值,最大值不能是0且必须大于或等于最小值,若省略最大值,则表示无穷大,可以使用简略格式?(0..1),*(0..),+(1..),默认值是1..1。
Global element的full name的namespace uri总是所在namespace的uri,即它总是qualified的,local element的full name的namespace uri由qualification annotation决定,若element中指定了就是用其值,否则使用所在namespace中指定的值,若namespace中未指定,默认是qualified,若qualification的值是qualified,则local element的full name的namespace uri是所在namespace的uri,否则是empty uri。Element ref用来引用global element,它继承了global element的所有特性,比如full name, type。
验证时,element数据通过full name和成员element声明匹配,或通过namespace uri同element wildcard声明匹配,考虑下面的例子:
xnamespace {nsa}{
    type T1
        children{
            seq[+]{
                element E1[*] as String;
                element E2[?] as String;
                choice{
                    element E3 as String;
                    element E4 as String;
                };
            };
        };
    ;
}
以及下面的XML数据片段:
<!--假设prefix p已被赋值了合适的uri-->
<p:E1 />
<p:E1 />
<p:E2 />
<p:E3 />
<p:E3 />
<p:E4 />
<p:E2 />
<p:E3 />
Schema的声明是结构化的,XML数据是“平坦”的,可以把validator想象成一个LL(1)形式的parser,它通过查看当前的element数据的full name或uri来确定和什么element声明或element wildcard声明匹配,进而匹配结构声明。一个显然的可能是,声明具有二义性,比如:
element E1[?] as String;
element E1 as String;
对于数据<p:E1 />,validator不知道该匹配第一个还是第二个声明。该二义性在XML Schema中叫做Unique Particle Attribution(UPA)。
Choice的声明也可能存在二义性:
choice{
    element E1 as String;
    seq{
        element E2[?] as String;
        element E1 as String;
    };
};
对于数据<p:E1 />,validator不知道该选择哪个分支。
Element wildcard也可能导致二义性:
element E1[?] as String;
wildcard any tryvalidate;
对于数据<p:E1 />,validator不知道该匹配第一个还是第二个声明。判断声明是否具有二义性是直观的,人肉扫描一次就能发现问题,编译器也会检查并报错。
若element声明了nullable annotation,那么element数据可以加上xsi:nil="true" attribute(prefix xsi代表"http://www.w3.org/2001/XMLSchema-instance"),则element数据的children必须为空,但可以有attributes。xsi:nil的含义取决于应用程序对它的解释。例:
element E1[nullable] as Int32;
XML数据:
<e0:E1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" xmlns:e0="http://schemas.example.com/projecta" />
Element的type可以是任意类型,可以是global type ref,也可以是local type,如果type是global type ref,那么element数据可以加上xsi:type attribute,显式指明element数据的类型,数据的类型必须等于或语义派生自声明的类型。前面提到global complex type可以声明为abstract,如果element声明的type是abstract global complex type,那么element数据必须加上xsi:type attribute指定一个具体的派生类型。例:
xnamespace {nsa} {
    type T1[abstract] extend Int32;
    type T2 extend T1;
    element GE1 as T1;
}
XML数据:
<e0:GE1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="e0:T2" xmlns:e0="http://schemas.example.com/projecta">123</e0:GE1>
GE1声明的类型是T1,因为它是抽象的,数据必须指定一个具体的派生类型T2。
Type可以派生,element也可以“派生”,这叫做substitution,只适用于global element。例:
xnamespace {nsa} {
    element GE1 as Int32;
    element GE2[substitute: GE1] as Int16;
    element GE3 as
        type
            children{
                elementref GE1;
            };
        ;
    ;
}
因为GE1能够被GE2所替代,下面的数据是合法的:
<e0:GE3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:e0="http://schemas.example.com/projecta">
    <e0:GE2 xsi:type="xs:short">123</e0:GE2>
</e0:GE3>
注意GE3的子element,声明的是GE1(的引用),但数据中却是GE2。
替代者的type必须等于或语义派生自被替代者的type。如同global complex type可以声明为abstract,global element也可以声明为abstract,则它必须被替代,上例中若给GE1加上abstract annotation,那么它(的名字)不能出现在XML数据中。
Global element的derivation prohibition annotation的有效值包括none, extend, restrict和all,若element中未声明,则使用其所在namespace中声明的值,若namespace中也未声明,则默认值是none,它用来禁止直接替代者的类型的语义派生,例:
type T1 extend Int32;
type T2 extend T1;
type T3 restrict T1;
element GE1[derivationprohibition: restrict] as T1;
element GE2[substitute: GE1] as T3;//ERROR: Restriction derivation prohibited
element GE3[substitute: GE1] as T2;
element GE4[substitute: GE3] as T3;//OK
注意,只是禁止直接替代者的类型派生。
Global complex type可以指定instance prohibition annotation,有效值包括none, extend, restrict和all,若type中未声明,则使用其所在namespace中声明的值,若namespace中也未声明,则默认值是none,它用来禁止XML数据中的xsi:type的值,例:
type T1[instanceprohibition: restrict] extend Int32;
type T2 extend T1;
type T3 restrict T2;
element GE1 as T1;
GE1的类型是T1,如果T1的instance prohibition值为none,那么其xsi:type的值可以是任意派生类,但上例的值是restrict,那么下面的XML数据将通不过验证:
<e0:GE1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="e0:T3" xmlns:e0="http://schemas.example.com/projecta">123</e0:GE1>
Global和local element也可以指定instance prohibition annotation,有效值包括none, extend, restrict, substitute和all,若element中未声明,则使用其所在namespace中声明的值,若namespace中也未声明,则默认值是none,substitute值只用于global element,用来禁止XML数据中的element替代,例:
xnamespace {nsa} {
    element GE1[instanceprohibition: substitute] as Int32;
    element GE2[substitute: GE1] as Int16;
    element GE3 as
        type
            children{
                elementref GE1;
            };
        ;
    ;
}
下面的数据将无法通过验证:
<e0:GE3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:e0="http://schemas.example.com/projecta">
    <e0:GE2 xsi:type="xs:short">123</e0:GE2>
</e0:GE3>
extend和restrict值用于global和local element,用来禁止其xsi:type的值,例:
type T1 extend Int32;
type T2 extend T1;
element GE1[instanceprohibition: extend] as T1;
上例中,T1并未指定instance prohibition,但element中指定了,下面的数据将通不过验证:
<e0:GE1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="e0:T2" xmlns:e0="http://schemas.example.com/projecta">123</e0:GE1>
若element的类型是complex child complex type,且该类型声明了mixed annotation,那么在XML数据中,该element的children可以混杂任意文本,例:
element GE1 as
    type[mixed]
        children{
            choice[*]{
                element E1 as Int32;
                element E2 as Int32;
            };
        };
    ;
;
下面的XML数据是合法的:
<e0:GE1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:e0="http://schemas.example.com/projecta">text0<e0:E1 xsi:type="xs:int">123</e0:E1>text1<e0:E2 xsi:type="xs:int">123</e0:E2>text2<e0:E1 xsi:type="xs:int">123</e0:E1></e0:GE1>
下面的XML数据也是合法的:
<e0:GE1 xmlns:e0="http://schemas.example.com/projecta">text</e0:GE1>
若element的类型是simple type或simple child complex type,可以为element指定default或fixed值,在XML数据中,若element的children为空,那么validator将为它自动添加声明的值,若children不为空且声明为fixed,则它必须等于声明的值,例:
element GE1[default: 123] as Int32;
对于下面的XML数据:
<e0:GE1 xmlns:e0="http://schemas.example.com/projecta" />
Validator会自动添加缺省值123到element数据的children。
若element和element wildcard的min occurrence为0,则称它为(effective)optional,child struct满足下面两个条件之一,则称它为effective optional:
  • min occurrence为0或没有成员
  • 对于sequence和unordered,所有成员都是effective optional;对于choice,至少有一个成员是effective optional
若element的type是complex child complex type,且其complex child为effective optional,那么在XML数据中,该element的children可以为空。若element的type是mixed complex child complex type,且其complex child为effective optional,那么可以为element指定default或fixed值,其值只能是string literal,表示其text children的缺省或固定值。例:
element GE1[default: "some text"] as
    type[mixed]
        children{
            choice{
                element E1[?] as Int32;
                element E2 as Int32;
            };
        };
    ;
;
上例中,element GE1的type是local mixed complex child complex type,其complex child即children(root child struct),在语义上它是一个sequence child struct,包含一个成员choice,因为它的成员E1是optional的,那么choice及children都是(effective) optional的,即GE1 element数据的children可为空。
Child struct member可以指定member name annotation,若指定了就是用指定的值,否则使用下面的方法获得缺省值:
  • seq:"Seq"
  • choice:"Choice"
  • unordered:"Unordered"
  • Local element:自己的名字
  • Element ref:global element的名字
  • Child struct ref:global child struct的名字
  • ElementWildcard:"Wildcard"
Child struct的所有直接成员的member name必须唯一,回忆前面提到的多个物理对象合并成一个逻辑对象,child struct member通过member name进行合并,合并是递归的,例:
type T1
    children{
        element E1 as Int32;
        seq{
            element E1 as Int32;
        };
        element E1[unqualified; membername: UE1] as Int32;
        seq[+; membername: Seq2]{
            element E1 as Int32;
            element E2 as Int32;
        };
    };
;
type T1
    children{
        element E1
            ##{
                //C# class members...
            }
        ;
        element[membername: UE1]
            ##{
                //C# class members...
            }
        ;
        seq[membername: Seq2]{}
            ##{
                //C# class members...
            }
        ;
    };
;
对于complex child complex type的扩展派生,子类的child struct包括继承自父类的child struct的成员和自己的成员,因为root child struct在语义上它是一个sequence child struct。对于complex child complex type的约束派生,子类的child struct将约束父类的child struct,子类声明的是约束父类child struct后的结果,子类的成员通过member name和父类的成员关联,存在如下的约束方法:
  • 相等或更窄的occurrence,若父类成员是effective optional的,可以去掉它,即不声明它
  • 省略父类choice struct的成员,若父类choice struct是effective optional的,则可以省略所有的成员,否则至少需要声明一个成员
  • 子类element的type等于或语义约束派生自父类element的type
  • Element wildcard的uris等于或更窄,validation mode等于或更严格
  • 将element wildcard约束成element,只要element的uri匹配于wildcard的uris
  • 将default value约束成fixed value
  • 将mixed type约束成non-mixed type,即去掉mixed annotation
例:
element GE1 as Int32;
type T1
    children{
        element E1[?] as Int32;
        elementref GE1;
        wildcard this tryvalidate[membername: W1];
    };
;
type T2 extend T1
    children{
        seq[*]{
            element E2 as Int32;
            element E3[?] as Int32;
        };
    };
;
type T3 restrict T2
    children{
        element GE1 as Int16;
        element W1 as Int32;
        seq[+]{
            element E2 as Int32;
        };
    };
;
可以看出,约束是在语义上进行的。
Element可以指定一到多个identity constraint,包括key, unique和keyref,等同于关系数据库中的primary key, unique key和foreign key,它们都需要被命名,在namespace中必须唯一。Key ref还需要指定被引用的key或unique的名字,但只能引用XML数据中的self or descendant element中的key或unique,否则在验证时会报错。Path expression是XPath expression的简化版,用来查询XML数据。考虑下面的XML数据:
<a:GE1 xmlns:a="http://schemas.example.com/projecta" xmlns:b="http://schemas.example.com/projectb">
  <b:E1>
    <a:E2>
      <a:E3>0</a:E3>
    </a:E2>
    <a:E2>
      <a:E3>1</a:E3>
    </a:E2>
    <a:E2>
      <a:E3>2</a:E3>
    </a:E2>
  </b:E1>
  <a:E1 A1="0">
    <E2>0</E2>
  </a:E3>
  <a:E1 A1="1">
    <E2>1</E2>
  </a:E3>
  <a:E1 A1="1">
    <E2>1</E2>
  </a:E3>
</a:GE1>
Path如同操作系统的文件路径,它由一到多个step组成,step间用/分隔。Step通常是child element或attribute的full name。以a:GE1为基点,下例的path选择了所有的a:E3:
{nsb}E1/{nsa}E2/{nsa}E3
先指定uri or alias,接着指定local name。下例选择所有的E2,其uri为empty:
{nsa}E1/{}E2
要选择attribute,在step前加@号,显然,attribute step只能是path的最后一个step。下例选择所有的A1:
{nsa}E1/@{}A1
Step可以是*,表示full name为任意值的child element或attribute,下例选择b:E1的孙子element a:E3,不管儿子element是什么full name
{nsb}E1/*/{nsa}E3
如果step的local name是*,表示选择符合其uri的local name为任意值的child element或attribute,下例选择b:E1的孙子element a:E3,只要儿子element的uri是{nsa}:
{nsb}E1/{nsa}*/{nsa}E3
Step的值若为.,表示self element。表示descendant element。.表示self and descendant element。下例选择所有uri为{nsa}的descendant element:
.**/{nsa}*
Path expression由一到多个path组成,它们间用|分隔,下例选择descendant elements下所有uri为{nsa}的child elements,以及b:E1自己及descendant elements的uri为empty的attributes:
**/{nsa}* | {nsb}E1/.**/@{}*
回顾上面的文法,as keyword后是identity path expression,=>号后是一到多个value path expression。Identity path expression的“基点”是声明identity constraint的element,它只能返回element,称该element为identity element。Value path expression的“基点”是identity element,且每个value path expression最多只能返回一个element或attribute,称它为value element or attribute。对于key,value path expression必须返回一个element或attribute,对于unique和keyref,value path expression可以返回空。Value element or attribute的值必须是simple type,这意味着element必须是simple type或simple child complex type,否则是null value。对于key,所有value的值必须是non-null,对于unique和keyref,若所有的value path expression返回空,或所有的value的值是null,则忽略该identity element。综上所述,identity path expression查询出0到多个identity element,再由value path expression从每个identity element查询出一到多个value,对于key,每个value必须为non-null,对于unique或keyref,至少要有一个value为non-null。对于key或unique,所有的value组的值必须唯一。对于keyref,首先,value组的个数要与被引用的key或unique的value组的个数相等,其次,value组的值要等于被引用value组的值。例:
element GE1 as SomeType
    key K1 as {nsb}E1/{nsa}E2/{nsa}E3 => .;
    keyref KR1 ref K1 as {nsa}E1 => @{}A1;
对于编译器生成的代码,和前面attribue的原理一样,这里只说我认为是重点的:
  • Global substituting element的class继承自substituted element的class,若global element指定了abstract annotation,那么生成的class也是abstract的
  • Element ref会生成一个ReferentialElement的property,类型是global element的类型,这体现了它是global element的引用这一本质
  • 如果child struct member的max occurrence大于1,编译器将生成一个list class,它包括一个顾名思义的CreateAndAddItem()方法。回顾上面的文法,前面加星号的code block是为list class准备的,不加星号的code block是为item class准备的
  • 对于choice struct member,和united simple type一样,向某个成员property赋值即选择了它,那么其它成员property将自动置为null
  • Example.EBusiness示例演示了如何在代码中使用identity constraint
  • 下例演示了对unordered child struct进行编程:
type T1
    children{
        unordered{
            element E1 as String;
            element E2 as String;
            element E3[?] as String;
        };
    };
    ##{
        public void Set(){
            var unordered = EnsureComplexChild().Ensure_Unordered();
            unordered.Ensure_E1().Value = "e1";
            unordered.Ensure_E3().Value = "e3";
            unordered.Ensure_E2().Value = "e2";
            //此时,member的顺序是声明的顺序:E1, E2, E3
            unordered.E1.SpecifiedOrder = 3;//值越大,越排在后面
            unordered.E2.SpecifiedOrder = 1;
            unordered.E3.SpecifiedOrder = 2;
            unordered.SortChildren();
            //此时,member的顺序是:E2, E3, E1
        }

    }
;

拾遗

因为编译器生成的代码大量使用C# nested class,很可能会导致很深的嵌套,那么引用被嵌套的class将极为不便,可以在code block中为生成的class取个别名,例:
xnamespace {"http://schemas.example.com/projecta"} [namespace: Example.ProjectA] {
    type T1
        children{
            element E1[*] as
                type
                    children{
                        element E2 as Int32
                            ## as Class1
                        ;
                    };
                ;
            ;
        };
    ;
}
编译器将生成这样的代码:
namespace Example.ProjectA {
    using Class1 = global::Example.ProjectA.T1.ComplexChildClass.E1_Class.ItemClass.TypeClass.ComplexChildClass.E2_Class;
    //...
}
编译器还为每个class生成了AsThis()静态方法:
private static <ClassFullName> AsThis(object o);
前面说到SDOM的节点会自动深克隆,如果你为节点添加了mutable ref field,那么你需要为它们实现深克隆,例:
type T1 restrict Int32
    ##{
        public List<string> StringList { get; private set; }
        public override X.Object DeepClone(){
            var obj = AsThis(base.DeepClone());
            if(StringList != null) obj.StringList = new List<string>(StringList);
            return obj;
        }
    }
;
可以override X.Object的下面两个方法实现自定义的预验证和后验证:
protected virtual bool TryValidating(X.Context context, bool fromValidate);
protected virtual bool TryValidated(X.Context context, bool success);
它们总会成对被调用,fromValidate参数若为true,表明该方法是由Object.TryValidate()触发的,否则是由<GlobalElement>.TryLoadAndValidate()静态方法触发的,对于前者,节点也许不合法,对于后者,MX runtime已经从XmlReader读取并验证了数据,并将数据赋值给了节点的property。对于应用程序的逻辑错误,即bug,这两个方法应抛出异常,对于用户的数据错误,应给向context添加diagnostic,若方法返回false,则应该添加error diagnostic,若方法返回true,则不应该添加error diagnostic,但可以添加warning或info diagnostic,例:
    ##{
        protected override bool TryValidating(X.Context context, bool fromValidate){
            var success = base.TryValidating(context, fromValidate);
            if(success){
                success = //进行自己的预验证
            }
            return success;
        }
        protected override bool TryValidated(X.Context context, bool success){
            success = base.TryValidated(context, success);
            if(success){
                success = //进行自己的后验证
            }
            //或log错误信息
            return success;
        }
    }

欢迎任何疑问和建议

Last edited Dec 8, 2014 at 2:22 PM by Knat, version 4