Getting started =============== Installation ------------ `vtjson` is available via pip: .. code-block:: console $ pip install vtjson Tutorial -------- .. testsetup:: * from vtjson import email, ge, intersect, make_type, regex, safe_cast, skip_first, url, validate Here is a simple schema: .. testcode:: book_schema = { "title": str, "authors": [str, ...], "editor?": str, "year": int, } The following conventions were used: * As in typescript, a (string) key ending in `?` represents an optional key. The corresponding schema (the item the key points to) will only be used for validation when the key is present in the object that should be validated. A key can also be made optional by wrapping it as :py:func:`vtjson.optional_key`. * If in a list/tuple the last entry is `...` (ellipsis) it means that the next to last entry will be repeated zero or more times. In this way generic types can be created. For example the schema `[str, ...]` represents a list of strings. Let's try to validate some book objects: .. testcode:: good_book = { "title": "Gone with the Wind", "authors": ["Margaret Mitchell"], "year": 1936, } bad_book = { "title": "Gone with the Wind", "authors": ["Margaret Mitchell"], "year": "1936", } validate(book_schema, good_book, name="good_book") validate(book_schema, bad_book, name="bad_book") As expected `vtjson` throws an exception for the second object: .. testoutput:: Traceback (most recent call last): ... raise ValidationError(message) vtjson.vtjson.ValidationError: bad_book['year'] (value:'1936') is not of type 'int' We can turn the `book_schema` into a genuine Python type. .. testcode:: Book = make_type(book_schema) print(f"Is good_book an instance of Book? {isinstance(good_book, Book)}!") print(f"Is bad_book an instance of Book? {isinstance(bad_book, Book)}!") .. testoutput:: Is good_book an instance of Book? True! Is bad_book an instance of Book? False! We may also rewrite the `book_schema` as a valid Python type annotation. .. testcode:: from typing import NotRequired, TypedDict class book_schema(TypedDict): title: str authors: list[str] editor: NotRequired[str] year: int Attempting to validate the bad book raises a similar exception as before: .. testcode:: validate(book_schema, bad_book, name="bad_book") .. testoutput:: Traceback (most recent call last): ... raise ValidationError(message) vtjson.vtjson.ValidationError: bad_book is not of type 'book_schema': bad_book['year'] (value:'1936') is not of type 'int' :py:func:`vtjson.safe_cast` functions exactly like `cast` except that it also verifies at run time that the given object matches the given schema. .. testcode:: book2 = safe_cast(book_schema, good_book) book3 = safe_cast(book_schema, bad_book) The exception thrown is similar. .. testoutput:: Traceback (most recent call last): ... raise ValidationError(message) vtjson.vtjson.ValidationError: object is not of type 'book_schema': object['year'] (value:'1936') is not of type 'int' Schemas can of course be more complicated and in particular they can be nested .. testcode:: person_schema = { "name": regex("[a-zA-Z. ]*"), "email?": email, "website?": url, } book_schema = { "title": str, "authors": [person_schema, ...], "editor?": person_schema, "year": intersect(int, ge(1900)), } :py:class:`regex`, :py:class:`email` and :py:class:`url` are built-in schemas. See :ref:`builtins`. :py:class:`intersect` is a `wrapper`. See :ref:`wrappers`. :py:class:`ge` is a `modifier`. See :ref:`modifiers`. It should be obvious that the schema .. testcode:: intersect(int, ge(1900)) represents an integer greater or equal than 1900. Let's validate an object not fitting the schema. .. testcode:: bad_book = { "title": "Gone with the Wind", "authors": [{"name": "Margaret Mitchell", "email":"margaret@gmailcom"}], "year": "1936", } validate(book_schema, bad_book, name="bad_book") .. testoutput:: Traceback (most recent call last): ... raise ValidationError(message) vtjson.vtjson.ValidationError: bad_book['authors'][0]['email'] (value:'margaret@gmailcom') is not of type 'email': The part after the @-sign is not valid. It should have a period. As before we can rewrite the new `book_schema` as a valid type annotation .. testcode:: from typing import Annotated, NotRequired, TypedDict class person_schema(TypedDict): name: Annotated[str, regex("[a-zA-Z. ]*")] email: NotRequired[Annotated[str, email]] website: NotRequired[Annotated[str, url]] class book_schema(TypedDict): title: str authors: list[person_schema] editor: NotRequired[person_schema] year: Annotated[int, ge(1900)] Many constraints expressible in `vtjson` schemas cannot be expressed in the language of type annotations. That's where `typing.Annotated` comes in. Consider the following example: .. testcode:: Annotated[str, email] Type checkers such as `mypy` only see the `str` part of this schema, but `vtjson` sees everything. For more information see :ref:`type_annotations`. There is a small caveat here: :py:class:`email` in fact already checks that the object is a string. So as further explained in :ref:`type_annotations`, it is more efficient to write: .. testcode:: Annotated[str, email, skip_first] Here it makes little difference, but the gain in efficiency may be important for larger schemas. Let's check that validation also works with type annotations: .. testcode:: validate(book_schema, bad_book, name="bad_book") .. testoutput:: Traceback (most recent call last): ... raise ValidationError(message) vtjson.vtjson.ValidationError: bad_book is not of type 'book_schema': bad_book['authors'][0] is not of type 'person_schema': bad_book['authors'][0]['email'] (value:'margaret@gmailcom') is not of type 'email': The part after the @-sign is not valid. It should have a period. Real world examples ------------------- .. _example1: Example 1 ^^^^^^^^^ Below we give the schema of a recent version of the run object in the mongodb database underlying the Fishtest web application https://tests.stockfishchess.org/tests. For the latest version see https://raw.githubusercontent.com/official-stockfish/fishtest/master/server/fishtest/schemas.py. See :ref:`example2` for a version of this example that is compatible with Python type annotations. .. code-block :: python import copy import math from datetime import datetime, timezone from bson.objectid import ObjectId from vtjson import ( at_most_one_of, div, fields, ge, glob, gt, ifthen, intersect, ip_address, keys, lax, one_of, quote, regex, set_name, union, url, ) username = regex(r"[!-~][ -~]{0,30}[!-~]", name="username") net_name = regex("nn-[a-f0-9]{12}.nnue", name="net_name") tc = regex(r"([1-9]\d*/)?\d+(\.\d+)?(\+\d+(\.\d+)?)?", name="tc") str_int = regex(r"[1-9]\d*", name="str_int") sha = regex(r"[a-f0-9]{40}", name="sha") country_code = regex(r"[A-Z][A-Z]", name="country_code") run_id = set_name(ObjectId.is_valid, "run_id") uuid = regex(r"[0-9a-zA-Z]{2,}(-[a-f0-9]{4}){3}-[a-f0-9]{12}", name="uuid") epd_file = glob("*.epd", name="epd_file") pgn_file = glob("*.pgn", name="pgn_file") even = div(2, name="even") datetime_utc = intersect(datetime, fields({"tzinfo": timezone.utc})) uint = intersect(int, ge(0)) suint = intersect(int, gt(0)) ufloat = intersect(float, ge(0)) sufloat = intersect(float, gt(0)) def valid_results(R): l, d, w = R["losses"], R["draws"], R["wins"] R = R["pentanomial"] return ( l + d + w == 2 * sum(R) and w - l == 2 * R[4] + R[3] - R[1] - 2 * R[0] and R[3] + 2 * R[2] + R[1] >= d >= R[3] + R[1] ) zero_results = { "wins": 0, "draws": 0, "losses": 0, "crashes": 0, "time_losses": 0, "pentanomial": 5 * [0], } if_bad_then_zero_stats_and_not_active = ifthen( keys("bad"), lax({"active": False, "stats": quote(zero_results)}) ) def final_results_must_match(run): rr = copy.deepcopy(zero_results) for t in run["tasks"]: r = t["stats"] for k in r: if k != "pentanomial": rr[k] += r[k] else: for i, p in enumerate(r["pentanomial"]): rr[k][i] += p if rr != run["results"]: raise Exception( f"The final results {run['results']} do not match the computed results {rr}" ) else: return True def cores_must_match(run): cores = 0 for t in run["tasks"]: if t["active"]: cores += t["worker_info"]["concurrency"] if cores != run["cores"]: raise Exception( f"Cores mismatch. Cores from tasks: {cores}. Cores from " f"run: {run['cores']}" ) return True def workers_must_match(run): workers = 0 for t in run["tasks"]: if t["active"]: workers += 1 if workers != run["workers"]: raise Exception( f"Workers mismatch. Workers from tasks: {workers}. Workers from " f"run: {run['workers']}" ) return True valid_aggregated_data = intersect( final_results_must_match, cores_must_match, workers_must_match, ) worker_info_schema = { "uname": str, "architecture": [str, str], "concurrency": suint, "max_memory": uint, "min_threads": suint, "username": str, "version": uint, "python_version": [uint, uint, uint], "gcc_version": [uint, uint, uint], "compiler": union("clang++", "g++"), "unique_key": uuid, "modified": bool, "ARCH": str, "nps": ufloat, "near_github_api_limit": bool, "remote_addr": ip_address, "country_code": union(country_code, "?"), } results_schema = intersect( { "wins": uint, "losses": uint, "draws": uint, "crashes": uint, "time_losses": uint, "pentanomial": [uint, uint, uint, uint, uint], }, valid_results, ) runs_schema = intersect( { "_id?": ObjectId, "version": uint, "start_time": datetime_utc, "last_updated": datetime_utc, "tc_base": ufloat, "base_same_as_master": bool, "rescheduled_from?": run_id, "approved": bool, "approver": union(username, ""), "finished": bool, "deleted": bool, "failed": bool, "is_green": bool, "is_yellow": bool, "workers": uint, "cores": uint, "results": results_schema, "results_info?": { "style": str, "info": [str, ...], }, "args": intersect( { "base_tag": str, "new_tag": str, "base_nets": [net_name, ...], "new_nets": [net_name, ...], "num_games": intersect(uint, even), "tc": tc, "new_tc": tc, "book": union(epd_file, pgn_file), "book_depth": str_int, "threads": suint, "resolved_base": sha, "resolved_new": sha, "master_sha": sha, "official_master_sha": sha, "msg_base": str, "msg_new": str, "base_options": str, "new_options": str, "info": str, "base_signature": str_int, "new_signature": str_int, "username": username, "tests_repo": url, "auto_purge": bool, "throughput": ufloat, "itp": ufloat, "priority": float, "adjudication": bool, "sprt?": intersect( { "alpha": 0.05, "beta": 0.05, "elo0": float, "elo1": float, "elo_model": "normalized", "state": union("", "accepted", "rejected"), "llr": float, "batch_size": suint, "lower_bound": -math.log(19), "upper_bound": math.log(19), "lost_samples?": uint, "illegal_update?": uint, "overshoot?": { "last_update": uint, "skipped_updates": uint, "ref0": float, "m0": float, "sq0": ufloat, "ref1": float, "m1": float, "sq1": ufloat, }, }, one_of("overshoot", "lost_samples"), ), "spsa?": { "A": ufloat, "alpha": ufloat, "gamma": ufloat, "raw_params": str, "iter": uint, "num_iter": uint, "params": [ { "name": str, "start": float, "min": float, "max": float, "c_end": sufloat, "r_end": ufloat, "c": sufloat, "a_end": ufloat, "a": ufloat, "theta": float, }, ..., ], "param_history?": [ [ {"theta": float, "R": ufloat, "c": ufloat}, ..., ], ..., ], }, }, at_most_one_of("sprt", "spsa"), ), "tasks": [ intersect( { "num_games": intersect(uint, even), "active": bool, "last_updated": datetime_utc, "start": uint, "residual?": float, "residual_color?": str, "bad?": True, "stats": results_schema, "worker_info": worker_info_schema, }, if_bad_then_zero_stats_and_not_active, ), ..., ], "bad_tasks?": [ { "num_games": intersect(uint, even), "active": False, "last_updated": datetime_utc, "start": uint, "residual": float, "residual_color": str, "bad": True, "task_id": uint, "stats": results_schema, "worker_info": worker_info_schema, }, ..., ], }, lax(ifthen({"approved": True}, {"approver": username}, {"approver": ""})), lax(ifthen({"is_green": True}, {"is_yellow": False})), lax(ifthen({"is_yellow": True}, {"is_green": False})), lax(ifthen({"failed": True}, {"finished": True})), lax(ifthen({"deleted": True}, {"finished": True})), lax(ifthen({"finished": True}, {"workers": 0, "cores": 0})), lax(ifthen({"finished": True}, {"tasks": [{"active": False}, ...]})), valid_aggregated_data, ) .. _example2: Example 2 ^^^^^^^^^ This is a rewrite of :ref:`example1` that is compatible with Python type annotations. .. code-block :: python import copy import math from datetime import datetime, timezone from typing import Annotated, Literal, NotRequired, TypedDict from bson.objectid import ObjectId from vtjson import ( at_most_one_of, div, fields, ge, glob, gt, ifthen, intersect, ip_address, keys, lax, one_of, quote, regex, skip_first, url, ) username = Annotated[str, regex(r"[!-~][ -~]{0,30}[!-~]", name="username"), skip_first] net_name = Annotated[str, regex("nn-[a-f0-9]{12}.nnue", name="net_name"), skip_first] tc = Annotated[ str, regex(r"([1-9]\d*/)?\d+(\.\d+)?(\+\d+(\.\d+)?)?", name="tc"), skip_first ] str_int = Annotated[str, regex(r"[1-9]\d*", name="str_int"), skip_first] sha = Annotated[str, regex(r"[a-f0-9]{40}", name="sha"), skip_first] country_code = Annotated[str, regex(r"[A-Z][A-Z]", name="country_code"), skip_first] run_id = Annotated[str, ObjectId.is_valid] uuid = Annotated[ str, regex(r"[0-9a-zA-Z]{2,}(-[a-f0-9]{4}){3}-[a-f0-9]{12}", name="uuid"), skip_first, ] epd_file = Annotated[str, glob("*.epd", name="epd_file"), skip_first] pgn_file = Annotated[str, glob("*.pgn", name="pgn_file"), skip_first] even = Annotated[int, div(2, name="even"), skip_first] datetime_utc = Annotated[datetime, fields({"tzinfo": timezone.utc})] uint = Annotated[int, ge(0)] suint = Annotated[int, gt(0)] ufloat = Annotated[float, ge(0)] sufloat = Annotated[float, gt(0)] class results_type(TypedDict): wins: uint losses: uint draws: uint crashes: uint time_losses: uint pentanomial: Annotated[list[int], [uint, uint, uint, uint, uint], skip_first] def valid_results(R: results_type) -> bool: l, d, w = R["losses"], R["draws"], R["wins"] Rp = R["pentanomial"] return ( l + d + w == 2 * sum(Rp) and w - l == 2 * Rp[4] + Rp[3] - Rp[1] - 2 * Rp[0] and Rp[3] + 2 * Rp[2] + Rp[1] >= d >= Rp[3] + Rp[1] ) results_schema = Annotated[ results_type, valid_results, ] class worker_info_schema(TypedDict): uname: str architecture: Annotated[list[str], [str, str], skip_first] concurrency: suint max_memory: uint min_threads: suint username: str version: uint python_version: Annotated[list[int], [uint, uint, uint], skip_first] gcc_version: Annotated[list[int], [uint, uint, uint], skip_first] compiler: Literal["clang++", "g++"] unique_key: uuid modified: bool ARCH: str nps: ufloat near_github_api_limit: bool remote_addr: Annotated[str, ip_address] country_code: country_code | Literal["?"] class overshoot_type(TypedDict): last_update: uint skipped_updates: uint ref0: float m0: float sq0: ufloat ref1: float m1: float sq1: ufloat class sprt_type(TypedDict): alpha: Annotated[float, 0.05, skip_first] beta: Annotated[float, 0.05, skip_first] elo0: float elo1: float elo_model: Literal["normalized"] state: Literal["", "accepted", "rejected"] llr: float batch_size: suint lower_bound: Annotated[float, -math.log(19), skip_first] upper_bound: Annotated[float, math.log(19), skip_first] lost_samples: NotRequired[uint] illegal_update: NotRequired[uint] overshoot: NotRequired[overshoot_type] sprt_schema = Annotated[ sprt_type, one_of("overshoot", "lost_samples"), ] class param_schema(TypedDict): name: str start: float min: float max: float c_end: sufloat r_end: ufloat c: sufloat a_end: ufloat a: ufloat theta: float class param_history_schema(TypedDict): theta: float R: ufloat c: ufloat class spsa_schema(TypedDict): A: ufloat alpha: ufloat gamma: ufloat raw_params: str iter: uint num_iter: uint params: list[param_schema] param_history: NotRequired[list[list[param_history_schema]]] class args_type(TypedDict): base_tag: str new_tag: str base_nets: list[net_name] new_nets: list[net_name] num_games: Annotated[uint, even] tc: tc new_tc: tc book: epd_file | pgn_file book_depth: str_int threads: suint resolved_base: sha resolved_new: sha master_sha: sha official_master_sha: sha msg_base: str msg_new: str base_options: str new_options: str info: str base_signature: str_int new_signature: str_int username: username tests_repo: Annotated[str, url, skip_first] auto_purge: bool throughput: ufloat itp: ufloat priority: float adjudication: bool sprt: NotRequired[sprt_schema] spsa: NotRequired[spsa_schema] args_schema = Annotated[ args_type, at_most_one_of("sprt", "spsa"), ] class task_type(TypedDict): num_games: Annotated[uint, even] active: bool last_updated: datetime_utc start: uint residual: float residual_color: NotRequired[str] bad: NotRequired[Literal[True]] stats: results_schema worker_info: worker_info_schema zero_results: results_type = { "wins": 0, "draws": 0, "losses": 0, "crashes": 0, "time_losses": 0, "pentanomial": 5 * [0], } if_bad_then_zero_stats_and_not_active = ifthen( keys("bad"), lax({"active": False, "stats": quote(zero_results)}) ) task_schema = Annotated[ task_type, if_bad_then_zero_stats_and_not_active, ] class bad_task_schema(TypedDict): num_games: Annotated[uint, even] active: Literal[False] last_updated: datetime_utc start: uint residual: float residual_color: str bad: Literal[True] task_id: uint stats: results_schema worker_info: worker_info_schema class results_info_schema(TypedDict): style: str info: list[str] class runs_type(TypedDict): _id: NotRequired[ObjectId] version: uint start_time: datetime_utc last_updated: datetime_utc tc_base: ufloat base_same_as_master: bool rescheduled_from: NotRequired[run_id] approved: bool approver: username | Literal[""] finished: bool deleted: bool failed: bool is_green: bool is_yellow: bool workers: uint cores: uint results: results_schema results_info: NotRequired[results_info_schema] args: args_schema tasks: list[task_schema] bad_tasks: NotRequired[list[bad_task_schema]] def final_results_must_match(run: runs_type) -> bool: rr = copy.deepcopy(zero_results) for t in run["tasks"]: r = t["stats"] # mypy does not support variable keys for # TypedDict rr["wins"] += r["wins"] rr["losses"] += r["losses"] rr["draws"] += r["draws"] rr["crashes"] += r["crashes"] rr["time_losses"] += r["time_losses"] for i, p in enumerate(r["pentanomial"]): rr["pentanomial"][i] += p if rr != run["results"]: raise Exception( f"The final results {run['results']} do not match the computed results {rr}" ) else: return True def cores_must_match(run: runs_type) -> bool: cores = 0 for t in run["tasks"]: if t["active"]: cores += t["worker_info"]["concurrency"] if cores != run["cores"]: raise Exception( f"Cores mismatch. Cores from tasks: {cores}. Cores from " f"run: {run['cores']}" ) return True def workers_must_match(run: runs_type) -> bool: workers = 0 for t in run["tasks"]: if t["active"]: workers += 1 if workers != run["workers"]: raise Exception( f"Workers mismatch. Workers from tasks: {workers}. Workers from " f"run: {run['workers']}" ) return True valid_aggregated_data = intersect( final_results_must_match, cores_must_match, workers_must_match, ) runs_schema = Annotated[ runs_type, lax(ifthen({"approved": True}, {"approver": username}, {"approver": ""})), lax(ifthen({"is_green": True}, {"is_yellow": False})), lax(ifthen({"is_yellow": True}, {"is_green": False})), lax(ifthen({"failed": True}, {"finished": True})), lax(ifthen({"deleted": True}, {"finished": True})), lax(ifthen({"finished": True}, {"workers": 0, "cores": 0})), lax(ifthen({"finished": True}, {"tasks": [{"active": False}, ...]})), valid_aggregated_data, ]