Skip to main content
  1. Articles/

Implementing Object in C++!

·1759 words·9 mins· ·
ykiko
Author
ykiko
Table of Contents
Reflection - This article is part of a series.
Part 3: This Article

Static vs Dynamic
#

The terms static typing and dynamic typing are familiar to many. The key distinction lies in the timing of type checking. What does this mean?

Consider the following C++ code:

std::string s = "123";
int a = s + 1;

We know that string cannot be directly added to int, so this should result in a TypeError. C++ checks for type errors at compile time, so this code will trigger a compile-time error.

Now, consider the corresponding Python code:

s = "123"
a = s + 1

Python, on the other hand, checks for errors at runtime, so the above code will actually produce a runtime error.

It’s important to emphasize the meanings of compile time and runtime in this context. These terms are often encountered, but their meanings can vary depending on the context. Here, we define them as follows:

  • Compile time: Refers to the period when code is being compiled into target code, before the program is executed.

    • For AOT (Ahead-Of-Time) compiled languages like C++, this is the process of compiling C++ into machine code.
    • For JIT (Just-In-Time) compiled languages like C#/Java, this generally refers to the process of compiling source code into IR (Intermediate Representation).
    • For transpiled languages like TypeScript, this is the process of compiling TypeScript into JavaScript.
  • Runtime: Refers to the period when the program is actually running, such as when machine code is executed on the CPU or bytecode is executed on a virtual machine.

Thus, C++, Java, C#, and TypeScript are considered statically typed languages. Python, although it also compiles source code into bytecode, does not perform type checking during this stage, so Python is considered a dynamically typed language.

However, this distinction is not absolute. The boundary between static and dynamic languages is not clear-cut. While C++, Java, C#, and TypeScript are statically typed, they provide various methods to bypass static type checking, such as pointer in C++, Object in Java/C#, and Any in TypeScript. Conversely, dynamically typed languages are increasingly incorporating static type checking, such as Python’s type hints and JavaScript’s TypeScript. Both paradigms are borrowing features from each other.

Currently, C++ only provides std::any for type erasure, but it is often not flexible enough. We desire more advanced features, such as accessing members by field names, calling functions by function names, and creating class instances by type names. The goal of this article is to construct a dynamic type in C++ similar to Java/C#’s Object.

Meta Types
#

Here, we do not adopt the intrusive design (inheritance) used in Java/C#’s Object. Instead, we use a non-intrusive design called fat pointer. A fat pointer is essentially a structure containing a pointer to the actual data and a pointer to type information. In the case of inheritance, the virtual table pointer resides in the object’s header.

class Any {
    Type* type;    // type info, similar to vtable
    void* data;    // pointer to the data
    uint8_t flag;  // special flag

public:
    Any() : type(nullptr), data(nullptr), flag(0) {}

    Any(Type* type, void* data) : type(type), data(data), flag(0B00000001) {}

    Any(const Any& other);
    Any(Any&& other);
    ~Any();

    template <typename T>
    Any(T&& value);  // box value to Any

    template <typename T>
    T& cast();  // unbox Any to value

    Type* GetType() const { return type; }  // get type info

    Any invoke(std::string_view name, std::span<Any> args);  // call method

    void foreach(const std::function<void(std::string_view, Any&)>& fn);  // iterate fields
};

The member functions will be implemented in subsequent sections. First, let’s consider what is stored in the Type type.

Meta Information
#

struct Type {
    std::string_view name;       // type name
    void (*destroy)(void*);      // destructor
    void* (*copy)(const void*);  // copy constructor
    void* (*move)(void*);        // move constructor

    using Field = std::pair<Type*, std::size_t>;           // type and offset
    using Method = Any (*)(void*, std::span<Any>);         // method
    std::unordered_map<std::string_view, Field> fields;    // field info
    std::unordered_map<std::string_view, Method> methods;  // method info
};

The content here is straightforward. We store the type name, destructor, move constructor, copy constructor, field information, and method information in Type. Field information includes the field type and name, while method information includes the method name and function address. If further expansion is desired, information about parent classes and overloaded functions can also be included. Since this is just an example, we will not consider them for now.

Function Type Erasure
#

To store member functions of different types in the same container, we must perform type erasure on the functions. All types of functions are erased to the type Any(*)(void*, std::span<Any>). Here, Any is the type we defined earlier, void* represents the this pointer, and std::span<Any> is the function’s parameter list. Now, we need to consider how to perform this function type erasure.

Take the member function say as an example:

struct Person {
    std::string_view name;
    std::size_t age;

    void say(std::string_view msg) { std::cout << name << " say: " << msg << std::endl; }
};

First, for convenience, let’s implement Any’s cast:

template <typename T>
Type* type_of();  // type_of<T> returns type info of T

template <typename T>
T& Any::cast() {
    if(type != type_of<T>()) {
        throw std::runtime_error{"type mismatch"};
    }
    return *static_cast<T*>(data);
}

Using the feature in C++ that allows non-capturing lambda to implicitly convert to function pointers, we can easily implement this erasure.

auto f = +[](void* object, std::span<Any> args) {
    auto& self = *static_cast<Person*>(object);
    self.say(args[0].cast<std::string_view>());
    return Any{};
};

The principle is simple: write a wrapper function to perform type conversion and forward the call. However, writing such forwarding code for each member function is cumbersome. We can consider using template metaprogramming to generate the code automatically, simplifying the type erasure process.

template <typename T>
struct member_fn_traits;

template <typename R, typename C, typename... Args>
struct member_fn_traits<R (C::*)(Args...)> {
    using return_type = R;
    using class_type = C;
    using args_type = std::tuple<Args...>;
};

template <auto ptr>
auto* type_ensure() {
    using traits = member_fn_traits<decltype(ptr)>;
    using class_type = typename traits::class_type;
    using result_type = typename traits::return_type;
    using args_type = typename traits::args_type;

    return +[](void* object, std::span<Any> args) -> Any {
        auto self = static_cast<class_type*>(object);
        return [=]<std::size_t... Is>(std::index_sequence<Is...>) {
            if constexpr(std::is_void_v<result_type>) {
                (self->*ptr)(args[Is].cast<std::tuple_element_t<Is, args_type>>()...);
                return Any{};
            } else {
                return Any{(self->*ptr)(args[Is].cast<std::tuple_element_t<Is, args_type>>()...)};
            }
        }(std::make_index_sequence<std::tuple_size_v<args_type>>{});
    };
}

I won’t explain the code here. If you don’t understand, it’s okay. Essentially, it automates the process of type erasure for member functions using template metaprogramming. Just know how to use it. Using it is very simple. Here, &Person::say is the pointer-to-member syntax. If you’re not familiar, refer to C++ Member Pointers Explained.

auto f = type_ensure<&Person::say>();
// decltype(f) => Any (*)(void*, std::span<Any>)

Type Information Registration
#

In fact, we need to generate a corresponding Type structure for each type to store its information, so that it can be accessed correctly. This functionality is handled by the type_of function mentioned earlier.

template <typename T>
Type* type_of() {
    static Type type;
    type.name = typeid(T).name();
    type.destroy = [](void* obj) { delete static_cast<T*>(obj); };
    type.copy = [](const void* obj) { return (void*)(new T(*static_cast<const T*>(obj))); };
    type.move = [](void* obj) { return (void*)(new T(std::move(*static_cast<T*>(obj)))); };
    return &type;
}

template <>
Type* type_of<Person>() {
    static Type type;
    type.name = "Person";
    type.destroy = [](void* obj) { delete static_cast<Person*>(obj); };
    type.copy = [](const void* obj) {
        return (void*)(new Person(*static_cast<const Person*>(obj)));
    };
    type.move = [](void* obj) {
        return (void*)(new Person(std::move(*static_cast<Person*>(obj))));
    };
    type.fields.insert({"name", {type_of<std::string_view>(), offsetof(Person, name)}});
    type.fields.insert({"age", {type_of<std::size_t>(), offsetof(Person, age)}});
    type.methods.insert({"say", type_ensure<&Person::say>()});
    return &type;
};

We provide a default implementation so that built-in basic types can automatically register some information. Then, we can specialize for custom types. Now, with this meta information, we can complete the implementation of Any’s member functions.

Complete Implementation of Any
#

Any::Any(const Any& other) {
    type = other.type;
    data = type->copy(other.data);
    flag = 0;
}

Any::Any(Any&& other) {
    type = other.type;
    data = type->move(other.data);
    flag = 0;
}

template <typename T>
Any::Any(T&& value) {
    type = type_of<std::decay_t<T>>();
    data = new std::decay_t<T>(std::forward<T>(value));
    flag = 0;
}

Any::~Any() {
    if(!(flag & 0B00000001) && data && type) {
        type->destroy(data);
    }
}

void Any::foreach(const std::function<void(std::string_view, Any&)>& fn) {
    for(auto& [name, field]: type->fields) {
        Any any = Any{field.first, static_cast<char*>(data) + field.second};
        fn(name, any);
    }
}

Any Any::invoke(std::string_view name, std::span<Any> args) {
    auto it = type->methods.find(name);
    if(it == type->methods.end()) {
        throw std::runtime_error{"method not found"};
    }
    return it->second(data, args);
}

The implementation of foreach is to iterate over all Fields, get the offset and type, and then wrap it into an Any type. Note that this is just a simple wrapper; since we set the flag, this wrapper will not cause multiple destructions. invoke is to find the corresponding function from the member function list and call it.

Example Code
#

int main() {
    Any person = Person{"Tom", 18};
    std::vector<Any> args = {std::string_view{"Hello"}};
    person.invoke("say", args);
    // => Tom say: Hello

    auto f = [](std::string_view name, Any& value) {
        if(value.GetType() == type_of<std::string_view>()) {
            std::cout << name << " = " << value.cast<std::string_view>() << std::endl;
        } else if(value.GetType() == type_of<std::size_t>()) {
            std::cout << name << " = " << value.cast<std::size_t>() << std::endl;
        }
    };

    person.foreach(f);
    // name = Tom
    // age = 18
    return 0;
}

The complete code is available on Github. With this, we have implemented a highly dynamic, non-intrusive Any.

Extensions and Optimizations
#

This article provides a very simple introduction to the principles, and the scenarios considered are also very simple. For example, inheritance and function overloading are not considered, and there are several areas where runtime efficiency can be optimized. Nevertheless, the functionality I’ve implemented may still be more than you need. The main point of this article is to express that for a performance-focused language like C++, there are indeed scenarios where such dynamic features are needed. However, efficiency and generality are often contradictory. At the language level, because generality must be considered, efficiency often falls short. For example, RTTI and dynamic_cast are often complained about, but fortunately, compilers provide options to disable them. Similarly, my implementation may not fully fit your scenario, but understanding this not-too-difficult principle allows you to implement a version more suitable for your needs.

Possible extensions:

  • Support modifying members by name.
  • Add a global map to record information for all types, enabling the creation of class instances by class name.
  • ...

Possible optimizations:

  • Reduce the number of new calls, or implement an object pool.
  • Currently, too much meta information is stored; you can tailor it according to your needs.

Additionally, a current pain point is that all this meta information must be written manually, making it difficult to maintain. If the class definition is modified, the registration code must also be modified, or errors will occur. A practical solution is to use a code generator to automatically generate this mechanical code. For more on how to do this, refer to other articles in this series.

Reflection - This article is part of a series.
Part 3: This Article