Apr 02

First off: this is real. It’s not my killer joke on the April fool day.

Today I published the first feature preview version of my personal project, called queryperf++. This is a framework for performance measurement on DNS server implementations, consisting of a customizable backend library and an executable measurement program.

It’s open source software distributed under a BSD compatible license. The latest tar ball is available at here, and the development repository is published at github.

The (current) major features of queryperf++ include:

  • It supports both UDP and TCP as the transport protocol. In particular, it can be used to measure server performance on AXFR or IXFR over TCP.
  • It allows multiple threads to send test queries to the tested server in parallel. It will help get more reliable results on performance of some very high-performance server implementations or in test scenarios that require more complicated tasks at the querier side.
  • It’s designed to be modular and extendable. The main tasks for the tests such as generating query and handling queries and responses are implemented as a separate library, and some part of them can be dynamically extended using user-defined C++ classes.
  • It supports IPv4 (not only IPv6) for legacy environments :-)

There are already several tools for this purpose, such as the queryperf, which is now distributed as a contributed tool for ISC BIND 9 (obviously queryperf++ was named after it), and DNSPerf and ResPerf provided by Nominum. They are pretty useful, and in fact I’ve been using them for my daily work. Yet I’ve not been fully satisfied with the existing tools due to the following issues:

  • Scalability: As the performance of DNS server implementations has been improved, especially exploiting multiple CPU cores, the performance of the measurement tool itself has become a limiting factor.
  • Lack of TCP support: while the vast majority of DNS transactions are carried over UDP, TCP is also used for some important tasks such as zone transfers. But existing tools can only measure performance for UDP queries.
  • Difficulty in extension: existing tools are generally written as a monolithic single program, and often hardcoding low level protocol logic in the main code (e.g., that the first 2 octets of the wire-format data are QID). So it’s difficult to extend or customize the tools for newer demand, such as for supporting other types of requests than ordinary queries.

As such, queryerperf++ naturally fills these gaps as summarized at the beginning of this article (to tell the truth, until now I’ve not realized the latest version of DNSPerf, just released last month, is now multi-threaded. I must admit I was disappointed in that some part of my efforts may now be moot. But other issues still seem to be open).

There’s also secret agenda of mine: I wanted ISC BIND 10 (the project I’m currently working on as my official job) to be used by a lot of more people. One of the major goals of BIND 10 is to provide a general platform for third party developers to build their own DNS related tools. queryperf++ internally uses BIND 10’s C++ DNS library, libdns++, intending to be one of such “third party” applications. Hopefully the unique features of queryperf++ are interesting to others, and it will help entice them to try BIND 10 (at least they need to build and install BIND 10 libdns++ to use queryperf++ :-).

Scalability and Performance

To see whether queryperf++ is really helpful in terms of scalability, I conducted some simple experiments, comparing queryperf and queryperf++ under the same measurement setups.

I configured BIND 9.9.0 named and BIND 10 authoritative server acting as a root server (using a sample of real root zone data) on a multi-core machine (at least 4 cores are available), and measured maximum number of queries they can handle without dropping queries by repeating the same query: www.example.com/A. The queries did not include EDNS0 OPT RR, so the response is a 505 octets of message, containing 13 NS RRs in the authority section and 14 RRs (mixture of AAAA and A) in the additional section. I measured the max QPS with changing the number of CPU cores the server used from 1 to 4. It’s controlled by the number of worker threads for BIND 9, and by the number of process instances of the authoritative server for BIND 10.

The measurement tools ran on a separate machine that had two CPU cores, directly connected to the server host via a GbE link. I used two threads for queryperf++, and doubled the number of outstanding queries (20 to 40) for queryperf to make the condition as compatible as possible (and, in fact, increasing the number of outstanding queries seemed to affect the result).

The following graph summarizes the measurement results.

I wish I could also note that BIND 10 is generally faster than BIND 9, but that’s a different topic :-). The important point in this context is the result for BIND 10 with 4 processes. In other cases the results were generally compatible whether it was done by queryperf or queryperf++, but in this specific case there was a clear gap. At this point queryperf consumed nearly 100% of CPU time, so it’s quite likely to hit its own performance limitation. If we were careless, we could misunderstand the result and interpret it means BIND 10 had a scalability issue with around 4 cores or more; as queryperf++ shows, the fact is that it was actually a limitation of the measurement tool.

Extendability

As for extendability, I’ll show one simple example using the library part of queryperf++. The following 13-line program can work as a complete measurement tool that sends an AXFR query for the root zone over TCP to [::1]:53 for 30 seconds, and shows the performance in average qps for the period (of course several header files need to be included, too).

int
main() {
    std::stringstream input_stream(". AXFR");
    Queryperf::Dispatcher disp(input_stream);
    disp.setProtocol(IPPROTO_TCP);
    disp.run();
    const boost::posix_time::time_duration duration =
        disp.getEndTime() - disp.getStartTime();
    const double qps = (static_cast<double>(disp.getQueriesCompleted()) /
                        (duration.total_microseconds() / 1000000));
    std::cout << "AXFR queries per second: " << qps << " qps" << std::endl;
    return (0);
}

This is a boring example and doesn't do anything that cannot be done by the queryperf++ program, but the point is that such a customized tool can be built without modifying the source code. It should also be possible to develop more meaningful derivatives using the queryperf++ library interfaces.

コメントを投稿 / Submit Comments



(あれば / Optional):