Category: http/2

  • Back to the Future with Cross-Context Dispatch

    Cross-Context Dispatch reintroduced to Jetty-12

    With the release of Jetty 12.0.8, we’re excited to announce the (re)implementation of a somewhat maligned and deprecated feature: Cross-Context Dispatch. This feature, while having been part of the Servlet specification for many years, has seen varied levels of use and support. Its re-introduction in Jetty 12.0.8, however, marks a significant step forward in our commitment to supporting the diverse needs of our users, especially those with complex legacy and modern web applications.

    Understanding Cross-Context Dispatch

    Cross-Context Dispatch allows a web application to forward requests to or include responses from another web application within the same Jetty server. Although it has been available as part of the Servlet specification for an extended period, it was deemed optional with Servlet 6.0 of EE10, reflecting its status as a somewhat niche feature.

    Initially, Jetty 12 moved away from supporting Cross-Context Dispatch, driven by a desire to simplify the server architecture amidst substantial changes, including support for multiple environments (EE8, EE9, and EE10). These updates mean Jetty can now deploy web applications using either the javax namespace (EE8) or the jakarta namespace (EE9 and EE10), all using the latest optimized jetty core implementations of HTTP: v1, v2 or v3.

    Reintroducing Cross-Context Dispatch

    The decision to reintegrate Cross-Context Dispatch in Jetty 12.0.8 was influenced significantly by the needs of our commercial clients, some who still leveraging this feature in their legacy applications. Our commitment to supporting our clients’ requirements, including the need to maintain and extend legacy systems, remains a top priority.

    One of the standout features of the newly implemented Cross-Context Dispatch is its ability to bridge applications across different environments. This means a web application based on the javax namespace (EE8) can now dispatch requests to, or include responses from, a web application based on the jakarta namespace (EE9 or EE10). This functionality opens up new pathways for integrating legacy applications with newer, modern systems.

    Looking Ahead

    The reintroduction of Cross-Context Dispatch in Jetty 12.0.8 is more than just a nod to legacy systems; it can be used as a bridge to the future of Java web development. By allowing for seamless interactions between applications across different Servlet environments, Jetty-12 opens the possibility of incremental migration away from legacy web applications.

  • UnixDomain Support in Jetty

    UnixDomain sockets support was added in Jetty 9.4.0, back in 2015, based on the JNR UnixSocket library.

    The support for UnixDomain sockets with JNR was experimental, and has remained so until now.

    In Jetty 10.0.7/11.0.7 we re-implemented support for UnixDomain sockets based on JEP 380, which shipped with Java 16.

    We have kept the source compatibility at Java 11 and used a little bit of Java reflection to access the new APIs introduced by JEP 380, so that Jetty 10/11 can still be built with Java 11.
    However, if you run Jetty 10.0.7/11.0.7 or later with Java 16 or later, then you will be able to use UnixDomain sockets based on JEP 380.

    The UnixDomain implementation from Java 16 is very stable, so we have switched our own website to use it.
    The page that you are reading right now has been requested by your browser and processed on the server by Jetty using Jetty’s HttpClient to send the request via UnixDomain sockets to our local WordPress.

    We have therefore deprecated the old Jetty modules based on JNR in favor of the new Jetty modules based on JEP 380.

    Note that since UnixDomain sockets are an alternative to TCP network sockets, any TCP-based protocol can be carried via UnixDomain sockets: HTTP/1.1, HTTP/2 and FastCGI.

    We have improved the documentation to detail how to use the new APIs introduced to support JEP 380, for the client and for the server.
    If you are configuring Jetty behind a load balancer (or Apache HTTPD or Nginx) you can now use UnixDomain sockets to communicate from the load balancer to Jetty, as explained in this section of the documentation.

    Enjoy!

  • Introducing Jetty Load Generator

    The Jetty Project just released the Jetty Load Generator, a Java 11+ library to load-test any HTTP server, that supports both HTTP/1.1 and HTTP/2.
    The project was born in 2016, with specific requirements. At the time, very few load-test tools had support for HTTP/2, but Jetty’s HttpClient did. Furthermore, few tools supported web-page like resources, which were important to model in order to compare the multiplexed HTTP/2 behavior (up to ~100 concurrent HTTP/2 streams on a single connection) against the HTTP/1.1 behavior (6-8 connections). Lastly, we were more interested in measuring quality of service, rather than throughput.
    The Jetty Load Generator generates requests asynchronously, at a specified rate, independently from the responses. This is the Jetty Load Generator core design principle: we wanted the request generation to be constant, and measure response times independently from the request generation. In this way, the Jetty Load Generator can impose a specific load on the server, independently of the network round-trip and independently of the server-side processing time. Adding more load generators (on the same machine if it has spare capacity, or using additional machines) will allow the load against the server to increase linearly.
    Using this core principle, you can setup the load testing by having N load generator loaders that impose the load on the server, and 1 load generator probe that imposes a very light load and measures response times.
    For example, you can have 4 loaders that impose 20 requests/s each, for a total of 80 requests/s seen by the server. With this load on the server, what would be the experience, in terms of response times, of additional users that make requests to the server? This is exactly what the probe measures.
    If the load on the server is increased to 160 requests/s, what would the probe experience? The same response times? Worse? And what are the probe response times if the load on the server is increased to 240 requests/s?
    Rather than trying to measure some form of throughput (“what is the max number of requests/s the server can sustain?”), the Jetty Load Generator measures the quality of service seen by the probe, as the load on the server increases. This is, in practice, what matters most for HTTP servers: knowing that, when your server has a load of 1024 requests/s, an additional user can still see response times that are acceptable. And knowing how the quality of service changes as the load increases.
    The Jetty Load Generator builds on top of Jetty’s HttpClient features, and offers:

    • A builder-style Java API, to embed the load generator into your own code and to have full access to all events emitted by the load generator
    • A command-line tool, similar to Apache’s ab or wrk2, with histogram reporting, for ease of use, scripting, and integration with CI servers.

    Download the latest command-line tool uber-jar from: https://repo1.maven.org/maven2/org/mortbay/jetty/loadgenerator/jetty-load-generator-starter/

    $ cd /tmp
    $ curl -O https://repo1.maven.org/maven2/org/mortbay/jetty/loadgenerator/jetty-load-generator-starter/1.0.2/jetty-load-generator-starter-1.0.2-uber.jar
    

    Use the --help option to display the available command line options:

    $ java -jar jetty-load-generator-starter-1.0.2-uber.jar --help
    

    Then run it, for example:

    $ java -jar jetty-load-generator-starter-1.0.2-uber.jar --scheme https --host your_server --port 443 --resource-rate 1 --iterations 60 --display-stats
    

    You will obtain an output similar to the following:

    ----------------------------------------------------
    -------------  Load Generator Report  --------------
    ----------------------------------------------------
    https://your_server:443 over http/1.1
    resource tree     : 1 resource(s)
    begin date time   : 2021-02-02 15:38:39 CET
    complete date time: 2021-02-02 15:39:39 CET
    recording time    : 59.657 s
    average cpu load  : 3.034/1200
    histogram:
    @                     _  37 ms (0, 0.00%)
    @                     _  75 ms (0, 0.00%)
    @                     _  113 ms (0, 0.00%)
    @                     _  150 ms (0, 0.00%)
    @                     _  188 ms (0, 0.00%)
    @                     _  226 ms (0, 0.00%)
    @                     _  263 ms (0, 0.00%)
    @                     _  301 ms (0, 0.00%)
                       @  _  339 ms (46, 76.67%) ^50%
       @                  _  376 ms (7, 11.67%) ^85%
      @                   _  414 ms (5, 8.33%) ^95%
    @                     _  452 ms (1, 1.67%)
    @                     _  489 ms (0, 0.00%)
    @                     _  527 ms (0, 0.00%)
    @                     _  565 ms (0, 0.00%)
    @                     _  602 ms (0, 0.00%)
    @                     _  640 ms (0, 0.00%)
    @                     _  678 ms (0, 0.00%)
    @                     _  715 ms (0, 0.00%)
    @                     _  753 ms (1, 1.67%) ^99% ^99.9%
    response times: 60 samples | min/avg/50th%/99th%/max = 303/335/318/753/753 ms
    request rate (requests/s)  : 1.011
    send rate (bytes/s)        : 189.916
    response rate (responses/s): 1.006
    receive rate (bytes/s)     : 41245.797
    failures          : 0
    response 1xx group: 0
    response 2xx group: 60
    response 3xx group: 0
    response 4xx group: 0
    response 5xx group: 0
    ----------------------------------------------------
    

    Use the Jetty Load Generator for your load testing, and report comments and issues at https://github.com/jetty-project/jetty-load-generator. Enjoy!

  • Reactive HttpClient 1.1.5, 2.0.0 and 3.0.0

    Following the releases of Eclipse Jetty 10.0.0 and 11.0.0, the Reactive HttpClient project — introduced back in 2017 — has released versions 1.1.5, 2.0.0 and 3.0.0.

    Reactive HttpClient 1.1.x Series

    Reactive HttpClient Versions 1.1.x, of which the latest is the newly released 1.1.5, requires at least Java 8 and it is based on Jetty 9.4.x.
    This version will be maintained as long as Jetty 9.4.x is maintained, likely many more years, to allow migration away from Java 8.

    Reactive HttpClient 2.0.x Series

    Reactive HttpClient Versions 2.0.x, with the newly released 2.0.0, requires at least Java 11 and it is based on Jetty 10.0.x.
    The Reactive HttpClient 2.0.x series is incompatible with the 1.1.x series, since the Jetty HttpClient APIs changed between Jetty 9.4.x and Jetty 10.0.x.
    This means that projects such as Spring WebFlux, at the time of this writing, are not compatible with the 2.0.x series of Reative HttpClient.

    Reactive HttpClient 3.0.x Series

    Reactive HttpClient Versions 3.0.x, with the newly released 3.0.0, requires at least Java 11 and it is based on Jetty 11.0.x.
    In turn, Jetty 11.0.x is based on the Jakarta EE 9 Specifications, which means jakarta.servlet and not javax.servlet.
    The Reactive HttpClient 3.0.x series is fundamentally identical to the 2.0.x series, apart from the Jetty dependency.
    While the HttpClient APIs do not change between Jetty 10 and Jetty 11, if you are using Jakarta EE 9 it will be more convenient to use the Reactive HttpClient 3.0.x series.
    For example when using Reactive HttpClient to call a third party service from within a REST service, it will be natural to use Reactive HttpClient 2.0.x if you use javax.ws.rs, and Reactive HttpClient 3.0.x if you use jakarta.ws.rs.
    Enjoy the new releases and tell us which series you use by adding a comment here!
    For further information, refer to the project page on GitHub.

  • Eat What You Kill without Starvation!

    Jetty 9 introduced the Eat-What-You-Kill[n]The EatWhatYouKill strategy is named after a hunting proverb in the sense that one should only kill to eat. The use of this phrase is not an endorsement of hunting nor killing of wildlife for food or sport.[/n] execution strategy to apply mechanically sympathetic techniques to the scheduling of threads in the producer-consumer pattern that are used for core capabilities in the server. The initial implementations proved vulnerable to thread starvation and Jetty-9.3 introduced dual scheduling strategies to keep the server running, which in turn suffered from lock contention on machines with more than 16 cores.  The Jetty-9.4 release now contains the latest incarnation of the Eat-What-You-Kill scheduling strategy which provides mechanical sympathy without the risk of thread starvation in a single strategy.  This blog is an update of the original post with the latest refinements.

    Parallel Mechanical Sympathy

    Parallel computing is a “false friend” for many web applications. The textbooks will tell you that parallelism is about decomposing large tasks into smaller ones that can be executed simultaneously by different computing engines to complete the task faster. While this is true, the issue is that for web application containers there is not an agreement on what is the “large task” that needs to be decomposed.

    From the applications point of view the large task to be solved is how to render a complex page for a user, combining multiple requests and resources, using many services for authentication and perhaps RESTful access to a data model on multiple back end servers. For the application, parallelism can improve quality of service of rendering a single page by spreading the decomposed tasks over all the available CPUs of the server.

    However, a web application container has a different large task to solve: how to provide service to hundreds or thousands, maybe even hundreds of thousands of simultaneous users. Unfortunately, for the container, the way to optimally allocate its this decomposed task to CPUs is completely opposite to how the application would like it’s decomposed tasks to be executed.

    Consider a server with 4 CPUs serving 4 users each which each have 4 tasks. The applications ideal view of parallel decomposition looks like:

    Label UxTy represent Task y for User x. Tasks for the same user are coloured alike

    This view suggests that each user’s combined task will be executed in minimum time. However some users must wait for prior users tasks to complete before their execution can start, so average latency is higher.

    Furthermore, we know from Mechanical Sympathy that such ideal execution is rarely possible, especially if there is data shared between tasks. Each CPU needs time to load its cache and register with data before it can be acted on. If that data is specific to the problem each user is trying to solve, then the real view of the parallel execution looks more like the following, the orange blocks indicating the time taken to load the CPU cache with user and task related data:

    Label UxTy represent Task y for User x. Tasks for the same user are coloured alike. Orange blocks represent cache load time.

    So from the containers point of view, the last thing it wants is the data from one users large problem spread over all its CPUs, because that means that when it executes the next task, it will have a cold cache and it must be reloaded with the data of the next user.  Furthermore, executing tasks for the same user on different CPUs risks Parallel Slowdown, where the cost of mutual exclusion, synchronisation and communication between CPUs can increase the total time needed to execute the tasks to more than serial execution.  If the tasks are fully mutually excluded on user data (unlikely but a bounding case), then the execution could look like:

    For optimal execution from the containers point of view it is far better if tasks from each user, which use common data, are kept on the same CPU so the cache only needs to be loaded once and there is no mutual exclusion on user data:

    While this style of execution does not achieve the minimal latency and throughput of the idealised application view, in reality it is the fairest and most optimal execution, with all users receiving similar quality of service and the optimal average latency.

    In summary, when scheduling the execution of parallel tasks, it is best to keep tasks that share data on the same CPU so that they may benefit from a hot cache (the original blog contains some micro benchmark results that quantifies the benefit).

    Produce Consume (PC)

    In order to facilitate the decomposition of large problems into smaller ones, the Jetty container uses the Producer-Consumer pattern:

    • The NIO Selector produces IO events that need to be consumed by reading, parsing and handling the data.
    • A multiplexed HTTP/2 connection produces Frames that need to be consumed by calling the Servlet Container. Note that the producer of HTTP/2 frames is itself a consumer of IO events!

    The producer-consumer pattern adds another way that tasks can be related by data. Not only might they be for the same user, but consuming a task will share the data that results from producing the task. A simple implementation can achieve this by using only a single CPU to both produce and consume the tasks:

    while (true)
    {
      Runnable task = _producer.produce();
      if (task == null)
        break;
       task.run();
    }

    The resulting execution pattern has good mechanical sympathy characteristics:

    Label UxPy represent Produce Task y for User x, Label UxCy represent Consume Task y for User x. Tasks for the same user are coloured in similar tones. Orange blocks are cache load times.

    Here all the produced tasks are immediately consumed on the same CPU with a hot cache!  Cache load times are minimised, but the cost is that server will suffer from Head of Line (HOL) Blocking, where the serial execution of task from a queue means that execution of tasks are forced to wait for the completion of unrelated tasks.  In this case tasks for U1C0 need not wait for U0C0 and U2C0 tasks need not wait for U1C1 or U0C1 etc. There is no parallel execution and thus this is not an optimal usage of the server resources.

    Produce Execute Consume (PEC)

    To solve the HOL blocking problem, multiple CPUs must be used so that produced tasks can be executed in parallel and even if one is slow or blocks, the other CPU can progress the other tasks.  To achieve this, a typical solution is to have one Thread executing on a CPU that will only produce tasks, which are then placed in a queue of tasks to be executed by Threads running on other CPUs.   Typically the task queue is abstracted into an Executor:

    while (true)
    {
        Runnable task = _producer.produce();
        if (task == null)
            break;
        _executor.execute(task);
    }

    This strategy could be considered the canonical solution to the producer consumer problem, where producers are separated from consumers by a queue and is at the heart of architectures such as SEDA. This strategy well solves the head of line blocking issue, since all tasks produced can complete independently in different Threads on different CPUs:

    This represents a good improvement in throughput and average latency over the simple Produce Consume, solution, but the cost is that every consumed task is executed on a different Thread (and thus likely a different CPU) from the one that produced the task.  While this may appear like a small cost for avoiding HOL blocking, our experience is that CPU cache misses significantly reduced the performance of early Jetty 9 releases.

    Eat What You Kill (EWYK) AKA Execute Produce Consume (EPC)

    To achieve good mechanical sympathy and avoid HOL blocking, Jetty has developed the Execute Produce Consume strategy, that we have nicknamed Eat What You Kill (EWYK) after the expression which states a hunter should only kill an animal they intend to eat. Applied to the producer consumer problem this policy says that a thread should only produce (kill) a task if it intends to consume (eat) it[n]The EatWhatYouKill strategy is named after a hunting proverb in the sense that one should only kill to eat. The use of this phrase is not an endorsement of hunting nor killing of wildlife for food or sport.[/n]. A task queue is still used to achieve parallel execution, but it is the producer that is dispatched rather than the produced task:

        while (true)
        {
            Runnable task = _producer.produce();
            if (task == null)
                break;
            _executor.execute(this); // dispatch production
            task.run(); // consume the task ourselves
        }

    The result is that a task is consumed by the same Thread, and thus likely the same CPU, that produced it, so that consumption is always done with a hot cache:

    Moreover, because any thread that completes consuming a task will immediately attempt to produce another task, there is the possibility of a single Thread/CPU executing multiple produce/consume cycles for the same user. The result is improved average latency and reduced total CPU time.

    Starvation!

    Unfortunately, a pure implementation of EWYK suffers from a fatal flaw! Since any thread producing a task will go on to consume that task,  it is possible for all threads/CPU to be consuming at once.   This was initially seen as a feature as it exerted good back pressure on the network as a busy server used all its resources consuming existing tasks rather than producing new tasks. However, in an application server consuming a task may be a blocking process that waits for more data/frames to be produced. Unfortunately if every thread/CPU ends up consuming such a blocking task, then there are no threads left available to produce the tasks to unblock them. Dead lock!

    A real example of this occurred with HTTP/2, when every Thread from the pool was blocked in a HTTP/2 request because it had used up its flow control window. The windows can be expanded by flow control frames from the other end, but there were no threads available to process the flow control frames!

    Thus the EWYK execution strategy used in Jetty is now adaptive and it can can use the most appropriate of the three strategies outlined above, ensuring there is always at least one thread/CPU producing so that starvation does not occur. To be adaptive, Jetty uses two mechanisms:

    • Tasks that are produced can be interrogated via the Invocable interface to determine if they are nonblocking, blocking or can be run in either mode.  NON_BLOCKING or EITHER tasks can be directly consumed by PC model.
    • The thread pools used by Jetty implement the TryExecutor interface which supports the method boolean tryExecute(Runnable task)which allows the scheduler to know if a thread was available to continue producing and thus allows EWYK/EPC mode, otherwise the task must be passed to an executor to be consumed in PEC mode.  To implement this semantic, Jetty maintains a dynamically sized pool of reserved threads that can respond to tryExecute(Runnable)calls.

    Thus the simple produce consume (PC) model is used for non-blocking tasks; for blocking tasks the EWYK, aka Execute Produce Consume (EPC) mode is used if a reserved thread is available, otherwise the SEDA style Produce Execute Consume (PEC) model is used.

    The adaptive EWYK strategy can be written as :

        while (true)
        {
            Runnable task = _producer.produce();
            if (task == null)
                break;
            if (Invocable.getInvocationType(task)==NON_BLOCKING)
                task.run();                     // Produce Consume
            else if (executor.tryExecute(this)) // recruit a new producer?
                task.run();                     // Execute Produce Consume (EWYK!)
            else
                executor.execute(task);         // Produce Execute Consume
        }
    

    Chained Execution Strategies

    As stated above, in the Jetty use-case it is common for the execution strategy used by the IO layer to call tasks that are themselves an execution strategy for producing and consuming HTTP/2 frames.  Thus EWYK strategies can be chained and by knowing some information about the mode in which the prior  strategy has executed them the strategies can be even more adaptive.

    The adaptable chainable EWYK strategy is outlined here:

      while (true) {
        Runnable task = _producer.produce();
        if (task == null)
          break;
        if (thisThreadIsNonBlocking())
        {
          switch(Invocable.getInvocationType(task))
          {
            case NON_BLOCKING:
              task.run();                 // Produce Consume
              break;
            case BLOCKING:
              executor.execute(task);     // Produce Execute Consume
              break;
            case EITHER:
              executeAsNonBlocking(task); // Produce Consume break;
           }
        }
        else
        {
          switch(Invocable.getInvocationType(task))
          {
            case NON_BLOCKING:
              task.run();                   // Produce Consume
              break;
            case BLOCKING:
              if (_executor.tryExecute(this))
                task.run();                 // Execute Produce Consume (EWYK!)
              else
                executor.execute(task);     // Produce Execute Consume
              break;
            case EITHER:
              if (_executor.tryExecute(this))
                task.run();                 // Execute Produce Consume (EWYK!)
              else
                executeAsNonBlocking(task); // Produce Consume
                break;
           }
        }

    An example of how the chaining works is that the HTTP/2 task declares itself as invocable EITHER in blocking on non blocking mode. If IO strategy is operating in PEC mode, then the HTTP/2 task is in its own thread and free to block, so it can itself use EWYK and potentially execute a blocking task that it produced.

    However, if the IO strategy has no reserved threads it cannot risk queuing an important Flow Control frame in a job queue. Instead it can execute the HTTP/2 as a non blocking task in the PC mode.  So even if the last available thread was running the IO strategy, it can use PC mode to execute HTTP/2 tasks in non blocking mode. The HTTP/2 strategy is then always able to handle flow control frames as they are non-blocking tasks run as PC and all other frames that may block are queued with PEC.

    Conclusion

    The EWYK execution strategy has been implemented in Jetty to improve performance through mechanical sympathy, whilst avoiding the issues of Head of Line blocking, Thread Starvation and Parallel Slowdown.   The team at Webtide continue to work with our clients and users to analyse and innovate better solutions to serve high performance real world applications.

  • Jetty, Cookies and RFC6265 Compliance

    Starting with patch 9.4.3, Jetty will be fully compliant with RFC6265, which presents changes to cookies which may have significant impact for some users.
    Up until now Jetty has supported Version=1 cookies defined in RFC2109 (and continued in RFC2965) which allows for special/reserved characters (control, separator, et al) to be enclosed within double quotes when declared in a Set-Cookie response header:
    Example:

    Set-Cookie: foo="bar;baz";Version=1;Path="/secur"
    

    Which was added to the HTTP Response headers using the following calls.

    Cookie cookie = new Cookie("foo", "bar;baz");
    cookie.setPath("/secur");
    response.addCookie(cookie);

    This allowed for normally non-permitted characters (such as the ; separator found in the example above) to be used as part of a cookie value. With the introduction of RFC6265 (replacing the now obsolete RFC2965 and RFC2109) , this use of Double Quotes to enclose special characters is no longer possible.
    This change was made as a reaction to the strict RFC6265 validation rules present in Chrome/Chromium.
    As such, users are now required to encode their cookie values to use these characters.
    Utilizing javax.servlet.http.Cookie, this can be done as:

    Cookie cookie = new Cookie("foo", URLEncoder.encode("bar;baz", "utf-8"));

    Starting with Jetty 9.4.3, we will now validate all cookie names and values when being added to the HttpServletResponse via the addCookie(Cookie) method.  If there is something amiss, Jetty will throw an IllegalArgumentException with the details.
    Of note, this new addCookie(Cookie) validation will be applied via the ServerConnector, and will work on HTTP/1.0, HTTP/1.1, and HTTP/2.0
    Additionally, Jetty has added a CookieCompliance property to the HttpConfiguration object which can be utilized to define which cookie policy the ServerConnectors will adhere to. By default, this will be set to RFC6265.
    In the standard Jetty Distribution, this can be found in the server’s jetty.xml as:

    <Set name="cookieCompliance">
      <Call class="org.eclipse.jetty.http.CookieCompliance" name="valueOf">
        <Arg><Property name="jetty.httpConfig.cookieCompliance" default="RFC6265"/></Arg>
      </Call>
    </Set>

    Or if you are utilizing the module system in the Jetty distribution, you can set the jetty.httpConfig.cookieCompliance property in the appropriate start INI for your${jetty.base} (such as ${jetty.base}/start.ini or ${jetty.base}/start.d/server.ini):

    ## Cookie compliance mode of: RFC6265
    # jetty.httpConfig.cookieCompliance=RFC6265

    Or, for older Version=1 Cookies, use:

    ## Cookie compliance mode of: RFC2965
    # jetty.httpConfig.cookieCompliance=RFC2965

     

  • Thread Starvation with Eat What You Kill

    This is going to be a blog of mixed metaphors as I try to explain how we avoid thread starvation when we use Jetty’s eat-what-you-kill[n]The EatWhatYouKill strategy is named after a hunting proverb in the sense that one should only kill to eat. The use of this phrase is not an endorsement of hunting nor killing of wildlife for food or sport.[/n] scheduling strategy.
    Jetty has several instances of a computing pattern called ProduceConsume, where a task is run that produces other tasks that need to be consumed. An example of a Producer is the HTTP/1.1 Connection, where the Producer task looks for IO activity on any connection. Each IO event detected is a Consumer task which will read the handle the IO event (typically a HTTP request). In Java NIO terms, the Producer in this example is running the NIO Selector and the Consumers are handling the HTTP protocol and the applications Servlets. Note that the split between Producing and Consuming can be rather arbitrary and we have tried to have the HTTP protocol as part of the Producer, but as we have previously blogged, that split has poor mechanical sympathy. So the key abstract about the Producer Consumer pattern for Jetty is that we use it when the tasks produced can be executed in any order or in parallel: HTTP requests from different connections or HTTP/2 frames from different streams.

    Eat What You Kill

    Mechanical Sympathy not only affects where the split is between producing and consuming, but also how the Producer task and Consumer tasks should be executed (typically by a thread pool) and such considerations can have a dramatic effect on server performance. For example, if one thread produced a task then it is likely that the CPU’s cache is now hot with all the data relating to that task, and so it is best that the same CPU consumes that task using the hot cache. This could be achieved with complex core locking mechanism, but it is far more straight-forward to consume the task using the same thread.
    Jetty has an ExecutionStrategy called Eat-What-You-Kill (EWYK), that has excellent mechanical sympathy properties. We have previously explained  this strategy in detail, but in summary it follows the hunters ethic[n]The EatWhatYouKill strategy is named after a hunting proverb in the sense that one should only kill to eat. The use of this phrase is not an endorsement of hunting nor killing of wildlife for food or sport.[/n] that one should only kill (produce) something that you intend to eat (consume). This strategy allows a thread to only run the producing task if it is immediately able to run any consumer task that is produced (using the hot CPU cache). In order to allow other consumer task to run in parallel, another thread (if available) is dispatched to do more producing and consuming.

    Thread Starvation

    EWYK is an excellent execution strategy that has given Jetty significant better throughput and reduced latency. That said, it is susceptible to thread starvation when it bites off more than it can chew.
    The issue is that EWYK works by using the same thread that produced a task to immediately consume the task and it is possible (even likely) that the consumer task will block as it is often calling application code which may do blocking IO or which is set to wait for some other event. To ensure this does not block the entire server, EWYK will dispatch another task to the thread pool that will do more producing.
    The problem is that if the thread pool is empty (because all the threads are in blocking application code) then the last non-blocked producing thread may produce a task which it then calls and also blocks. A task to do more producing will have been dispatched to the thread pool, but as it was generated from the last available thread, the producing task will be waiting in the job queue for an available thread. All the threads are blocking and it may be that they are all blocking on IO operations that will only be unblocked if some data is read/written.  Unless something calls the NIO Selector, the read/write will not been seen. Since the Selector is called by the Producer task, and that is waiting in the queue, and the queue is stalled because of all the threads blocked waiting for the selector the server is now dead locked by thread starvation!

    Always two there are!

    Jetty’s clever solution to this problem is to not only run our EWYK execution strategy, but to also run the alternative ProduceExecuteConsume strategy, where one thread does all the producing and always dispatches any produced tasks to the thread pool. Because this is not mechanically sympathetic, we run the producer task at low priority. This effectively reserves one thread from the thread pool to always be a producer, but because it is low priority it will seldom run unless the server is idle – or completely stalled due to thread starvation. This means that Jetty always has a thread available to Produce, thus there is always a thread available to run the NIO Selector and any IO events that will unblock any threads will be detected. This needs one more trick to work – the producing task must be able to tell if a detected IO task is non-blocking (i.e. a wakeup of a blocked read or write), in which case it executes it itself rather than submitting the task to any execution strategy. Jetty uses the InvocationType interface to tag such tasks and thus avoid thread starvation.
    This is a great solution when a thread can be dedicated to always Producing (e.g. NIO Selecting). However Jetty has other Producer-Consumer patterns that cannot be threadful. HTTP/2 Connections are consumers of IO Events, but are themselves producers of parsed HTTP/2 frames which may be handled in parallel due to the multiplexed nature of HTTP/2. So each HTTP/2 connection is itself a Produce-Consume pattern, but we cannot allocate a Producer thread to each connection as a server may have many tens of thousands connections!
    Yet, to avoid thread starvation, we must also always call the Producer task for HTTP/2. This is done as it may parse HTTP/2 flow control frames that are necessary to unblock the IO being done by applications threads that are blocked and holding all the available threads from the pool.
    Even if there is a thread reserved as the Producer/Selector by a connector, it may detect IO on a HTTP/2 connection and use the last thread from the thread pool to Consume that IO. If it produces a HTTP/2 frame and EWYK strategy is used, then the last thread may Consume that frame and it too may block in application code. So even if the reserved thread detects more IO, there are no more available threads to consume them!
    So the solution in HTTP/2 is similar to the approach with the Connector. Each HTTP/2 connection has two executions strategies – EWYK, which is used when the calling thread (the Connector’s consumer) is allowed to block, and the traditional ProduceExecuteConsume strategy, which is used when the calling thread is not allowed to block. The HTTP/2 Connection then advertises itself as an InvocationType of EITHER to the Connector. If the Connector is running normally a EWYK strategy will be used and the HTTP/2 Connection will do the same. However, if the Connector is running the low priority ProduceExecutionConsume strategy, it invokes the HTTP/2 connection as non-blocking. This tells the HTTP/2 Connection that when it is acting as a Consumer of the Connectors task, it must not block – so it uses its own ProduceExecuteConsume strategy, as it knows the Production will parse the HTTP/2 frame and not perform the Consume task itself (which may block).
    The final part is that the HTTP/2 frame Producer can look at the frames produced. If they are not frames that will block when handled (i.e. Flow Control) they are handled by the Producer and not submitted to any strategy to be Consumed. Thus, even if the Server is on it’s last thread, Flow Control frames will be detected, parsed and handled – unblocking other threads and avoiding starvation!

  • HTTP/2 at JAX

    I was invited to speak at the JAX conference in Mainz about HTTP/2.
    Jetty has always been a front-runner when it’s about web protocols: first with WebSocket, then with SPDY and finally with HTTP/2.
    We believe that HTTP/2 is going to make the web much better, and we try to spread the word at conferences.
    The JAX conference was great, and despite most of the sessions being in German, I had the chance to network with various speakers – it is always great to be able to speak to top notch people over breakfast or dinner, or while waiting for the next session.
    Below you can find Oracle’s Yolande Poirier video interviewing me about HTTP/2 and the JAX textual interview about the same argument.
    Enjoy !

  • HTTP/2 with HAProxy and Jetty

    HTTP/2 is now the official RFC 7540, and it’s about time to deploy your website on HTTP/2, to get the numerous benefits that HTTP/2 brings.
    A very typical deployment is to have Apache (or Nginx) working as a reverse proxy to a Servlet Container such as Jetty or Tomcat.
    This configuration cannot be used for HTTP/2, because Apache does not support yet HTTP/2 (nor does Nginx).
    We want to propose an alternative deployment replacing Apache (or Nginx) with HAProxy, so that we can leverage Jetty’s 9.3.0 HTTP/2 support, and retain most if not all the features that Apache (or Nginx) were providing as reverse proxy.
    For those that don’t know HAProxy, it’s a very fast load balancer and proxy that powers quite a number of the world’s most visited sites, see here.
    What you will get is a very efficient TLS offloading (performed by HAProxy via OpenSSL), and Jetty HTTP/2 support, including HTTP/2 Push.
    The setup to make HAProxy + Jetty + HTTP/2 work is fairly simple, and documented in detail here.
    Don’t wait years to update your website to HTTP/2: whether you run a JEE web application, or a PHP application like WordPress, HAProxy and Jetty can speed up your website considerably, and many studies have shown that this results in more business.
    Browsers like Firefox and Chrome already support HTTP/2, so you will get more than half of the world potentially accessing your website with HTTP/2.
    Contact us if you want to know more about HTTP/2 and how we can help you to speed up your website.

  • Introduction to HTTP2 in Jetty

    Jetty 9.3 supports HTTP/2 as defined by RFC7540 and it is extremely simple to enable and get started using this new protocol that is available in most current browsers.

    Getting started with Jetty 9.3

    Before we can run HTTP/2, we need to setup Jetty for HTTP/1.1 (strictly speaking this is not required, but makes for an easy narrative):

    $ cd /tmp
    $ wget http://repo1.maven.org/maven2/org/eclipse/jetty/jetty-distribution/9.3.0.RC1/jetty-distribution-9.3.0.RC1.tar.gz
    $ tar xfz jetty-distribution-9.3.0.RC1.tar.gz
    $ export JETTY_HOME=/tmp/jetty-distribution-9.3.0.RC1
    $ mkdir demo
    $ cd demo
    $ java -jar $JETTY_HOME/start.jar --add-to-startd=http,https,deploy
    $ cp $JETTY_HOME/demo-base/webapps/async-rest.war webapps/ROOT.war
    $ java -jar $JETTY_HOME/start.jar

    The result of these commands is to:

    • Download the RC1 release of Jetty 9.3 and unpack it to the /tmp directory
    • Create a demo directory and set it up as a jetty base.
    • Enable the HTTP and HTTPS connectors
    • Deploy a demo web application
    • Start the server!

    Now you are running Jetty and you can see the demo application deployed by pointing your browser at http://localhost:8080 or https://localhost:8443 (you may have to accept the self signed SSL certificate)!

    In the console output, I’ll draw your attention to the following two INFO lines that should have been logged:

    Started ServerConnector@490ab905{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
    Started ServerConnector@69955f9a{SSL,[ssl, http/1.1]}{0.0.0.0:8443}

    These lines indicate that the server is listening on ports 8080 and 8443 and lists the default and optional protocols that are support on each of those connections.  So you can see that port 8080 supports HTTP/1.1 (which by specification supports HTTP/1.0) and port 8443 supports SSL plus HTTP/1.1 (which is HTTPS!).

    Enabling HTTP/2

    Now you can stop the Jetty server by hitting CTRL+C on the terminal, and the following command is all that is needed to enable HTTP/2 on both of these ports and to start the server:

    $ java -jar $JETTY_HOME/start.jar --add-to-startd=http2,http2c
    $ java -jar $JETTY_HOME/start.jar

    This does not create/enable new connectors/ports, but adds the HTTP/2 protocol to the supported protocols of the existing connectors on ports 8080 and 8443.

    To access the demo web application with HTTP/2 you will need to point a recent browser to https://localhost:8443/.  You can verify whether your browser supports HTTP/2 here, add extensions to your browser to display an icon in the address bar (see this extension for Firefox). Firefox also sets a fake response header: X-Firefox-Spdy: h2.

    How does it work?

    If you now look at the console logs you will see that additional protocols have been added to both existing connectors on 8080 and 8443:

    Started ServerConnector@4bec1f0c{HTTP/1.1,[http/1.1, h2c, h2c-17, h2c-14]}{0.0.0.0:8080}
    Started ServerConnector@5bc63d63{SSL,[ssl, alpn, h2, h2-17, h2-14, http/1.1]}{0.0.0.0:8443}

    The name ‘h2’ is the official abbreviation for HTTP/2 over TLS  and ‘h2c’ is the abbreviation for unencrypted HTTP/2 (they really wanted to save every bite in the protocol!).   So you can see that port 8080 is now listening by default for HTTP/1.1, but can also talk h2c (and the draft versions of that).   Port 8443 now by defaults talks SSL, then uses ALPN to negotiate a protocol from: ‘h2’, ‘h2-17’, ‘h2-14’ or ‘http/1.1’ in that priority order.

    When you point your browser at https://localhost:8443/ it will establish a TLS connection and then use the ALPN extension to negotiate the next protocol.  If both the client and server speak the same version of HTTP/2, then it will be selected, otherwise the connection falls back to HTTP/1.1.

    For port 8080, the use of ‘h2c’ is a little more complex.  Firstly there is the problem of finding a client that speaks plain text HTTP/2, as none of the common browsers will use the protocol on plain text connections.  The cUrl utility does support h2c, as of does the Jetty HTTP/2 client.

    The default protocol on port 8080 is still HTTP/1.1, so that the initial connection will be expected to speak that protocol. To use the HTTP/2 protocol a connection may send a HTTP/1.1 request that carries  an Upgrade header, which the server may accept and upgrade to any of the other protocols listed against the connector (eg ‘h2’, ‘h2-17’ etc.) by sending a 101 switching protocols response!   If the server does not wish to accept the upgrade, it can respond to the HTTP/1.1 request and continue normally.

    However, clients are also allowed to assume that a known server does speak HTTP/2 and can attempt to make a connection to port 8080 and immediately start talking HTTP/2.   Luckily the protocol has been designed with a preamble that looks a bit like a HTTP/1.1 request:

    PRI * HTTP/2.0
    SM

    Jetty’s HTTP/1.1 implementation is able to detect that preamble and if the connector also supports ‘h2c’, then the connection is upgraded without the need for a 101 Switching Protocols response!

    HTTP/2 Configuration

    Configuration of HTTP/2 can be considered in the following parts

    Properties Configuration File Purpose
    start.d $JETTY_HOME/etc
    ssl.ini jetty-ssl.xml Connector configuration (eg port) common to HTTPS and HTTP/2
    ssl.ini jetty-ssl-context.xml Keystore  configuration common to HTTPS and HTTP/2
    https.ini jetty-https.xml HTTPS Protocol configuraton
    http2.ini jetty-http2.xml HTTP/2 Protocol configuration