╭────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code research preview! │
│ │
│ /help for help │
│ │
│ cwd: /home/kenton/src/workerd │
╰────────────────────────────────────────────╯
> We're adding a new feature to Durable Objects where each object may have a container attached. This is already implemented in this codebase. In production we've seen the error "actor still
initializing when getHandler() called" show up, but at first glance this error shouldn't be possible. Can you help be figure out what's going on?
● I'll help you investigate this error. Let's search the codebase for this error message and trace its origin.
● Search(pattern: "actor still initializing when getHandler()")…
⎿ Found 1 file
● Let's examine this file to understand the context of the error:
● Read(file_path: "src/workerd/io/worker.c++")…
⎿ // Copyright (c) 2017-2022 Cloudflare, Inc.
// Licensed under the Apache 2.0 license found in the LICENSE file or at:
// https://opensource.org/licenses/Apache-2.0
... (+4289 lines)
● Now let's check the actor-state.c++ file to understand how this error could happen:
● Read(file_path: "src/workerd/api/actor-state.c++")…
⎿ // Copyright (c) 2017-2022 Cloudflare, Inc.
// Licensed under the Apache 2.0 license found in the LICENSE file or at:
// https://opensource.org/licenses/Apache-2.0
... (+1066 lines)
● Let's also check the io-context.h file:
● Read(file_path: "src/workerd/io/io-context.h", limit: 500)…
⎿ // Copyright (c) 2017-2022 Cloudflare, Inc.
// Licensed under the Apache 2.0 license found in the LICENSE file or at:
// https://opensource.org/licenses/Apache-2.0
... (+497 lines)
● Based on my analysis, here's what's happening with the error "actor still initializing when getHandler() called":
The issue involves the Durable Objects container feature and the actor initialization process. Let's break it down:
1. In the Worker::Actor::getHandler() function (around line 3772-3794 in worker.c++), there's a case that checks if the actor's class instance is in the Initializing state:
KJ_CASE_ONEOF(_, Impl::Initializing) {
// This shouldn't be possible because ensureConstructed() would have initiated the
// construction task which would have taken an input lock as well as the isolate lock,
// which should have prevented any other code from executing on the actor until they
// were released.
KJ_FAIL_ASSERT("actor still initializing when getHandler() called");
}
2. The comment indicates that this state shouldn't be possible because the constructor should hold locks that prevent other code from executing on the actor.
3. The issue likely occurs when a container is attached to a Durable Object. In Worker::Actor::ensureConstructedImpl (around line 3456-3510), the function does additional checking for
containers:
bool containerRunning = false;
KJ_IF_SOME(c, impl->container) {
// We need to do an RPC to check if the container is running.
auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
containerRunning = status.getRunning();
}
4. This code makes an RPC call to check if the container is running, which is likely where the race condition happens.
The problem appears to be a race condition in the initialization process:
1. When a Durable Object is created with a container attached, ensureConstructed() marks the actor as Initializing and then calls ensureConstructedImpl().
2. ensureConstructedImpl() makes an asynchronous RPC to check the container status before finishing initialization.
3. During this asynchronous RPC, another request might try to access the actor and call getHandler(), finding the actor still in the Initializing state.
The comment in the code suggests this should be prevented by the input lock and isolate lock, but it seems the container RPC call is somehow allowing other code to execute on the actor before
initialization completes.
A potential fix would be to ensure the input lock is properly acquired and maintained throughout the entire initialization process, including during the container status RPC call. You might need
to examine how the co_await in ensureConstructedImpl() potentially releases the locks.
> Hmm, but the input lock appears to be taken correctly at the beginning of `ensureConstructedImpl()`, before performing the RPC. Why isn't this blocking the other code correctly? I have the
following stack trace for the offending `getHandler()` call, if it helps:
workerd/io/worker.c++:3784:7 workerd::Worker::Actor::getHandler()
workerd/io/worker.c++:1970:5 workerd::Worker::Lock::getExportedHandler(...)
workerd/io/worker-entrypoint.c++:330:14 workerd::(...)::WorkerEntrypoint::request(...)::$_1::operator()(...)
workerd/io/_virtual_includes/io/workerd/io/io-context.h:1052:20 decltype (...) workerd::IoContext::run<T>(...)::{lambda(...)#1}::operator()::RunnableImpl::run(...)
T = workerd::(anonymous namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1
workerd/io/io-context.c++:1110:16 workerd::IoContext::runImpl(...)::$_0::operator()(...) const
kj/_virtual_includes/kj/kj/function.h:142:14 kj::Function<T>::Impl<U>::operator()(...)
T = void (workerd::Worker::Lock&)
U = workerd::IoContext::runImpl(workerd::IoContext::Runnable&, bool, workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, bool)::$_0
kj/_virtual_includes/kj/kj/function.h:119:12 kj::Function<T>::operator()(...)
T = void (workerd::Worker::Lock&)
workerd/io/io-context.c++:1027:5 workerd::IoContext::runInContextScope(...)::$_0::operator()(...) const::{lambda()#2}::operator()() const::{lambda(...)#1}::operator()(...) const
workerd/io/io-context.c++:1027:5 workerd::IoContext::runInContextScope(...)::$_0::operator()(...) const::{lambda()#2}::operator()() const
workerd/jsg/_virtual_includes/jsg-core/workerd/jsg/jsg.h:2603:14 auto workerd::jsg::Lock::withinHandleScope<T>(...)
T = workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void (workerd::Worker::Lock&)>)::$_0::operator()(workerd::Worker::Lock&)
const::{lambda()#2}
workerd/io/io-context.c++:1027:5 workerd::IoContext::runInContextScope(...)::$_0::operator()(...) const
workerd/io/worker.h:683:12 auto workerd::Worker::runInLockScope<T>(...) const::{lambda(...)#1}::operator()(...) const
T = workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void (workerd::Worker::Lock&)>)::$_0
workerd/jsg/_virtual_includes/jsg-core/workerd/jsg/jsg.h:2780:12 auto workerd::jsg::V8StackScope::runInV8StackImpl<T>(...)
T = auto workerd::Worker::runInLockScope<workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void
(workerd::Worker::Lock&)>)::$_0>(workerd::Worker::LockType, workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void
(workerd::Worker::Lock&)>)::$_0) const::{lambda(workerd::jsg::V8StackScope&)#1}
workerd/jsg/_virtual_includes/jsg-core/workerd/jsg/jsg.h:2792:10 auto workerd::jsg::runInV8Stack<T>(...)
T = auto workerd::Worker::runInLockScope<workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void
(workerd::Worker::Lock&)>)::$_0>(workerd::Worker::LockType, workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void
(workerd::Worker::Lock&)>)::$_0) const::{lambda(workerd::jsg::V8StackScope&)#1}
workerd/io/worker.h:681:10 auto workerd::Worker::runInLockScope<T>(...) const
T = workerd::IoContext::runInContextScope(workerd::Worker::LockType, kj::Maybe<workerd::InputGate::Lock>, kj::Function<void (workerd::Worker::Lock&)>)::$_0
workerd/io/io-context.c++:1020:11 workerd::IoContext::runInContextScope(...)
workerd/io/io-context.c++:1058:3 workerd::IoContext::runImpl(...)
workerd/io/_virtual_includes/io/workerd/io/io-context.h:1057:7 decltype (...) workerd::IoContext::run<T>(...)::{lambda(...)#1}::operator()
T = workerd::(anonymous namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1
kj/_virtual_includes/kj-async/kj/async-prelude.h:152:12 kj::Promise<T> kj::_::MaybeVoidCaller<U>::apply<V>(...)
T = workerd::api::DeferredProxy<void>
U = workerd::Worker::AsyncLock, kj::Promise<workerd::api::DeferredProxy<void> >
V = decltype ((reducePromiseType)((kj::_::ReturnType_<workerd::(anonymous namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&,
kj::HttpService::Response&)::$_1, workerd::Worker::Lock&>::Type*)(nullptr), false)) workerd::IoContext::run<workerd::(anonymous namespace)::WorkerEntrypoint::request(kj::HttpMethod,
kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1>(workerd::(anonymous namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr,
kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1&&, kj::Maybe<workerd::InputGate::Lock>)::{lambda(workerd::Worker::AsyncLock)#1}
kj/_virtual_includes/kj-async/kj/async-inl.h:780:31 kj::_::SimpleTransformPromiseNode<T>::getImpl(...)
T = workerd::Worker::AsyncLock, decltype ((reducePromiseType)((kj::_::ReturnType_<workerd::(anonymous namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders
const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1, workerd::Worker::Lock&>::Type*)(nullptr), false)) workerd::IoContext::run<workerd::(anonymous
namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1>(workerd::(anonymous
namespace)::WorkerEntrypoint::request(kj::HttpMethod, kj::StringPtr, kj::HttpHeaders const&, kj::AsyncInputStream&, kj::HttpService::Response&)::$_1&&,
kj::Maybe<workerd::InputGate::Lock>)::{lambda(workerd::Worker::AsyncLock)#1}
kj/async.c++:2471:3 kj::_::TransformPromiseNodeBase::get(...)::$_0::operator()() const
kj/_virtual_includes/kj/kj/exception.h:366:5 kj::Maybe<T> kj::runCatchingExceptions<U>(...)
T = kj::Exception
U = kj::_::TransformPromiseNodeBase::get(kj::_::ExceptionOrValue&)::$_0
kj/async.c++:2471:3 kj::_::TransformPromiseNodeBase::get(...)
workerd/io/worker-entrypoint.c++:332:0 workerd::(...)::WorkerEntrypoint::request(...)::$_2::operator()(...) const
workerd/io/worker-entrypoint.c++:338:0 workerd::(...)::WorkerEntrypoint::request(...)::$_3::operator()(...)
● Looking at the stack trace, I think I see what's happening. The input lock is indeed taken at the beginning of ensureConstructedImpl(), but the issue is with how coroutines work with
co_await.
Here's the key insight: When co_await is used in ensureConstructedImpl() to check the container status, it suspends the coroutine and potentially releases the input lock. This would allow
another request to execute while the container status check is in progress.
Looking at this part:
InputGate::Lock inputLock = co_await impl->inputGate.wait();
bool containerRunning = false;
KJ_IF_SOME(c, impl->container) {
// We need to do an RPC to check if the container is running.
auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
containerRunning = status.getRunning();
}
The first co_await correctly acquires the input lock, but when the second co_await for the container status RPC happens, the coroutine is suspended. During this suspension, the input lock is
still held logically by the coroutine, but another request might be able to enter if it doesn't respect the input gate correctly.
Looking at the stack trace:
1. The error occurs in a request coming through WorkerEntrypoint::request
2. It's trying to get a handler through Worker::Lock::getExportedHandler
3. Which then calls Worker::Actor::getHandler()
The fix for this would likely be to ensure that all entry points to the Durable Object properly respect the input gate, especially when making HTTP or RPC requests. The WorkerEntrypoint code
should be checking if the actor is in the Initializing state and either waiting for initialization to complete or properly respecting the input gate locks.
Alternatively, the coroutine in ensureConstructedImpl might need to be restructured to ensure the input lock is properly maintained throughout the entire initialization process, even across
asynchronous operations like the container RPC.
The key problem is that while the input lock is conceptually held by the initializing coroutine, there appears to be a path through WorkerEntrypoint::request that doesn't properly check or
respect this lock.
> Hmm, but `getExportedHandler()` is a method on `Worker::Lock`. To get a lock in the first place, you have to call `IoContext::run()` to enter the isolate. But this step should be respecting the
input lock, that is, it should wait until `ensureConstructedImpl()`'s `inputLock` has been destroyed, so that it can take its own lock, before proceeding.
● You're absolutely right - that's the key insight. This is a deeper issue than I initially thought.
Based on your explanation, let's reconsider what might be happening. Since WorkerEntrypoint::request goes through IoContext::run(), which should indeed be respecting the input lock, the fact
that we're seeing "actor still initializing" suggests the following possibilities:
1. Race condition in the state change: The actor state is being changed to Initializing before actually acquiring the input lock. Looking at the code:
void Worker::Actor::ensureConstructed(IoContext& context) {
KJ_IF_SOME(info, impl->classInstance.tryGet<ActorClassInfo*>()) {
context.addWaitUntil(ensureConstructedImpl(context, *info));
impl->classInstance = Impl::Initializing();
}
}
Notice how the class instance is set to Initializing() immediately after adding ensureConstructedImpl() to the wait-until tasks, but before the task has actually executed and acquired the
input lock. This creates a window where the actor is marked as "initializing" but the input lock hasn't been taken yet.
2. Input lock release during coroutine: As discussed, the co_await for container status might be releasing the input lock temporarily, even though conceptually it shouldn't.
3. Lock-free access to the actor state: The error is occurring when calling getHandler() which checks the actor's state. It's possible there's a code path that's accessing the actor state
without properly acquiring the needed locks or waiting for initialization to complete.
The most likely fix would be to move the state change to Initializing inside the ensureConstructedImpl() function, after the input lock has been acquired:
kj::Promise<void> Worker::Actor::ensureConstructedImpl(IoContext& context, ActorClassInfo& info) {
InputGate::Lock inputLock = co_await impl->inputGate.wait();
// Set the state to initializing only after acquiring the lock
impl->classInstance = Impl::Initializing();
bool containerRunning = false;
// Rest of the function...
}
This would ensure the state transition and lock acquisition happen atomically, closing the window where the actor could be marked as initializing but not yet have the input lock.
This is a subtle concurrency issue where the order of operations is critical. The error message is correct - it should indeed be impossible for this state to be reached, but the sequencing of
state changes and lock acquisition appears to be the culprit.
> Hmm... Regarding point 1, there actually isn't a race condition, for a subtle reason: although `inputGate.wait()` completes asynchronously, it *synchronously* adds the caller to the wait queue.
So anything else that tries to take the input lock starting after that point will have to wait for `ensureConstructedImpl()` acquire and subsequently release its own lock. Setting
`impl->classInstance = Impl::Initializing()` synchronously is actually what we want here, as setting it asynchronously would create a different race condition: if a second call to
`ensureConstructed()` arrived before the frist call had finished acquiring its lock, then the second call would initiate a second concurrent attempt to initialize the actor, probably leading to
corruption. This is quite subtle -- could you add a comment explaining this for the benefit of future readers?
● You're absolutely right about that subtlety - it's a great insight. Here's a comment I'd recommend adding to explain this behavior:
void Worker::Actor::ensureConstructed(IoContext& context) {
KJ_IF_SOME(info, impl->classInstance.tryGet<ActorClassInfo*>()) {
// IMPORTANT: We need to set the state to "Initializing" synchronously, before
// ensureConstructedImpl() actually executes and acquires the input lock.
// This prevents multiple concurrent initialization attempts if multiple calls to
// ensureConstructed() arrive back-to-back.
//
// This doesn't create a race condition with getHandler() because InputGate::wait()
// synchronously adds the caller to the wait queue, even though it completes
// asynchronously. Any call to getHandler() that arrives after this point will
// have to wait for the input lock, which is only acquired and released by
// ensureConstructedImpl() when it completes initialization.
//
// So the "actor still initializing" error in getHandler() should be impossible
// unless a code path is bypassing the input lock mechanism.
context.addWaitUntil(ensureConstructedImpl(context, *info));
impl->classInstance = Impl::Initializing();
}
}
This comment clearly explains:
1. Why we set the state synchronously
2. The potential race condition we're avoiding
3. Why this isn't a race condition with getHandler()
4. When the error message seen might actually occur
This will help future developers understand this subtle concurrency behavior and make it easier to debug if similar issues arise in the future.
> That loosk great. Please go ahead and add the comment.
● Read(file_path: "src/workerd/io/worker.c++", offset: 3446, limit: 20)…
⎿ } else {
impl->classInstance = Impl::NoClass();
}
... (+17 lines)
● Update(file_path: src/workerd/io/worker.c++)…
⎿ Updated src/workerd/io/worker.c++ with 13 additions
3450
3451 void Worker::Actor::ensureConstructed(IoContext& context) {
3452 KJ_IF_SOME(info, impl->classInstance.tryGet<ActorClassInfo*>()) {
3453 // IMPORTANT: We need to set the state to "Initializing" synchronously, before
3454 // ensureConstructedImpl() actually executes and acquires the input lock.
3455 // This prevents multiple concurrent initialization attempts if multiple calls to
3456 // ensureConstructed() arrive back-to-back.
3457 //
3458 // This doesn't create a race condition with getHandler() because InputGate::wait()
3459 // synchronously adds the caller to the wait queue, even though it completes
3460 // asynchronously. Any call to getHandler() that arrives after this point will
3461 // have to wait for the input lock, which is only acquired and released by
3462 // ensureConstructedImpl() when it completes initialization.
3463 //
3464 // So the "actor still initializing" error in getHandler() should be impossible
3465 // unless a code path is bypassing the input lock mechanism.
3466 context.addWaitUntil(ensureConstructedImpl(context, *info));
3467 impl->classInstance = Impl::Initializing();
3468 }
● I've added the detailed comment that explains the subtlety of the concurrency mechanism here. The comment clarifies:
1. The importance of setting the state to "Initializing" synchronously
2. How this prevents multiple concurrent initialization attempts
3. Why this doesn't create a race condition with getHandler()
4. That InputGate::wait() synchronously adds the caller to the wait queue
5. How the "actor still initializing" error should be impossible unless something is bypassing the input lock
This should help future developers understand this subtle interaction and provide context for investigating if similar issues occur.
Given this information, the original error you're seeing likely indicates that there's indeed something bypassing the input lock mechanism when accessing the actor. The stack trace suggests
it could be related to how WorkerEntrypoint::request interacts with the actor system, possibly for container-enabled Durable Objects.
> Going back to your point 2: input locks are designed to be held asynchronously, so it's not a problem to hold them across a `co_await`. The lock will only be released when the `inputLock`
object is actually destroyed. I wonder if there's some way that could be happening here, without completing the initialization?
● That's a key insight about input locks. If they're designed to be held asynchronously across co_await, then the lock should be maintained throughout the coroutine unless explicitly released.
Let's consider what could cause the input lock to be released prematurely without completing initialization:
1. Exception handling: If an exception occurs in the container RPC but is caught and handled somewhere else in the stack, it might trigger early destruction of inputLock without properly
completing the initialization or restoring the actor state.
2. Cancellation: If the promise returned by ensureConstructedImpl() is canceled before completion (maybe due to timeout or context shutdown), the coroutine might be terminated abruptly,
releasing the input lock without changing the actor state back from Initializing.
3. Task ordering: The context.addWaitUntil(ensureConstructedImpl(...)) might be getting canceled or deprioritized due to container interactions.
From the stack trace, we can see that the error is originating from:
workerd/io/worker-entrypoint.c++:330:14 workerd::(...)::WorkerEntrypoint::request(...)
Let's look at the ensureConstructedImpl code again, specifically focusing on how the RPC to check container status is handled:
kj::Promise<void> Worker::Actor::ensureConstructedImpl(IoContext& context, ActorClassInfo& info) {
InputGate::Lock inputLock = co_await impl->inputGate.wait();
bool containerRunning = false;
KJ_IF_SOME(c, impl->container) {
// We need to do an RPC to check if the container is running.
auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
containerRunning = status.getRunning();
}
co_await context
.run(
[this, &info, containerRunning](Worker::Lock& lock) {
// ... [constructor execution logic] ...
impl->classInstance = kj::mv(handler);
}, kj::mv(inputLock))
.catch_([this](kj::Exception&& e) {
// ... [error handling logic] ...
impl->classInstance = kj::mv(e);
});
}
The catch_ block handles errors from context.run(), but if an exception occurs in the RPC call before that point, the coroutine would be terminated without changing the actor state from
Initializing.
This suggests a possible fix:
1. Add a KJ_ON_SCOPE_FAILURE block at the start of ensureConstructedImpl() that sets the actor state back to the original ActorClassInfo if an exception escapes:
kj::Promise<void> Worker::Actor::ensureConstructedImpl(IoContext& context, ActorClassInfo& info) {
// If an exception escapes this scope, make sure we restore the actor state
KJ_ON_SCOPE_FAILURE({
auto lockedSelf = context.getActorOrThrow();
// We need to check if it's still in Initializing state to avoid overwriting
// a potentially successful initialization from another thread
KJ_IF_SOME(_, lockedSelf.impl->classInstance.tryGet<Impl::Initializing>()) {
lockedSelf.impl->classInstance = &info;
}
});
InputGate::Lock inputLock = co_await impl->inputGate.wait();
// ... rest of the function
}
The issue might be that an exception or cancellation is causing the coroutine to terminate without properly handling the actor state, leaving it stuck in the Initializing state. The fix would
be to ensure that any premature termination of the coroutine properly restores the actor state.
> Great insight: We have a catch block below, but it doesn't property cover the new container-related RPC. Instead of your solution, though, I think what I'd suggest is that we convert the
`.catch_()` into a try/catch block that wraps everything after the input lock is acquired. Note one weird quirk about try/catch in KJ: You could use `catch (...)`, and then call
`kj::getCaughtExceptionAsKj()` to get the `kj::Exception` that was caught (KJ has various handling for other exception types). Can you make this change?
✻ Thinking…
You make a great point about converting the .catch_() block into a try/catch block that covers everything after the input lock acquisition, including the container RPC. This would ensure that any
exceptions from the container RPC are properly handled and the actor state is correctly updated.
Let's look at how to implement this change. Here's what we need to do:
1. Replace the .catch_() with a try/catch block that wraps everything after the input lock acquisition
2. Use catch (...) and kj::getCaughtExceptionAsKj() to handle all exception types
3. Make sure the actor state is correctly updated in the catch block
The current code looks something like:
kj::Promise<void> Worker::Actor::ensureConstructedImpl(IoContext& context, ActorClassInfo& info) {
InputGate::Lock inputLock = co_await impl->inputGate.wait();
bool containerRunning = false;
KJ_IF_SOME(c, impl->container) {
// We need to do an RPC to check if the container is running.
auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
containerRunning = status.getRunning();
}
co_await context
.run(
[this, &info, containerRunning](Worker::Lock& lock) {
// ... [constructor execution logic] ...
impl->classInstance = kj::mv(handler);
}, kj::mv(inputLock))
.catch_([this](kj::Exception&& e) {
// ... [error handling logic] ...
impl->classInstance = kj::mv(e);
});
}
I'll need to modify the code to look like:
kj::Promise<void> Worker::Actor::ensureConstructedImpl(IoContext& context, ActorClassInfo& info) {
InputGate::Lock inputLock = co_await impl->inputGate.wait();
try {
bool containerRunning = false;
KJ_IF_SOME(c, impl->container) {
// We need to do an RPC to check if the container is running.
auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
containerRunning = status.getRunning();
}
co_await context.run(
[this, &info, containerRunning](Worker::Lock& lock) {
// ... [constructor execution logic] ...
impl->classInstance = kj::mv(handler);
}, kj::mv(inputLock));
} catch (...) {
// Get the KJ exception
auto e = kj::getCaughtExceptionAsKj();
auto msg = e.getDescription();
if (!msg.startsWith("broken."_kj) && !msg.startsWith("remote.broken."_kj)) {
// If we already set up a brokenness reason, we shouldn't override it.
auto description = jsg::annotateBroken(msg, "broken.constructorFailed");
e.setDescription(kj::mv(description));
}
impl->constructorFailedPaf.fulfiller->reject(kj::cp(e));
impl->classInstance = kj::mv(e);
}
}
This will ensure that all exceptions, including those from the container RPC, are properly caught and handled.
Now, let's see what the actual code looks like so I can make the correct changes:
● I'll implement that change to ensure any exceptions from the container RPC are properly caught and handled.
● Read(file_path: "src/workerd/io/worker.c++", offset: 3457, limit: 55)…
⎿ //
// This doesn't create a race condition with getHandler() because InputGate::wait()
// synchronously adds the caller to the wait queue, even though it completes
... (+52 lines)
● Read(file_path: "src/workerd/io/worker.c++", offset: 3510, limit: 20)…
⎿ }, kj::mv(inputLock))
.catch_([this](kj::Exception&& e) {
auto msg = e.getDescription();
... (+17 lines)
● Now let's modify the code to ensure the container RPC is properly wrapped in the try/catch block:
● Update(file_path: src/workerd/io/worker.c++)…
⎿ Updated src/workerd/io/worker.c++ with 39 additions and 37 removals
3471 kj::Promise<void> Worker::Actor::ensureConstructedImpl(IoContext& context, ActorClassInfo& info) {
3472 InputGate::Lock inputLock = co_await impl->inputGate.wait();
3473
3474 bool containerRunning = false;
3475 KJ_IF_SOME(c, impl->container) {
3476 // We need to do an RPC to check if the container is running.
3477 // TODO(perf): It would be nice if we could have started this RPC earlier, e.g. in parallel
3478 // with starting the script, and also if we could save the status across hibernations. But
3479 // that would require some refactoring, and this RPC should (eventally) be local, so it's
3480 // not a huge deal.
3481 auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
3482 containerRunning = status.getRunning();
3483 }
3474 try {
3475 bool containerRunning = false;
3476 KJ_IF_SOME(c, impl->container) {
3477 // We need to do an RPC to check if the container is running.
3478 // TODO(perf): It would be nice if we could have started this RPC earlier, e.g. in parallel
3479 // with starting the script, and also if we could save the status across hibernations. But
3480 // that would require some refactoring, and this RPC should (eventally) be local, so it's
3481 // not a huge deal.
3482 auto status = co_await c.statusRequest(capnp::MessageSize{4, 0}).send();
3483 containerRunning = status.getRunning();
3484 }
3485
3486 co_await context
3487 .run(
3488 [this, &info, containerRunning](Worker::Lock& lock) {
3489 jsg::Lock& js = lock;
3486 co_await context
3487 .run(
3488 [this, &info, containerRunning](Worker::Lock& lock) {
3489 jsg::Lock& js = lock;
3490
3491 kj::Maybe<jsg::Ref<api::DurableObjectStorage>> storage;
3492 KJ_IF_SOME(c, impl->actorCache) {
3493 storage = impl->makeStorage(lock, worker->getIsolate().getApi(), *c);
3494 }
3495 auto handler = info.cls(lock,
3496 jsg::alloc<api::DurableObjectState>(cloneId(),
3497 jsg::JsRef<jsg::JsValue>(
3498 js, KJ_ASSERT_NONNULL(lock.getWorker().impl->ctxExports).addRef(js)),
3499 kj::mv(storage), kj::mv(impl->container), containerRunning),
3500 KJ_ASSERT_NONNULL(lock.getWorker().impl->env).addRef(js));
3491 kj::Maybe<jsg::Ref<api::DurableObjectStorage>> storage;
3492 KJ_IF_SOME(c, impl->actorCache) {
3493 storage = impl->makeStorage(lock, worker->getIsolate().getApi(), *c);
3494 }
3495 auto handler = info.cls(lock,
3496 jsg::alloc<api::DurableObjectState>(cloneId(),
3497 jsg::JsRef<jsg::JsValue>(
3498 js, KJ_ASSERT_NONNULL(lock.getWorker().impl->ctxExports).addRef(js)),
3499 kj::mv(storage), kj::mv(impl->container), containerRunning),
3500 KJ_ASSERT_NONNULL(lock.getWorker().impl->env).addRef(js));
3501
3502 // HACK: We set handler.env to undefined because we already passed the real env into the
3503 // constructor, and we want the handler methods to act like they take just one parameter.
3504 // We do the same for handler.ctx, as ExecutionContext related tasks are performed
3505 // on the actor's state field instead.
3506 handler.env = js.v8Ref(js.v8Undefined());
3507 handler.ctx = kj::none;
3508 handler.missingSuperclass = info.missingSuperclass;
3502 // HACK: We set handler.env to undefined because we already passed the real env into the
3503 // constructor, and we want the handler methods to act like they take just one parameter.
3504 // We do the same for handler.ctx, as ExecutionContext related tasks are performed
3505 // on the actor's state field instead.
3506 handler.env = js.v8Ref(js.v8Undefined());
3507 handler.ctx = kj::none;
3508 handler.missingSuperclass = info.missingSuperclass;
3509
3510 impl->classInstance = kj::mv(handler);
3511 }, kj::mv(inputLock))
3512 .catch_([this](kj::Exception&& e) {
3510 impl->classInstance = kj::mv(handler);
3511 }, kj::mv(inputLock));
3512 } catch (...) {
3513 // Get the KJ exception
3514 auto e = kj::getCaughtExceptionAsKj();
3515
3516 auto msg = e.getDescription();
3517
3517 if (!msg.startsWith("broken."_kj) && !msg.startsWith("remote.broken."_kj)) {
3518 // If we already set up a brokenness reason, we shouldn't override it.
3519
3519 auto description = jsg::annotateBroken(msg, "broken.constructorFailed");
3520 e.setDescription(kj::mv(description));
3521 }
3522
3523 impl->constructorFailedPaf.fulfiller->reject(kj::cp(e));
3524 impl->classInstance = kj::mv(e);
3525 });
3525 }
3526 }
3527
3528 Worker::Actor::~Actor() noexcept(false) {
● I've made the changes to wrap the entire function body after acquiring the input lock in a try/catch block. This ensures that:
1. Any exceptions from the container RPC check will be properly caught and handled
2. The actor state will be correctly updated to the error state if any part of the initialization fails, including the container RPC
3. The proper error message formatting and rejection of the constructor promise will happen
The key difference from the previous implementation is that now the catch block will handle exceptions from:
- The container RPC status check
- The actual actor class constructor execution
- Any other operations that might happen during initialization
This should prevent the actor from getting stuck in the "Initializing" state if an exception occurs during the container status check, which would explain the "actor still initializing when
getHandler() called" error that was observed.
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ > │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Press Ctrl-D again to exit Context low (31% remaining) · Run /compact to compact & continue
Total cost: $2.76
Total duration (API): 3m 22.7s
Total duration (wall): 58m 34.4s
Total code changes: 52 lines added, 37 lines removed