Skip to main content
Skip to main content
Troubleshooting & SupportFebruary 19, 20269 min read

FiveM Discord refund bot readiness checks with health probes and canary cases

Build readiness checks, health probes, and canary cases so your FiveM Discord refund bot fails safely and recovers fast.

Refund automation in a FiveM community usually touches the most sensitive parts of your Discord: tickets, staff roles, logs, and sometimes payout or inventory workflows. When the bot is “up” but not actually able to do its job—missing permissions, broken webhooks, rate-limited APIs, or a stalled worker—you get silent failures and frustrated players. Readiness checks, health probes, and canary cases give you early warning and a safe way to validate changes before they impact real refund requests.

“Reliability is a community feature: if staff can’t trust the tools, players won’t trust the process.”
Common ops principle for community moderation teams

Define what “ready” means for a refund bot

A process can be online while still being unready. In Discord terms, your bot might be connected to the gateway but unable to create a ticket thread, post to a log channel, or assign a “Refund Approved” role. Start by defining readiness as “the bot can complete the minimum refund workflow end-to-end in this guild.” That definition should map to concrete checks you can run automatically.

  • Guild access: the bot can fetch the target guild and required channels by ID (ticket channel/category, log channel, staff review channel).
  • Permissions: the bot has View Channel, Send Messages, Embed Links, Attach Files (if you upload evidence), Manage Threads (if using threads), and Manage Roles (only if you assign refund-related roles).
  • Slash commands: core commands (for example /refund create, /refund status, /refund close) are registered and responding within an acceptable time.
  • Storage: the bot can read/write its datastore (SQLite/Postgres/Redis) and can acquire any required locks for job workers.
  • Outbound integrations: webhooks for logs/audit posts are valid; any external API calls (payment provider, panel API) return expected status codes.
  • Rate limits: the bot can send at least one message to the log channel without hitting a global or per-route rate limit. (You can’t “avoid” rate limits, but you can detect when you’re already in trouble.)

Practical tip: split liveness vs readiness

Use two different probes. Liveness answers “should the process be restarted?” (event loop alive, no deadlocks). Readiness answers “should this instance receive real refund traffic?” (permissions, channels, database, command handlers). Mixing them causes restart loops when Discord permissions change or a channel is deleted.

Implement health probes that match Discord and FiveM realities

A good probe is fast, deterministic, and safe. Avoid probes that spam channels or create real tickets. Instead, use read-only checks where possible, and write-only checks to a dedicated private channel if you must validate posting. If you run your bot in Docker, expose HTTP endpoints like /healthz (liveness) and /readyz (readiness). If you run it on a VPS without a reverse proxy, you can still expose a local port and have systemd or a watchdog script curl it.

For Discord-specific readiness, validate the exact objects your workflow uses. Example: your refund flow might create a private thread under #refund-tickets, ping @Refund Staff, and log an embed to #refund-logs. Your readiness probe should fetch those channel IDs, confirm type (text channel vs forum vs category), and confirm permissions for the bot member in that channel. If you rely on interaction responses, also confirm your interaction handler is registered and not throwing errors on a trivial “ping” command.

  1. Liveness probe: check the bot process can run a short event-loop task, and that the gateway connection is in a healthy state (connected, not reconnecting continuously).
  2. Readiness probe: fetch required guild and channels by ID; fail if any are missing or renamed into an incompatible type (for example, a forum channel where you expect a text channel).
  3. Permission probe: compute effective permissions for the bot in each required channel (Send Messages, View Channel, Embed Links, Manage Threads, Manage Messages if you clean up).
  4. Write probe (optional): post a single-line message to a private #bot-probes channel and delete it immediately, or rotate messages with a TTL to avoid clutter.
  5. Dependency probe: run a lightweight query against your datastore (SELECT 1) and verify the job queue has an active worker if you process refunds asynchronously.
  6. Alerting: if readiness fails for more than N minutes, notify a staff-only channel (for example #tech-alerts) via a separate webhook or a secondary bot token.

Practical tip: probe permissions using real role setups

Create a “Bot Test” role and a “Refund Staff” role in your staging guild. Put the bot under the same role hierarchy and channel overwrites you use in production. A probe that passes in an admin-everywhere test server can still fail in production where @everyone is denied View Channel on ticket categories.

Design canary cases that validate the refund workflow end-to-end

Health probes tell you the bot is positioned to work; canary cases tell you it actually works. A canary case is a controlled, low-impact test that exercises the same code paths as a real refund request. For refund bots, the safest approach is to run canaries in a staging Discord guild that mirrors production roles, permission overwrites, and ticket structures. If you must canary in production, isolate it to a staff-only category and label everything clearly (for example, prefix threads with “CANARY”).

A practical canary scenario: a test user triggers /refund create with a fake transaction ID, the bot opens a ticket thread, posts a checklist embed, pings @Refund Staff, and writes an audit entry to #refund-logs. Then a staff account runs /refund approve, the bot updates status, applies a “Refund Approved” role (if you use roles for tracking), and closes the ticket. You don’t need to simulate money movement; you need to validate Discord operations, state transitions, and logging.

  • Ticket creation: thread/forum post created under the correct parent channel, with correct auto-archive settings and naming convention.
  • Visibility: only the requester and staff can view the ticket (verify channel overwrites or private thread membership).
  • Staff ping: @Refund Staff mention succeeds (role is mentionable or bot has permission to mention).
  • Logs: an embed with case ID, user ID, and action is posted to #refund-logs via webhook or bot message.
  • State integrity: the case moves through expected statuses (OPEN → REVIEW → APPROVED/DENIED → CLOSED) without skipping steps.
  • Cleanup: ticket is locked/archived, and any temporary roles or tags are removed.

Operational safeguards: permissions, rate limits, and failure modes

Most refund bot incidents come from operational drift: a channel gets deleted, a category overwrite changes, a role moves above the bot in the hierarchy, or a webhook is regenerated. Bake safeguards into your readiness checks and your runtime behavior so failures are visible and contained.

Permissions are the first line. If your bot assigns roles like “Refund Pending” or “Chargeback Watch,” it must be higher than those roles in the Discord role hierarchy, and it needs Manage Roles. If you use threads, it needs Manage Threads and Send Messages in Threads. For ticket systems built on forum channels, validate that your bot has Create Public Threads or Create Private Threads as appropriate and that it can apply tags if your workflow uses them.

Rate limiting is the second line. Refund workflows can burst when a server restarts, a wipe happens, or a payment provider has an outage. Add backoff and queueing for outbound Discord calls, and log when you hit 429s. Your readiness probe should not fail just because you are rate-limited in the moment, but it should surface “degraded” status so staff knows responses may be delayed.

Finally, define safe failure modes. If readiness fails, the bot should refuse to open new refund tickets and instead respond with a clear message directing users to a manual fallback (for example, a pinned “Refund Request Form” in #support). This prevents half-created tickets that staff can’t see. Systems like LD Refund System typically work best when you treat automation as an assistant, not the only path—keep a documented manual process for edge cases.

Logging, alerting, and audit trails staff can trust

Refund disputes are sensitive. Your logs should help you answer: who requested the refund, what evidence was provided, what staff member acted, what decision was made, and when. Put operational logs and audit logs in different places. Operational logs are for debugging (timeouts, permission errors). Audit logs are for staff accountability (case status changes).

In Discord, a common pattern is: #bot-ops for errors and warnings, #refund-logs for immutable case events, and #tech-alerts for paging staff when readiness stays red. Use structured embeds with fields like Case ID, Discord User ID, FiveM identifiers (license:, discord:, steam:), Staff Actor, and Action. If you also log to disk, rotate logs and avoid storing sensitive payment data; store references (transaction hash or provider ID) instead of full details.

Alerting should be actionable. Don’t just post “ready=false.” Post the failing check and the remediation hint: “Missing permission: Embed Links in #refund-logs” or “Channel ID not found: 123… (was #refund-tickets deleted?)”. If you run multiple instances, include instance name and version so you can correlate with deployments.

Rollouts and troubleshooting playbook (what to do when checks fail)

Treat bot changes like you treat FiveM resource updates: stage, canary, then roll out. Tag releases, keep a changelog, and make sure staff knows what “normal” looks like. A small permissions change can break ticket creation just as easily as a bad resource manifest can break a server start.

  1. If readiness fails, freeze new cases: disable /refund create or route it to a manual fallback message.
  2. Check Discord drift first: confirm channel IDs, role hierarchy, and overwrites for the ticket category and log channel.
  3. Validate webhooks: ensure the webhook URL in config is current and the channel still exists; rotate secrets if leaked.
  4. Inspect rate limits and API errors: look for 429s, missing intents, or interaction timeouts in #bot-ops.
  5. Run the canary suite in staging: confirm ticket creation, staff ping, log embed, status transitions, and closure.
  6. Only then redeploy: roll back to the last known good version if canaries fail after a change.

If you’re using a packaged refund workflow like LD Refund System, apply the same discipline: keep staging and production configs separate, run readiness probes against the exact guild/channel IDs in each environment, and schedule a weekly canary to catch permission drift before players do. The goal is not more automation—it’s fewer surprises.

Troubleshooting & SupportFiveMDiscordBotsObservabilityOperations

Need a smarter refund flow?

LD Refund System automates Discord approvals, in-game claims, and audit logging so your staff stay focused on players.

Online support