Company - Inero Software - Software Consulting

Keycloak Deployment Auditing – General Scope and Guidelines

Andrzej Chybicki — Mon, 29 Dec 2025 10:08:56 +0000

Keycloak Deployment Auditing – General Scope and Guidelines

Practical lessons from auditing multi-realm, multi-client Keycloak environments in medium and large organizations

1. Introduction

In medium and large enterprises, Keycloak deployments rarely follow a simple “one realm – one application” pattern. In reality, such environments typically consist of multiple realms reflecting organizational structures, environments, or business domains, alongside dozens or even hundreds of client applications.

These clients often include web frontends, backend services, machine-to-machine integrations, and legacy systems, all maintained by different teams with varying levels of IAM expertise. As a result, Identity and Access Management quickly becomes a shared responsibility rather than a centrally controlled component.

“A Keycloak audit is not about verifying settings in the admin console — it is about understanding how identity, applications, and security decisions interact at scale.”

The primary goal of a Keycloak deployment audit is therefore not to “find flaws in Keycloak itself”, but to assess whether the entire authentication and authorization ecosystem is secure, coherent, and aligned with modern OAuth 2.1 and OpenID Connect best practices.

From our experience auditing complex enterprise IAM landscapes, a comprehensive Keycloak security audit focuses on three complementary objectives:

- evaluating the configuration of the Keycloak Authorization Server,
- reviewing how client applications integrate with Keycloak,
- identifying security risks emerging from the interaction between both sides.

This holistic approach is essential, as many real-world security issues do not stem from a single misconfiguration, but from subtle inconsistencies across multiple realms, clients, and applications.

OVERALL RISK SEVERITY (ORS) MODEL

ORS
Impact	HIGH	Medium	High	Critical
	MEDIUM	Low	Medium	High
	LOW	Note	Low	Medium
		LOW	MEDIUM	HIGH
	Likelihood

To prioritize findings in a meaningful and actionable way, audit results are typically classified using a risk-based approach inspired by OWASP methodologies. Each finding is evaluated as a combination of:

- likelihood of exploitation,
- potential impact on confidentiality, integrity, and availability.

This allows organizations to distinguish between:

- critical risks with immediate business impact,
- medium and low risks related to configuration hardening and attack surface reduction,
- best-practice recommendations aimed at long-term security maturity.

Keycloak-side audit – known patterns, real-world consequences

Configuration aspects of Keycloak itself are well documented and widely discussed in official documentation and community guidelines. Nevertheless, real-world audits of large-scale deployments consistently reveal recurring issues such as:

- lack of regular realm key rotation,
- missing client secret rotation,
- overly permissive redirect URIs and web origins,
- unused but enabled service accounts,
- globally enabled “full scope allowed” settings,
- deprecated direct access grants left active,
- missing or inconsistent enforcement of PKCE.

While these topics are well known, they are worth revisiting from an operational perspective. In large, multi-realm Keycloak deployments, even seemingly minor configuration oversights can accumulate and significantly increase the overall attack surface.

2. Client-side audit – where the highest risks emerge

From a security standpoint, the most sensitive and often underestimated part of a Keycloak deployment is the client application layer. Even a well-configured Authorization Server cannot compensate for insecure client-side implementations.

“In real-world Keycloak deployments, the most critical security risks rarely originate in the IAM platform itself — they emerge at the client application layer.”

In practice, the most severe findings during Keycloak audits are almost always related to how applications consume tokens, validate authentication state, and handle sensitive data after a successful login.

Missing token validation in client applications

One of the most critical issues observed in enterprise environments is incomplete or missing access token validation on the application side. This includes scenarios where:

- endpoints do not verify authentication at all,
- token signatures or claims are not fully validated,
- authorization checks are inconsistently applied across APIs.

Such vulnerabilities effectively bypass Keycloak entirely, allowing attackers to interact directly with application endpoints without compromising the IAM platform itself.

Insecure token storage and handling

Another high-impact issue involves improper handling of access tokens within client applications. Common anti-patterns include:

- storing tokens in cookies without Secure or HttpOnly flags,
- persisting tokens in local or session storage,
- sharing tokens across application components in a durable form.

In browser-based applications, these practices dramatically increase exposure to XSS attacks and session hijacking. From an architectural perspective, this is an application design flaw rather than a Keycloak configuration issue.

Token transmission via URLs

Despite being widely discouraged, access tokens are still occasionally transmitted through URL query parameters or redirects, especially in legacy systems. This practice poses a severe security risk, as tokens may be exposed through:

- browser history,
- server and proxy logs,
- monitoring and analytics tools,
- third-party integrations.

In multi-application IAM environments, such leakage can have cascading effects across multiple systems.

Incomplete PKCE or nonce support

Some client applications technically use the Authorization Code Flow, but fail to:

- properly implement PKCE,
- validate nonce values,
- or explicitly enforce secure defaults in client libraries.

In complex deployments with numerous redirect paths and client types, this significantly increases the risk of authorization code injection attacks, even when Keycloak itself is correctly configured.

Missing security headers and improper cookie configuration

Finally, many audited applications lack basic web security hardening measures such as:

- Content-Security-Policy (CSP),
- HTTP Strict Transport Security (HSTS),
- properly configured SameSite cookie attributes.

These controls are not managed by Keycloak, yet they play a crucial role in protecting authentication flows and user sessions at the application level.

Summary

Auditing a Keycloak deployment in an enterprise environment requires looking far beyond realm and client configuration screens. While proper Keycloak hardening is essential, the highest security risks typically arise from insecure client-side implementations and architectural decisions.

“Keycloak can be hardened perfectly, yet the overall security posture will always be defined by the weakest client application integrated with it.”

Based on practical audit experience in large, multi-realm Keycloak environments:

- the most critical vulnerabilities emerge at the intersection of Keycloak and client applications,
- correct IAM configuration does not mitigate insecure application behavior,
- many high-impact issues can be resolved without changes to Keycloak itself, by improving application architecture and integration patterns.

A well-executed Keycloak security audit helps organizations reduce attack surface, standardize IAM integrations, and safely scale their identity infrastructure across teams, environments, and business units.

In large organizations, Keycloak effectively becomes the backbone of digital identity — and its real security strength is determined by the weakest link in the surrounding application ecosystem.

Artykuł Keycloak Deployment Auditing – General Scope and Guidelines pochodzi z serwisu Inero Software - Software Consulting.

Implementing an AI-Powered Telephony Service Center with ElevenLabs & LiveAPI

Andrzej Chybicki — Mon, 17 Nov 2025 11:18:27 +0000

Implementing an AI-Powered Telephony Service Center with ElevenLabs & LiveAPI

Over the past year, advancements in real-time AI models and high‑fidelity speech synthesis have accelerated the development of AI-driven telephony systems. At Inero, we’ve had the opportunity to integrate modern telephony solutions with LiveAPI technology and ElevenLabs’ voice engine to create a human‑like, responsive, GDPR‑compliant communication experience for a major corporate client.

This article combines two perspectives: a high-level overview of LiveAPI and ElevenLabs technology, and a behind‑the‑scenes look at our practical engineering experience while delivering a real-world AI telephony solution.

1. What Makes LiveAPI and ElevenLabs a Powerful Combination?

LiveAPI solutions such as OpenAI Realtime API and Google Gemini Live API shift the paradigm from static prompts to streaming, interactive communication. These systems support real‑time audio input, low‑latency responses, natural interrupt handling, and multimodal context.

ElevenLabs complements this with industry‑leading voice synthesis. Its realistic, expressive voices and advanced prosody control enable AI agents that sound convincingly human. For telephony environments, this matters — clients expect clarity, confidence, and a pleasant conversational tone.

A clean, purple-themed diagram visualizing the data flow in an AI-powered telephony system. The graphic illustrates how a user speaks into a microphone, how the audio is processed by a LiveAPI voice LLM, and how the response is synthesized by ElevenLabs TTS before returning to the user as speech. The design represents a real-time, low-latency interaction loop used in modern conversational AI and telephony integrations.

" data-image-caption="

How user audio flows through LiveAPI and ElevenLabs TTS to create real-time voice responses.

" data-medium-file="https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-300x200.png" data-large-file="https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-1030x687.png" tabindex="0" role="button" width="1030" height="687" src="https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-1030x687.png" class="attachment-large size-large wp-image-8251" alt="Diagram showing the interaction flow between a user, a LiveAPI voice model, and ElevenLabs TTS in a real-time AI telephony system." srcset="https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-1030x687.png 1030w, https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-300x200.png 300w, https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-768x512.png 768w, https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel-450x300.png 450w, https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel.png 1536w" sizes="(max-width: 1030px) 100vw, 1030px" data-attachment-id="8251" data-permalink="https://inero-software.com/enterprise-ai-telephony/liveapi_elevenlabs_interactionmodel/" data-orig-file="https://inero-software.com/wp-content/uploads/2025/11/LiveAPI_ElevenLabs_InteractionModel.png" data-orig-size="1536,1024" data-comments-opened="0" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0"}" data-image-title="LiveAPI + ElevenLabs Interaction Model" data-image-description="

" data-image-caption="

How user audio flows through LiveAPI and ElevenLabs TTS to create real-time voice responses.

2. Why GDPR Compliance Shapes the Choice of API in Europe

For European organisations, GDPR compliance is not optional — it defines which AI vendors can be used in production. Although both OpenAI and Google offer real-time APIs, enterprises operating in the EU often restrict use to providers ensuring transparent, EU‑aligned data governance. In practice, this means that Gemini Live API was the viable choice for our implementation, while OpenAI was excluded despite strong technical capabilities.

3. Our Practical Experience Integrating Telephony with LiveAPI and ElevenLabs

Below we outline the key lessons, challenges, and engineering decisions from our implementation.

3.1 Project Context

Our client — a large corporate organisation — required a system capable of handling outbound and inbound calls automatically, while maintaining a tone and responsiveness extremely close to human interaction. The goal was not a simple IVR or menu system, but a natural, fully conversational experience driven by real‑time AI.

3.2 Technology Stack and Constraints

We evaluated both OpenAI and Gemini Live APIs to compare latency, contextual reasoning and streaming quality. However, due to GDPR compliance requirements, the production system was designed around Gemini Live API. ElevenLabs provided the speech synthesis layer, offering high realism and consistent quality across telephony channels.

Diagram showing the processing pipeline from telephony audio through LiveAPI to ElevenLabs TTS in an AI-powered voice system.

" data-image-caption="

Diagram showing the processing pipeline from telephony audio through LiveAPI to ElevenLabs TTS in an AI-powered voice system.

" data-medium-file="https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1-300x200.png" data-large-file="https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1-1030x687.png" tabindex="0" role="button" width="768" height="512" src="https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1-768x512.png" class="attachment-medium_large size-medium_large wp-image-8260" alt="Telephony with AI Processing Pipeline" srcset="https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1-768x512.png 768w, https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1-300x200.png 300w, https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1-1030x687.png 1030w, https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1.png 1536w" sizes="(max-width: 768px) 100vw, 768px" data-attachment-id="8260" data-permalink="https://inero-software.com/enterprise-ai-telephony/telephony_elevenlabs_pipeline-2/" data-orig-file="https://inero-software.com/wp-content/uploads/2025/11/Telephony_ElevenLabs_Pipeline-1.png" data-orig-size="1536,1024" data-comments-opened="0" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0"}" data-image-title="AI Telephony Pipleine" data-image-description="

Diagram showing the processing pipeline from telephony audio through LiveAPI to ElevenLabs TTS in an AI-powered voice system.

" data-image-caption="

Diagram showing the processing pipeline from telephony audio through LiveAPI to ElevenLabs TTS in an AI-powered voice system.

3.3 Key Engineering Challenges

Beyond typical engineering concerns like audio quality, session stability, and call routing, the most demanding challenge was not purely technical — it was understanding how real users communicate over the phone. Subtle behaviors such as interruptions, hesitation, changing tone, or switching context required careful analysis and extensive testing.

We also dealt with several micro‑issues, such as premature call termination, incorrect end‑of‑utterance detection, and managing the timing between user speech and AI responses.

3.4 What We Built Ourselves

AI models are inherently non‑deterministic and cannot be fully controlled like classic software components. To ensure predictable and business‑aligned outcomes, we developed backend modules responsible for:
• Conversation flow supervision
• Session state tracking
• Monitoring and logging voice interactions
• Handling edge cases and ambiguous user inputs

ElevenLabs’ tooling, especially the Hard Disk service, significantly supported our workflow, but the orchestration layer was built entirely by Inero.

3.5 What We Learned

The most important insight: designing a telephony AI system requires deep understanding of the user’s context, combined with the business objectives of the project. Quick prototyping and iterative PoC testing were essential — allowing us to validate conversational patterns early, reveal unexpected user behavior, and refine the interaction design.

Ultimately, success depended on aligning the AI’s conversational style with how real customers naturally speak, pause, and respond during a phone call.

4. GDPR Considerations in AI Telephony

All audio handling, session storage, and logging were designed according to GDPR principles: strict data minimisation, no training on user audio, encrypted transmission, and optional anonymisation of transcriptions. Where possible, processing was routed through EU‑aligned infrastructure.

Conclusion

Implementing an AI‑driven telephony service center requires more than connecting APIs — it requires understanding users, managing nuanced conversational flows, and ensuring full compliance with EU regulations. Our experience shows that LiveAPI technologies combined with ElevenLabs can deliver highly human‑like, responsive, and scalable communication channels for enterprise clients.

Artykuł Implementing an AI-Powered Telephony Service Center with ElevenLabs & LiveAPI pochodzi z serwisu Inero Software - Software Consulting.

Secure Email Delivery in Keycloak 26.2 Using XOAUTH2

Andrzej Chybicki — Mon, 15 Sep 2025 10:48:22 +0000

Secure Email Delivery in Keycloak 26.2 Using XOAUTH2

Email has been one of the oldest and most fundamental services on the internet, used for notifications, password resets, verifications, and more. Over time we’ve seen major improvements — encryption via TLS, then STARTTLS, and now many providers are moving away from basic password authentication in favor of modern token-based schemes like XOAUTH2. With Keycloak 26.2, this evolution has arrived: Keycloak now supports XOAUTH2 for outgoing SMTP mail, adding greater security and compatibility with providers who have deprecated legacy authentication

1. What is XOAUTH2, and Why It Matters

XOAUTH2 is a means of authenticating to an SMTP (or other email-sending) server using an OAuth2 access token rather than a username + password. Some of the key benefits include:

- Improved Security: Tokens can be more tightly controlled, with limited scope and lifetime.
- Compliance with Modern Providers: Many providers are disabling basic auth.
- Centralised and Auditable Auth: Easier management and rotation. Each client’s access can be revoked independently of other clients’ operations.
- Reduced Risk of Credential Leakage: No raw passwords stored or transmitted.

2. How XOAUTH2 is Implemented in Keycloak 26.2

With version 26.2, Keycloak adds native support for XOAUTH2 when sending emails via SMTP. This means administrators can move away from static username and password credentials and instead configure Keycloak to obtain an OAuth2 access token at runtime.

In the Admin Console under Realm → Realm Settings → Email, you can now switch the Authentication Type from Password to Token (XOAUTH2). Once enabled, additional fields appear where you provide:

– Client ID and Client Secret from your identity provider (e.g., Azure AD).
– The OAuth2 Token Endpoint used to request an access token.
– Optional Scopes, depending on your provider (for Microsoft 365: https://outlook.office365.com/.default).
– A From address / SMTP username, which may still be required by the mail server.

Keycloak then handles the process of requesting and refreshing tokens using the Client Credentials Grant flow. You can use the “Test connection” button to verify that the configuration is correct and that emails can be sent successfully.

This approach aligns Keycloak with modern security standards and prepares deployments for providers that are phasing out legacy authentication.

Note: The Enable Debug SMTP option (visible at the bottom of the form) activates extended logging for outgoing email. When enabled, Keycloak produces detailed debug output of the SMTP communication, which can be very useful for diagnosing integration issues such as authentication failures, token retrieval problems, or TLS misconfigurations. It is recommended to use this setting only in testing or troubleshooting scenarios, as it may expose sensitive information in the logs.

Retirement of Basic Authentication for SMTP AUTH (Client Submission) in Exchange Online

Days

Hours

3. Why This Matters for Microsoft Azure / Office 365 Users

Microsoft has announced the retirement of Basic Authentication for SMTP AUTH (Client Submission) in Exchange Online. Starting March 1, 2026, Microsoft will begin phasing out Basic Auth, and by April 30, 2026, it will be completely disabled. This change directly impacts Keycloak deployments where outgoing emails are sent via Office 365 / Exchange Online SMTP.

If your Keycloak instance is still configured with a username and password for SMTP, it will stop working once Basic Auth is retired. The solution is to migrate to XOAUTH2 configuration in Keycloak 26.2.

By adopting XOAUTH2, you ensure:

- Continued compatibility with Microsoft email services
- Stronger security and compliance
- Reduced risk compared to static credentials

4. Beyond XOAUTH2?

There’s even more going on in modern email delivery. Many email delivery platforms steer away from traditional SMTP protocol towards API-based approach (e.g. MailJet, SendGrid or MailGun). This gives more flexibility to integrators and allows platform providers to offer additional features. API-based email sending is not jet supported by Keycloak out-of-the-box, but this support can be added via custom extensions. Contact us if you are interested in integrating Keycloak with API-based email delivery platforms.

Conclusion

The addition of XOAUTH2 support in Keycloak 26.2 is more than just a feature upgrade — it’s an essential step for organizations that rely on Office 365, Gmail, or other providers who are deprecating legacy authentication. By adopting XOAUTH2 today, you can future-proof your Keycloak deployment, comply with provider requirements, and improve overall email security.

Artykuł Secure Email Delivery in Keycloak 26.2 Using XOAUTH2 pochodzi z serwisu Inero Software - Software Consulting.

Keycloak or SaaS IdP? A Tech Leader’s Guide to Making the Right IAM Choice

Andrzej Chybicki — Thu, 24 Jul 2025 07:45:53 +0000

Introduction

Shipping single sign‑on quickly is tempting. Stakeholders push for a smooth login experience, developers want to move on to core features, and security teams are eager to tick the “MFA enabled” box. The trouble is that identity and access management (IAM) decisions outlive launch days. Once you choose a platform, you inherit its operational model, cost structure and compliance implications for years.

In this blogpost we provide technical leads some few crucial issues when evaluating Keycloak—an open‑source IAM platform that has become a go‑to choice in many Java and cloud‑native environments. Rather than a hands‑on tutorial, you’ll get a decision framework that starts with business realities. We’ll walk through seven questions that determine whether Keycloak fits your context. For each, you’ll see why it matters, how to assess it, the red flags to watch for, and a concrete deliverable to capture the outcome.

By the end, you’ll fill in a short scorecard and see if your organization toward Keycloak, a commercial SaaS IdP (Auth0, Okta, Azure AD B2C, etc.), or a hybrid path. If you want a sanity check, we offer a free 45‑minute Keycloak readiness consultation—no slides, just practical advice.

Where Keycloak Lives in Your Stack

Keycloak usually sits between your user‑facing applications and the identity sources they rely on. Applications delegate authentication and authorization to Keycloak. Keycloak can manage users internally or federate with LDAP/Active Directory. It also exposes logs and metrics that feed your SIEM and monitoring stack. Even if this picture seems obvious to engineers, spelling it out helps align legal, compliance and product stakeholders on who owns what.

Keycloak in a Nutshell (and Two Misconceptions)

Keycloak is an open‑source IAM server supporting OIDC, SAML, MFA, theming and an extension model (SPIs). Originally developed by Red Hat, it now thrives under a large community.

Misconception #1: “Open source = free to run”. The software is free, but production IAM also needs infrastructure, monitoring, upgrades and people. Misconception #2: “It’s just for developers”. In reality, without governance and processes, any IAM platform becomes a liability.

Seven Questions to Frame the Decision

Treat these questions as a workshop agenda, not a checklist. Bring security, operations, product and finance to the same table. The goal is to leave each session with an artifact that informs budgeting, architecture and planning.

1. Compliance & Risk: Do You Need Full Control Over IAM?

Regulatory frameworks such as NIS2 or GDPR—and sector standards like PCI DSS or HIPAA—often demand demonstrable control over identities, audit trails and incident response. If auditors expect you to produce detailed logs or prove exactly who changed what, a black‑box SaaS can create friction

List the controls and evidence you must provide. Do you need to host IAM in a specific region?

How quickly must you produce logs? Are you required to approve every policy change?

If many answers point to tight control, Keycloak’s self‑hosted nature becomes an advantage.

The biggest red flag is deferring compliance: “we’ll pass audits later”. Another is that nobody owns IAM data retention or log policies.

Deliverable: a compliance checklist mapped to IAM features and governance processes.

2. Integration Map: How Many Apps and Protocols Today—and in
Two Years?

Integration effort—not software choice—usually drives project cost and timeline. Keycloak handles OIDC/SAML/LDAP well, but legacy systems and partner constraints can complicate
the picture. Inventory every application that authenticates users. Classify by protocol, business criticality and migration difficulty. Project changes over the next 24 months: new products,

acquisitions, vendor switches.

Red flags: no authoritative app inventory; underestimating testing for each integration.

Deliverable: a prioritized integration backlog with rough sizing.

3. Team & Operations Capacity: Can You Secure and Run It
24/7 (or Outsource)?

IAM outages stop business. Someone must patch, monitor, respond to incidents and plan upgrades. Decide whether your DevOps/SecOps team can own this or you’ll offload parts to a partner.

Assess on‑call capacity, automation maturity, security expertise. Define SLAs, RTO/RPO. Consider managed services for routine ops while retaining architectural control

Red flags: a single overworked DevOps, lack of monitoring/alerting, no upgrade plan.

Deliverable: a RACI matrix for IAM operations and an initial ops budget.

4. Customization Needs: Themes, Extensions and Advanced
Authorization

Keycloak’s extensibility is a major advantage: custom authenticators, advanced policies (ABAC), branded login flows, phishing‑resistant UX. If differentiation or strict UX/security is a
requirement, flexibility matters.

Gauge UX/theming demands, multilingual support, accessibility, device trust, passkeys, fine‑grained authorization.Each adds value to an extensible platform.

Red flags: assuming the default theme is enough; ignoring SPI development complexity.

Deliverable: a customization backlog with effort estimates and ownership.

5. Scalability & High Availability: What Are Your Peak
Loads and DR Needs?

If login fails, revenue stops. HA/DR design impacts infrastructure cost and complexity. You need clarity on peaks, acceptable downtime and recovery objectives.

Estimate peak concurrent logins (launch days, campaigns). Define RTO/RPO. Choose between VM clusters or Kubernetes with an operator. Decide on multi‑region strategies.

Red flags: “we’ll scale later”, skipping DR tests entirely.

Deliverable: an HA/DR architecture option matrix with pros, cons and cost

6. Budget & TCO: What Does Three Years Really Cost vs
SaaS?

Keycloak costs = infra + people + consulting. SaaS costs = subscriptions + add‑ons + overage fees.

Only a 3‑year TCO model reveals the truth.

Build a spreadsheet covering infra, backups, monitoring, labor, upgrades. Do the same for SaaS: MAU fees, advanced features, support tiers. Stress‑test both with growth scenarios.

Red flags: ignoring people costs, assuming maintenance is free, overlooking SaaS overage triggers.

Deliverable: a TCO calculator you can keep updating as data changes.

7. Vendor Lock‑In & Roadmap Control: How Much Flexibility
Do You Need?

Open source gives you architectural freedom. SaaS gives you speed but ties you to someone else’s roadmap and pricing. Sometimes that’s fine; sometimes it’s a risk.

Map likely IAM needs for 2–3 years. How critical is it to add custom flows quickly or hold back an upgrade? Could pricing shifts hurt you?

Red flag: ‘we’ll never need to extend’.
Organizations evolve and regulations shift.

Deliverable: a risk matrix—flexibility/control vs speed/convenience—plotting Keycloak, SaaS and Hybrid for your case.

A Visual Decision Flow

If your team prefers a diagram to spark discussion, start with the simplified flow below. It nudges you toward Keycloak, SaaS or Hybrid based on the dominant answers. Use it as an icebreaker, not as a final verdict.

Quantify It: The Scorecard

To make debates objective, translate the seven questions into numbers. Give each one a score from 1 to 5 (5 means a strong push toward Keycloak). Totals near the high end suggest Keycloak or Hybrid; lower totals suggest SaaS. More important than the number is the conversation it forces: why did we give compliance a 5 but ops capacity a 2?

Question	Score (1–5)	Notes	Leaning
Compliance & Risk: Do You Need Full Control Over IAM?
Integration Map: How Many Apps and Protocols Today—and in Two Years?
Team & Operations Capacity: Can You Secure and Run It 24/7 (or Outsource)?
Customization Needs: Themes, Extensions and Advanced Authorization
Scalability & High Availability: What Are Your Peak Loads and DR Needs?
Budget & TCO: What Does Three Years Really Cost vs SaaS?
Vendor Lock‑In & Roadmap Control: How Much Flexibility Do You Need?
Total / Recommendation			Keycloak / SaaS / Hybrid

From Decision to Deployment: A Pragmatic Pipeline

Assuming Keycloak is the direction, you still need a process to avoid chaos. We recommend a pipeline that mirrors proven delivery patterns: Discovery → Assessment → Architecture → PoC → Pilot → Production → Operate. Each phase ends with a clear artifact and go/no‑go gate.

Discovery clarifies drivers, constraints and stakeholders. Without this, technical work drifts. Assessment inventories integrations and compliance needs, and identifies risks and skill gaps.Architecture produces the reference design, HA/DR plan and governance model. PoC attacks the riskiest assumptions first—often a tricky integration or compliance requirement. Pilot rolls out to a subset of apps/users to validate processes, comms and support.

Production rollout happens in phases with rollback strategies (blue/green, canary).

Operate means continuous monitoring, patching, upgrades and cost optimization—often where a partner can help your team breathe.

Next Steps

If your scorecard favors Keycloak, schedule a Discovery & Governance workshop to align stakeholders, draft a high‑level architecture and turn assumptions into a roadmap. If you’re unsure, run a PoC targeting the top two risks. And if SaaS seems better today, design an exit strategy anyway—lock‑in is fine when it’s deliberate, not accidental.

Ready to Validate Your Choice?

Book a free 45‑minute Keycloak Readiness Consultation. We’ll go through the seven questions together, fill out the scorecard and outline concrete next steps—whether that’s an internal PoC, a hybrid approach or a full advisory engagement.

FAQ

Is Keycloak free to use in production?

Yes. The software is open source, but production‑grade IAM still requires infrastructure, operations and security work. Some organizations use managed Keycloak or a consulting partner to offload that burden.

How long does a typical Keycloak deployment take?

A focused PoC can be done in weeks. Larger rollouts with dozens of integrations and strict compliance tend to span several months from assessment to stable production.

Can Keycloak meet NIS2/GDPR requirements?

Technically yes—Keycloak offers detailed logging, fine‑grained policies and MFA, and can be hosted where you need it. Compliance still depends on governance and evidence, not just tool capabilities.

Artykuł Keycloak or SaaS IdP? A Tech Leader’s Guide to Making the Right IAM Choice pochodzi z serwisu Inero Software - Software Consulting.

Is Your Company Ready for New Technology? How to Evaluate Technological Readiness

Marta Kuprasz — Mon, 26 May 2025 09:01:34 +0000

In the publication “Intelligent Agents in AI Really Can Work Alone. Here’s How.” by Gartner, the authors predict that by 2028, 33% of business applications will use agentic artificial intelligence, compared to less than 1% in 2024. This development will enable autonomous decision-making for up to 15% of daily operational tasks. Is your company prepared to take advantage of these changes?

Technological readiness is a broad concept that encompasses IT infrastructure, team capabilities, the maturity of business processes, organizational readiness for change, and compliance with legal and security policies. Companies planning to implement ERP systems, identity and access management (IAM) tools, or AI-based solutions need a strong technical and operational foundation to succeed.

How to Assess an Enterprise’s Technological Readiness?

To accurately determine an enterprise’s technological readiness, it’s worth conducting an assessment across several key areas. This process is similar to an audit, helping to answer a crucial question: is the company prepared to successfully implement and make use of a new technology?

IT Infrastructure

The assessment of IT infrastructure should begin with a clear definition of the requirements of the new technology. These requirements determine which resources will be necessary — in terms of performance, architecture, security, and availability.

Only once you have a clear technological specification can you reliably assess whether your current IT environment is capable of meeting those requirements. If you’re considering introducing AI-based tools into your organization and want to understand the exact costs associated with deploying and maintaining a large language model, be sure to check out our latest analysis.

LLM Implementation and Maintenance Costs for Businesses: A Detailed Breakdown

Managing and Accessing Data

The assessment of data management should begin by identifying what data will be used by the new technology, in what format, how frequently, and from which sources. It’s the system’s requirements that define what data is needed and in what form it must be available.

Only based on this can you determine whether the data within the organization is ready for use. It’s essential to verify whether there are technical means for retrieving data from sources (e.g., APIs, data exchange files), whether the data has a consistent structure, and whether it meets minimum quality standards. Inconsistencies, duplicates, incomplete records, or unstructured data may require an additional processing stage before the data can be effectively used.

Organizational accessibility is equally important — data must be available not only technically, but also in accordance with internal policies and regulations. For sensitive data, it’s crucial to ensure that access complies with security policies and legal frameworks such as the GDPR.

If the planned technology involves integration of multiple sources, real-time analysis, or processing of large volumes of data, it may be necessary to prepare a dedicated integration layer or reorganize the company’s data management approach.

Team Preparation and Training

Assessing team readiness should start with an analysis of the competencies required to operate the new technology. Depending on the solution, this may involve both technical skills (e.g., system configuration, data analysis) and operational knowledge (e.g., understanding business processes, interpreting system outputs).

Only once roles and responsibilities in the new environment are clearly defined can you assess whether the team has the necessary qualifications or if additional training is needed. This might involve upskilling through training sessions, involving external experts, or securing temporary support from the technology provider.

Special attention should be given to those responsible for maintaining and developing the system — they need early access to information about the architecture, data model, failure scenarios, and access controls. Without this, the new technology risks becoming a “black box,” increasing the likelihood of operational errors and making future improvements more difficult.

Team preparation should not be a one-time effort. It’s important to plan for post-implementation activities such as mentoring, internal documentation, and continuous development of skills in areas supported by the new technology.

Measuring Business Readiness

Assessing technological readiness should be treated as a process. The most effective approach is to create a roadmap of preparatory actions and then measure progress based on clearly defined stages and evaluation criteria.

The roadmap should include key areas such as IT infrastructure, data availability, team readiness, system integration, change management, and regulatory compliance. For each of these areas, it’s important to define the target requirements as well as assess the current state. This approach not only helps estimate the overall level of readiness but also identifies specific obstacles and weak points that may hinder the implementation process.

Step-by-step evaluation — based on the schedule and tasks assigned to specific teams — allows for ongoing verification of whether the company is moving toward operational readiness. This approach helps minimize the risk of unexpected delays and costs, as potential issues can be detected early, before entering the actual implementation phase.

Technological readiness is not a single end result. It’s the sum of many elements — technical, organizational, and competency-related — all of which should be assessed in the context of the specific implementation and its requirements.

How We Work

When working with clients on the implementation of new applications and systems, we always begin by discussing the business context and the organization’s actual needs. We don’t recommend off-the-shelf solutions without prior analysis — instead, we help identify which technologies have the potential to truly improve processes, and which may only lead to unnecessary costs and complications.

Drawing on our experience from IT projects across various industries, we provide step-by-step guidance — from the planning stage, through readiness assessment, to proper implementation and stabilization. We make sure the technology fits the organization’s capabilities and genuinely supports its operational development, rather than becoming an additional burden.

A well-planned implementation doesn’t end with launching an application — it ends with achieving the intended business outcomes.

Artykuł Is Your Company Ready for New Technology? How to Evaluate Technological Readiness pochodzi z serwisu Inero Software - Software Consulting.

LLM Implementation and Maintenance Costs for Businesses: A Detailed Breakdown

Martyna Mul — Wed, 14 May 2025 06:44:35 +0000

When considering the introduction of artificial intelligence into your company, it’s important to understand the costs involved in implementing and maintaining your own LLM. Expenses go beyond just paying for model usage (e.g., token-based API fees) and include a range of factors — from infrastructure to security. Below, we discuss the types of costs associated with using dedicated LLMs and present example calculations for popular models (such as GPT-4, Claude, Mistral, LLaMA, etc.), including business use case scenarios.

More and more companies are considering the use of large language models (LLMs) in their own products and processes. These “dedicated” models can act as intelligent assistants—answering customer questions, analyzing documents, generating reports, and much more. You can read more about it here.

Types of Costs When Using LLMs

Before starting the implementation, it’s important to understand all the components that contribute to the total cost of using a dedicated model.

Infrastructure:

If you’re using models via a cloud API (OpenAI, Anthropic, Google), you only pay for the tokens used. The infrastructure cost is “hidden” on the provider’s side.

If you choose to self-host a model such as Mistral or LLaMA, you’ll need to maintain a GPU server—either locally or in the cloud. For example, renting an instance with an A100 GPU typically costs $1–2 per hour, which amounts to $750–1,500 per month if the server runs continuously. While such an investment can handle a high volume of queries, it may be underutilized at a smaller scale.

Licensing and Model Fees

Commercial models come with licensing or subscription fees. For example, when using the GPT-4 API from OpenAI or Claude from Anthropic, you pay per token used according to the provider’s pricing (we outline token costs in detail later on). On the other hand, open-source models like LLaMA or Mistral are available for free—there are no licensing or token fees. Meta, for instance, released LLaMA 2 under a license that allows businesses to use it freely. However, “free” doesn’t mean zero cost—you’ll still pay for the infrastructure and electricity needed to run the model (as mentioned earlier). It’s also important to check license restrictions: some open models may have specific usage conditions (e.g., restrictions on certain industries).

Model Adaptation and Customization

For an LLM to perform well in a specific company setting, it often requires customization—such as additional training (fine-tuning) on company-specific data or at least the preparation of tailored prompts (known as prompt engineering). This adaptation process can generate significant costs:

- Model Fine-Tuning: Training a model on your own dataset requires computing power (typically GPUs running for many hours) and expert knowledge. For larger models, this can cost anywhere from several thousand to tens of thousands of dollars—factoring in both infrastructure expenses and specialist time. Even fine-tuning a smaller model (e.g., GPT-3.5) via OpenAI’s API can incur significant costs, as it involves processing hundreds of thousands or even millions of tokens during training—billed according to the provider’s token pricing.

- Prompt Engineering: As an alternative or complement to training, you can craft tailored prompts and instructions for the model. While writing prompts itself doesn’t require paid resources, iteratively testing and refining multiple versions consumes tokens (which adds cost when using a cloud-based model) and takes up team time. This can be viewed as either an operational cost or a competence-related expense—specialist time is needed to optimize the model’s behavior for your specific use case.

Operational Costs

After deploying the model, ongoing operational costs come into play. These include monitoring the model’s performance, maintaining efficiency, logging results, applying updates, and fixing potential issues. If you’re using an API, the main operational cost will be the monthly bill for consumed tokens, along with any premium subscription fees (some providers offer subscription plans with usage limits or preferred pricing). If the model is hosted locally, operational costs typically include:

- Electricity consumption – GPU-based models can consume significant amounts of power, leading to substantial monthly energy costs.
- System administration – Time spent by administrators on server maintenance, backups, and updating software components (e.g., AI libraries).
- Infrastructure scaling – As demand grows, additional machines or cloud instances may be needed, resulting in further expenses.
- High availability – If the LLM assistant needs to operate 24/7 without downtime, you may need to invest in redundant resources (e.g., backup servers) or enter into an SLA agreement with your cloud provider.

Team Expertise

Implementing an LLM requires the right expertise within the IT/Data team. If your company lacks AI experience, it may be necessary to train existing employees or hire new specialists—such as an ML engineer or MLOps expert—which adds recruitment or training costs. Alternatively, some companies choose to work with external consultants or service providers to deploy the model. This also incurs costs, usually one-time project fees, which can be significant. It’s also important to account for the time your team spends integrating the model with existing systems (e.g., connecting it to a database or user-facing application). This is a labor cost that’s often overlooked in smaller projects but can have a major impact in practice.

The categories above show that the total cost of owning a dedicated LLM-based solution goes far beyond just the fee for accessing the model. It’s important to consider all these factors before making a decision. In the next section, we’ll look at specific numbers: how much a single prompt costs for various popular models, and what it would take to maintain a simple LLM assistant in two example business scenarios.

Cost of a Single Prompt in Popular LLM Models

Language models are typically billed based on the number of tokens processed. A token is a small piece of text—it may represent a single word or part of a word (for example, 1,000 tokens roughly equals 750 words of continuous text). API providers list prices per 1,000 or 1 million tokens.

Below is a comparison of the approximate cost to process 1,000 tokens using selected popular LLM models:

LLM Model Comparison

LLM Model	Access / License	Cost per 1000 tokens	Notes
GPT-3.5 Turbo (OpenAI)	Cloud API (chat model available, e.g., in ChatGPT)	$0.0015 (input) $0.0020 (output)	Very low cost – 16k tokens + paid upgrade to 128k Good response quality
GPT-4 (8k)	Cloud API (OpenAI)	$0.08 (input) $0.16 (output)	High quality; high cost
GPT-4 Turbo (128k)	Cloud API (OpenAI)	$0.01 (input) $0.03 (output)	Reliable large context (up to 128k tokens) Cheaper (only slightly more than GPT-3.5)
Claude Instant v1.2	Cloud API (Anthropic)	$0.0008 (input) $0.0024 (output)	Fast, lower-cost Claude model (equivalent to GPT-3.5)
Claude 2 (100k)	Cloud API (Anthropic)	$0.008 (input) $0.024 (output)	High-quality model by Anthropic; context up to 100k tokens
Mistral 7B	Open source (free model)	Token cost: $0	Requires self-hosting Alternative to GPT-3.5 – low hardware requirements (can run with <1M tokens)
LLaMA 2 13B	Open source (free model)	Token cost: $0	Self-hosting required Needs stronger hardware (e.g., 2× 24GB GPU) than 7B, but still accessible for many companies
LLaMA 2 70B	Open source (free model)	Token cost: $0	Requires self-hosting Requires expensive infrastructure (e.g., 8× 80GB GPUs) At this scale, costs may match or even exceed GPT-4

Legend: How Token Costs Are Calculated

- Input tokens – words contained in the user’s prompt.
- Output tokens – words generated by the model in response (completion).

For most commercial providers, the cost is charged separately for input and output tokens. For example:

GPT-4 Turbo:

- 1,000 input tokens: $0.03
- 1,000 output tokens: $0.06

If a dialogue contains a total of 1,000 tokens (e.g., 500 input + 500 output), the cost is approximately $0.045.

For simplicity, you can assume that a full interaction of 1,000 tokens costs about $0.09.

By comparison:

- GPT-3.5 Turbo – a similar 1,000-token dialogue costs only about $0.0035 (i.e., 0.35 cents).
- Open-source models (e.g., Mistral, LLaMA) – token costs are $0, since the models run locally. You only pay for infrastructure-related costs (power consumption, server uptime, etc.).

Open-source models (such as Mistral, LLaMA, etc.) are attractive because they come with no fees for the model itself—you can generate any number of tokens without paying the model provider a cent. However, to run these models, you need to maintain your own infrastructure. At a small scale, the cost of renting a machine for a single query may actually exceed the cost of an individual API call to a model like GPT. On the other hand, at a large scale—with many queries per day—open-source solutions can become significantly more cost-effective. In summary, cost-effectiveness depends on the use case, which we’ll explore in the next section.

Example Costs of Implementing an LLM Assistant (100 Queries per Day)

Let’s now consider a practical scenario: your company wants to implement a simple LLM-based virtual assistant that performs one of the following tasks:

- Document analysis – e.g., the assistant reads offers or contracts and extracts key information such as clauses, deadlines, and amounts.
- Customer inquiry handling – e.g., the assistant replies to customer emails with questions about pricing, product availability, technical support, etc.

Let’s assume that:

- The assistant will handle approximately 100 interactions per day.
- Each interaction consists of a prompt and a response, totaling around 2,000 tokens (e.g., 1,000 tokens in the prompt—roughly 750 words or several paragraphs—and 1,000 tokens in the response, or about 750 generated words). This token size covers fairly complex queries and detailed replies.
- On a monthly basis, the assistant will process around 6 million tokens (3,000 interactions × 2,000 tokens = 6,000,000 tokens).

We want to compare the monthly operating costs of such an assistant depending on the choice of model and deployment approach. We’ll present two variants:

- API Variant (Closed Model): We use a commercial model via an API (e.g., OpenAI GPT or Anthropic Claude). We don’t maintain our own servers—costs are limited to token usage, billed according to the provider’s pricing.
- Self-Hosted Variant (Open-Source Model): We use an open-source model (e.g., Mistral or LLaMA) deployed on our own servers. Costs include infrastructure needed to support approximately 100 queries per day—such as cloud GPU instance rental or hardware amortization, plus electricity.

Below is a table comparing estimated monthly costs for several example models under both deployment variants, assuming 6 million tokens per month:

Monthly LLM Cost Comparison

Model (variant)	Estimated Monthly Cost	Comment
GPT-3.5 Turbo (API)	approx. $18 (USD)	Very low cost for this quality level. Estimate: approx. $0.0027/1k tokens → $12 for generating 4M tokens + $6 for prompts → ~$18/month total.
GPT-4 (8k) (API)	approx. $270	Much higher cost for better quality. Example: 8M tokens → cost: 8M × $0.08/1k (input) + $0.16/1k (output) → $270–$540 monthly.
GPT-4 Turbo (128k) (API)	approx. $18	Slightly more expensive than GPT-3.5 due to cheaper input/output token pricing. May even deliver better quality than GPT-4 (8k).
Claude Instant (API)	approx. $20–25	Comparable to GPT-3.5 in cost. Estimate: approx. $0.0021/1k tokens (input+output) → ~$18–25 for 8M tokens (plus potential flat fees).
Claude 2 (API)	approx. $150–200	Cheaper than GPT-4, but still several times more expensive than GPT-3.5. Estimate: $0.032/1k tokens → ~$192 for 8M tokens.
Mistral 7B (open source, self-hosted, 1x GPU)	approx. $300	Cost mainly for maintaining server/GPU. Assumption: 1x 24GB GPU instance – model generates ~30–60 tokens/sec, power usage 100–150W. Actual cost depends on location and usage (electricity + server = ~$300–400/month).
LLaMA 2 70B (open source, self-hosted, multi-GPU)	approx. $1,000+	High cost due to powerful GPU requirements. Typically requires at least 8×80GB GPUs (~$10k–12k hardware + high power consumption). Costs vary based on setup model (on-prem / cloud / GPU provider).
Local model (e.g., LLaMA 13B, GPTQ, Mistral 7B – CPU)	approx. $300–500	Cost includes operation of local server. May be slower than GPT-3.5, but offers more privacy and control. For CPU instance (e.g., 12 cores, 64 GB RAM), monthly cost is mainly for electricity and maintenance.

From the above comparison, several key takeaways can be drawn:

Small-scale usage (100 queries/day) favors API solutions

With relatively low query volume, using a commercial API (OpenAI, Anthropic) is highly cost-effective—especially with lower-priced models like GPT-3.5 or Claude Instant, where monthly costs can be as low as a few dozen dollars. For higher-end models, monthly costs may rise to several hundred dollars. Still, at this scale, running your own GPU server at $300+ per month would be less economical than relying on cloud-based APIs.

Large-scale usage (thousands of queries) changes the equation

If your assistant becomes successful and the number of queries increases by 10x or even 100x, the monthly API bill could grow to thousands or even tens of thousands of dollars. In such cases, investing in an open-source, self-hosted model starts to make financial sense. With a high enough query volume, the per-request cost of running the model locally becomes lower than the API cost—since the purchased or rented hardware is being used more efficiently. In extreme cases of massive scale, some organizations may even consider training their own model from scratch—but this is typically reserved for the largest players with very substantial budgets.

Use Case Matters (Quality vs. Cost Efficiency)

Choosing the right model shouldn’t be based solely on cost—it also depends on the quality of output required for your use case. In a document analysis scenario, precision in extracting information is the top priority. A lower-cost or open-source model may be sufficient here, especially if fine-tuned to the task. A model with 7B–13B parameters can offer adequate performance at a much lower cost. Moreover, when processing sensitive documents (e.g., contracts), running the model locally ensures that the content never leaves your organization—an invaluable benefit from a legal and data privacy standpoint. On the other hand, in customer inquiry handling, where natural language quality, politeness, and contextual understanding are critical, GPT-4 can significantly outperform smaller models. In this case, a company may find it worthwhile to pay more for superior customer experience.

Hidden Costs Around the Project

It’s important to note that the above calculations cover only the technical costs—such as token usage or infrastructure. In practice, there are also “soft” costs to consider, including staff time for preparing the implementation, integrating the model with systems like a CRM or knowledge base, testing, and ongoing iterations and improvements. For example, if the assistant needs to retrieve data from a company’s internal document repository, those documents often need to be organized or cleaned before they can be effectively used by the model.

Cost Example: AI Assistant for Analyzing Emails and PDF Documents

Here we also present the cost breakdown of our assistant based on Google’s Gemini model, which we described [here]. Its task is to automatically analyze incoming emails to identify insurance policies and extract key data from attached PDF documents—such as policy number, insured party address, or payment confirmation.

Average Token Count per Email:

- Input: 3,500 tokens
- Output: 220 tokens

Analyzing 100 emails with attachments using the Gemini 2.0 Flash model costs approximately $1.50.

Summary

Can We Afford Our Own “ChatGPT” in the Company? As we’ve seen, the answer is: it depends—primarily on the scale of usage and quality requirements. The key lies in selecting a model and deployment method that aligns with your specific needs. An iterative approach is often the most practical: start with a lower-cost model or API, evaluate the results, and scale up to a more powerful model or self-hosted solution as the project matures. Regardless of the path you choose, careful planning and cost monitoring across all categories is essential. We hope this comparison helps you make informed decisions and prepare a realistic budget for implementing a dedicated LLM in your organization.

If you’re considering implementing an assistant in your company, it’s worth finding answers to the following questions:

- Do I need high-quality responses (e.g., GPT-4), or is an approximate answer sufficient (e.g., Claude Haiku, Gemini Flash)?
- Am I processing sensitive data (e.g., customer documents)?
- Do I have an IT team capable of hosting a model in-house?
- What is the expected number of queries per day/month?
- Is it more cost-effective to maintain my own infrastructure, or should I pay for API access?

For small to medium-scale applications, the cost of using a dedicated LLM can be quite reasonable. Thanks to cloud-based services, it’s possible to get started for just a few dozen dollars per month with models like GPT-3.5 or Claude Instant—an excellent option for experimentation and early prototypes. If you need top-tier performance, such as what GPT-4 offers, you’ll need to account for higher costs. However, even a few hundred dollars per month can be justified if the business value is significant—for example, by automating tasks that would otherwise require many hours of manual work.

On the other hand, for large companies planning intensive AI use, costs can grow exponentially—making it worth considering open-source options and greater investment in in-house infrastructure. Open models like LLaMA or Mistral offer freedom from per-token fees, but shift the cost burden to hardware and staffing. They become cost-effective when operating at scale or when full control over data is a top priority.

Looking to Bring AI Tools into Your Company?

We offer comprehensive technology support in the field of artificial intelligence and AI agents. Tell us about your idea!

Artykuł LLM Implementation and Maintenance Costs for Businesses: A Detailed Breakdown pochodzi z serwisu Inero Software - Software Consulting.

Chatbot, Agent or AI Assistant? Find Out Which Solution Is Best for Your Business

Marta Kuprasz — Thu, 08 May 2025 08:57:21 +0000

Artificial intelligence and Large Language Models are buzzwords heard in nearly every industry. Many companies are wondering how to use them safely and which solution will be the most effective. There are plenty of options—and they’re often hard to tell apart. In this article, we break them down in a clear and easy-to-understand way.

AI can take on many roles in a company—as a chatbot, assistant, agent, data analysis tool, content generator, or knowledge search engine. So how can you choose the solution that best fits your employees’ needs? It helps to understand what each option has to offer.

Chatbot – answers questions, provides explanations, and handles requests

This is the most common use of AI in areas such as customer service and sales. An AI chatbot based on a large language model, such as ChatGPT, can hold natural conversations, understand the context of inquiries, and deliver accurate answers—24/7, in multiple languages, and without human involvement.

These solutions are typically implemented on websites, in messaging platforms (like Messenger or WhatsApp), or within helpdesk systems, where they assist with answering questions, tracking orders, or providing product information. As a result, they significantly automate customer service, reduce operational costs, and improve customer satisfaction ratings.

For the purposes of this article, we define a chatbot as an AI interface primarily intended for external users—in other words, it operates “outside the company.” This definition distinguishes it from AI agents, which perform more complex tasks within internal processes by integrating with systems, databases, or APIs.

https://www.incone60.eu/seastat

AI Agent – a tool designed to carry out specific tasks

Unlike a chatbot, which interacts with external users, an AI agent operates within the organization and supports employees by automating specific business processes. It’s not a one-size-fits-all tool—it’s built with a clearly defined purpose in mind, such as document processing, data analysis, or integration with ERP systems.

Thanks to large language models like Gemini or Claude, an AI agent can understand context, make decisions, and trigger specific actions—without human input. It can run in the background, process data from multiple sources, manage files, or handle email inboxes. Each AI agent is tailored to the company’s individual needs and specific tasks. Only then can it offer real value instead of becoming just another generic tool.

Want to see how this works in practice?

Check out our case study: Meet your personal AI agent-a case study for a freight forwarding company – where we describe how we built an agent integrated with an email inbox.

AI Assistant – supports users in daily work by operating contextually and “in the background”

Unlike a chatbot that answers questions or an agent that automates a specific process, an AI assistant is a tool that works alongside employees in real time—it understands context, suggests next steps, and makes tasks easier within familiar applications.

It’s typically integrated into a specific work environment, such as a word processor, spreadsheet, CRM, or project management tool. The assistant doesn’t replace the user—it actively supports them in making decisions, writing, analyzing data, or planning.

AI assistants like GitHub Copilot, Notion AI, or Google’s Workspace assistant show how this technology can genuinely boost team productivity and reduce time spent on routine tasks. From a business perspective, a well-designed assistant can improve work quality, reduce errors, and make onboarding new employees easier.

Other Business Applications of Large Language Models

The possibilities go far beyond chatbots, assistants, or agents. These models can take on specialized roles, supporting tasks such as document processing, data analysis, or content creation. They’re increasingly used to automatically summarize reports, extract information from unstructured sources (like emails, PDFs, or scanned forms), or answer natural-language questions based on internal documentation.

LLMs can also assist marketing teams by generating suggestions for ad copy, product descriptions, or sales messages tailored to the company’s style. In analytics departments, they provide faster access to data—generating database queries, interpreting results, and presenting insights in a way that’s easy for non-technical users to understand. These applications often don’t require building a new tool from scratch, but rather integrating the AI model into existing company systems. This way, the technology supports specific tasks—right where it’s needed.

AI Models and Data Security

Business owners and managers still approach AI tools with caution, mainly because they’re unsure how to ensure the security and confidentiality of processed data. We’ve explored these topics in previous publications that are worth reviewing.

In the article “AI User Privacy: An Analysis of Platform Policies”, we outlined the data privacy and model training policies followed by major AI providers such as OpenAI, Google Gemini, Microsoft’s Azure OpenAI, and Anthropic’s Claude.

For those considering an on-premise solution, we recommend the blog post “Top Lightweight LLMs for Local Deployment” There, we reviewed several top open-source lightweight LLMs and explained how to run them on a local Windows machine—even with limited GPU resources.

Choosing the right AI tool for your company depends primarily on the goal it’s meant to achieve. A chatbot works best where quick and accessible customer service is key. An AI agent can automate repetitive internal processes and improve information flow between systems. An AI assistant provides day-to-day support for employees—offering suggestions, summaries, or preparing data for further use.

Large language models also allow integration with existing processes—without the need to build a dedicated tool from scratch. However, implementing AI-based technology requires a well-thought-out decision, taking into account both efficiency and data security. If you’re looking to adopt AI in your company and need an experienced partner to guide you through the process, get in touch with us.

Bring AI into Your Business

We provide professional consulting and end-to-end implementation of tools based on large language models.

Artykuł Chatbot, Agent or AI Assistant? Find Out Which Solution Is Best for Your Business pochodzi z serwisu Inero Software - Software Consulting.

AI User Privacy: An Analysis of Platform Policies

Martyna Mul — Wed, 30 Apr 2025 08:35:35 +0000

Ever wondered where your data goes when you interact with AI cloud platforms? Or is it used to train future models? In this article, we’ll break down the data privacy policies of top AI platforms. You will also learn what to do to ensure your data is not used for training Large Language Models (LLM).

Major AI cloud providers have become increasingly transparent about their data usage policies – especially when it comes to training models. While most platforms, particularly those offering enterprise-level services, do not use your inputs and outputs for training by default, the fine print matters. Understanding how these services handle your data – and how you can maintain control – is essential.

In this article, we’ll break down the data privacy and model training policies of top AI platforms, including OpenAI, Google Gemini, Microsoft’s Azure OpenAI and Anthropic’s Claude. You’ll learn:

- How AI platforms use your data and whether your data is used to train models by default
- How to prevent AI from using your data opt, if needed
- Where your data is stored (data residency), and
- What compliance measures (like GDPR) apply

Adopting AI isn’t just about prompt engineering or model performance. It’s also about knowing where your data goes—and how to ensure it stays under your control.

Here’s what you need to know:

OpenAI – Data Usage and Privacy

OpenAI treats your data differently based on how you interact with its services:

ChatGPT App (Web/Mobile)

When you chat with ChatGPT, your conversations may be used to train AI models – unless you manually opt out. To prevent your data from being used:

- Go to Settings → Data Controls → Improve the model for everyone and toggle it off.
- Even with the opt-out, OpenAI stores chats for 30 days for abuse monitoring before deletion.

OpenAI API and ChatGPT Enterprise

If you’re a developer or a business using OpenAI’s API or ChatGPT Enterprise, there’s no need to opt out. By default, OpenAI does not use API or Enterprise data to train its models, and your data stays private. You don’t need to do anything to opt out – it’s already protected. You can choose to share data to help improve the model, but only if you want to.

Data Residency

OpenAI’s servers are mostly based in the United States, and currently, if you’re using the API directly, you can’t choose where your data is stored. That means your data is processed within OpenAI’s own infrastructure – protected by strong security, but not necessarily hosted in your country.

However, there’s some progress for enterprise users. OpenAI recently introduced an option for eligible enterprise API customers that allows data to be stored in Europe, provided there’s a specific agreement in place.

If regional data residency is important for your business – say, for GDPR or internal compliance – you might want to consider using Azure OpenAI, which hosts OpenAI’s models on Microsoft’s cloud. With Azure, you can choose a region like Western Europe or Asia, and all data processing and storage will stay within that geography.

We’ll dive into Azure more in the next section – but in short: OpenAI handles your data securely, but for strict control over where it lives, a partner cloud service like Azure may be a better fit.

Google (Gemini) – Google’s Approach to Your Data

Google’s foray into generative AI includes Gemini, a next-generation model that powers products like Google Gemini (the chatbot) and various enterprise AI offerings on Google Cloud. Here’s how they handle your data:

Gemini App

By default, Google does save your Gemini chat history to your account (much like search history) and may use it to improve their service. However, Google provides a “Gemini Activity” setting to control this.

To manage this:

- Visit Gemini Activity settings.

- Pause Gemini Activity to stop saving chats and prevent them from being used in AI model training data sources.

- You can also delete existing conversation history.

Turning off Gemini Activity means your new chats won’t be used to improve their machine learning services, nor will they be seen by human reviewers, unless you explicitly submit them as feedback. This gives regular users a way to opt out, similar to ChatGPT’s opt-out toggle.

To stop saving your conversations, go to the Activity tab and toggle Gemini Apps Activity. You can also delete your past conversations.

API and Vertex AI

If you’re using Google Cloud’s Vertex AI platform:

- Your prompts and outputs are not used to train AI models without explicit permission.

- Data may be cached briefly (up to 24 hours) for performance but remains within your selected geographic region.

- Businesses can opt for a zero-retention policy for maximum privacy.

Data residency

Data residency is a strong point for Google: you can choose which geographic region your AI service runs in (e.g. EU or US data centers), and Google will process and store data in that region to meet any data localization requirements.

Microsoft Azure OpenAI – Enterprise Data Protection by Design

Training Policy

Microsoft’s Azure OpenAI Service lets companies use OpenAI’s models through the trusted Azure cloud platform. Privacy is a major selling point here. Microsoft is very explicit: any data you send into Azure OpenAI is not used to train the underlying models or improve Microsoft’s or OpenAI’s services .

Microsoft’s Azure OpenAI Service essentially hosts OpenAI’s models (GPT-4, GPT-3.5, etc.) within the Microsoft Azure cloud. Microsoft has specifically designed this service for enterprises that require strong privacy protections. Key aspects are:

- Any data you input into Azure OpenAI – prompts, completions (model outputs), embeddings, fine-tuning data – is not used to train the AI models.

- Your inputs and outputs “are NOT available to other customers, are NOT available to OpenAI, and are NOT used to improve OpenAI models”.

- Microsoft only retains data as needed to provide the service and monitor for misuse. In fact, prompts and outputs on Azure are stored only temporarily (up to 30 days) by default, and solely for abuse detection purposes. After 30 days, those prompts are deleted. If even this temporary storage is a concern (say, for ultra-sensitive data), Microsoft offers a process called “modified abuse monitoring” where you can request that even the 30-day storage be bypassed, meaning no prompts are retained at all. Typically, you’d need approval for this exception, but it’s an option for high-security scenarios.

Data Residency

Because it’s on Azure, you also benefit from easily choosing the region and complying with data residency requirements. When setting up Azure OpenAI, you deploy the service to an Azure region (for example, East US, West Europe, Southeast Asia, etc.). All processing and data storage for inference will occur within that region or its geographical boundary. So, if you deploy in Western Europe, your data isn’t leaving Europe – crucial for GDPR compliance. Azure itself meets numerous compliance standards (SOC 2, ISO 27001, etc.), and these extend to Azure OpenAI as an Azure service.

Anthropic (Claude) – A Privacy-First AI Assistant

Training Policy

Anthropic, the company behind the Claude AI assistant (Claude 2 and newer versions), has emphasized a privacy-conscious approach from the outset. Anthropic adopts an opt-in approach:

- By default, Anthropic does not use your conversations or data to train its models. This applies to both their commercial offerings (Claude for Work, Anthropic API) and consumer products (Claude Free, Claude Pro) – your prompts and Claude’s responses aren’t automatically used for model training.

- They only use data if you deliberately opt-in, such as by providing explicit feedback. For instance, if you click a thumbs-up/down in a Claude interface or send data to their feedback channels, you’re essentially saying “you can learn from this”.

For enterprise clients, Anthropic offers Claude Team/Enterprise, which not only guarantees no training on your data but also provides admin controls. One such feature is custom data retention settings. By default, Anthropic’s systems might retain your inputs/outputs indefinitely for your account (though not for training). However, Claude Enterprise admins can set a retention policy – for example, you might set it to delete all conversation data after 30 days, 60 days, etc., with 30 days being the current minimum. These controls aim to support compliance with regulations like GDPR.

Data Residency

Anthropic is a newer player, and currently, when you use their API directly, you don’t explicitly choose a data region – it’s likely hosted in the US by Anthropic (or possibly through cloud providers like AWS in the US region). However, Anthropic models are also available through partners, which can help with data residency. For example, Anthropic’s Claude is offered via Amazon Bedrock (AWS’s AI service) and via Google Cloud Vertex AI. If you use Claude through one of these platforms, you can take advantage of AWS’s or Google’s region controls.

Conclusion

Understanding the data collection practices of LLM providers is crucial for AI compliance, customer trust, and corporate governance. Whether you’re focused on compliance, customer trust, or internal data governance, these insights help you make informed decisions. Choose providers that align with your privacy values – and always review your settings.

Here’s a comparison of major platforms:

Provider	Default Data Training	Web App Setting	Data Residency Options	GDPR/CCPA Compliance	Privacy Policy
OpenAI	No (API)	Opt-out available	No; (unless used via Azure Microsoft)	Yes	Consumer privacy
Google	No (Cloud + Gemini)	No training by default	Broad region control	Yes	Enterprise privacy, Gemini privacy, Vertex AI
Azure	No	N/A	Full regional control	Yes	Azure, OpenAI privacy
Anthropic	No	No training by default	No (unless used via partners)	Yes	API users, Claude.ai users

For maximum privacy and control, local deployment (on-premises models) is always an alternative. This avoids cloud storage concerns entirely. You can read more about local deployment here.

Let's talk about AI agents

Ready to bring AI into your business? Let us help you get started.

Artykuł AI User Privacy: An Analysis of Platform Policies pochodzi z serwisu Inero Software - Software Consulting.

Top Lightweight LLMs for Local Deployment

Martyna Mul — Thu, 17 Apr 2025 09:50:46 +0000

Running large language models (LLMs) on your own hardware has become increasingly feasible thanks to lightweight LLMs—models with relatively small parameter counts that deliver strong performance without requiring server-grade GPUs. In this post, we’ll explore several top open-source lightweight LLMs and how to run them on a local Windows PC—whether CPU-only or with a limited GPU—for document processing tasks. We also include a benchmark comparing the models in terms of accuracy and inference speed, helping you choose the right model for your local environment and use case.

What Are Lightweight LLMs (and Why Run Them Locally)?

“Lightweight” LLMs are models typically in the range of ~1–8 billion parameters – far smaller than GPT-3 class models – often optimized to run on a single GPU or even CPU. They are usually released as open models with freely available weights. These models trade some raw power for efficiency, but recent research and clever engineering (better data, distilled training, efficient attention mechanisms, etc.) have dramatically improved their capabilities. Many can now match or beat much larger models on specific benchmarks.

Local deployment of such models is valuable for several reasons:

- Privacy & Security: All data stays on your machine, which is crucial for confidential documents like insurance contracts. You’re not sending sensitive text to a third-party API.

- Cost Savings: Once downloaded, local models run for free – no API usage fees or cloud compute bills. This can make a big difference if you process large volumes of documents regularly.

- Latency & Offline Access: Local inference eliminates network latency. Responses can be near-instant on a GPU, and you can operate entirely offline. This is useful for on-site workflows or when internet access is restricted.

- Customization: With local models you have full control – you can adjust parameters, prompts, or fine-tune models to better fit your domain (e.g. insurance data) without vendor limits.

In short, lightweight LLMs put AI capabilities directly in your hands, on hardware you own. Next, we’ll compare some of the leading open models that are well-suited for local document processing.

Comparing Top Lightweight LLMs

Lightweight open-source large language models (LLMs) are becoming a practical choice for organizations looking to run AI workloads locally. They offer a strong balance between performance, speed, and resource requirements—making them ideal for document summarization, extraction, and classification without relying on cloud infrastructure.

We’ll focus on the following open-source models (each with downloadable checkpoints) that have a good reputation for quality relative to their size:

- Llama 3.1 – 8B parameters (Meta AI)
- StableLM Zephyr – 3B parameters (Stability AI)

- Llama 3.2 – 1B/3B parameters (Meta AI)

- Mistral – 7B parameters (Mistral AI)

- Gemma 3 – 1B and 4B variants (Google DeepMind)

- DeepSeek R1 – 1.5B and 7B variants (DeepSeek AI)

- Phi-4 Mini – 3.8B parameters (Microsoft)

- TinyLlama – 1.1B parameters (community project)

These models range from very small (under 1 GB on disk) to mid-sized (~5 GB). All can be run in inference mode on a 16 GB GPU (often even in half-precision or 4-bit quantized form) and many are workable on CPU with enough RAM and patience. Table 1 summarizes their characteristics:

Model	Size on Disk (quantized)	Max Context	Licence
Llama 3.1 (8B)	4.9GB	128k tokens	Open-source
StableLM Zephyr (3B)	1.6GB	4k tokens	Only non-commercial use
Llama 3.2 (3B)	2.0GB	128k tokens	Open-source
Mistral (7B)	4.1GB	32k tokens	Open-source (Apache 2.0)
Gemma 3 (4B)	3.3GB	128k tokens	Open-source
Gemma 3 (1B)	0.8GB	32k tokens	Open-source
DeepSeek R1 (7B)	4.7GB	128k tokens	Open-source (MIT licence)
DeepSeek R1 (1.5B)	1.1GB	128k tokens	Open-source (MIT licence)
Phi-4 Mini (3.8B)	2.5GB	128k tokens	Open-source
TinyLlama (1.1B)	0.6GB	2k tokens	Open-source

Table 1: Lightweight LLMs for local use – model sizes and maximum context window.

Notes: “Max Context” is the maximum sequence length (tokens) the model can process in one go.

Next, let’s look at each model’s pros and cons, especially in the context of document tasks:

- Llama 3.1 (8B): Powerful general-purpose model; moderate size and strong multilingual capabilities. Heavy for CPU-only systems; requires chunking for long documents.

- StableLM Zephyr (3B): Ultra-lightweight, good for basic QA/extraction. Limited by small parameter count and commercial license restrictions.

- Llama 3.2 (3B): Excellent summarization and retrieval; long context support (128k tokens). Smaller size affects complex reasoning accuracy.

- Mistral (7B): Best overall performer for its size; highly efficient inference. Ideal for detailed summarization tasks.

- Gemma 3 (4B/1B): Offers multimodal capabilities and extensive multilingual support. The 4B model balances capability and speed; the 1B model best suited for simple tasks.

- DeepSeek R1 (7B/1.5B): Balanced efficiency and comprehension for general NLP tasks; limited complex reasoning compared to Mistral.

- Phi-4 Mini (3.8B): Exceptional reasoning, math, and logical capabilities; perfect for analytical document processing. English-focused.

- TinyLlama (1.1B): Extremely lightweight; suitable for basic text extraction/classification tasks. Limited contextual understanding.

The models reviewed above cover a wide range of sizes and capabilities. Larger variants like Llama 3.1 and Mistral perform well on complex summarization and multilingual tasks but are less suited for CPU-only setups. Mid-sized models such as Llama 3.2 and Gemma 3 (4B) handle long inputs efficiently with reasonable performance. Smaller models, including TinyLlama and StableLM Zephyr, are lightweight and fast, making them practical for basic extraction or classification tasks.

Models Benchmarking: Document Extraction and Summarization

Here we outline a simple model benchmarking plan covering two common document-processing tasks:

Information Extraction: We evaluated how well each model can extract specific fields from a policy or certificate. Specifically, we prompted each model to find the policy number, insured name, VAT ID, address and insurance period in the document text and return the structured output – clean JSON response with all the needed values.
Summarization: Each model generated a concise summary of an insurance policy, covering key points such as coverage, exclusions, and conditions.We rated the summaries on clarity, correctness, factual accuracy and readability and penalized heavily fabricating information.

We used 11 documents and ran all tests using Ollama (you can read about running model with Ollama here). The benchmarks were performed on a PC equipped with an NVIDIA GeForce RTX 2060 and 6 GB VRAM. To ensure consistent results, each model was run with temperature set to 0 for the extraction task (to produce deterministic outputs), and with a fixed temperature of 0.7 for summarization. For the extraction task, we also used structured outputs:

 

{ 
        "model": "deepseek-r1:7b", 
        "prompt": "You are an assistant that extracts insurance-related information from a given input text. You must extract and return only the following fields: - policy_number,- insurance_period,- insured (company or person name),- nip (tax identification number),- address (of the insured). Return the output as a **clean JSON object** — not as a string, not inside quotes, and without any commentary. If a field is missing, use 'Not found'. Document text: ", 

    "stream": false, 
    "format": { 
    "type": "object", 
    "properties": { 
      "policy_number": { 
        "type": "string" 
      }, 
      "insurance_period_start": { 
        "type": "string" 
      }, 
      "insurance_period_end": { 
        "type": "string" 
      }, 
      "insured": { 
        "type": "string" 
      }, 
      "insured_nip": { 
        "type": "string" 
      }, 
      "insured_address": { 
        "type": "string" 
      } 
    }, 
    "required": [ 
      "policy_number", 
      "insurance_period_start",  
      "insurance_period_end", 
      "insured", 
      "insured_nip", 
      "insured_address" 
    ] 
  } 
}

Examples of insurance certifacates.

The table below presents the benchmark results. Extraction accuracy refers to the number of documents (out of 11) where the model successfully extracted all key fields. Token/sec indicates the model’s inference speed — how quickly it generates responses.

Model	Summarization	Extraction Accuracy	Tokens/sec
Llama 3.1 (8B)	High-quality, no hallucinations	10/11	13.49
StableLM 3B	Average quality, typos/hallucinations	4/11	56.51
Llama 3.2 (3B)	Concise yet comprehensive summary, no hallucinations	8/11	49.49
Mistral 7B	Extensive summary, factually correct	8/11	29.01
Gemma 3 4B	Concise yet comprehensive summary, no hallucinations	10/11	13.37
Gemma 3 1B	Concise yet comprehensive summary, no hallucinations	4/11	73.46
DeepSeek 7B	Concise yet comprehensive summary, no hallucinations	6/11	16.39
DeepSeek 1.5B	Very poor, frequent hallucinations/errors	0/11	66.45
Phi-4 Mini 3.8B	Very concise summaries, factually correct	9/11	39.31
TinyLlama 1.1B	Poor quality, severe hallucinations	2/11	107.34

Table 2: Benchmarking results.

This scatterplot visualizes the trade-off between extraction accuracy and inference speed (measured in tokens per second)

The benchmarking results reveal significant variations among the tested models.

- Bottom-right models – Llama 3.1 (8B), Gemma 3 (4B), and Phi-4 Mini (3.8B) – excel in summarization quality and extraction accuracy, consistently providing concise and accurate outputs. Phi-4 Mini seems to offer a good trade-off between speed and accuracy.

- Mistral 7B, DeepSeek 7B, Llama 3.2 generate detailed and informative summaries, though their extraction performance is more moderate.

- On the other hand, smaller models (on the top-left side of the chart) like StableLM Zephyr (3B), Gemma 3 (1B) and TinyLlama (1.1B) show significantly weaker extraction accuracy and are prone to frequent hallucinations. However, they benefit from faster inference times. Their limited context windows (e.g., 4k tokens) may contribute to these shortcomings. Overall, they may be suitable for only very basic tasks.

Choosing the Right Model for Your Needs

When selecting a language model for document extraction or summarization, it’s all about balancing accuracy, speed, and hardware constraints. Below is a quick breakdown to help you pick the best fit—whether you need high precision, fast inference, or something lightweight for basic tasks.

- High Accuracy & Reasonable Speed: Choose Phi-4 Mini (3.8B), Gemma 3 (4B), or Llama 3.1 (8B) for robust extraction and summarization accuracy.

- Fast Inference & Moderate Accuracy: Opt for Llama 3.2 (3B) or StableLM Zephyr (3B) for simpler tasks on limited hardware.

- Balanced Performance (Accuracy-Speed Tradeoff): Mistral (7B) provides strong general-purpose capability suitable for detailed document summarization tasks.

- Low Resource Environments (Basic Tasks): Consider TinyLlama (1.1B) for quick extraction or classification on minimal hardware if accuracy isn’t critical.

Conclusion

Lightweight LLMs are increasingly viable solutions for local deployment, particularly in document-intensive industries such as insurance. Models such as Phi-4 Mini, Gemma 3 (4B), and Mistral 7B provide strong performance in summarization, extraction, and classification tasks. Carefully balancing model size, inference speed, and accuracy ensures optimal outcomes, empowering organizations with affordable, private, and responsive AI solutions directly on owned hardware.

This might interest you

Optimization of Back-Office Processes with AI Agent Implementation: A Practical Example

Read the full text

Artykuł Top Lightweight LLMs for Local Deployment pochodzi z serwisu Inero Software - Software Consulting.

How to Prepare Your Company for AI Agent Implementation

Marta Kuprasz — Tue, 08 Apr 2025 08:45:46 +0000

Implementing an AI agent in a company is not only a technological challenge but also a strategic one. As more businesses consider using artificial intelligence in their daily operations—from customer service to document analysis—successful implementation requires careful planning. This article explains what to focus on before deploying an AI agent, which areas of the business need to be well-prepared, and how to avoid common mistakes.

There are many areas where AI can be helpful. From automating routine tasks, supporting customer service and data analysis, to streamlining decision-making processes and creating intelligent assistants that support team workflows. The potential is enormous—but the key lies in properly preparing the organization for this change.

Stages of AI Assistant Implementation

The process of implementing an AI assistant in an organization can be divided into several stages, each requiring specific actions. From analyzing business needs, selecting the right language model, and preparing the infrastructure, to integrating with existing systems and testing—each step impacts the overall effectiveness of the solution.

The key stages are:

Needs analysis and readiness assessment
Data and content preparation
Solution design
Assistant development and configuration
Testing and pilot phase
Deployment and maintenance

Needs analysis and readiness assessment

To ensure the best results from implementing an AI agent, start by asking yourself: which tasks and areas have the most potential for optimization through the use of artificial intelligence?

When looking for an answer to this question, it’s worth carefully analyzing your company’s current structure, processes, and employee responsibilities. This will help identify so-called “bottlenecks” that may affect the quality of services provided. These might include, for example:

- long response times to quote requests
- teams overloaded with routine tasks
- lack of consistency in customer communication
- manual processing of documents and data
- difficulties in quickly accessing internal company knowledge

Based on this analysis, you’ll be able to identify areas for improvement as well as the people who will directly benefit from the support of AI assistants.

The second area that should be reviewed is the existing infrastructure. Implementing an AI assistant doesn’t require a large amount of hardware. If the company doesn’t want to invest in new machines, it can opt to use cloud services such as Azure, AWS, or Google Cloud.

Data is a crucial part of the preparation process. To fully leverage the potential of dedicated AI solutions, it’s important to understand that training the model behind the assistant requires datasets stored in digital form. These should be well-organized and kept in a central repository or database. The less structured the data, the higher the cost of implementing the assistant—and the greater the risk that the solution won’t meet expectations.

Data and content preparation

At this stage, it’s essential to gather all materials that contain important company knowledge—this may include PDF, Word, and Excel documents, website content, FAQ sections, emails, or data from databases.

Next, the collected information needs to be properly prepared—organized, cleaned of unnecessary content (e.g., unreadable PDFs), standardized where possible, and exported to CSV or JSON files (e.g., emails).

In some cases, such as when planning further model customization (fine-tuning), it will also be necessary to label the data or prepare a dedicated training set in the form of instructions and expected responses, for example:

{"prompt": "What documents are required to sign an OCS agreement?", "response": "The following documents are required to sign an OCS agreement: ..."}

Solution design

At this stage, decisions are made about the technical design of the solution. It’s important to define what type of assistant will best meet the company’s needs—whether it’s a simple chatbot answering questions, a more advanced assistant with access to company knowledge (so-called RAG – Retrieval-Augmented Generation), or an agent capable of independently performing specific tasks such as making bookings, generating reports, or sending emails.

The next step is selecting the appropriate technologies, including the large language model (LLM) that will power the assistant—such as GPT-4, Claude, Mistral, LLaMA, or Gemini—depending on specific needs and requirements related to privacy, cost, and integration capabilities.

Finally, it’s worth preparing a list of functions the assistant should perform and planning integration with other systems used in the company—such as the CRM, knowledge base, or email.

Assistant development and configuration

At this stage, both the technical backend and the user-facing part of the assistant (frontend) are developed. This could be, for example, a chat interface on the website, a button that launches the assistant in an application, or a widget integrated with tools like Slack. You can read more about how AI agent integration with the Slack communication platform can look here >>LINK

In parallel, the selected language model is deployed—via services such as Azure OpenAI, OpenAI API, Anthropic (Claude), Google Vertex AI (Gemini), or locally using open-source models like LLaMA, Mistral, or Mixtral.

If the assistant is meant to use internal company knowledge, a RAG (Retrieval-Augmented Generation) mechanism needs to be configured—enabling it to search and match relevant documents to user queries.

Finally, integrations with other systems—such as CRM, ticketing systems, or email—are implemented, allowing the assistant to meaningfully support the team’s day-to-day work.

Testing and pilot phase

After implementation, thorough testing of the solution is essential. The first step is functional testing—checking whether the assistant correctly understands user intent, responds in line with company documentation, and handles different types of queries appropriately.

The next phase is testing with end users (UAT – User Acceptance Testing), which helps assess how well the assistant performs in real-world scenarios and whether it meets employees’ expectations.

Based on feedback and observations, iterative improvements are made—such as adjusting responses, adding new documents to the knowledge base, or refining prompts and the agent’s logic. This phase is often repeated several times until a satisfactory level of quality is achieved.

Deployment and maintenance

After completing the testing phase, the assistant is deployed to the target infrastructure—this may be a public cloud (e.g., Azure, AWS, GCP), on-premise servers, or a hybrid solution, depending on security and availability requirements. More about this is covered later in the article.

It’s also necessary to set up monitoring, which allows you to track things like token usage, query frequency, error rates, and the quality of generated responses. This enables quick issue resolution and cost optimization.

In daily use, it’s important to keep the data up to date—adding new documents, removing outdated information, and updating the knowledge base the assistant relies on.

Over time, as business needs evolve, it may be worth considering retraining or fine-tuning the model—e.g., every few months—to better align it with the organization’s specific context.

Finally, it’s important to provide technical support and user assistance to ensure the solution is not only technically reliable but also convenient and intuitive for everyday use.

Data privacy

In the “Deployment and maintenance” section, we discussed the available options for choosing the infrastructure on which the AI agent will be deployed.

Each solution has its pros and cons. Choosing an on-premise setup gives you full control over the data, but it requires a dedicated machine with specific parameters.

Another option is using a public cloud service, such as Azure. Microsoft clearly states that data submitted to the Azure OpenAI service is not used to train or improve OpenAI or Microsoft models (source).

According to Microsoft, prompts and responses are not shared with other customers or OpenAI. Azure operates in full isolation mode: when using GPT-4 on Azure, no information from your conversations is shared with OpenAI LLC. Microsoft has confirmed this in a Data Processing Addendum (DPA).

AI decision accountability

It’s important to remember that formal and legal responsibility for the outcomes of an AI agent’s actions and the data it processes lies with the entity that implemented and oversees the solution—most often.

the organization (e.g., the company that deployed the assistant),
the system administrator,
the individual making decisions based on AI suggestions (e.g., a customer service representative, recruiter, or doctor).

How to reduce risk?

Human-in-the-loop (HITL) – A human must approve important decisions, while AI only supports the process (e.g., the assistant drafts a response, but a person approves it).
Clear disclaimers and warnings – The AI should inform users: “I am an AI assistant – please verify my responses before making a decision.”
Source verification – The AI assistant should, where possible, cite sources for its answers or indicate when it doesn’t know rather than guessing. Using RAG enables precise control over the knowledge base.

Summary

The process of implementing an AI agent must be well-planned and carefully considered. It may seem challenging at first, but with proper preparation, it can deliver long-term benefits. If you need support, feel free to contact us.

AI Agent in Your Company?

Write to us and find out how an AI Agent can support your company.

Contact

Artykuł How to Prepare Your Company for AI Agent Implementation pochodzi z serwisu Inero Software - Software Consulting.

Company - Inero Software - Software Consulting

Keycloak Deployment Auditing – General Scope and Guidelines

Keycloak Deployment Auditing – General Scope and Guidelines

Practical lessons from auditing multi-realm, multi-client Keycloak environments in medium and large organizations

1. Introduction

Keycloak-side audit – known patterns, real-world consequences

2. Client-side audit – where the highest risks emerge

Missing token validation in client applications

Insecure token storage and handling

Token transmission via URLs

Incomplete PKCE or nonce support

Summary

Implementing an AI-Powered Telephony Service Center with ElevenLabs & LiveAPI

Implementing an AI-Powered Telephony Service Center with ElevenLabs & LiveAPI

1. What Makes LiveAPI and ElevenLabs a Powerful Combination?

2. Why GDPR Compliance Shapes the Choice of API in Europe

3. Our Practical Experience Integrating Telephony with LiveAPI and ElevenLabs

3.1 Project Context

3.2 Technology Stack and Constraints

3.3 Key Engineering Challenges

3.4 What We Built Ourselves

3.5 What We Learned

4. GDPR Considerations in AI Telephony

Conclusion

Secure Email Delivery in Keycloak 26.2 Using XOAUTH2

Secure Email Delivery in Keycloak 26.2 Using XOAUTH2

1. What is XOAUTH2, and Why It Matters

2. How XOAUTH2 is Implemented in Keycloak 26.2

Retirement of Basic Authentication for SMTP AUTH (Client Submission) in Exchange Online

3. Why This Matters for Microsoft Azure / Office 365 Users

4. Beyond XOAUTH2?

Conclusion

Keycloak or SaaS IdP? A Tech Leader’s Guide to Making the Right IAM Choice

Introduction

Where Keycloak Lives in Your Stack

Keycloak in a Nutshell (and Two Misconceptions)

Seven Questions to Frame the Decision

1. Compliance & Risk: Do You Need Full Control Over IAM?

2. Integration Map: How Many Apps and Protocols Today—and inTwo Years?

3. Team & Operations Capacity: Can You Secure and Run It24/7 (or Outsource)?

4. Customization Needs: Themes, Extensions and AdvancedAuthorization

5. Scalability & High Availability: What Are Your PeakLoads and DR Needs?

6. Budget & TCO: What Does Three Years Really Cost vsSaaS?

7. Vendor Lock‑In & Roadmap Control: How Much FlexibilityDo You Need?

A Visual Decision Flow

Quantify It: The Scorecard

Question

Score (1–5)

Notes

Leaning

From Decision to Deployment: A Pragmatic Pipeline

Next Steps

Ready to Validate Your Choice?

FAQ

Is Your Company Ready for New Technology? How to Evaluate Technological Readiness

How to Assess an Enterprise’s Technological Readiness?

IT Infrastructure

LLM Implementation and Maintenance Costs for Businesses: A Detailed Breakdown

Managing and Accessing Data

Team Preparation and Training

Measuring Business Readiness

How We Work

LLM Implementation and Maintenance Costs for Businesses: A Detailed Breakdown

Types of Costs When Using LLMs

Infrastructure:

Licensing and Model Fees

Model Adaptation and Customization

Operational Costs

Team Expertise

Cost of a Single Prompt in Popular LLM Models

Example Costs of Implementing an LLM Assistant (100 Queries per Day)

Cost Example: AI Assistant for Analyzing Emails and PDF Documents

Summary

Looking to Bring AI Tools into Your Company?

Chatbot, Agent or AI Assistant? Find Out Which Solution Is Best for Your Business

Chatbot – answers questions, provides explanations, and handles requests

AI Agent – a tool designed to carry out specific tasks

AI Assistant – supports users in daily work by operating contextually and “in the background”

Other Business Applications of Large Language Models

AI Models and Data Security

2. Integration Map: How Many Apps and Protocols Today—and in
Two Years?

3. Team & Operations Capacity: Can You Secure and Run It
24/7 (or Outsource)?

4. Customization Needs: Themes, Extensions and Advanced
Authorization

5. Scalability & High Availability: What Are Your Peak
Loads and DR Needs?

6. Budget & TCO: What Does Three Years Really Cost vs
SaaS?

7. Vendor Lock‑In & Roadmap Control: How Much Flexibility
Do You Need?