<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>Lightweight LLMs - Inero Software - Software Consulting</title>
	<atom:link href="https://inero-software.com/tag/lightweight-llms/feed/" rel="self" type="application/rss+xml" />
	<link>https://inero-software.com/tag/lightweight-llms/</link>
	<description>We unleash innovations using cutting-edge technologies, modern design and AI</description>
	<lastBuildDate>Thu, 17 Apr 2025 11:54:21 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.1</generator>

<image>
	<url>https://inero-software.com/wp-content/uploads/2018/11/inero-logo-favicon.png</url>
	<title>Lightweight LLMs - Inero Software - Software Consulting</title>
	<link>https://inero-software.com/tag/lightweight-llms/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">153509928</site>	<item>
		<title>Top Lightweight LLMs for Local Deployment</title>
		<link>https://inero-software.com/top-lightweight-llms-for-local-deployment/</link>
		
		<dc:creator><![CDATA[Martyna Mul]]></dc:creator>
		<pubDate>Thu, 17 Apr 2025 09:50:46 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Blog]]></category>
		<category><![CDATA[Company]]></category>
		<category><![CDATA[AI development]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Large Language Model]]></category>
		<category><![CDATA[Lightweight LLMs]]></category>
		<category><![CDATA[LLM]]></category>
		<guid isPermaLink="false">https://inero-software.com/?p=7843</guid>

					<description><![CDATA[<p>In this post, we’ll explore several top open-source lightweight LLMs and how to run them on a local Windows PC—whether CPU-only or with a limited GPU—for document processing tasks. </p>
<p>Artykuł <a href="https://inero-software.com/top-lightweight-llms-for-local-deployment/">Top Lightweight LLMs for Local Deployment</a> pochodzi z serwisu <a href="https://inero-software.com">Inero Software - Software Consulting</a>.</p>
]]></description>
										<content:encoded><![CDATA[		<div data-elementor-type="wp-post" data-elementor-id="7843" class="elementor elementor-7843" data-elementor-post-type="post">
				<div class="elementor-element elementor-element-cc31ada e-flex e-con-boxed e-con e-parent" data-id="cc31ada" data-element_type="container">
					<div class="e-con-inner">
				<div class="elementor-element elementor-element-2485c29 elementor-widget elementor-widget-html" data-id="2485c29" data-element_type="widget" data-widget_type="html.default">
				<div class="elementor-widget-container">
					</div>
				</div>
				<div class="elementor-element elementor-element-d3520b4 elementor-widget elementor-widget-text-editor" data-id="d3520b4" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<h5><strong><span class="TrackedChange SCXW35608661 BCX0"><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun TrackChangeDeleteHighlight SCXW35608661 BCX0">Running large language models (LLMs) on your own hardware has become increasingly </span></span></span><span class="TrackedChange SCXW35608661 BCX0"><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun TrackChangeDeleteHighlight SCXW35608661 BCX0">feasible</span></span></span><span class="TrackedChange SCXW35608661 BCX0"><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun TrackChangeDeleteHighlight SCXW35608661 BCX0"> thanks to </span></span></span><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW35608661 BCX0">lightweight LLMs</span><span class="NormalTextRun SCXW35608661 BCX0">—models w</span><span class="NormalTextRun SCXW35608661 BCX0">ith</span> <span class="NormalTextRun SCXW35608661 BCX0">relatively small</span><span class="NormalTextRun SCXW35608661 BCX0"> parameter counts that deliver </span><span class="NormalTextRun SCXW35608661 BCX0">strong performance</span><span class="NormalTextRun SCXW35608661 BCX0"> without requiring server-grade GPUs.</span><span class="NormalTextRun SCXW35608661 BCX0"> In this post, </span><span class="NormalTextRun SCXW35608661 BCX0">we’ll</span><span class="NormalTextRun SCXW35608661 BCX0"> explore several top open-source lightweight LLMs and how to run them on a local Windows PC—whether CPU-only or with a limited GPU—for document processing tasks.</span> </span><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW35608661 BCX0">We also include a </span></span><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW35608661 BCX0">benchmark comparing the models</span></span><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW35608661 BCX0"> in terms of </span></span><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW35608661 BCX0">accuracy and inference speed</span></span><span class="TextRun SCXW35608661 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW35608661 BCX0">, helping you choose the right model for your local environment and use case.</span></span><span class="EOP SCXW35608661 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:299,&quot;335559739&quot;:299}"> </span></strong></h5>						</div>
				</div>
				<div class="elementor-element elementor-element-10359f9 elementor-widget elementor-widget-heading" data-id="10359f9" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
			<h3 class="elementor-heading-title elementor-size-default">What Are Lightweight LLMs (and Why Run Them Locally)? </h3>		</div>
				</div>
				<div class="elementor-element elementor-element-621d40f elementor-widget elementor-widget-text-editor" data-id="621d40f" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW177302101 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW177302101 BCX0">“Lightweight” LLMs are models typically in the range of ~1–8 billion parameters – far smaller than GPT-3 class models – often optimized to run on a single GPU or even CPU. They are usually released as open models with freely available weights. These models trade some raw power for efficiency, but recent research and clever engineering (better data, distilled training, efficient attention mechanisms, etc.) have dramatically improved their capabilities. Many can now match or beat much larger models on specific benchmarks</span><span class="NormalTextRun SCXW177302101 BCX0">.</span></span><span class="EOP SCXW177302101 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-81497fe elementor-widget elementor-widget-text-editor" data-id="81497fe" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span data-contrast="auto">Local deployment of such models is valuable for several reasons:</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="1" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="1" data-aria-level="1"><strong>Privacy &amp; Security:</strong><span data-contrast="auto"> All data stays on your machine, which is crucial for confidential documents like insurance contracts. You’re not sending sensitive text to a third-party API.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="1" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="2" data-aria-level="1"><strong>Cost Savings:</strong><span data-contrast="auto"> Once downloaded, local models run </span><strong>for free</strong><span data-contrast="auto"> – no API usage fees or cloud compute bills. This can make a big difference if you process large volumes of documents regularly.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="1" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="3" data-aria-level="1"><strong>Latency &amp; Offline Access:</strong><span data-contrast="auto"> Local inference eliminates network latency. Responses can be near-instant on a GPU, and you can operate entirely offline. This is useful for on-site workflows or when internet access is restricted.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="1" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="4" data-aria-level="1"><strong>Customization:</strong><span data-contrast="auto"> With local models you have full control – you can adjust parameters, prompts, or fine-tune models to better fit your domain (e.g. insurance data) without vendor limits.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><p><span data-contrast="auto">In short, lightweight LLMs put AI capabilities directly in your hands, on hardware you own. Next, we’ll compare some of the leading open models that are well-suited for local document processing.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-6e958d1 elementor-widget elementor-widget-heading" data-id="6e958d1" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
			<h3 class="elementor-heading-title elementor-size-default">Comparing Top Lightweight LLMs </h3>		</div>
				</div>
				<div class="elementor-element elementor-element-adbf2c8 elementor-widget elementor-widget-text-editor" data-id="adbf2c8" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW101152181 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW101152181 BCX0">Lightweight open-source large language models (LLMs) are becoming a practical choice for organizations looking to run AI workloads locally. They offer a strong balance between performance, speed, and resource requirements—making them ideal for document summarization, extraction, and classification without relying on cloud infrastructure. </span></span><span class="EOP SCXW101152181 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
					</div>
				</div>
		<div class="elementor-element elementor-element-330c9fe e-flex e-con-boxed e-con e-parent" data-id="330c9fe" data-element_type="container">
					<div class="e-con-inner">
				<div class="elementor-element elementor-element-73949bc elementor-widget elementor-widget-text-editor" data-id="73949bc" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span data-contrast="auto">We’ll focus on the following open-source models (each with downloadable checkpoints) that have a good reputation for quality relative to their size:</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-6703794 elementor-widget elementor-widget-text-editor" data-id="6703794" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<ul><li style="list-style-type: none;"><ul><li><strong>Llama 3.1</strong><span data-contrast="auto"> – 8B parameters (Meta AI)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="2" data-aria-level="1"><strong>StableLM Zephyr</strong><span data-contrast="auto"> – 3B parameters (Stability AI)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="3" data-aria-level="1"><strong>Llama 3.2</strong><span data-contrast="auto"> – 1B/3B parameters (Meta AI)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="4" data-aria-level="1"><strong>Mistral</strong><span data-contrast="auto"> – 7B parameters (Mistral AI)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="5" data-aria-level="1"><strong>Gemma 3</strong><span data-contrast="auto"> – 1B and 4B variants (Google DeepMind)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="6" data-aria-level="1"><strong>DeepSeek R1</strong><span data-contrast="auto"> – 1.5B and 7B variants (DeepSeek AI)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="7" data-aria-level="1"><strong>Phi-4 Mini</strong><span data-contrast="auto"> – 3.8B parameters (Microsoft)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="8" data-aria-level="1"><strong>TinyLlama</strong><span data-contrast="auto"> – 1.1B parameters (community project)</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul>						</div>
				</div>
				<div class="elementor-element elementor-element-f98ca55 elementor-widget elementor-widget-text-editor" data-id="f98ca55" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<ul><li style="list-style-type: none;"><span data-contrast="auto">These models range from very small (under 1 GB on disk) to mid-sized (~5 GB). All can be run in inference mode on a 16 GB GPU (often even in half-precision or 4-bit quantized form) and many are workable on CPU with enough RAM and patience. Table 1 summarizes their characteristics:</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul>						</div>
				</div>
				<div class="elementor-element elementor-element-71cd074 elementor-widget elementor-widget-html" data-id="71cd074" data-element_type="widget" data-widget_type="html.default">
				<div class="elementor-widget-container">
			<style>
  @import url('https://fonts.googleapis.com/css2?family=Roboto:wght@300&display=swap');

  .model-table {
    font-family: 'Roboto', sans-serif;
    font-weight: 300;
    font-size: 14px;
    color: #1C244B;
    border-collapse: collapse;
    width: 100%;
  }

  .model-table th, .model-table td {
    border: 1px solid #ccc;
    padding: 8px;
    text-align: left;
    color: #1C244B;
  }

  .model-table th {
    background-color: #f2f2f2;
  }
</style>

<table class="model-table">
  <thead>
    <tr>
      <th>Model</th>
      <th>Size on Disk (quantized)</th>
      <th>Max Context</th>
      <th>Licence</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Llama 3.1 (8B)</td>
      <td>4.9GB</td>
      <td>128k tokens</td>
      <td>Open-source</td>
    </tr>
    <tr>
      <td>StableLM Zephyr (3B)</td>
      <td>1.6GB</td>
      <td>4k tokens</td>
      <td>Only non-commercial use</td>
    </tr>
    <tr>
      <td>Llama 3.2 (3B)</td>
      <td>2.0GB</td>
      <td>128k tokens</td>
      <td>Open-source</td>
    </tr>
    <tr>
      <td>Mistral (7B)</td>
      <td>4.1GB</td>
      <td>32k tokens</td>
      <td>Open-source (Apache 2.0)</td>
    </tr>
    <tr>
      <td>Gemma 3 (4B)</td>
      <td>3.3GB</td>
      <td>128k tokens</td>
      <td>Open-source</td>
    </tr>
    <tr>
      <td>Gemma 3 (1B)</td>
      <td>0.8GB</td>
      <td>32k tokens</td>
      <td>Open-source</td>
    </tr>
    <tr>
      <td>DeepSeek R1 (7B)</td>
      <td>4.7GB</td>
      <td>128k tokens</td>
      <td>Open-source (MIT licence)</td>
    </tr>
    <tr>
      <td>DeepSeek R1 (1.5B)</td>
      <td>1.1GB</td>
      <td>128k tokens</td>
      <td>Open-source (MIT licence)</td>
    </tr>
    <tr>
      <td>Phi-4 Mini (3.8B)</td>
      <td>2.5GB</td>
      <td>128k tokens</td>
      <td>Open-source</td>
    </tr>
    <tr>
      <td>TinyLlama (1.1B)</td>
      <td>0.6GB</td>
      <td>2k tokens</td>
      <td>Open-source</td>
    </tr>
  </tbody>
</table>
		</div>
				</div>
				<div class="elementor-element elementor-element-55c06b4 elementor-widget elementor-widget-text-editor" data-id="55c06b4" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<h6><span class="TextRun SCXW254867370 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW254867370 BCX0">Table 1:</span></span><span class="TextRun SCXW254867370 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW254867370 BCX0"> Lightweight LLMs for local use – model sizes a</span><span class="NormalTextRun SCXW254867370 BCX0">nd</span> <span class="NormalTextRun SCXW254867370 BCX0">maximum</span><span class="NormalTextRun SCXW254867370 BCX0"> context windo</span><span class="NormalTextRun SCXW254867370 BCX0">w</span><span class="NormalTextRun SCXW254867370 BCX0">.</span></span><span class="EOP SCXW254867370 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></h6>						</div>
				</div>
				<div class="elementor-element elementor-element-58e51e9 elementor-widget elementor-widget-text-editor" data-id="58e51e9" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><strong><span class="TextRun SCXW7653520 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW7653520 BCX0">Notes:</span></span></strong><span class="TextRun SCXW7653520 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW7653520 BCX0"> “Max Context” is the maximum sequence length (tokens) the model can process in one go. </span></span><span class="EOP SCXW7653520 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-223eda5 elementor-widget elementor-widget-text-editor" data-id="223eda5" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW99345828 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW99345828 BCX0">Next, </span><span class="NormalTextRun SCXW99345828 BCX0">let’s</span><span class="NormalTextRun SCXW99345828 BCX0"> look at each model’s </span></span><span class="TextRun SCXW99345828 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW99345828 BCX0">pros and cons</span></span><span class="TextRun SCXW99345828 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW99345828 BCX0">, especially in the context of document tasks:</span></span><span class="EOP SCXW99345828 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-7192f01 elementor-widget elementor-widget-text-editor" data-id="7192f01" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="1" data-aria-level="1"><strong>Llama 3.1 (8B)</strong><span data-contrast="auto"><strong>:</strong> Powerful general-purpose model; moderate size and strong multilingual capabilities. Heavy for CPU-only systems; requires chunking for long documents.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="2" data-aria-level="1"><strong>StableLM Zephyr (3B)</strong><span data-contrast="auto"><strong>:</strong> Ultra-lightweight, good for basic QA/extraction. Limited by small parameter count and commercial license restrictions.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="3" data-aria-level="1"><strong>Llama 3.2 (3B)</strong><span data-contrast="auto">: Excellent summarization and retrieval; long context support (128k tokens). Smaller size affects complex reasoning accuracy.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="4" data-aria-level="1"><strong>Mistral (7B)</strong><span data-contrast="auto"><strong>:</strong> Best overall performer for its size; highly efficient inference. Ideal for detailed summarization tasks.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="5" data-aria-level="1"><strong>Gemma 3 (4B/1B)</strong><span data-contrast="auto">: Offers multimodal capabilities and extensive multilingual support. The 4B model balances capability and speed; the 1B model best suited for simple tasks.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="6" data-aria-level="1"><strong>DeepSeek R1 (7B/1.5B)</strong><span data-contrast="auto">: Balanced efficiency and comprehension for general NLP tasks; limited complex reasoning compared to Mistral.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="7" data-aria-level="1"><strong>Phi-4 Mini (3.8B)</strong><span data-contrast="auto">: Exceptional reasoning, math, and logical capabilities; perfect for analytical document processing. English-focused.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="8" data-aria-level="1"><strong>TinyLlama (1.1B)</strong><span data-contrast="auto">: Extremely lightweight; suitable for basic text extraction/classification tasks. Limited contextual understanding.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul>						</div>
				</div>
				<div class="elementor-element elementor-element-906c9d8 elementor-widget elementor-widget-text-editor" data-id="906c9d8" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW259074413 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW259074413 BCX0">The models reviewed above cover a wide range of sizes and capabilities. Larger variants like Llama 3.1 and Mistral perform well on complex summarization and multilingual tasks but are less suited for CPU-only setups. Mid-sized models such as Llama 3.2 and Gemma 3 (4B) handle long inputs efficiently with reasonable performance. Smaller models, including </span><span class="NormalTextRun SpellingErrorV2Themed SCXW259074413 BCX0">TinyLlama</span><span class="NormalTextRun SCXW259074413 BCX0"> and </span><span class="NormalTextRun SpellingErrorV2Themed SCXW259074413 BCX0">StableLM</span><span class="NormalTextRun SCXW259074413 BCX0"> Zephyr, are lightweight and fast, making them practical for basic extraction or classification tasks.</span></span><span class="EOP SCXW259074413 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-013ecbc elementor-widget elementor-widget-heading" data-id="013ecbc" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
			<h3 class="elementor-heading-title elementor-size-default">Models Benchmarking: Document Extraction and Summarization </h3>		</div>
				</div>
				<div class="elementor-element elementor-element-f583b4c elementor-widget elementor-widget-text-editor" data-id="f583b4c" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW65580225 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW65580225 BCX0">Here we outline a simple </span><span class="NormalTextRun SCXW65580225 BCX0">model </span><span class="NormalTextRun SCXW65580225 BCX0">benchmarking plan covering t</span><span class="NormalTextRun SCXW65580225 BCX0">wo</span><span class="NormalTextRun SCXW65580225 BCX0"> common document-processing tasks:</span></span><span class="EOP SCXW65580225 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-236155a elementor-widget elementor-widget-text-editor" data-id="236155a" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<ol><li><strong> Information Extraction:</strong><span data-contrast="auto"> We evaluated how well each model can extract specific fields from a policy or certificate. Specifically, we prompted each model to find the </span><b><span data-contrast="auto">p</span></b><strong>olicy number, insured name</strong><span data-contrast="auto"><strong>,</strong> VAT ID, address and insurance period in the document text and return the structured output &#8211; clean JSON response with all the needed values.</span></li><li><strong> Summarization: </strong><span data-contrast="auto">Each model generated a concise summary of an insurance policy, covering key points such as coverage, exclusions, and conditions.We rated the summaries on clarity, correctness, factual accuracy and readability and penalized heavily fabricating information.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ol>						</div>
				</div>
				<div class="elementor-element elementor-element-02421da elementor-widget elementor-widget-text-editor" data-id="02421da" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun CommentStart SCXW43958002 BCX0">We used 11 document</span><span class="NormalTextRun SCXW43958002 BCX0">s</span><span class="NormalTextRun SCXW43958002 BCX0"> and</span><span class="NormalTextRun SCXW43958002 BCX0"> </span><span class="NormalTextRun SCXW43958002 BCX0">ran all t</span><span class="NormalTextRun SCXW43958002 BCX0">ests using </span></span><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SpellingErrorV2Themed SCXW43958002 BCX0">Ollama</span></span><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW43958002 BCX0"> <a href="https://inero-software.com/deploying-llms-locally-a-guide-to-ollama-and-lm-studio/">(</a></span><span class="NormalTextRun SCXW43958002 BCX0">you can read about </span><span class="NormalTextRun SCXW43958002 BCX0">running model with </span><span class="NormalTextRun SpellingErrorV2Themed SCXW43958002 BCX0">Ollama</span> <span class="NormalTextRun CommentStart SCXW43958002 BCX0">here</span><span class="NormalTextRun SCXW43958002 BCX0">)</span><span class="NormalTextRun SCXW43958002 BCX0">.</span><span class="NormalTextRun SCXW43958002 BCX0"> </span><span class="NormalTextRun SCXW43958002 BCX0">The benchmarks were performed on a PC equipped with an</span><span class="NormalTextRun SCXW43958002 BCX0"> NVIDIA </span><span class="NormalTextRun SCXW43958002 BCX0">GeForce RTX 2060 </span><span class="NormalTextRun SCXW43958002 BCX0">and </span><span class="NormalTextRun SCXW43958002 BCX0">6</span><span class="NormalTextRun SCXW43958002 BCX0"> GB </span><span class="NormalTextRun SCXW43958002 BCX0">V</span><span class="NormalTextRun SCXW43958002 BCX0">RAM.</span> <span class="NormalTextRun SCXW43958002 BCX0">To ensure consistent results, each model was run with </span></span><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW43958002 BCX0">temperature set to 0</span></span><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW43958002 BCX0"> for the extraction task (to produce deterministic outputs), and with a fixed </span></span><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW43958002 BCX0">temperature of 0.7</span></span><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW43958002 BCX0"> for summarization. For the extraction task, we also used </span></span><a class="Hyperlink SCXW43958002 BCX0" href="https://ollama.com/blog/structured-outputs" target="_blank" rel="noreferrer noopener"><span class="TextRun Underlined SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="none"><span class="NormalTextRun SCXW43958002 BCX0" data-ccp-charstyle="Hyperlink">structured outputs</span></span></a><span class="TextRun SCXW43958002 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW43958002 BCX0">:</span> </span><span class="EOP SCXW43958002 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335557856&quot;:16777215,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:270}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-f1a279a elementor-widget elementor-widget-text-editor" data-id="f1a279a" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<pre> <br /><br /><span data-contrast="none">{</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"model"</span><span data-contrast="none">: </span><span data-contrast="none">"deepseek-r1:7b"</span><span data-contrast="none">,</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"prompt"</span><span data-contrast="none">: </span><span data-contrast="none">"You are an assistant that extracts insurance-related information from a given input text. You must extract and return only the following fields: - policy_number,- insurance_period,- insured (company or person name),- nip (tax identification number),- address (of the insured). Return the output as a **clean JSON object** — not as a string, not inside quotes, and without any commentary. If a field is missing, use 'Not found'. Document text: "</span><span data-contrast="none">,</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335557856&quot;:16777215,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:270}"> </span><br /><br /><span data-contrast="none">    </span><span data-contrast="none">"stream"</span><span data-contrast="none">: </span><b><span data-contrast="none">false</span></b><span data-contrast="none">,</span> <br /><span data-contrast="none">    </span><span data-contrast="none">"format"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">    </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"object"</span><span data-contrast="none">,</span> <br /><span data-contrast="none">    </span><span data-contrast="none">"properties"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"policy_number"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"string"</span> <br /><span data-contrast="none">      },</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insurance_period_start"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"string"</span> <br /><span data-contrast="none">      },</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insurance_period_end"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"string"</span> <br /><span data-contrast="none">      },</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insured"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"string"</span> <br /><span data-contrast="none">      },</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insured_nip"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"string"</span> <br /><span data-contrast="none">      },</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insured_address"</span><span data-contrast="none">: {</span> <br /><span data-contrast="none">        </span><span data-contrast="none">"type"</span><span data-contrast="none">: </span><span data-contrast="none">"string"</span> <br /><span data-contrast="none">      }</span> <br /><span data-contrast="none">    },</span> <br /><span data-contrast="none">    </span><span data-contrast="none">"required"</span><span data-contrast="none">: [</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"policy_number"</span><span data-contrast="none">,</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insurance_period_start"</span><span data-contrast="none">, </span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insurance_period_end"</span><span data-contrast="none">,</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insured"</span><span data-contrast="none">,</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insured_nip"</span><span data-contrast="none">,</span> <br /><span data-contrast="none">      </span><span data-contrast="none">"insured_address"</span> <br /><span data-contrast="none">    ]</span> <br /><span data-contrast="none">  }</span> <br /><span data-contrast="none">}</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335557856&quot;:16777215,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:270}"> </span></pre>						</div>
				</div>
				<div class="elementor-element elementor-element-a6fbca0 elementor-widget elementor-widget-image" data-id="a6fbca0" data-element_type="widget" data-widget_type="image.default">
				<div class="elementor-widget-container">
													<img fetchpriority="high" decoding="async" data-attachment-id="7846" data-permalink="https://inero-software.com/top-lightweight-llms-for-local-deployment/attachment/111553/" data-orig-file="https://inero-software.com/wp-content/uploads/2025/04/111553.png" data-orig-size="1154,649" data-comments-opened="0" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="111553" data-image-description="" data-image-caption="" data-medium-file="https://inero-software.com/wp-content/uploads/2025/04/111553-300x169.png" data-large-file="https://inero-software.com/wp-content/uploads/2025/04/111553-1030x579.png" tabindex="0" role="button" width="1030" height="579" src="https://inero-software.com/wp-content/uploads/2025/04/111553-1030x579.png" class="attachment-large size-large wp-image-7846" alt="" srcset="https://inero-software.com/wp-content/uploads/2025/04/111553-1030x579.png 1030w, https://inero-software.com/wp-content/uploads/2025/04/111553-300x169.png 300w, https://inero-software.com/wp-content/uploads/2025/04/111553-768x432.png 768w, https://inero-software.com/wp-content/uploads/2025/04/111553-533x300.png 533w, https://inero-software.com/wp-content/uploads/2025/04/111553.png 1154w" sizes="(max-width: 1030px) 100vw, 1030px" data-attachment-id="7846" data-permalink="https://inero-software.com/top-lightweight-llms-for-local-deployment/attachment/111553/" data-orig-file="https://inero-software.com/wp-content/uploads/2025/04/111553.png" data-orig-size="1154,649" data-comments-opened="0" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="111553" data-image-description="" data-image-caption="" data-medium-file="https://inero-software.com/wp-content/uploads/2025/04/111553-300x169.png" data-large-file="https://inero-software.com/wp-content/uploads/2025/04/111553-1030x579.png" role="button" />													</div>
				</div>
				<div class="elementor-element elementor-element-c923f73 elementor-widget elementor-widget-text-editor" data-id="c923f73" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<h6><span class="TextRun SCXW85460195 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW85460195 BCX0">Examples of insurance </span><span class="NormalTextRun SCXW85460195 BCX0">certifacates</span><span class="NormalTextRun SCXW85460195 BCX0">.</span></span><span class="EOP SCXW85460195 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></h6>						</div>
				</div>
				<div class="elementor-element elementor-element-e9e7e62 elementor-widget elementor-widget-text-editor" data-id="e9e7e62" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><strong><span class="TextRun SCXW36022441 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW36022441 BCX0">The table below presents the benchmark results.</span></span></strong> <span class="TextRun SCXW36022441 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW36022441 BCX0">Extraction accuracy</span></span><span class="TextRun SCXW36022441 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW36022441 BCX0"> refers to the number of documents (out of 11) where the model successfully extracted all key fields. </span></span><span class="TextRun SCXW36022441 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW36022441 BCX0">Token/sec</span></span><span class="TextRun SCXW36022441 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"> <span class="NormalTextRun SCXW36022441 BCX0">indicates</span><span class="NormalTextRun SCXW36022441 BCX0"> the model’s inference speed — how quickly it generates responses.</span></span><span class="EOP SCXW36022441 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-e5f35c8 elementor-widget elementor-widget-html" data-id="e5f35c8" data-element_type="widget" data-widget_type="html.default">
				<div class="elementor-widget-container">
			<style>
  @import url('https://fonts.googleapis.com/css2?family=Roboto:wght@300&display=swap');

  .model-table {
    font-family: 'Roboto', sans-serif;
    font-weight: 300;
    font-size: 14px;
    color: #1C244B;
    border-collapse: collapse;
    width: 100%;
  }

  .model-table th, .model-table td {
    border: 1px solid #ccc;
    padding: 8px;
    text-align: left;
    color: #1C244B;
  }

  .model-table th {
    background-color: #f2f2f2;
  }

  .green-bg {
    background-color: #DFF0D8;
  }

  .red-bg {
    background-color: #F2DEDE;
  }
</style>

<table class="model-table">
  <thead>
    <tr>
      <th>Model</th>
      <th>Summarization</th>
      <th>Extraction Accuracy</th>
      <th>Tokens/sec</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Llama 3.1 (8B)</td>
      <td class="green-bg">High-quality, no hallucinations</td>
      <td>10/11</td>
      <td>13.49</td>
    </tr>
    <tr>
      <td>StableLM 3B</td>
      <td class="red-bg">Average quality, typos/hallucinations</td>
      <td>4/11</td>
      <td>56.51</td>
    </tr>
    <tr>
      <td>Llama 3.2 (3B)</td>
      <td class="green-bg">Concise yet comprehensive summary, no hallucinations</td>
      <td>8/11</td>
      <td>49.49</td>
    </tr>
    <tr>
      <td>Mistral 7B</td>
      <td>Extensive summary, factually correct</td>
      <td>8/11</td>
      <td>29.01</td>
    </tr>
    <tr>
      <td>Gemma 3 4B</td>
      <td class="green-bg">Concise yet comprehensive summary, no hallucinations</td>
      <td>10/11</td>
      <td>13.37</td>
    </tr>
    <tr>
      <td>Gemma 3 1B</td>
      <td class="green-bg">Concise yet comprehensive summary, no hallucinations</td>
      <td>4/11</td>
      <td>73.46</td>
    </tr>
    <tr>
      <td>DeepSeek 7B</td>
      <td class="green-bg">Concise yet comprehensive summary, no hallucinations</td>
      <td>6/11</td>
      <td>16.39</td>
    </tr>
    <tr>
      <td>DeepSeek 1.5B</td>
      <td class="red-bg">Very poor, frequent hallucinations/errors</td>
      <td>0/11</td>
      <td>66.45</td>
    </tr>
    <tr>
      <td>Phi-4 Mini 3.8B</td>
      <td>Very concise summaries, factually correct</td>
      <td>9/11</td>
      <td>39.31</td>
    </tr>
    <tr>
      <td>TinyLlama 1.1B</td>
      <td class="red-bg">Poor quality, severe hallucinations</td>
      <td>2/11</td>
      <td>107.34</td>
    </tr>
  </tbody>
</table>
		</div>
				</div>
				<div class="elementor-element elementor-element-4f30579 elementor-widget elementor-widget-text-editor" data-id="4f30579" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<h6><span class="TextRun SCXW220458249 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW220458249 BCX0">Table 2: </span><span class="NormalTextRun SCXW220458249 BCX0">B</span><span class="NormalTextRun SCXW220458249 BCX0">enchmarking results.</span></span><span class="TextRun SCXW220458249 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW220458249 BCX0"> </span></span><span class="EOP SCXW220458249 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></h6>						</div>
				</div>
				<div class="elementor-element elementor-element-1046393 elementor-widget elementor-widget-image" data-id="1046393" data-element_type="widget" data-widget_type="image.default">
				<div class="elementor-widget-container">
													<img decoding="async" data-attachment-id="7847" data-permalink="https://inero-software.com/top-lightweight-llms-for-local-deployment/lightweight-llm-scatterplot/" data-orig-file="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot.png" data-orig-size="1968,1180" data-comments-opened="0" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lightweight-llm-scatterplot" data-image-description="" data-image-caption="" data-medium-file="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-300x180.png" data-large-file="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-1030x618.png" tabindex="0" role="button" width="1030" height="618" src="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-1030x618.png" class="attachment-large size-large wp-image-7847" alt="" srcset="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-1030x618.png 1030w, https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-300x180.png 300w, https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-768x460.png 768w, https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-1536x921.png 1536w, https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-500x300.png 500w, https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot.png 1968w" sizes="(max-width: 1030px) 100vw, 1030px" data-attachment-id="7847" data-permalink="https://inero-software.com/top-lightweight-llms-for-local-deployment/lightweight-llm-scatterplot/" data-orig-file="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot.png" data-orig-size="1968,1180" data-comments-opened="0" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lightweight-llm-scatterplot" data-image-description="" data-image-caption="" data-medium-file="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-300x180.png" data-large-file="https://inero-software.com/wp-content/uploads/2025/04/lightweight-llm-scatterplot-1030x618.png" role="button" />													</div>
				</div>
				<div class="elementor-element elementor-element-704e9c5 elementor-widget elementor-widget-text-editor" data-id="704e9c5" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW241422309 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW241422309 BCX0">This scatterplot visualizes the </span></span><span class="TextRun SCXW241422309 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW241422309 BCX0">trade-off between extraction accuracy and inference speed</span></span><span class="TextRun SCXW241422309 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW241422309 BCX0"> (measured in tokens per second)</span></span><span class="EOP SCXW241422309 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-5527166 elementor-widget elementor-widget-text-editor" data-id="5527166" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span data-contrast="auto">The benchmarking results reveal significant variations among the tested models. </span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559685&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="1" data-aria-level="1"><strong>Bottom-right</strong><span data-contrast="auto"> models &#8211; </span><strong>Llama 3.1 (8B), Gemma 3 (4B)</strong><span data-contrast="auto">, and </span><strong>Phi-4 Mini (3.8B)</strong> <span data-contrast="auto">&#8211; </span><span data-contrast="auto">excel in summarization quality and extraction accuracy, consistently providing concise and accurate outputs. Phi-4 Mini seems to offer a good trade-off between speed and accuracy.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="2" data-aria-level="1"><strong>Mistral 7B, DeepSeek 7B, Llama 3.2</strong><span data-contrast="auto"> generate detailed and informative summaries, though their extraction performance is more moderate.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="-" data-font="Aptos" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Aptos&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;-&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">On the other hand, </span><strong>smaller models</strong> <span data-contrast="auto">(on the top-left side of the chart) like </span><strong><i>StableLM Zephyr (3B), Gemma 3 (1B)</i> and <i>TinyLlama</i></strong><i><span data-contrast="auto"> (1.1B)</span></i><span data-contrast="auto"> show significantly weaker extraction accuracy and are prone to frequent hallucinations. However, they benefit from faster inference times. Their limited context windows (e.g., 4k tokens) may contribute to these shortcomings. Overall, they may be suitable for only very basic tasks.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul>						</div>
				</div>
				<div class="elementor-element elementor-element-1ac20ae elementor-widget elementor-widget-heading" data-id="1ac20ae" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
			<h3 class="elementor-heading-title elementor-size-default">Choosing the Right Model for Your Needs </h3>		</div>
				</div>
				<div class="elementor-element elementor-element-11e1bfc elementor-widget elementor-widget-text-editor" data-id="11e1bfc" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">When selecting a language model for document extraction or summarization, </span><span class="NormalTextRun SCXW204701935 BCX0">it’s</span><span class="NormalTextRun SCXW204701935 BCX0"> all about balancing </span></span><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">accuracy</span></span><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">, </span></span><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">speed</span></span><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">, and </span></span><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">hardware constraints</span></span><span class="TextRun SCXW204701935 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW204701935 BCX0">. Below is a quick breakdown to help you pick the best fit—whether you need high precision, fast inference, or something lightweight for basic tasks.</span></span><span class="EOP SCXW204701935 BCX0" data-ccp-props="{}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-689718c elementor-widget elementor-widget-text-editor" data-id="689718c" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<ul><li style="list-style-type: none;"><ul><li data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="1" data-aria-level="1"><strong>High Accuracy &amp; Reasonable Speed:</strong><span data-contrast="auto"> Choose </span><strong>Phi-4 Mini (3.8B), Gemma 3 (4B)</strong><span data-contrast="auto">, or </span><strong>Llama 3.1 (8B)</strong><span data-contrast="auto"> for robust extraction and summarization accuracy.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="2" data-aria-level="1"><strong>Fast Inference &amp; Moderate Accuracy:</strong><span data-contrast="auto"> Opt for </span><strong>Llama 3.2 (3B)</strong><span data-contrast="auto"> or </span><strong>StableLM Zephyr (3B)</strong><span data-contrast="auto"> for simpler tasks on limited hardware.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="3" data-aria-level="1"><strong>Balanced Performance (Accuracy-Speed Tradeoff): Mistral (7B)</strong><span data-contrast="auto"> provides strong general-purpose capability suitable for detailed document summarization tasks.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul><ul><li style="list-style-type: none;"><ul><li data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559683&quot;:0,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" aria-setsize="-1" data-aria-posinset="4" data-aria-level="1"><strong>Low Resource Environments (Basic Tasks):</strong><span data-contrast="auto"> Consider </span><strong>TinyLlama (1.1B)</strong><span data-contrast="auto"> for quick extraction or classification on minimal hardware if accuracy isn&#8217;t critical.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></li></ul></li></ul>						</div>
				</div>
				<div class="elementor-element elementor-element-ee4c212 elementor-widget elementor-widget-heading" data-id="ee4c212" data-element_type="widget" data-widget_type="heading.default">
				<div class="elementor-widget-container">
			<h3 class="elementor-heading-title elementor-size-default">Conclusion </h3>		</div>
				</div>
				<div class="elementor-element elementor-element-510ec3a elementor-widget elementor-widget-text-editor" data-id="510ec3a" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
							<p><span class="TextRun SCXW44846787 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW44846787 BCX0">Lightweight LLMs are increasingly </span><span class="NormalTextRun SCXW44846787 BCX0">viable</span><span class="NormalTextRun SCXW44846787 BCX0"> solutions for local deployment, particularly in document-intensive industries such as insurance. Models such as Phi-4 Mini, Gemma 3 (4B), and Mistral 7B provide </span><span class="NormalTextRun SCXW44846787 BCX0">strong performance</span><span class="NormalTextRun SCXW44846787 BCX0"> in summarization, extraction, and classification tasks. Carefully balancing model size, inference speed, and accuracy ensures </span><span class="NormalTextRun SCXW44846787 BCX0">optimal</span><span class="NormalTextRun SCXW44846787 BCX0"> outcomes, empowering organizations with affordable, private, and responsive AI solutions directly on owned hardware.</span></span><span class="EOP SCXW44846787 BCX0" data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:0,&quot;335551620&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}"> </span></p>						</div>
				</div>
				<div class="elementor-element elementor-element-8874a86 elementor-cta--skin-cover elementor-animated-content elementor-bg-transform elementor-bg-transform-zoom-in elementor-widget elementor-widget-call-to-action" data-id="8874a86" data-element_type="widget" data-widget_type="call-to-action.default">
				<div class="elementor-widget-container">
					<a class="elementor-cta" href="https://inero-software.com/optimization-of-back-office-processes-with-ai-agent-implementation-a-practical-example/">
					<div class="elementor-cta__bg-wrapper">
				<div class="elementor-cta__bg elementor-bg" style="background-image: url(https://inero-software.com/wp-content/uploads/2025/03/cta-1903-1030x579.png);" role="img" aria-label="cta 1903"></div>
				<div class="elementor-cta__bg-overlay"></div>
			</div>
							<div class="elementor-cta__content">
				
									<h2 class="elementor-cta__title elementor-cta__content-item elementor-content-item elementor-animated-item--grow">
						This might interest you					</h2>
				
									<div class="elementor-cta__description elementor-cta__content-item elementor-content-item elementor-animated-item--grow">
						Optimization of Back-Office Processes with AI Agent Implementation: A Practical Example					</div>
				
									<div class="elementor-cta__button-wrapper elementor-cta__content-item elementor-content-item elementor-animated-item--grow">
					<span class="elementor-cta__button elementor-button elementor-size-">
						Read the full text					</span>
					</div>
							</div>
						</a>
				</div>
				</div>
					</div>
				</div>
				</div>
		<p>Artykuł <a href="https://inero-software.com/top-lightweight-llms-for-local-deployment/">Top Lightweight LLMs for Local Deployment</a> pochodzi z serwisu <a href="https://inero-software.com">Inero Software - Software Consulting</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7843</post-id>	</item>
	</channel>
</rss>
