Adding webpage crawling to ChatGPT-4o with Semantic Kernel
In this demo, we will use Semantic Kernel to integrate a custom plugin written in python. This plugin is designed to extract web content from URLs specified within the chat prompt.
Semantic Kernel is an open source SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together. (see here for more details).
Prerequisites:
- Azure account ( 5 min onboarding ).
- Basic python knowledge.
Follow these steps to run the demo:
- Deploy GPT-4o in Microsoft AI Studio:(follow my ~5 minute guide here)
- Clone this repo : https://github.com/microsoft/semantic-kernel.
- Install docker compose (step-by-step guide).
- Update your AZURE_OPENAI_ENDPOINT in docker-compose.yml
- Update the api-key.txt with your AZURE_OPENAI_API_KEY
- Run the demo with terminal command: “docker-compose up”
- Use the chat app at http://localhost:8000/static/webclient.html
This is the end result:
The step by step explanation of how it works:
This demo contains 4 main elements:
- “Semantic Kernel” service implemented in app.py
- GPT-4o endpoint in Azure.
- Web-App which is available in browser at localhost://static/webclient:8000
- Some publicly available website.
Here is the flow:
- User enters his query which mentions the url of the website.
- The Web-App posts the query to the “Semantic Kernel” service endpoint.
- The “Semantic Kernel” invokes the GPT-4o model service in Azure with predefined custom “fetchurl” plugin. The received answer from the GPT-4o is conveyed to Web-App.
Let’s explain the key parts of this demo.
“Semantic Kernel” service:
Is implemented in app.py. Let’s break down the code in the app.py and explain each of 5 sections.
#1 Firstly, let’s declare the import packages:
import os
from dotenv import load_dotenv
from fastapi import FastAPI, Body
from semantic_kernel.kernel import Kernel
from fetchurl import fetch_text_content_from_url
— semantic_kernel to enable model invocation with FetchPlugin.
— fastapi to create an endpoint at localhost:8000/demoprompt. This endpoint receives user prompts from the chat web app, invokes the AI service with these prompts, and returns the model’s response back to the user.
— fetchurl for simple web content fetch method fetch_text_content_from_url, implemented with beautifullsoup4 library.
#2 On service startup we set-up the kernel with setup_kernel method:
- ai_service provides the endpoint of the deployed chatGPT4o . We use the “chat completion” AI service (for more details see here).
- fetchurl custom plugin is implemented in class FetchPlugin .
async def setup_kernel():
kernel = Kernel()
# Check if we're using Azure OpenAI
if os.getenv('GLOBAL_LLM_SERVICE') != "AzureOpenAI":
raise ValueError("This script is configured to use Azure OpenAI. Please check your .env file.")
#Check if the secret key is defined in environment variable (for dev perposes)
if 'AZURE_OPENAI_API_KEY' in os.environ:
azure_api_key= os.getenv('AZURE_OPENAI_API_KEY')
else:
#get the key from the secret file:
azure_api_key = read_secret('AZURE_OPENAI_API_KEY')
service_id = "function_calling"
# Azure OpenAI setup
ai_service = AzureChatCompletion(
service_id=service_id,
deployment_name=os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT_NAME'),
endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'),
api_key= azure_api_key
)
# Adding the completion AI service to the kernel
kernel.add_service(ai_service)
# Adding the custom plugin to the kernel
kernel.add_plugin(FetchPlugin(), plugin_name="fetchurl")
return kernel
@app.on_event("startup")
async def startup_event():
global kernel
kernel = await setup_kernel()
#3 Defining the FetchPlugin class which makes use of fetch_text_content_from_url method from fetchurl package (which is implemented in fetchurl.py)
#Definition of the custom fetch plugin:
class FetchPlugin:
"""Plugin provides fetch of content from url."""
@kernel_function(name="get_content_from_url", description="Get the content from url")
def get_content_from_url(self, url: Annotated[str, "The input url"]) -> Annotated[str, "The output is a string"]:
return fetch_text_content_from_url(url)
#4 Handling request from web-client:
On app.post(“/demoprompt”)
- Invokes the model via kernel with the custom plugin we defined at the service startup. The model fetches the content from the provided url in the prompt and acts upon the prompt query. The fetchurl plugin is used by the LLM model while it is generating the response.
- Receives the response from the model and convey it to the web-client.
#Endpoint for chat prompts:
@app.post("/demoprompt")
async def demo_prompt(request: PromptRequest):
settings: OpenAIChatPromptExecutionSettings = kernel.get_prompt_execution_settings_from_service_id(
service_id="function_calling"
)
settings.function_choice_behavior = FunctionChoiceBehavior.Auto(filters={"included_plugins": ["fetchurl"]})
result = await kernel.invoke_prompt(
function_name="get_content_from_url",
plugin_name="fetchurl",
prompt=request.prompt,
settings=settings,
)
return {"response": str(result)}
#5 Additional utilities used in the demo:
- We use the load_dotenv method to read environment variables from .env file. This is useful when running the app locally with ‘python app.py’. You can store AI service details, including the API key, in this file. Check the .env.example file to see how to format it correctly.
- Adding no-cors support by adding CORSMiddleware.
- read_secret method is used to handle secrets injected by Docker Compose. This is used when running the service in a Docker container, as defined in the docker-compose.yml file. It allows secure access to sensitive information like API keys without storing them in the code.
- Serving webclient.html at http://localhost:8000/static/webclient.html by adding @app.get(“/”) endpoint.
load_dotenv()
app = FastAPI()
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Allows all origins
allow_credentials=True,
allow_methods=["*"], # Allows all methods
allow_headers=["*"], # Allows all headers
)
# Read the secret from a secret file which were injected by docker-compose
def read_secret(secret_name):
try:
with open(f'/run/secrets/AZURE_OPENAI_API_KEY', 'r') as secret_file:
return secret_file.read().strip()
except IOError:
return None
#Serving a simple chat webpage:
@app.get("/")
async def read_root():
return {"message": "Welcome to the API. Static files are served under /static"}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
A simple web-client:
The webchat client is implemented in webclient.html. This browser-based client does the following:
- It sends user queries to the “Semantic Kernel service” (which is implemented in app.py).
- It formats and displays the service’s responses to the user.
This HTML file handles the front-end interaction, while app.py manages the back-end processing.
Running in docker:
We use docker-compose.yml to run the docker image of “Semantic Kernel” service named “chatserver”.
docker-compose.yml file contains the the LLM details:
GLOBAL_LLM_SERVICE, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_CHAT_DEPLOYMENT_NAME.
and API credentials (stored in api-key.txt):
AZURE_OPENAI_API_KEY
services:
chatserver:
image: chatserver:latest
command: uvicorn app:app --host 0.0.0.0 --port 8000 --reload
environment:
- GLOBAL_LLM_SERVICE=AzureOpenAI
- AZURE_OPENAI_ENDPOINT=https://ai-example.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview
- AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=gpt-4o
ports:
- "8000:8000"
secrets:
- AZURE_OPENAI_API_KEY
secrets:
AZURE_OPENAI_API_KEY:
file: ./api-key.txt
— The service chatserver is based on the docker image chatserver , which is also available on docker hub.
— In order to change the app.py code , run: “docker build -t chatserver . “ . The Dockerfile is available in the repo directory, which is based on the latest python docker image.
The requirements.txt list all the required python packages.
Design considerations
- This demo uses just one plugin, but Semantic Kernel can support many different custom plugins. For example, you could add plugins to get weather information, run linters and access various free or paid services.
- The Microsoft Semantic Kernel is not the only way to implement RAG (Retrieval-Augmented Generation) apps which are lately becoming popular. See LangChain and LlamaIndex as an alternative implementations.
- For simplicity, this demo uses an Azure managed service to access the LLM model. However it is possible to use Semantic Kernel on the self hosted models (more about it here).
Discussing limitations:
- The fetch method used in the demo is very limited and may not work on some publicly available sites. To overcome the fetch limitation you can implement a more sophisticated fetch method.
- Be aware of your AI API usage cost. See Azure Cost Management to set budgets and monitor usage. For more cost friendly options, consider using self hosted models.
Conclusion
This demo has only touched on a small fraction of the vast potential for customizing Large Language Model (LLM) services to use user-provided data, without the need for fine-tuning models or extensive preprocessing. I think Semantic Kernel is really exiting, and I look forward making more cool stuff with it.