MTA Developer Lightspeed
Using the Migration Toolkit for Applications command-line interface to migrate your applications
Abstract
Making open source more inclusive
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.
Chapter 1. Configuring large language models for analysis
In an analysis, MTA with Developer Lightspeed provides the large language model (LLM) with the contextual prompt to identify the issues in the current application and generate suggestions to resolve them.
MTA with Developer Lightspeed is designed to be model agnostic. It works with LLMs that are run in different environments (in local containers, as local AI, as a shared service) to support analyzing Java applications in a wide range of scenarios. You can choose an LLM from well-known providers, local models that you run from Ollama or Podman desktop, and OpenAI API compatible models that are configured as Model-as-a-Service deployments.
The result of an analysis performed by MTA with Developer Lightspeed depends on the parameters of the LLM that you choose.
You can run an LLM from the following generative AI providers:
- OpenAI
- Azure OpenAI
- Google Gemini
- Amazon Bedrock
- Deepseek
- OpenShift AI
1.1. Deploying an LLM as a scalable service
The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using MTA with Developer Lightspeed with an LLM that is trained for your specific requirements than general purpose models from the public AI providers.
MTA with Developer Lightspeed is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams.
To cater to an enterprise-wide LLM deployment, MTA with Developer Lightspeed integrates with LLMs that are deployed as a scalable service on Red Hat OpenShift Container Platform clusters. These deployments, called model-as-a-service (MaaS), provide you with a granular control over resources such as compute, cluster nodes, and auto-scaling Graphical Processing Units (GPUs) while enabling you to leverage LLMs to perform analysis at a large scale.
The workflow for configuring an LLM on OpenShift Container Platform AI can be broadly divided into the following parts:
- Installing and configuring infrastructure resources
- Configuring OpenShift AI
- Connecting OpenShift AI with the LLM
- Preparing the LLM for analysis
1.1.1. Installing and configuring OpenShift Container Platform cluster
As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating OpenShift Container Platform clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters.
Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on OpenShift Container Platform AI. The following procedures refer to an Red Hat OpenShift Container Platform cluster hosted on Amazon Web Services.
1.1.1.1. Installing an OpenShift Container Platform cluster
The following procedure considers m6i.2xlarge Amazon EC2 M6i instances for the primary nodes. See Amazon EC2 M6i Instances to consider instances that are suitable for your requirements.
To create an OpenShift cluster with three AWS primary and secondary nodes:
Procedure
- Download the OpenShift Container Platform stable client from the mirror site.
Extract the tar file in your system with the following command:
tar xvzf <file-name>
-
Place the
oc
binary in a directory on your$PATH
. Create an install-config.yml file with the following configurations. Replace
BASE.DOMAIN
,CLUSTER_NAME
,PULL_SECRET
, andSSL_PUBLIC_KEY
with applicable values.additionalTrustBundlePolicy: Proxyonly apiVersion: v1 baseDomain: BASE.DOMAIN compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: aws: type: m6i.2xlarge replicas: 3 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: aws: type: m6i.2xlarge replicas: 3 metadata: creationTimestamp: null name: CLUSTER_NAME networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 10.0.0.0/16 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 platform: aws: region: REGION publish: External pullSecret: 'PULL_SECRET' sshKey: | SSH_PUBLIC_KEY
+ Run
./openshift-install create cluster
from theoc
binary path to install the OpenShift Container Platform cluster.
Next, install OpenShift AI operator in your OpenShift Container Platform cluster and configure htpasswd authentication for users to log in to the OCP AI operator.
1.1.1.2. Configuring operators for OpenShift Container Platform AI
The OpenShift Container Platform AI operator automatically installs all the required operators. However, since this example model-as-a-service deployment needs NVidia GPUs, you must install the following operators in your OpenShift Container Platform AI namespace:
- NVIDIA GPU Operator (provided by NVIDIA Corporation) - The NVIDIA GPU Operator uses the Operator framework within Red Hat OpenShift Container Platform to manage the full lifecycle of NVIDIA software components required to run GPU-accelerated workloads. The components include the NVIDIA drivers (to enable CUDA), the Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node tagging using GPU feature discovery (GFD), DCGM-based monitoring, and others. See NVidia GPU Architecture for more information.
- Node Feature Discovery Operator (provided by Red Hat) - The Node Feature Discovery Operator (NFD) is a Kubernetes add-on for detecting hardware features and system configuration. It manages the detection of hardware features and configuration in an Red Hat OpenShift Container Platform cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.
Procedure
-
Log in to the OCP cluster web interface with
cluster-admin
privileges. - Select Node Feature Discovery Operator on the Operators page.
- In the NodeFeatureDiscovery tab, click Create a NodeFeatureDiscovery to create an instance with default values.
- Select Nvidia GPU Operator on the Operators page.
- In the ClusterPolicy tab, click Create a ClusterPolicy to create an instance with default values.
1.1.1.3. Creating a GPU machine set
The machineset
custom resource contains details about the underlying infrastructure such as compute instance from a public cloud provider, block devices, region and zone of the compute instance, and accelarator to your OpenShift Container Platform cluster.
Procedure
Log in to the OpenShift Container Platform CLI and type the following command to get
CLUSTER_NAME
,CLUSTER_ID
,AVALABILITY_ZONE
, andREGION
to list currently available machineset in your cluster:$ oc get machineset -n openshift-machine-api
Enter the following command to view values of a specific compute machine set custom resource (CR):
$ oc get machineset <machineset_name> \ -n openshift-machine-api -o yaml
-
Enter the values for
CLUSTER_NAME
,CLUSTER_ID
,AVALABILITY_ZONE
,REGION
,instanceType
,cluster-api/accelerator
in the samplemachineset
YAML configuration. Type the following command to create the
machineset
resource for nodes in your OpenShift Container Platform cluster.oc create -f machineset.yml
Enter the following command to get the status of the
machineset
,machine
, andnode
CRs.watch 'oc get machineset -n openshift-machine-api && oc get machines -n openshift-machine-api && oc get nodes'
Sample
machineset
yaml filecat << EOF > machineset.yml apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64 machine.openshift.io/GPU: "8" machine.openshift.io/memoryMb: "786432" machine.openshift.io/vCPU: "192" labels: machine.openshift.io/cluster-api-cluster: CLUSTER_NAME-CLUSTER_ID name: CLUSTER_NAME-CLUSTER_ID-gpu-AVAILABILITY_ZONE namespace: openshift-machine-api spec: selector: matchLabels: machine.openshift.io/cluster-api-cluster: CLUSTER_NAME-CLUSTER_ID machine.openshift.io/cluster-api-machineset: CLUSTER_NAME-CLUSTER_ID-gpu-AVAILABILITY_ZONE template: metadata: labels: machine.openshift.io/cluster-api-cluster: CLUSTER_NAME-CLUSTER_ID machine.openshift.io/cluster-api-machine-role: gpu machine.openshift.io/cluster-api-machine-type: gpu machine.openshift.io/cluster-api-machineset: CLUSTER_NAME-CLUSTER_ID-gpu-AVAILABILITY_ZONE spec: replicas: 0 lifecycleHooks: {} metadata: labels: cluster-api/accelerator: A10G node-role.kubernetes.io/gpu: "" providerSpec: value: ami: id: ami-0c65d71e89d43aa90 1 apiVersion: awsproviderconfig.openshift.io/v1beta1 blockDevices: - ebs: iops: 0 kmsKey: {} volumeSize: 240 volumeType: gp2 credentialsSecret: name: aws-cloud-credentials deviceIndex: 0 iamInstanceProfile: id: CLUSTER_NAME-CLUSTER_ID-worker-profile instanceType: g5.48xlarge kind: AWSMachineProviderConfig metadata: creationTimestamp: null metadataServiceOptions: {} placement: availabilityZone: AVAILABILITY_ZONE region: REGION securityGroups: - filters: - name: tag:Name values: - CLUSTER_NAME-CLUSTER_ID-node - filters: - name: tag:Name values: - CLUSTER_NAME-CLUSTER_ID-lb subnet: filters: - name: tag:Name values: - CLUSTER_NAME-CLUSTER_ID-subnet-private-AVAILABILITY_ZONE tags: - name: kubernetes.io/cluster/CLUSTER_NAME-CLUSTER_ID value: owned userDataSecret: name: worker-user-data EOF
- 1
- Specify a valid Red Hat Enterprise Linux CoreOS (RHCOS) Amazon Machine Image (AMI) for your AWS zone for your OpenShift Container Platform nodes. If you want to use an AWS Marketplace image, you must complete the OpenShift Container Platform subscription from the AWS Marketplace to obtain an AMI ID for your region.
1.1.1.4. Configuring GPU node auto scaling
The cluster autoscaler increases and decreases the size of the cluster based on deployment needs.
To autoscale your cluster, you must deploy a ClusterAutoscaler
custom resource (CR), and then deploy a MachineAutoscaler
CR for each compute machine set.
Procedure
Modify the parameters for the
ClusterAutoscaler
custom resource (CR) by using the sample resource definition file.cat << EOF > <filename>.yml apiVersion: autoscaling.openshift.io/v1 kind: ClusterAutoscaler metadata: generation: 1 name: default spec: logVerbosity: 4 maxNodeProvisionTime: 15m podPriorityThreshold: -10 resourceLimits: gpus: - max: 16 min: 0 type: A10G scaleDown: delayAfterAdd: 20m delayAfterDelete: 5m delayAfterFailure: 30s enabled: true unneededTime: 5m EOF
See Cluster autoscaler resource definition for descriptions of the CR parameters.
+ . Enter the following command to deploy the cluster auto scaler CR.
+
$ oc create -f <filename>.yaml
After you deploy the ClusterAutoscaler
CR, you must deploy at least one MachineAutoscaler
CR.
1.1.1.5. Configuring machine auto scaling
The machine autoscaler adjusts the number of machines in the compute machine sets that you deploy in an Red Hat OpenShift Container Platform cluster. The machine autoscaler makes more machines when the cluster runs out of resources to support more deployments. Any changes to the values in MachineAutoscaler
resources, such as the minimum or maximum number of instances, are immediately applied to the compute machine set they target.
To deploy a MachineAutoscaler
CR for each compute machine set:
Procedure
Modify the parameters for the
MachineAutoscaler
custom resource (CR) by using the sample resource definition file.cat << EOF > <filename>.yml apiVersion: autoscaling.openshift.io/v1beta1 kind: MachineAutoscaler metadata: name: CLUSTER_NAME-CLUSTER_ID-gpu-AVAILABILITY_ZONE namespace: "openshift-machine-api" spec: minReplicas: 0 maxReplicas: 2 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: CLUSTER_NAME-CLUSTER_ID-gpu-AVAILABILITY_ZONE EOF
See Machine autoscaler resource definition for descriptions of the CR parameters.
+ . Enter the following command to deploy the cluster auto scaler CR.
+
$ oc create -f <filename>.yaml
After you deploy the MachineAutoscaler
CR, you must configure the OpenShift Container Platform AI operator.
1.1.2. Configuring OpenShift Container Platform AI
The configurations that you must complete for OpenShift Container Platform AI include creating a data science project instance in the OpenShift Container Platform AI operator. Next, you can configure model-specific configurations in the Red Hat OpenShift Container Platform AI console.
1.1.2.1. Creating a DataScience project cluster
To create a Data science project instance:
Prerequisites
-
You have
admin
rights to access the Red Hat OpenShift Container Platform Service on AWS cluster.
Procedure
- Install the OCP AI operator on the Red Hat OpenShift Container Platform Service on AWS web console.
- From the Data Science Cluster tab of the operator, click Create DataScienceCluster to create an instance with default values.
- After you create the Data Science Cluster instance, select Red Hat OpenShift AI from the application launcher icon at the top to launch the OCP AI web console.
1.1.2.2. Configuring the LLM serving runtime
It takes several minutes to scale nodes and pull the image to serve the virtual large language model (vLLM). However, the default time for deploying a vLLM is 10 minutes. A vLLM deployment that takes longer fails on the OpenShift Container Platform AI cluster.
To mitigate this issue, you must enter a custom serving time configuration.
Procedure
-
On the OpenShift Container Platform AI dashboard, click Settings > Serving runtimes. The Serving runtimes page lists the
vLLM ServingRuntime for KServe
custom resource (CR).KServe
orchestrates model serving for all types of models and includes model-serving runtimes that implement the loading of given types of model servers. KServe also handles the lifecycle of the deployment object, storage access, and networking setup. -
Click on the kebab menu for
vLLM ServingRuntime for KServe
and select Duplicate serving runtime. -
Enter a different display name for the serving runtime and increase the value for
serving.knative.dev/progress-deadline
to60m
. To support multiple GPU nodes and scaling, add
--distributed-executor-backend
and--tensor-parallel-size
tocontainers.args
as follows:spec: containers: - args: - --port=8080 - --model=/mnt/models - --served-model-name={{.Name}} - --distributed-executor-backend=mp - --tensor-parallel-size=8
Next, you must create an accelerator profile if you want to run a GPU node for the first time.
1.1.2.3. Creating an accelerator profile
Taints and tolerations allow the NVidia GPU nodes to control which pods should (or should not) be scheduled on them. A taint allows a node to refuse a pod to be scheduled unless that pod has a matching toleration.
You can use accelerators to configure taints and associated tolerance levels for the GPU nodes.
Procedure
- On the OpenShift Container Platform AI dashboard, click Settings > Accelerator profiles.
- Click Create accelerator profile.
-
On the Create accelerator profile dialog, type
NVIDIA GPU
as the Name andnvidia.com/gpu
as the Identifier. - To enable or disable the accelerator profile immediately after creation, click the toggle in the Enable column.
- Click Add toleration to open the Add toleration dialog. Toleration schedules pods with matching taints.
-
From the Operator list, select
Exists
. The key/effect parameters in the taint must match the same parameters configured under toleration in the pod. You must leave a blank value parameter, which matches any value. -
Enter
nvidia.com/gpu
as the key. -
From the Effect list, select
NoSchedule
. New pods that do not match the taint are not scheduled onto that node. Existing pods on the node remain. - Click Add to add the toleration configuration for the node.
- Click Create accelerator profile to complete the accelerator configuration.
1.1.3. Deploying the large language model
To connect the OpenShift Container Platform AI platform to a large language model (LLM), first, you must upload your LLM to a data source.
OpenShift Container Platform AI, that runs on pods in a Red Hat OpenShift Container Platform on AWS (ROSA) cluster, can access the LLM from a data source such as an Amazon Web Services (AWS) S3 storage. You must create an AWS S3 bucket and configure access permission so that it can access the pods running in the ROSA cluster. See how to enable service account to assume AWS IAM role to access the ROSA pods.
Next, you must configure a data connection to the bucket and deploy the LLM from the OpenShift Container Platform AI platform.
1.1.3.1. Adding a data connection
In OpenShift Container Platform, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers in your organization.
In your data science project, you must create a data connection to your existing S3-compatible storage bucket to which you uploaded a large language model.
Prerequisites
You need the following credential information for the storage buckets:
- Endpoint URL
- Access key
- Secret key
- Region
- Bucket name
If you do not have this information, contact your storage administrator.
Procedure
- In the OpenShift Container Platform AI web console, select Data science projects. The Data science projects page shows a list of projects that you can access. For each user-requested project in the list, the Name column shows the project display name, the user who requested the project, and the project description.
- Click Create project. In the Create project dialog, update the Name field to enter a unique display name for your project.
- Optional: In the Description field, provide a project description.
- Click Create. Your project is listed on the Data science projects page.
- Click the name of your project, select the Connections tab, and click Create connection.
- In the Connection type drop down, select S3 compatible object storage - v1.
- In the Connection details section, enter the connection name, the access key, the secret key, endpoint to your storage bucket, and the region.
- Click Create.
1.1.3.2. Deploying the LLM
After you configure the data connection, you must deploy the model in OpenShift Container Platform AI web console.
Prerequisites
A user with admin
privileges has enabled the single-model serving platform on your OpenShift Container Platform cluster.
Procedure
- In the OpenShift Container Platform AI dashboard, navigate to the project details page and click the Models tab.
- In the Single-model serving platform tile, click Select single-model to open the Deploy model dialog.
Complete the following configurations:
- Model name: enter the model name in RFC compliant format.
- Serving runtime: select the serving runtime you configured.
- Model framework: select vLLM.
- Model server size: select Large.
- Accelerator: select the accelerator you configured.
- Number of accelerators: enter 8
- Make deployed models available through an external route: enable the option.
- Require token authentication: enable the option.
- Existing connection: enable existing connection under Source model location.
- Connection: select the data connection name for the data connection you created
- Path: Enter the path to your model. For example, Llama-3.1-8B-Instruct.
- Click Deploy. On the Models tab, an endpoint and token will be provided for your model after it is provisioned.
Export variables to access the model after the model status becomes Ready. Replace <values> in the command with applicable values for the variables.
$ export SERVING_NAME=<serving-name> $ export NAMESPACE=<project-name> $ export TOKEN=$(oc get secret -n $NAMESPACE default-name-$SERVING_NAME-sa -o go-template='{{ .data.token }}' | base64 -d) $ export ENDPOINT=https://$(oc get route -n istio-system ${SERVING_NAME}-${NAMESPACE} -o go-template='{{ .spec.host }}') $ curl -k -w %{certs} $ENDPOINT > ca-cert.pem $ export SSL_CERT_FILE=ca-cert.pem $ export REQUESTS_CA_BUNDLE=ca-cert.pem $ export OPENAI_API_BASE="$ENDPOINT/v1"
NoteNote the
OPENAI_API_BASE
endpoint URL.Verification
You can verify the successful deployment of the model by using the following
curl
command to write a short Python programme by using the model. ReplacePROVIDED_ENDPOINT
andPROVIDED_TOKEN
with the values on the Models tab. The following command has an example model name which must be replaced with the model that you deployed.
$ export ENDPOINT=PROVIDED_ENDPOINT $ export TOKEN=PROVIDED_TOKEN $ curl -k -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ -d '{"model": "llama-3.1-8B-instruct", \ "prompt": "Write a hello world program in python", \ "max_tokens": 100, "temperature": 0.01 }' \ $ {ENDPOINT}/v1/completions
After you deploy the LLM, scale down the service by using the following command:
$ oc process deploy-model -p SERVING_NAME=<serving-name> -p MODEL_PATH=<model-path> | oc delete -f -
1.1.4. Preparing the large language model for analysis
To access the large language model (LLM), you must create an API key for the model and update settings in MTA with Developer Lightspeed to enable the extension to use the LLM.
1.1.4.1. Configuring the OpenAI API key
After you deployed a model and exported the self-signed SSL certificate, you must configure the OpenAI API compatible key to use the large language model (LLM). Update the API key and base URL in the `provider-settings`YAML configuration to use the LLM for an MTA with Developer Lightspeed analysis.
Procedure
Enter the following command to generate the
OPENAI_API_KEY
:$ export OPENAI_API_KEY=$(oc create token --duration=87600h -n ${NAMESPACE} ${SERVING_NAME}-sa)
Next, you must enter the following configuration in the
provider-settings.yaml
in the MTA with Developer Lightspeed Visual Studio (VS) Code extension.openshift-kai-test-generation: &active environment: SSL_CERT_FILE: "<name-of-SSL_CERT_FILE>" REQUESTS_CA_BUNDLE: "<name-of-REQUESTS_CA_BUNDLE>" OPENAI_API_KEY: "<OPENAI_API_KEY>" provider: "ChatOpenAI" args: model: "<serving-name>" base_url: "https://<serving-name>-<data-science-project-name>.apps.konveyor-ai.migration.redhat.com/v1"
You must now be able to use the model for application analysis by using the MTA with Developer Lightspeed extension.
1.2. Configuring the LLM in Podman Desktop
The Podman AI lab extension enables you to use an open-source model from a curated list of models and use it locally in your system.
Prerequisites
- You installed Podman Desktop in your system.
- You completed initial configurations in MTA with Developer Lightspeed required for the analysis.
Procedure
- Go to the Podman AI Lab extension and click Catalog under Models.
- Download one or more models.
- Go to Services and click New Model Service.
- Select a model that you downloaded in the Model drop down menu and click Create Service.
- Click the deployed model service to open the Service Details page.
- Note the server URL and the model name. You must configure these specifications in the MTA with Developer Lightspeed extension.
Export the inference server URL as follows:
export OPENAI_API_BASE=<server-url>
-
In the VS Code, click Configure GenAI Settings to open the
provider-settings.yaml
file. Enter the model details from Podman Desktop. For example, use the following configuration for a Mistral model.
podman_mistral: provider: "ChatOpenAI" environment: OPENAI_API_KEY: "unused value" args: model: "mistral-7b-instruct-v0-2" base_url: "http://localhost:35841/v1"
NoteThe Podman Desktop service endpoint does not need a password but the OpenAI library expects the
OPENAI_API_KEY
to be set. In this case, the value of theOPENAI_API_KEY
variable does not matter.