We are seeking a highly skilled and experienced Lead AI/ML Engineer with 8 to 10 years of experience to spearhead the development, optimization, and deployment of cutting-edge AI models. In this role, you will bridge the gap between state-of-the-art Generative AI (LLMs, RAG) and resource-constrained environments (Edge AI, Embedded Systems). You will design robust NLP pipelines, optimize complex deep learning models for real-time execution, and deploy them onto Jets on platforms.
If you thrive on making massive models run efficiently on tiny hardware, this role is for you.
Key Responsibilities :
– LLM & RAG Systems : Design, architect, and implement production-grade Retrieval-Augmented Generation (RAG) pipelines utilizing advanced Vector Databases (e.g., Pinecone, Milvus, Qdrant, Chroma) and orchestration frameworks like LangChain or LlamaIndex.
– Model Fine-Tuning : Fine-tune open-source LLMs and Vision-Language Models (VLMs) using techniques like LoRA, QLoRA, and P-tuning for domain-specific applications.
– Core NLP Tasks : Develop and maintain core NLP pipelines including Named Entity Recognition (NER), text classification, sentiment analysis, and semantic search using Hugging Face Transformers.
– Prompt Engineering : Architect complex, system-level prompts and implement guardrails to ensure deterministic, safe, and context-aware model outputs.
– Model Compression : Apply advanced optimization techniques, including Quantization (INT8/FP16 calibration, PTQ, QAT), pruning, and knowledge distillation, to reduce model footprint without sacrificing accuracy.
– Compilers & Runtimes : Convert and optimize deep learning models from PyTorch to deployment-ready formats using ONNX Runtime and NVIDIA TensorRT.
– Hardware Deployment : Deploy, benchmark, and profiles models directly on NVIDIA Jetson Platforms (Jetson Nano, Orin Nano, Orin NX/AGX) ensuring optimal utilization of GPU and DLA (Deep Learning Accelerator) cores.
– Embedded Linux : Develop within Embedded Linux environments, including writing efficient C++/Python wrapper code, managing dependencies, and flashing/configuring Jetson boards.
– Real-time Applications : Architect end-to-end, low-latency, real-time AI applications capable of processing streaming data (text, audio, or video) at the edge.
Required Skills & Qualifications :
– Core Languages : Expert-level proficiency in Python and standard data science libraries (NumPy, Pandas, Scikit-learn). Familiarity with C++ for edge deployment is a strong plus.
– Deep Learning Frameworks : Advanced hands-on experience with PyTorch and the Hugging Face ecosystem.
– Edge Optimization Tools : Deep technical understanding of TensorRT, ONNX, OpenVINO, and quantization frameworks (e.g., BitsAndBytes, AWQ, GPTQ).
– Frameworks & Infrastructure : Proven experience with LangChain, Docker, and interacting with specialized Vector Databases.
– Hardware Ecosystem : Direct experience working with NVIDIA Jetson Nano or higher-end Jetson Orin hardware modules, JetPack SDK, and DeepStream SDK.
Experience :
– 8-10 years of professional experience in AI/ML engineering, with at least 3+ years specifically focused on Edge AI deployment or LLM engineering.
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#AlbionarcJobs#FintechJobs
#AsiaJobs#MiddleEastCareers
#TechTalent#FintechRecruitment
#FinanceOpportunities#
