- Sebenzisa ukulungisa kahle (i-PEFT, i-LoRA) kanye nezitaki ezikudivayisi ezifana ne-LiteRT ukuze ulungise ama-LLM ngendlela engabizi kakhulu.
- Hlanganisa izinga lemodeli, izinga lesistimu, ukuhlolwa kwe-inthanethi nokungaxhunyiwe ku-inthanethi ngezilinganiso ezahlukahlukene kanye nokubuyekezwa kwabantu.
- Ukubonwa okugcwele kwethuluzi nge-Prometheus, i-OpenTelemetry kanye ne-GPU metrics ukuqapha ukubambezeleka, amathokheni kanye nokuphepha.
- Hlanganisa ama-LLMOp, ama-loop okulinganisa kanye nezilawuli zobumfihlo eziqinile ukuze usebenzise ama-LLM ngokuthembekile ekukhiqizeni.
Amamodeli Olimi Olukhulu (ama-LLM) asuka kuma-demo amahle aye kwingqalasizinda ebalulekile yomsebenzi, futhi lokho kushintsha konke mayelana nendlela esizihlela ngayo, esizihlola futhi esizisebenzisa ngayo. Uma i-chatbot yakho isiza odokotela, abameli noma amaqembu ezokuthutha ukuthi benze izinqumo zangempela, awusakwazi ukuphatha imodeli njengebhokisi elimnyama "elibonakala lihlakaniphe ngokwanele" ngaphandle kokuhlola ukuthi lisebenza kanjani. imikhawulo kanye nokubandlululaUdinga indlela ehlelekile yokulandelela yonke isicelo, ukukala ikhwalithi, ukulawula izindleko nokufakazela ukuthi uhlelo lusebenza ngokuphepha ngokuhamba kwesikhathi.
Lo mhlahlandlela uhlanganisa izinsika ezintathu ezivame ukuhlala emibhalweni ehlukene: amasu okulungisa kahle, izinhlaka zokuhlola kanye nokuqashelwa komkhiqizo, futhi iwahlanganise abe yincwadi eyodwa yokudlala yokuhlela. Sizohamba ngendlela yokukhetha phakathi kokulungisa kahle okugcwele kanye nokulungisa kahle amapharamitha, indlela yokuklama ukuhlolwa okuqinile kwe-LLM (ku-inthanethi nokungaxhunyiwe ku-inthanethi, imodeli kanye nezinga lesistimu), indlela yokulandelela amathuluzi kanye nezilinganiso nge-OpenTelemetry kanye ne-Prometheus, kanye nendlela yokuhlanganisa konke lokho nomsebenzi oqhubekayo, oqaphelwa yibhizinisi.
Amasu okulungisa kahle ama-LLM: i-full vs i-PEFT kanye ne-LoRA
Uma uvumelanisa i-LLM eqeqeshwe kusengaphambili nesimo sakho sokusetshenziswa, ukukhetha kokuqala kokwakha ukuthi mangaki amapharamitha ozowathinta ngempela, ngoba leso sinqumo siqhuba izidingo zehadiwe, isikhathi sokuqeqeshwa, izindleko ngisho nendlela oyisebenzisa ngayo imodeli ekukhiqizeni.
Ukulungisa kahle okuphelele kusho ukuthi ubuyekeza lonke isethi yamapharamitha esisekelo se-LLM ngesikhathi sokuqeqeshwa, okuyinto engokoqobo kuphela uma unesethi yedatha enkulu, esezingeni eliphezulu, eqondene nomsebenzi kanye nokubala okungathi sína. Le ndlela iwusizo uma idatha yesizinda sakho ihluka kakhulu ku-corpus yokuqala yangaphambi kokuqeqeshwa - isibonelo, umsizi wezomthetho oqeqeshwe ngomthetho wamacala athile noma ithuluzi lokusekela emtholampilo lezindawo ezikhethekile zezokwelapha.
I-Parameter-Efficient Fine-Tuning (PEFT) iyindlela yokuhlinzwa ekhetheke kakhulu yomodeli ngokuqandisa izisindo zokuqala nokwengeza izingxenye ezincane, eziqeqeshwayo, njengamamojula okuguquguquka kwezinga eliphansi. Esikhundleni sokubhala kabusha ikhasi ngalinye lencwadi yezincwadi enamakhasi ayi-1,000, empeleni unamathisela inqwaba yeposi ebhalwe phansi ngolwazi lwesizinda. Ukuqeqeshwa kugxila kulezi zinhlaka ezengeziwe, okugcina ukusetshenziswa kwememori ye-GPU kanye nesikhathi sewashi lodonga kuphansi kakhulu.
I-LoRA (Low‑Rank Adaptation) kanye ne-QLoRA yizindlela ze-PEFT ezisetshenziswa kakhulu namuhla, ukufaka ama-matrices asezingeni eliphansi kuma-projection okunaka okubalulekile ukuze ukwazi ukuzivumelanisa nokuziphatha ngenani elincane lamapharamitha engeziwe. I-QLoRA ibeka amaqhinga okulinganisa phezulu ukuze inyuse ukusetshenziswa kwememori phansi kakhulu, ivumela ukulungiswa kahle kwamamodeli amakhulu ngokumangazayo ku-GPU eyodwa noma ngisho nehadiwe ye-prosumer ngenkathi isafinyelela ikhwalithi yokuncintisana.
Ukuqalisa nokulungiselela ama-LLM kudivayisi nge-LiteRT kanye ne-MediaPipe
Akuwona wonke ama-LLM plasti adinga iqoqo lama-GPU efwini; ngezinye izikhathi ufuna imodeli isebenze ngokuphelele kudivayisi, kungaba ngenxa yokubambezeleka, ubumfihlo, ukusetshenziswa ungaxhunyiwe ku-inthanethi noma izizathu zezindleko. Yilapho i-LiteRT kanye ne-MediaPipe LLM Inference stack zisebenza khona.
I-MediaPipe LLM Inference API ikuvumela ukuthi usebenzise ama-LLM ombhalo abe umbhalo ngqo kuziphequluli nezinhlelo zokusebenza zeselula, ukukhiqiza umbhalo, ukufingqa amadokhumenti noma ukuphendula imibuzo ngaphandle kokuthumela izixwayiso kuseva ekude. Amamodeli ashicilelwe ku-LiteRT Community asevele eza ngefomethi ehambisanayo, ngakho-ke ugwema izinyathelo ezinde zokuguqula ngokwezifiso, futhi ungazisebenzisa kusuka ku-app bundle yakho noma kusitoreji sendawo.
Uma uhlela umsebenzi we-LLM Inference, ulawula ukuziphatha ngezinketho ezimbalwa eziyinhloko ezifana modelPath (lapho imodeli ye-LiteRT ihlala khona kuphrojekthi yakho), maxTokens (inani eliphelele lokufaka kanye namathokheni okukhiphayo ocingweni olulodwa), topK (inani lamathokheni acatshangelwayo esinyathelweni ngasinye sesizukulwane), temperature (ukungahleliwe vs ukuzimisela), randomSeed (kwezizukulwane eziphindaphindwayo), kanye nokuphinda ubize ngokuzithandela nge resultListener futhi errorListener ukusetshenziswa okungenasikhathi.
Ngale kokukhiqiza i-vanilla, i-API isekela ukukhetha phakathi kwamamodeli amaningi nokusebenzisa ama-adaptha e-LoRA ukuze kwenziwe ngokwezifiso, ukuze ukwazi ukuthumela imodeli yesisekelo esiqinile kanye nama-LoRA amaningi ahlelwe ngokwezizinda ezahlukene (isibonelo, ukwesekwa kwamakhasimende, ukufingqa, noma ukubuyekezwa kwekhodi) bese uwashintsha ngokuguquguqukayo ngesikhathi sokusebenza kumadivayisi anikwe amandla yi-GPU.
Ukukhetha nokusebenzisa imindeni evulekile ye-LLM (i-Gemma nabangani)
Kokusetshenziswa okusedivayisini kanye nokusetshenziswa okulula, amamodeli amancane avulekile njengomndeni wakwaGemma kanye nezinhlobo ezincane zeGemma‑2 zikhanga kakhulu, ngoba zilinganisa kahle phakathi kwamandla nezidingo zezinsizakusebenza.
I-Gemma‑3n E2B kanye ne-E4B zenzelwe ngqo ihadiwe elinganiselwe, kusetshenziswa ukwenziwa kwepharamitha ekhethiwe ukuze kube yisethi encane yamapharamitha esebenzayo ngethokheni ngayinye. Empeleni, lokhu kukunikeza ikhwalithi yamamodeli anezigidigidi zamapharamitha ngenkathi kuveza inani lamapharamitha "elisebenzayo" eliseduze ne-2B noma i-4B, eliphathekayo kakhulu kuma-GPU eselula kanye nezimo zesiphequluli.
I-Gemma‑3 1B iyindlela elula nelula, enesisindo esivulekile esingaba yibhiliyoni elilodwa esifakwe kumafomethi alungele i-LiteRT (njenge .task futhi .litertlm) ye-Android newebhu. Uma uyisebenzisa nge-LLM Inference API, uvame ukukhetha phakathi kwama-backend e-CPU ne-GPU, qiniseka ukuthi maxTokens ifanisa ubude bomongo obubhakwe kumodeli, bese uyigcina numResponses ku-1 ohlangothini lwewebhu ukuze kusebenze kahle.
I-Gemma‑2 2B ithuthukisa ikhwalithi yokucabanga yesigaba sayo sobukhulu ngenkathi isalokhu incane ngokwanele ukuthi isebenze kabanzi, futhi isebenza njengesisekelo esiqinile sabasizi abakudivayisi noma ama-ejenti esizinda akhethekile, ikakhulukazi uma kuhlanganiswa nama-adaptha e-LoRA kanye nokuhlola ngokucophelela.
Ukuguqula ama-LLM e-PyTorch abe yi-LiteRT nokuwapakisha
Uma uqala ngemodeli yokukhiqiza ye-PyTorch, ungayiguqula ibe yinto yobuciko ehambisana ne-MediaPipe enethuluzi le-LiteRT Torch Generative, ephatha ukuhumusha igrafu, ukulinganisa kanye nokuthunyelwa kwesiginesha okudingekayo ukuze kutholakale kahle idivayisi.
Indlela yokusebenza ephezulu ibukeka kanje: landa izindawo zakho zokuhlola ze-PyTorch, sebenzisa ukuguqulwa kwe-LiteRT Torch Generative ukuze ukhiqize .tflite ifayela, bese udala inqwaba yomsebenzi ehlanganisa leli fayela lemodeli namapharamitha e-tokenizer kanye ne-metadata. Iskripthi se-bundler (nge- mediapipe.tasks.python.genai.bundler) ithatha into yokucushwa ehlanganisa indlela ye-TFLite, i-SentencePiece tokenizer, ama-start and stop tokens, kanye negama lefayela elifunekayo lokukhipha.
Ngenxa yokuthi lokhu kuguqulwa kwenza ukulungiswa okuqondiswe ku-CPU futhi kungaba yinto edinga inkumbulo eningi, ngokuvamile udinga umshini we-Linux one-RAM okungenani engu-64 GB, futhi uzofuna ukufaka inguqulo efanele ye-MediaPipe evela ku-PyPI ukuze uthole iskripthi sokuhlanganisa. Okukhiphayo kuyiphakheji yomsebenzi equkethe yona ngokwayo uhlelo lwakho lokusebenza lwe-Android noma lwewebhu olungayisebenzisa nge-LLM Inference API ngaphandle kwekhodi eyengeziwe yokunamathisela.
Ngaphakathi kokucushwa kwe-bundling ucacisa zonke izinto ezibalulekile zesikhathi sokusebenza njengemodeli ye-tokenizer, amathokheni okulawula kanye nezindlela zokukhipha, ukuze into yokugcina ihlanganise yonke into edingekayo ukuze kutholakale incazelo kusukela ekuqaleni kuze kube sekupheleni, kugcinwe ukusetshenziswa kuphindaphindwa futhi kube lula ukuhlola izinguqulo ezahlukene ku-CI/CD.
Ukwenza ngokwezifiso kwe-LoRA: kusukela ekuqeqeshweni kuya ekucabangeni okusedivayisini
I-LoRA akuyona nje icebo lokuqeqesha; kufanele futhi ucabange ukuthi lawo ma-adaptha asezingeni eliphansi amelelwa futhi alayishwa kanjani esitokisini sakho sokuphetha, ikakhulukazi uma ufuna ukuzisebenzisa ngokukhetha kumadivayisi asekelwa yi-GPU.
Ngesikhathi sokuqeqeshwa, ngokuvamile uthembele emitatsheni efana ne-PEFT ukuchaza ukucushwa kwe-LoRA kwezakhiwo ezisekelwayo njenge-Gemma noma i-Phi‑2, ukukhomba i-adaptha kumamojula ahlobene nokunaka kuphela. Ku-Gemma, lokho kuvame ukusho ukugoqa q_proj, k_proj, v_proj futhi o_proj; ku-Phi‑2, iphethini evamile iwukuhlela ukuqagela kokunaka kanye nesendlalelo esiqinile esiyinhloko. Izinga r in LoraConfig ilawula ukuthi zingaki izinhlaka ezintsha ozingezayo kanye nomthamo wokuveza we-adaptha.
Ngemva kokulungisa kahle isethi yakho yedatha, indawo yokuhlola ephumayo igcinwa njenge- adapter_model.safetensors ifayela, eliphethe kuphela izisindo ze-LoRA. Ukuze ucindezele lokhu kuphayiphi yakho ye-MediaPipe, uguqula i-adaptha ibe ifayela le-TFLite le-LoRA elithile usebenzisa i-converter ye-MediaPipe, udlulise i-a ConversionConfig okuhlanganisa izinketho zemodeli eyisisekelo, i-backend ye-GPU (ukusekelwa kwe-LoRA kuyi-GPU kuphela lapha), indlela yokuhlola ye-LoRA, izinga elikhethiwe kanye negama lefayela le-TFLite eliphumayo.
Isinyathelo sokuguqula sikhiqiza ama-flatbuffer amabili: elilodwa le-LLM eqandisiwe kanye nelinye le-overlay ye-LoRA, futhi kokubili kuyadingeka ngesikhathi sokuphetha. Ku-Android, isibonelo, uqala umsebenzi wokuphetha we-LLM ngokukhomba modelPath kuya kumodeli eyisisekelo yezinto zobuciko kanye loraPath kufayela le-LoRA TFLite, kanye namapharamitha ajwayelekile okukhiqiza afana maxTokens, topK, temperature futhi randomSeed.
Ngokombono wonjiniyela wohlelo lokusebenza, ukusebenzisa imodeli ekhuliswe yi-LoRA kusobala: usashayela ucingo generateResponse() noma uhlobo lwayo lwe-async, kodwa ngaphansi kwe-hood izinsimbi ze-LoRA zilawula ukunaka, zikunikeza ukuziphatha okuqondile kwesizinda ngaphandle kokuthumela imodeli enkulu, elungiswe kahle ngokuphelele.
Ukushisa kwe-LLM kanye nokuziphatha kokubhala ikhodi kuyasebenza
Phakathi kwama-hyperparameter okuhlukanisa, izinga lokushisa yilona elibumba ngqo indlela i-LLM yakho ezizwa ngayo “ngokudala” noma ngokulondoloza, ngoba iphinda ilinganise ukusatshalaliswa kwamathuba phezu kwethokheni elandelayo ngesikhathi sokukhiqiza. Inani elingu-1.0 lisebenzisa ukusatshalaliswa okungavuthiwe; amanani angaphansi kuka-1 ayalola ukuze amathokheni anokwenzeka kakhulu abe namandla kakhulu, kuyilapho amanani angaphezulu kuka-1 ewenza abe yisicaba futhi anikeze amathokheni angase abe namathuba aphansi ithuba elingcono.
Emazingeni okushisa aphansi (isibonelo 0.1-0.2) imodeli iziphatha ngendlela eqinisekile, ukubuyisa imiphumela efanayo kakhulu ukuze kutholakale ngokushesha okufanayo futhi kukhethwe ukuqedwa okuphephile nokungamangazi. Lokhu kuyathandeka ezimweni ezilawulwa kakhulu njengokufingqwa kwezomthetho, ukubika kwezokwelapha noma izincazelo zezimali, lapho ukuvumelana, ukucaca kanye nesisekelo samaqiniso kubaluleke kakhulu kunobuciko besitayela.
Amazinga okushisa aphakathi nendawo acishe abe ngu-0.7-0.9 avame ukuba mnandi kuma-chatbot nabasizi okufanele bazwakale njengabantu kodwa bahlale besendleleni efanele, ukufaka ukuhlukahluka okwanele ukugwema izimpendulo eziphindaphindayo ngenkathi ngokuvamile kulondolozwa ukuhambisana. Imikhiqizo eminingi yengxoxo isebenza kuleli banga futhi ihlanganisa izinga lokushisa nemikhawulo efana namathokheni aphezulu okukhipha kanye nezihlungi zokuphepha.
Amazinga okushisa aphezulu kakhulu aseduze no-2.0 enza imodeli ithambekele kakhulu ezizukulwaneni ezingaqondani noma ezingaphandle kwesihloko, okungase kube mnandi ekucabangeni ngamathoyizi kodwa akuvamile ukwamukeleka emisebenzini ebalulekile. Njengokujwayelekile, ulungisa izinga lokushisa ngokubambisana nezinye izilinganiso zokuthatha isampula (top‑k, top‑p, izinhlawulo zokuphindaphinda) bese uqinisekisa umthelela ngokuhlola okuhlelekile, hhayi ukuqonda kuphela.
Kungani ukuhlolwa kwe-LLM okuqinile kungenakuxoxiswana ngakho
Njengoba izinhlangano zifaka ama-LLM emisebenzini yokusebenza kusukela ekuhleleni ukunakekelwa kwempilo kuya ekuhlolweni kwezomthetho kanye nokuhlela uchungechunge lokuhlinzeka, Izindleko zemiphumela emibi zikhuphuka ngokushesha - cabanga ngokuxilongwa okungaqondakali, izincomo ezichemile noma izimpendulo ezinobuthi ezilethwa ngezinga elikhulu. Yingakho ukuhlolwa kungeke kube yinto ecatshangelwa kamuva noma ukulinganisa okukodwa; kufanele kube yingxenye yesiko kanye nomjikelezo wokuphila kwezinhlelo zakho ze-AI.
Ukuhlolwa kwe-LLM, okuyisisekelo sako, kumayelana nokulinganisa ngokuhlelekile ukuthi imodeli iziphatha kanjani ngezindlela ezine: ukunemba, ukusebenza kahle, ukuthembeka kanye nokuphepha, kusetshenziswa ingxube yezilinganiso zobuningi kanye nokwahlulela komuntu. Kwenziwe kahle, kunikeza abathuthukisi kanye nababambiqhaza isithombe esicacile samandla, ubuthakathaka, izindlela zokwehluleka kanye nokufaneleka kwenhloso kuzo zonke izizinda ezahlukene kanye nezigaba zomsebenzisi.
Izinzuzo zihlanganisa izendlalelo eziningi zesitaki: uthuthukisa ukusebenza kwemodeli eluhlaza, wembula futhi unciphise ukucwasa okulimazayo, uqinisekise ukuthi izimpendulo zihlala zisekelwe eqinisweni, futhi uqinisekise ukuthi ukuziphatha kwezilimi eziningi kanye nesizinda esithile kuyahlangabezana nokulindelekile, konke lokhu ngenkathi ulandela ukuthi lezi zakhiwo zishintsha kanjani njengoba ulungisa, ubuyekeza izixwayiso, noma ukhipha izinguqulo ezintsha zamamodeli.
Ngenxa yokuthi i-LLM efanayo ingasetshenziselwa yonke into kusukela engxoxweni edlalwayo kuya ekusekelweni kwezinqumo ezibaluleke kakhulu, isu lakho lokuhlola kumele lihambisane kahle nemigomo yebhizinisi kanye nokubekezelelana kwezingozi, kunokuthembela kuphela kumabhodi wabaphambili ajwayelekile noma amaphuzu avela kubantu abaningi.
Izicelo ezibalulekile zokuhlolwa kokusebenza kwe-LLM
Ukusetshenziswa okusobala kokuhlola ukuqapha nokuthuthukisa ukusebenza okuyisisekelo: ukuthi imodeli iqonda kahle kangakanani imiyalelo, ihumusha umongo futhi ithola noma ibhala ulwazi olufanele, uma ubheka uhlobo lwezixwayiso abasebenzisi bakho abazithumelayo ngempela. Lapha uhlanganisa amamethrikhi athile emisebenzi namasethi edatha alungisiwe esizinda ukuze ulandelele inqubekela phambili ngokuhamba kwesikhathi.
Enye indawo ebalulekile ukutholakala kobandlululo kanye nokunciphisa, njengoba idatha yokuqeqesha ingaveza ubandlululo lomphakathi oluvela emiphumeleni ekhiqizwe, ukukhiqiza okuqukethwe okungafanele, okunohlangothi olulodwa noma okubandlululayo. Ukuhlolwa okuvamile kusetshenziswa izixwayiso ezikhethiwe kanye nezibonelo ezibhalwe amalebula kukusiza ukuthi uveze lezi zinkinga futhi unciphise ukuziphatha okulimazayo ngokuhlelwa kwedatha, ukulungiswa kahle kanye nezinqubomgomo zokuphepha.
Ukuqhathanisa iqiniso eliyisisekelo yilapho uqhathanisa khona imiphumela yemodeli namaqiniso aqinisekisiwe noma izimpendulo ezilindelekile, ukumaka isizukulwane ngasinye ngokunemba, ukuphelela kanye nokufaneleka. Kungakhathaliseki ukuthi usebenzisa ama-annotator abantu noma ukuhlola amaqiniso okuzenzakalelayo kanye nokuqinisekisa okusekelwe ekutholeni, le nqubo yembula ukuthi imodeli ivame kangakanani ukubona izinto ezingekho, ishiya imininingwane ebalulekile, noma igcizelela ukuzethemba kwayo.
Ukuqhathanisa imodeli kungenye indlela ewusizo: uma ukhetha phakathi kwemindeni noma izinhlobo ezahlukene ze-LLM, Usebenzisa ibhethri elifanayo lokuhlola kubo bonke abantu abazongenela ukhetho ukuze ubone ukuthi yiliphi elinikeza ukuguquguquka okungcono kakhulu kokunemba, ukubambezeleka, izindleko kanye nokuphepha komthwalo wakho womsebenzi kanye nesizinda sakho, esikhundleni sokuthembela ekulinganisweni kwezilinganiso ezijwayelekile.
Izinhlaka zokuhlola kanye nezilinganiso zama-LLM
Ukuhlolwa kwebanga lebhizinisi akuvamile ukuncika enombolweni eyodwa; kunalokho, uhlanganisa ithuluzi lezinhlaka nezilinganiso ezenzelwe imisebenzi yakho, ukuhlanganisa izivivinyo eziqaphela umongo, impendulo yabantu, izimpawu ze-UX kanye nezilinganiso ezijwayelekile uma kufaneleka.
Ukuhlolwa okuqondene nomongo kubuza ukuthi imiphumela iyafana yini nesizinda sakho, ithoni kanye nephrofayili yengozi, isibonelo ngokuhlola ukuthi imodeli esetshenziswa ezikoleni igwema okuqukethwe okunobuthi, ulwazi olungelona iqiniso kanye nolimi olubandlululayo, kuyilapho i-chatbot yokuthengisa ibhekwa kakhulu ngesilinganiso sesisombululo, ithoni yezwi kanye nokufaneleka komkhiqizo. Izilinganiso ezijwayelekile lapha zifaka phakathi ukufaneleka, ukunemba kokuphendula imibuzo, amaphuzu e-BLEU kanye ne-ROUGE, izilinganiso zobuthi kanye nemvamisa yokubona izinto ezingekho emthethweni.
Ukuhlolwa okuqhutshwa ngumsebenzisi, okuvame ukubhekwa njengendinganiso yegolide, kufaka ababuyekezi abangabantu ku-loop ukuze bathole izimpendulo ngokuhambisana, ukuba usizo, inhlonipho kanye nokuphepha, okubaluleke kakhulu ezinkingeni ezicashile ezingatholakali kalula ngamamaki azenzakalelayo. Inkinga izindleko nesikhathi, ikakhulukazi ngezinga, ngakho-ke uvame ukuhlanganisa izibuyekezo zabantu nokuhlolwa okuzenzakalelayo.
Izilinganiso ze-UI/UX ziqedela isithombe ngokugxila endleleni abasebenzisi abalubona ngayo uhlelo kunokuthi luthola kanjani amaphuzu kubhentshimakhi, ukulandelela ukwaneliseka komsebenzisi, izimpawu zokukhungatheka, isikhathi sokuphendula esibonwayo kanye nendlela imodeli ethola ngayo amaphutha noma ukungaqondani kahle. Lezi zimpawu zihambisana ngqo nama-KPI ebhizinisi njengokugcina kanye nempumelelo yomsebenzi.
Ama-benchmarks okuqhathanisa ajwayelekile njenge-MT‑Bench, i-AlpacaEval, i-MMMU noma i-GAIA ahlinzeka ngamasethi emibuzo nezimpendulo ezijwayelekile zokulinganisa amakhono abanzi, kodwa ngokwemvelo azinalo ulwazi oluphelele ngesizinda. Zinhle kakhulu ekuhlolweni kokuqonda kwezinga eliphezulu kanye nokuqhathaniswa kwamamodeli ahlukahlukene, kodwa kumele zihambisane nokuhlolwa okubonisa izimo zakho zokusebenzisa kanye nedatha.
Ukuhlolwa kwe-LLM yezinga lemodeli uma kuqhathaniswa nezinga lesistimu
Kuwusizo ukuhlukanisa phakathi kokuhlola imodeli engenalutho nokuhlola uhlelo oluphelele olwakhelwe kuyo, ngoba izinkinga eziningi zomhlaba wangempela zivela ekuhleleni kokuhlelwa, emipayipini yokubuyisa noma ezingqimbeni zokuphepha, hhayi ezisindweni ze-LLM eziyisisekelo zodwa.
Ukuhlolwa kwezinga lemodeli kugxila emakhonweni ajwayelekile njengokucabanga, ukuhambisana, ukuphathwa kwezilimi eziningi noma ukumbozwa kolwazi, ngokuvamile zisebenzisa amabhentshimakhi abanzi njenge-MMLU noma amasethi okuhlola ngokwezifiso aklanyelwe ukwelula imodeli ezimweni eziningi. Lawa maphuzu azisa ukuthi yimaphi amamodeli ayisisekelo owakhethayo nokuthi ungatshala kuphi ekulungiseni kahle.
Ngakolunye uhlangothi, ukuhlolwa kwezinga lesistimu kulinganisa ukuthi lonke uhlelo lokusebenza lusebenza kanjani endaweni yalo yangempela kanye nesimo sokusetshenziswa, kufaka phakathi izingxenye zokubuyisa, izingcingo zamathuluzi, amaphethini e-multi-agent, izithiyo zokuvikela, ukugcinwa kwesikhashana kanye ne-business logic. Ama-metric lapha angafaka phakathi ukunemba kokuthola, impumelelo yomsebenzi kusukela ekuqaleni kuya ekugcineni, ukunemba kwesizinda esithile, kanye nokwaneliseka komsebenzisi, okukunikeza umbono ongokoqobo wokuziphatha kokukhiqiza.
Empeleni, zombili lezi zindlela ziyadingeka: izivivinyo ezigxile kumamodeli ziqhuba izinqumo eziyisisekelo ze-R&D kanye nezakhiwo, kuyilapho izivivinyo ezigxile ohlelweni zisekela ukuphindaphinda okusheshayo, ukulungiswa kwe-UX kanye nokuhambisana nokulindelwe ngabasebenzisi kanye nezidingo zomthetho.
Ukuhlolwa kwe-LLM oku-inthanethi uma kuqhathaniswa nokungaxhunyiwe ku-inthanethi
Esinye isici esibalulekile ukuthi ngabe ukuhlolwa kwenzeka ungaxhunyiwe ku-inthanethi ezindaweni ezilawulwayo noma ku-inthanethi ngokumelene nethrafikhi yangempela yokukhiqiza, Imodi ngayinye inikeza amandla ahlukile kanye nokushintshana.
Ukuhlolwa okungaxhunyiwe ku-inthanethi kusebenzisa amasethi edatha ahleliwe, izixwayiso zokwenziwa noma ithrafikhi yesithunzi ukuhlola amamodeli ngaphambi kokuthi athinte abasebenzisi ababukhoma, ukuqinisekisa ukuthi ukusebenza okuyisisekelo kuhlangabezana nebha encane, ukuthi izihlungi zokuphepha zibamba izinkinga ezisobala, nokuthi ukuhlehla kuyatholwa ngaphambi kokukhishwa. Leli yisango lakho langaphambi kokuqaliswa, elivame ukuzenzakalela kumapayipi e-CI.
Ukuhlolwa kwe-inthanethi kuthwebula indlela imodeli eziphatha ngayo ngokufaka kwangempela komsebenzisi, imikhawulo, amaphethini omthwalo kanye namacala onqenqema, Ukulandelela amamethrikhi abukhoma njengokwaneliseka komsebenzisi, amazinga okunyuka, imibiko yezigameko, kanye nokusebenza ngaphansi kwamaphrofayili ethrafikhi ahlukene. Kunamandla kakhulu uma kuhlanganiswa nokuhlolwa kwe-A/B ukuqhathanisa izixwayiso, ama-hyperparameter noma izinguqulo zamamodeli ngokusekelwe emiphumeleni yangempela yebhizinisi.
Ukusetha okuvuthiwe kuhlanganisa zombili izindlela ndawonye: izivivinyo ezingaxhunyiwe ku-inthanethi zisebenza njengenethiwekhi yokuphepha kanye nohlelo lokuxwayisa kusenesikhathi, kuyilapho izivivinyo eziku-inthanethi ziqondisa ukulungiswa okuhle futhi ziqinisekisa ukuthi ukulungiswa kuhumushela ngempela kokuhlangenwe nakho okungcono komsebenzisi kanye nengozi yokusebenza encishisiwe.
Izindlela ezinhle kakhulu: Ama-LLMOp, ukuhlolwa kwangempela kanye nama-metric suite acebile
Ukuze uphathe ama-LLM ngokuzibophezela ngezinga, udinga imikhuba ye-LLMOps efana ne-DevOps, kugcizelelwa ukuzenzekela, ukubambisana kanye nokulethwa okuqhubekayo, kodwa kugxile kudatha, amamodeli kanye nokuhlola. Lokhu kuvame ukuhlanganisa ososayensi bedatha, onjiniyela be-ML kanye namaqembu okusebenza ndawonye mayelana nokusebenzisana ngamathuluzi nezinqubo ezifana amaqembu e-ejenti yokwakha.
Amapulatifomu e-LLMOps enza ukuqeqeshwa nokusetshenziswa kwemodeli ngokuzenzakalela, aqaphe ikhwalithi kanye nokushelela, futhi ahlanganise izinyathelo zokuhlola ngqo kumapayipi e-CI/CD, ukuze lonke ushintsho kudatha, izixwayiso noma ikhodi lubangele ibhethri elijwayelekile lokuhlolwa. Umphumela uba ukuphindaphinda okusheshayo okunezimanga ezimbalwa ekukhiqizweni.
Ukuhlolwa komhlaba wangempela – ukubeka amamodeli phambi kwabasebenzisi bangempela noma izilingisi ezingokoqobo – kubalulekile ekutholeni izimo ezingalindelekile, ezingajwayelekile, ikakhulukazi ekusebenzisaneni kolimi okuvulekile. Ukuhlolwa kwelebhu okulawulwayo kungaqinisekisa ukuzinza kanye nokusebenza okuyisisekelo, kodwa izimpendulo ezingcolile, ezidalwe ngabantu zembula imizamo yokujaja ijele, amagama angacacile kanye namacala angaphandle okungekho sethi yedatha ekhethiwe eyayingawalindela.
I-arsenal ehlukahlukene ye-metric ibalulekile ekugwemeni umbono we-tunnel kumaphuzu owodwa njenge-BLEU noma ukudideka, ngakho-ke amadeshibhodi akho kufanele alandelele ukuhambisana, ukushelela, ukuba namaqiniso, ukufaneleka, ukuqonda komongo, ukubambezeleka, ukudlula kanye nezinkomba zokuphepha. Uma ubuka kabanzi, amathuba akho okubona ayanda ngokushesha.
Abeluleki kanye nabalingani bobunjiniyela abangochwepheshe kwizixazululo ze-AI ezenziwe ngokwezifiso bangasiza izinhlangano ukuthi zifake le mikhuba kusukela ekuqaleni kuze kube sekupheleni, kusukela ekwakheni amapayipi okuhlola nokuwahlanganisa ku-CI/CD kuya ekuqiniseni ukuthunyelwa kwamafu, ukusebenzisa ukubuyekezwa kokuphepha kanye namadeshibhodi okuxhumanisa ahlanganisa ukuziphatha kwemodeli ngqo nezilinganiso zebhizinisi.
Ukulinganisa ama-LLM: ukugeleza okusebenzayo kwezinyathelo ezinhlanu
Inqubo yokulinganisa ehlelekile ikusiza ukuthi usuke ekuhlolweni okungahleliwe uye ezinqumweni eziphindaphindwayo, eziqhutshwa idatha, ikakhulukazi uma uqhathanisa amamodeli amaningi, ukucushwa noma amasu okulungisa kahle.
Ukugeleza okuqinile kwezinyathelo ezinhlanu ngokuvamile kuqala ngokukhetha isethi yemisebenzi yokuhlola ekhombisa izimo zokusetshenziswa ezilula neziyinkimbinkimbi, ukuqinisekisa ukuthi uhlola imodeli kuyo yonke imikhakha yobunzima kanye nokumbozwa kwesizinda okuhambisana nohlelo lwakho lokusebenza.
Okulandelayo, uhlela noma wakha amasethi edatha angachemile futhi amele ngangokunokwenzeka, ukuthwebula imibuzo yangempela yomsebenzisi, ulimi oluqondile lwesizinda, amacala onqenqema ngisho nezimpendulo zokuphikisana. Lesi yisisekelo lapho zonke ezinye izendlalelo zokuhlola zincike khona.
Bese ulungisa isango lemodeli kanye nezindlela zokulungisa noma zokuzivumelanisa nezimo, njenge-adaptha ze-LoRA, ukuze i-benchmark yakho ibonise indlela imodeli ezosetshenziswa ngayo ngempela. Lokhu kufaka phakathi ukuvumelanisa ubude bomongo, amapharamitha okusampula kanye ne-middleware yokuphepha nezilungiselelo zokukhiqiza.
Uma indawo isilungile, uqhuba ukuhlolwa usebenzisa ingxube efanele yezilinganiso zomsebenzi ngamunye, kusukela ekudidekeni kobuchule bokulingisa ulimi kuya ku-ROUGE ukuze kufinyezwe, amaphuzu okuhlukahluka kokudala, kanye nokwahlulela kwabantu ngokufaneleka nokuvumelana.
Ekugcineni, wenza ukuhlaziya okuningiliziwe bese uqala umjikelezo wempendulo ophindaphindayo, ukunikeza ulwazi emuva ubunjiniyela obusheshayo, ukuhlanzwa kwedatha, amasu okulungisa kahle kanye nokulungiselelwa kwe-guardrail, ukuze ukulinganisa kube yi-loop yokuthuthukisa eqhubekayo kunokuba kube umbiko wesikhathi esisodwa.
Ukubonwa kwezinhlelo ze-LLM: ngale kokubambezeleka kwe-HTTP
Ukuqapha kwe-API yendabuko - amaphutha okubala kanye nokulinganisa isilinganiso se-HTTP latency - akwanele ngokwanele emisebenzini ye-LLM, ngoba izindlela eziningi zokwehluleka ezilimaza kakhulu zenzeka emigqeni, kwimemori ye-GPU noma ekuziphatheni kokusakaza kwethokheni isikhathi eside ngaphambi kokuba ungqimba lwakho lwewebhu luphakamise i-alamu.
Ukubonwa kwe-LLM kuncike epayipini lezimpawu eziningi elihlanganisa izibalo, imikhondo, izingodo, amaphrofayili, izivivinyo zokwenziwa kanye nama-SLO, okukunikeza umbono onemininingwane, obangela ukuthi isikhathi sichithwe kuphi, ukuthi yini egcwala kuqala nokuthi ulwazi lomsebenzisi lushintsha kanjani njengoba amaphethini omthwalo eshintsha.
Ezingeni le-metric, awukhathaleli nje izicelo ngomzuzwana kanye nokubambezeleka kwe-p99, kodwa futhi unendaba ne-time-token to first (TTFT), ukubambezeleka kwe-inter-token, ubude bomugqa, usayizi we-batch, ama-token ngomzuzwana, ukusetshenziswa kwe-GPU kanye nengcindezi ye-KV cache, njengoba lezi ziyizinkomba ezihamba phambili zokuwa komkhiqizo kanye nokuhamba kancane okubonakalayo komsebenzisi ezixhumi zokusakaza.
Ukulandelela, okusetshenziswa nge-OpenTelemetry, kuhlanganisa zonke izigaba zesicelo esisodwa - ukuqondisa, ukubuyisa, izingcingo zamathuluzi, izihlungi zokuphepha, ukwenziwa kwemodeli kanye nokucutshungulwa ngemva kokucubungula - ukuze lapho ukukhuphuka noma ukuphuma kwe-latency kwehla, ukwazi ukukhomba ukuthi imbangela iyisitolo se-vector esihamba kancane, i-GPU egcwele ngokweqile noma ingxenye ye-middleware engaziphathi kahle.
Amalogi asabalulekile ekulungiseni nasekuhlolweni kwabantu, kodwa ngokwezinga le-LLM kufanele uwaklame ngokucophelela, ukugwema izimfanelo ezingenamkhawulo ze-principal (njengezikhuthazo ezingavuthiwe, ama-ID eseshini noma izimpikiswano zamathuluzi agcwele) bese ugxila esikhundleni salokho ku-metadata ehlelekile, ephansi ye-principal efana nomndeni wemodeli, i-endpoint, isifunda, ikhodi yesimo kanye nezinhlobo zemiphumela eqinile.
Amapulani e-metric kanye nemigomo ye-semantic yama-LLM
Izinhlaka ezahlukene zokukhonza ze-LLM ziveza amagama ahlukene kancane e-metric, kodwa imiqondo eyisisekelo iyavumelana, kanye nezimiso ze-semantic ze-OpenTelemetry ze-GenAI seziqala ukuzihlanganisa zibe yi-schema ephathekayo.
Izinhlelo ezifana ne-Hugging Face TGI, i-vLLM kanye ne-NVIDIA Triton zivame ukunikeza ama-endpoints e-Prometheus anama-histograms ubude besikhathi sesicelo kusukela ekuqaleni kuze kube sekupheleni, amakhawuntara ezicelo zamathokheni akhiqizwe futhi aphumelele, amageji osayizi womugqa kanye nosayizi webhetshi, kanye nesikhathi esikhethekile ngethokheni kanye nezilinganiso ze-TTFT ezihambisana ngqo nolwazi lomsebenzisi.
I-GPU telemetry nayo ibaluleke kakhulu, futhi abathumeli abanjenge-adapter ye-DCGM ye-NVIDIA baveza izilinganiso ze-Prometheus zokusetshenziswa, ukusetshenziswa kwememori kanye nezinye izimpawu ezisezingeni eliphansi, ongakusebenzisa ukubikezela izehlakalo ezingaphandle kwenkumbulo, ukunquma ukuthi uzokhulisa nini futhi uqonde ukuthi imisebenzi ehlukahlukene icindezela kanjani ama-accelerator akho.
Imithetho ye-semantic ye-OpenTelemetry ye-GenAI ichaza amagama ajwayelekile ezilinganiso eziyinhloko ezifana nokuthi gen_ai.server.request.duration, gen_ai.server.time_to_first_token, gen_ai.server.time_per_output_token futhi gen_ai.client.token.usage, okukuvumela ukuthi usebenzise i-telemetry kanye bese uhambisa i-telemetry kuma-backend ahlukahlukene (i-Prometheus, i-Mimir, ama-APM ezentengiselwano) ngaphandle kokuxhuma kabusha ikhodi yakho njalo.
Ngaphezu kwalezi zilinganiso ezingavuthiwe, ubeka amadeshibhodi kanye nemibuzo ye-PromQL ebala amaphesenti, amazinga amaphutha, izinkomba zokugcwala kanye namaphroksi ezindleko, ukwakha iphaneli yokulawula ebukhoma yeqembu lakho le-LLM amaqembu okusebenza angalisebenzisa ngempela ukwenza izinqumo zomthamo kanye nokuthembeka.
Ukuklama ipayipi le-telemetry: ukudonsa, ukusunduza kanye nabaqoqi
I-LLM observability stack eqinile ivame ukuhlanganisa i-pull-based metrics scraping ne-push-based OTLP telemetry, ukufaka amathuluzi afana ne-Prometheus ngenkathi kusetshenziswa abaqoqi be-OpenTelemetry ukuthola imikhondo nezingodo.
UPrometheus usalokhu edonsa phambili: amaseva nabathumeli bamazwe angaphandle baveza /metrics iphuzu lokugcina, futhi uPrometheus uyayiklwebha ngezikhathi ezihleliwe. Lokhu kusebenza kahle kumaseva okucabanga (i-TGI, i-vLLM, i-Triton), abathumeli be-GPU, abathumeli be-node kanye nokuhlolwa komthwalo we-k6, okukunikeza ukuhamba komsebenzi okufanayo kwezilinganiso zamandla.
Ngokulandela umkhondo, amalogi kanye ngezinye izikhathi amamethrikhi akhiqizwe izinhlelo zokusebenza ezisetshenziswayo, ngokuvamile usebenzisa i-OTLP push, ukuthumela ama-span kanye nemicimbi ehlelekile kumqoqi oyedwa noma ngaphezulu we-OpenTelemetry abenza i-batching, i-sampling, i-redaction kanye ne-export kuma-backend afana ne-Tempo, i-Jaeger, i-Loki, i-Elastic APM noma amapulatifomu ezentengiselwano.
Amaphethini okusetshenziswa avame ukuhlanganisa ama-DaemonSets asezingeni le-node, abaqoqi bezimoto eziseceleni kanye namasango aphakathi, lapho ama-DaemonSets ephatha khona ukucebisa i-host kanye nokucubungula okwabiwe, ama-sidecar ahlinzeka ngokuhlukaniswa kwemithwalo yemisebenzi elawula izimpendulo ezibucayi, kanti abaqoqi besango baphoqelela izinqubomgomo zokuthatha amasampula kanye nokuqondisa inethiwekhi yonke.
Kuwo wonke lo mzila kufanele uqaphele amasu okuthatha amasampula kanye nokubaluleka kwelebula, ukusebenzisa amasampula asekelwe emsileni ukugcina imikhondo ethakazelisayo (ehamba kancane, evame ukuba namaphutha) ngenkathi ulahla umsindo, kanye nokuklama amalebula e-metric ukuze ungaqhumi ngengozi inkumbulo kanye nokusetshenziswa kwe-CPU engqalasizinda yakho yokubona.
Indawo yamathuluzi ukuze kubhekwe i-LLM
Uhlelo lokubona oluvulekile lubanzi, futhi imisebenzi ye-LLM isendaweni lapho kuhlangana khona amathuluzi amaningana, ngalinye liletha amandla ezinhlobo ezithile zesignali: i-Prometheus yezilinganiso, i-Tempo noma i-Jaeger yezimpawu, i-Loki noma i-Elastic yezimpawu, kanye ne-Pyroscope yephrofayili eqhubekayo.
I-Grafana ivame ukusebenza njengesendlalelo se-UI esihlanganisayo phezu kwalesi sitaki, inikeza amadeshibhodi angabuza imithombo eminingi yedatha endaweni eyodwa, abone ama-SLO, ahlobanise izilinganiso ngemikhondo namalogi, kanye nokusebenza kwemisebenzi yokushaya ucingo kwamaqembu e-SRE aphatha izinsizakalo ezisindayo ze-LLM.
Ezinhlanganweni ezikhetha izixazululo eziphethwe, izinsizakalo ezifana neGrafana Cloud, Datadog, New Relic noma i-Amazon Managed Prometheus zinikeza ama-backend aphethwe, ukwamukela ithrafikhi yokubhala kude ye-OTLP noma i-Prometheus kanye nokuphathwa kokulinganiswa, ukugcinwa kanye nokutholakala okuphezulu, ngezindleko zokukhiya kwabathengisi kanye namamodeli entengo aphelele.
Noma iyiphi inhlanganisela oyikhethayo, okubalulekile ukuhambisana: yenza kube ngokwejwayelekile eduze kwe-OpenTelemetry lapho kungenzeka khona, yamukela izimiso zesimantiki zamamethrikhi nezikhala ze-GenAI, futhi uphathe ukusethwa kwakho kokubuka njengengxenye yokwakheka kwe-LLM yakho eyinhloko kunokuba kube yinto ecatshangelwe kamuva ekugcineni.
Ukufakwa, ukukala, ukuphepha kanye nokuxazulula izinkinga
Ukusebenzisa ukubonwa kwama-LLM kuma-Kubernetes kuvame ukuqala ngamaphakheji anemibono efana ne-kube-prometheus-stack kanye nabaqoqi be-OpenTelemetry, kuyilapho izivivinyo ezilula zingasebenza nge-Docker Compose noma ukusethwa kwe-VM okuyisisekelo. Isihluthulelo ukuthi ukutholwa, ukugcinwa kanye nokuhlelwa kwedeshibhodi kucatshangwa kusukela osukwini lokuqala, hhayi phakathi nesigameko esithuthukisiwe.
Njengoba ithrafikhi ikhula, usuka ekugcinweni kwendawo okuzenzakalelayo kwe-Prometheus (cishe izinsuku ezingu-15) uye ekugcinweni kwesikhathi eside ngezinhlelo ezifana ne-Mimir, i-Thanos, i-Cortex noma izinsizakalo ze-Prometheus eziphethwe, futhi wamukele ama-trace backend afana ne-Tempo angakhiqiza ama-metrics kusuka kuma-span uma kudingeka. Izitolo zezingodo ezifana ne-Loki noma i-Elastic zidinga ukwakheka kwamalebula ngokucophelela ukuze zihlale zingabizi.
Ukuphepha nobumfihlo kubucayi kakhulu kuzinhlelo zokusebenza ze-LLM, ngoba izixwayiso nemiphumela kungase kube nedatha yomuntu siqu noma eyimfihlo, kanye nombhalo we-OpenTelemetry kanye ne-Prometheus uxwayisa ngokusobala ngokuvuza kolwazi olubucayi ngedatha ye-telemetry. Unciphisa lezi zingozi ngokususa izixwayiso nezimpendulo ngokuzenzakalelayo, ukuhlunga izimfanelo kumqoqi, ukuphoqelela i-RBAC kanye nemingcele yenethiwekhi eqinile, kanye nokubeka izinqubomgomo zokugcina ezibonisa izibopho zomthetho.
Uma amadeshibhodi ebonakala engalungile noma amasignali elahleka, ulungisa amaphutha kusukela ekungahambelani kwempilo yokungenisa kanye nohlelo kuya ezinkingeni zokusampula kanye nokubaluleka, ukuhlola impumelelo ye-scrape, ama-endpoints e-OTLP, amagama amalebula, ukusetshenziswa kwe-histogram, imithetho yokusampula kanye nesimo sokuthumela i-GPU kuze kube yilapho imbangela icacile futhi ixazululiwe.
Ukuhlanganisa zonke lezi zindlela - amasu okulungisa kahle, ukuhlolwa okuqinile, ukuthunyelwa kudivayisi kanye nokubuka okujulile - yilokho okuguqula ama-LLM kusuka kuma-prototype okuhlola abe izinhlelo ezithembekile, ezihlolwayo izinhlangano ezingazethemba ezindaweni ezibucayi, kuyilapho zisathuthuka ngokushesha ngokwanele ukuze zihambisane nesivinini socwaningo lwe-AI kanye nezidingo zebhizinisi ezishintshayo.