Ini adalah list yg aku buat based on apa aku dpt dari community lain dan list apa local community kita buat, local community focus more on llm dan lambat untuk explore sendiri benda baru
Sebelum ni apa aku buat based on apa community lain buat
- vectordb 14/3/23 - https://twitter.com/khursani8/status/1635490473312919553
- voice conversion 6/2/23 - https://x.com/khursani8/status/1622548019706216448?s=46&t=KgPtmqLaM0tK9HeoMI1fZg
- stable diffusion 1/3/23 - https://twitter.com/khursani8/status/1630776772907388930
- llama qlora finetune 21/7/23 - https://twitter.com/khursani8/status/1682397593756917761
- clean llm data 25/7/23 - https://twitter.com/khursani8/status/1683724087741542400
- lora as git diff - https://twitter.com/khursani8/status/1698862103703196153
- model merging 24/10/23 - https://twitter.com/khursani8/status/1716780770214515097
- deploy gguf pakai cpu 5/11/23 - https://twitter.com/khursani8/status/1721181105175539943
- gaussian splat 1/12/23 -https://twitter.com/khursani8/status/1730361293599810023 https://twitter.com/khursani8/status/1758733589443916178
- image to 3d model 28/3/24 - https://twitter.com/khursani8/status/1773274180445733099
- photogrammetry - https://twitter.com/khursani8/status/1776923107506417782
- Instruct vector - https://twitter.com/khursani8/status/1781507074167603420
- cloudflare llm - https://twitter.com/khursani8/status/1785642956697018682
- llm ops - https://twitter.com/khursani8/status/1787083932649034155
- lyrics to music+sing ⇒ https://twitter.com/khursani8/status/1788213947424030810
- lm eval - eval
- Merging evo - Ni aku dh buat pakai satu gpu(maybe bnyk gpu cepat sikit dpt best model), dpt beat benchmark, tp bila guna utk chat, model tu jadi useless. Maybe sbb benchmark tu tak general. Selalunya akan guna healing untuk jadikan model tu useful balik selepas merging.
Ada banyak lagi list lain, tp tu apa aku buat kerja dlm company, tak adil utk compare dgn apa community buat, atas tu own initiative
Yg community boleh buat sendiri tanpa depend on company
- None
Community depend on government initative
- Forever None
Community depend on company that sponsor resource
- Malaysia AI
- ajar vectordb, aku tak tau bila, dgr rumour je
- stable diffusion - None
- Full finetune - https://huggingface.co/mesolitica/llama-7b-hf-2048-fpf
- qlora finetune - None
- clean llm data - https://huggingface.co/datasets/malaysia-ai/dedup-text-dataset
- model merging - None
- deploy gguf atau adik beradik dia - https://twitter.com/huseinzol05/status/1787406054646804536
- gsplat - None
- image to 3d model - None
- photogrammetry - None
- Instruct vector - None
- cf llm - None
- llm ops - None
- lyrics to music+sing - None
- Improve RAG system - https://twitter.com/huseinzol05/status/1788044531910180901
- vlm(not for ocr, but for image understanding) - https://twitter.com/huseinzol05/status/1787009284729102645
- Maybe ada benda aku terlepas yg Malaysia AI buat since dh lama tak follow
Apa yang community sekarang dah move on to:
Mostly focus on production mcm improve speed, improve evaluation metrics, continuous improvement etc
- LLMOPS - Dify * hot right now
Dekat jepun dorg dah ada bot nama GOVBOT, GOVBOT ni bole tolong jawab soalan pasal, cukai, benefit. Apa yang lawak nya, community dorg tunjuk yg benda ni bole siap dlm 3 jam pakai Dify tanpa perlu kerajaan dorg bazir 85 juta yen https://note-com.translate.goog/sangmin/n/n3cb256cc22cc?_x_tr_sl=ja&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp&_x_tr_hist=true
So, yep kerajaan negara lain pun membazir sebab tak boleh spell kerajaan tanpa kera.
Honestly dgn dify aku bole jadi mcm wordpress developer utk chatbot since most of the company nk ada chatbot yg boleh jawab je dan dify ni dh memudahkan. Ada scraper utk website, llmops, IDE mcm prompt flow azure, multiple chatbackend(local & non local) dan bnyk lagi laa. So kalau korg nak chatbot sendiri, pakai je dify ni. Tak perlu nak ML expert power meth bagai, software engineer je perlu.
Tutorial:
https://www.youtube.com/watch?v=UT3CR5t-6EU
https://www.youtube.com/watch?v=lEnf1YjFj_4
In simple term dorg panggil compiler utk nlp instruction. So mcm write C, nti convert jd binary utk execute. So far usage yg aku jumpa boleh pakai utk generate proposal document, generate rangka code(mcm time aku intern dulu, senior dev buat rangka code, aku yg isi code dlm function/class yg dia dah declare)
Generate song pakai lyrics, ada satu tu aku nmpk diorg list kan keywords, lps dah dpt list of keyword, suruh chatgpt generate song lyrics based on keywords. Lepas tu generate song pakai suno atau udio.
- AI crawler
Tak pandai write script utk scraping? boleh pakai AI crawler
https://github.com/unclecode/crawl4ai
https://github.com/VinciGit00/Scrapegraph-ai
- Generate synthetic data for evaluation
- Dapatkan language vector utk llama, lps tu apply language vector dkt llava so tak perlu train vlm from scratch untuk faham certain language
- Control vector - Untuk control model behaviour without training
- Benchmark for their country language fluency
- Generate math problem for model data augmentation