Sani/Yapping

Created Fri, 10 May 2024 07:54:52 +0800 Modified Thu, 26 Dec 2024 08:45:18 +0000
629 Words

Ini adalah list yg aku buat based on apa aku dpt dari community lain dan list apa local community kita buat, local community focus more on llm dan lambat untuk explore sendiri benda baru

Sebelum ni apa aku buat based on apa community lain buat

  1. vectordb 14/3/23 - https://twitter.com/khursani8/status/1635490473312919553
  2. voice conversion 6/2/23 - https://x.com/khursani8/status/1622548019706216448?s=46&t=KgPtmqLaM0tK9HeoMI1fZg
  3. stable diffusion 1/3/23 - https://twitter.com/khursani8/status/1630776772907388930
  4. llama qlora finetune 21/7/23 - https://twitter.com/khursani8/status/1682397593756917761
  5. clean llm data 25/7/23 - https://twitter.com/khursani8/status/1683724087741542400
  6. lora as git diff - https://twitter.com/khursani8/status/1698862103703196153
  7. model merging 24/10/23 - https://twitter.com/khursani8/status/1716780770214515097
  8. deploy gguf pakai cpu 5/11/23 - https://twitter.com/khursani8/status/1721181105175539943
  9. gaussian splat 1/12/23 -https://twitter.com/khursani8/status/1730361293599810023 https://twitter.com/khursani8/status/1758733589443916178
  10. image to 3d model 28/3/24 - https://twitter.com/khursani8/status/1773274180445733099
  11. photogrammetry - https://twitter.com/khursani8/status/1776923107506417782
  12. Instruct vector - https://twitter.com/khursani8/status/1781507074167603420
  13. cloudflare llm - https://twitter.com/khursani8/status/1785642956697018682
  14. llm ops - https://twitter.com/khursani8/status/1787083932649034155
  15. lyrics to music+sing ⇒ https://twitter.com/khursani8/status/1788213947424030810
  16. lm eval - eval
  17. Merging evo - Ni aku dh buat pakai satu gpu(maybe bnyk gpu cepat sikit dpt best model), dpt beat benchmark, tp bila guna utk chat, model tu jadi useless. Maybe sbb benchmark tu tak general. Selalunya akan guna healing untuk jadikan model tu useful balik selepas merging.

Ada banyak lagi list lain, tp tu apa aku buat kerja dlm company, tak adil utk compare dgn apa community buat, atas tu own initiative


Yg community boleh buat sendiri tanpa depend on company

  • None

Community depend on government initative

  • Forever None

Community depend on company that sponsor resource

  1. Malaysia AI
    1. ajar vectordb, aku tak tau bila, dgr rumour je
    2. stable diffusion - None
    3. Full finetune - https://huggingface.co/mesolitica/llama-7b-hf-2048-fpf
    4. qlora finetune - None
    5. clean llm data - https://huggingface.co/datasets/malaysia-ai/dedup-text-dataset
    6. model merging - None
    7. deploy gguf atau adik beradik dia - https://twitter.com/huseinzol05/status/1787406054646804536
    8. gsplat - None
    9. image to 3d model - None
    10. photogrammetry - None
    11. Instruct vector - None
    12. cf llm - None
    13. llm ops - None
    14. lyrics to music+sing - None
    15. Improve RAG system - https://twitter.com/huseinzol05/status/1788044531910180901
    16. vlm(not for ocr, but for image understanding) - https://twitter.com/huseinzol05/status/1787009284729102645
  • Maybe ada benda aku terlepas yg Malaysia AI buat since dh lama tak follow

Apa yang community sekarang dah move on to:

Mostly focus on production mcm improve speed, improve evaluation metrics, continuous improvement etc

  1. LLMOPS - Dify * hot right now

Dekat jepun dorg dah ada bot nama GOVBOT, GOVBOT ni bole tolong jawab soalan pasal, cukai, benefit. Apa yang lawak nya, community dorg tunjuk yg benda ni bole siap dlm 3 jam pakai Dify tanpa perlu kerajaan dorg bazir 85 juta yen https://note-com.translate.goog/sangmin/n/n3cb256cc22cc?_x_tr_sl=ja&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp&_x_tr_hist=true

So, yep kerajaan negara lain pun membazir sebab tak boleh spell kerajaan tanpa kera.

Honestly dgn dify aku bole jadi mcm wordpress developer utk chatbot since most of the company nk ada chatbot yg boleh jawab je dan dify ni dh memudahkan. Ada scraper utk website, llmops, IDE mcm prompt flow azure, multiple chatbackend(local & non local) dan bnyk lagi laa. So kalau korg nak chatbot sendiri, pakai je dify ni. Tak perlu nak ML expert power meth bagai, software engineer je perlu.

Tutorial:

https://www.youtube.com/watch?v=UT3CR5t-6EU

https://www.youtube.com/watch?v=lEnf1YjFj_4

  1. Zoltraak

In simple term dorg panggil compiler utk nlp instruction. So mcm write C, nti convert jd binary utk execute. So far usage yg aku jumpa boleh pakai utk generate proposal document, generate rangka code(mcm time aku intern dulu, senior dev buat rangka code, aku yg isi code dlm function/class yg dia dah declare)

  1. Udio

Generate song pakai lyrics, ada satu tu aku nmpk diorg list kan keywords, lps dah dpt list of keyword, suruh chatgpt generate song lyrics based on keywords. Lepas tu generate song pakai suno atau udio.

  1. AI crawler

Tak pandai write script utk scraping? boleh pakai AI crawler

https://github.com/unclecode/crawl4ai

https://github.com/VinciGit00/Scrapegraph-ai

  1. Generate synthetic data for evaluation
  2. Dapatkan language vector utk llama, lps tu apply language vector dkt llava so tak perlu train vlm from scratch untuk faham certain language
  3. Control vector - Untuk control model behaviour without training
  4. Benchmark for their country language fluency
  5. Generate math problem for model data augmentation