Commit Graph

181 Commits

Author SHA1 Message Date
Blade He dd6701f18c 1. optimize investment mapping algorithm
2. realize investment mapping metrics
2024-09-25 15:15:38 -05:00
Blade He 0f14bf4a7a 1. get document/ provider mapping data
2. optimize metrics algorithm
3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
2024-09-23 17:21:02 -05:00
Blade He 8496c7b5ed optimize instructions
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He 91530d6089 add more description for Performance Fees calculation rules 2024-09-20 11:58:48 -05:00
Blade He 40bcce4404 instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE 2024-09-20 10:26:18 -05:00
Blade He c4985ac75f optimize data extract, metrics calculation algorithm 2024-09-19 22:45:08 -05:00
Blade He 48dc8690c3 support extract data by pdf page image 2024-09-19 16:29:26 -05:00
Blade He 67371e534e only calculate metrics for intersection document list 2024-09-19 11:54:51 -05:00
Blade He 27b3540c63 optimize metrics calculation algorithm 2024-09-19 11:44:17 -05:00
Blade He 98e86a6cfd realize to calculate data extraction metrics. 2024-09-18 17:10:54 -05:00
Blade He 50e6c3c19d a little change 2024-09-16 16:43:03 -05:00
Blade He 932870f406 support split text for this case: outputs over 4K tokens. 2024-09-16 12:03:13 -05:00
Blade He 0f6dbd27eb optimize instructions for performance fees. 2024-09-13 16:10:44 -05:00
Blade He e17414173a update to get more precise results 2024-09-12 16:00:49 -05:00
Blade He d56ac9482e Adjust for output example format 2024-09-11 09:24:36 -05:00
Blade He 0887608719 support auto-mapping fund/ share by raw names. 2024-09-09 17:34:53 -05:00
Blade He 878383a72c support extract the continuous page(s) for not missing next page data which without table header. 2024-09-06 16:29:35 -05:00
Blade He 1caf552065 support extract data by ChatGPT4o.
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He 7c83f9152a try to improve page filter precision 2024-09-04 17:01:12 -05:00
Blade He 7198450e53 support calculate page filter metrics. 2024-09-03 17:07:53 -05:00
Blade He f81e2862f3 update prompts to extract TOR, OGC, TER, Performance fees data. 2024-08-30 16:37:00 -05:00
Blade He 63da030fe1 update general prompts 2024-08-29 17:05:58 -05:00
Blade He 134b365b68 Try to generate general prompts for LUX English AR
- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
2024-08-28 16:44:19 -05:00
Blade He 32676728f6 optimize prompts 2024-08-28 10:21:26 -05:00
Blade He 15720d8bfd 1. Text-and-image all in one chat function by ChatGPT4o
2. many experiments for extracting data by two ways:
page text or page image.
2024-08-26 17:17:39 -05:00
Blade He 843f588015 support chat with image by ChatGPT4o 2024-08-26 11:19:07 -05:00
Blade He 6519dc23d4 support filter pages by data point keywords 2024-08-23 16:38:11 -05:00
Blade He 993664cf78 a lot of functions to prepare data. 2024-08-22 10:37:56 -05:00
Blade He f91e0cf1a8 auto-fix json data format 2024-08-19 17:59:32 -05:00
Blade He fa46b45ad5 support output tables as markdown format from pdf documents 2024-08-19 15:49:45 -05:00
Blade He 424c30853c initial 2024-08-19 09:52:13 -05:00