Commit Graph

84 Commits

Author SHA1 Message Date
Blade He a25991e2bb 1. Set TOR reported name priority
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He 70362b554f Fix issue for "The last fund name of previous PDF page" logic:
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He a11a99fdc3 1. Optimize instructions: not to fetch the data with "up to" statement.
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He 352886ade2 update instructions for TER, OGC, Performance Fees 2024-12-02 11:45:19 -06:00
Blade He 843bbbd13f dynamic loading instructions for multilingual. 2024-11-20 17:00:22 -06:00
Blade He 067d89e0f9 Add datapoint_reportedname.json for dynamic loading reported names based on document language. 2024-11-19 16:49:15 -06:00
Blade He a42c0b5c2b optimize retrieve fund instructions 2024-11-13 10:25:08 -06:00
Blade He 7a41b03634 1. optimize instructions for fund name
2. optimize drilldown logic
2024-11-12 17:01:10 -06:00
Blade He 2645d528b1 support output data point reported name 2024-10-29 16:47:45 -05:00
Blade He fa763f4f14 1. optimize instructions
2. optimize mapping algorithm
2024-10-24 16:24:21 -05:00
Blade He 53dadf61f4 optimize keywords/ instructions for special cases documents. 2024-10-23 16:56:43 -05:00
Blade He 171f3b6d1f optimize for OGC data extraction. 2024-10-23 16:07:54 -05:00
Blade He 03365227b9 optimize instructions 2024-10-21 11:04:53 -05:00
Blade He 8b651f374c optimize instructions 2024-10-14 09:12:05 -05:00
Blade He 92a26cd262 optimize configuration 2024-10-11 12:16:34 -05:00
Blade He aa2c2332ae optimize for more cases 2024-10-08 17:16:01 -05:00
Blade He 8496c7b5ed optimize instructions
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He 91530d6089 add more description for Performance Fees calculation rules 2024-09-20 11:58:48 -05:00
Blade He 40bcce4404 instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE 2024-09-20 10:26:18 -05:00
Blade He 48dc8690c3 support extract data by pdf page image 2024-09-19 16:29:26 -05:00
Blade He 98e86a6cfd realize to calculate data extraction metrics. 2024-09-18 17:10:54 -05:00
Blade He 932870f406 support split text for this case: outputs over 4K tokens. 2024-09-16 12:03:13 -05:00
Blade He 0f6dbd27eb optimize instructions for performance fees. 2024-09-13 16:10:44 -05:00
Blade He e17414173a update to get more precise results 2024-09-12 16:00:49 -05:00
Blade He d56ac9482e Adjust for output example format 2024-09-11 09:24:36 -05:00
Blade He 878383a72c support extract the continuous page(s) for not missing next page data which without table header. 2024-09-06 16:29:35 -05:00
Blade He 1caf552065 support extract data by ChatGPT4o.
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He f81e2862f3 update prompts to extract TOR, OGC, TER, Performance fees data. 2024-08-30 16:37:00 -05:00
Blade He 63da030fe1 update general prompts 2024-08-29 17:05:58 -05:00
Blade He 134b365b68 Try to generate general prompts for LUX English AR
- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
2024-08-28 16:44:19 -05:00
Blade He 32676728f6 optimize prompts 2024-08-28 10:21:26 -05:00
Blade He 15720d8bfd 1. Text-and-image all in one chat function by ChatGPT4o
2. many experiments for extracting data by two ways:
page text or page image.
2024-08-26 17:17:39 -05:00
Blade He 843f588015 support chat with image by ChatGPT4o 2024-08-26 11:19:07 -05:00
Blade He fa46b45ad5 support output tables as markdown format from pdf documents 2024-08-19 15:49:45 -05:00