Commit Graph

45 Commits

Author SHA1 Message Date
Blade He 01e2a0e38d add configuration for datapoints data types
update configuration for minimum initial investment
support apply value to all of funds for minimum initial investment
2025-02-05 12:08:12 -06:00
Blade He a8810519f8 optimize instructions configuration
optimize drilldown part logic
2025-02-04 15:29:24 -06:00
Blade He db0827435b supplement EMEA AR configuration files 2025-01-16 11:30:44 -06:00
Blade He 9f0e77a11e support load configurations by doc_source parameter 2025-01-16 11:17:48 -06:00
Blade He 91c86bb983 update AUS Prospectus relevant configuration 2025-01-08 17:40:57 -06:00
Blade He 0a867dcf07 complete configuration for AUS Prospectus 2025-01-07 16:25:13 -06:00
Blade He 9348e32caa support more performance fee keywords 2025-01-06 13:14:20 -06:00
Blade He 309bb714f6 fix issue for parsing data via Vision Function. 2024-12-11 16:49:04 -06:00
Blade He d673a99e21 switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He 75ea5e70de 1. support fetch data from messy-code page by ChatGPT4o Vision function.
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He d79b05885d optimize prompts for TOR 2024-12-06 14:50:34 -06:00
Blade He a25991e2bb 1. Set TOR reported name priority
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He 70362b554f Fix issue for "The last fund name of previous PDF page" logic:
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He a11a99fdc3 1. Optimize instructions: not to fetch the data with "up to" statement.
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He 352886ade2 update instructions for TER, OGC, Performance Fees 2024-12-02 11:45:19 -06:00
Blade He 843bbbd13f dynamic loading instructions for multilingual. 2024-11-20 17:00:22 -06:00
Blade He 067d89e0f9 Add datapoint_reportedname.json for dynamic loading reported names based on document language. 2024-11-19 16:49:15 -06:00
Blade He a42c0b5c2b optimize retrieve fund instructions 2024-11-13 10:25:08 -06:00
Blade He 7a41b03634 1. optimize instructions for fund name
2. optimize drilldown logic
2024-11-12 17:01:10 -06:00
Blade He 2645d528b1 support output data point reported name 2024-10-29 16:47:45 -05:00
Blade He fa763f4f14 1. optimize instructions
2. optimize mapping algorithm
2024-10-24 16:24:21 -05:00
Blade He 53dadf61f4 optimize keywords/ instructions for special cases documents. 2024-10-23 16:56:43 -05:00
Blade He 171f3b6d1f optimize for OGC data extraction. 2024-10-23 16:07:54 -05:00
Blade He 03365227b9 optimize instructions 2024-10-21 11:04:53 -05:00
Blade He 8b651f374c optimize instructions 2024-10-14 09:12:05 -05:00
Blade He 92a26cd262 optimize configuration 2024-10-11 12:16:34 -05:00
Blade He aa2c2332ae optimize for more cases 2024-10-08 17:16:01 -05:00
Blade He 8496c7b5ed optimize instructions
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He 91530d6089 add more description for Performance Fees calculation rules 2024-09-20 11:58:48 -05:00
Blade He 40bcce4404 instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE 2024-09-20 10:26:18 -05:00
Blade He 48dc8690c3 support extract data by pdf page image 2024-09-19 16:29:26 -05:00
Blade He 98e86a6cfd realize to calculate data extraction metrics. 2024-09-18 17:10:54 -05:00
Blade He 932870f406 support split text for this case: outputs over 4K tokens. 2024-09-16 12:03:13 -05:00
Blade He 0f6dbd27eb optimize instructions for performance fees. 2024-09-13 16:10:44 -05:00
Blade He e17414173a update to get more precise results 2024-09-12 16:00:49 -05:00
Blade He d56ac9482e Adjust for output example format 2024-09-11 09:24:36 -05:00
Blade He 878383a72c support extract the continuous page(s) for not missing next page data which without table header. 2024-09-06 16:29:35 -05:00
Blade He 1caf552065 support extract data by ChatGPT4o.
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He f81e2862f3 update prompts to extract TOR, OGC, TER, Performance fees data. 2024-08-30 16:37:00 -05:00
Blade He 63da030fe1 update general prompts 2024-08-29 17:05:58 -05:00
Blade He 134b365b68 Try to generate general prompts for LUX English AR
- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
2024-08-28 16:44:19 -05:00
Blade He 32676728f6 optimize prompts 2024-08-28 10:21:26 -05:00
Blade He 15720d8bfd 1. Text-and-image all in one chat function by ChatGPT4o
2. many experiments for extracting data by two ways:
page text or page image.
2024-08-26 17:17:39 -05:00
Blade He 843f588015 support chat with image by ChatGPT4o 2024-08-26 11:19:07 -05:00
Blade He fa46b45ad5 support output tables as markdown format from pdf documents 2024-08-19 15:49:45 -05:00