Commit Graph

97 Commits

Author SHA1 Message Date
Blade He b18c48efeb A little change 2024-10-03 16:31:16 -05:00
Blade He f0dd7f9e89 Consider multiple share short names cases. 2024-10-02 17:25:25 -05:00
Blade He edb90c718e Optimize mapping algorithm
Consider some share class names are with multiple short name, e.g.
CPR Invest Global Disruptive Opportunities Class I sw EUR - Acc
The short names are I and sw
The purpose is to support get all of short names from share class name.
2024-10-02 15:08:26 -05:00
Blade He 3bb13947af Optimize mapping algorithm:
For multiple currencies in fund/ share name, if exist USD, remove it
Fix the issue for split words without space
If there is no currency in share class name, try to get same currency from document mapping which with same fund name and same short share class name.
2024-10-02 13:25:08 -05:00
Blade He f06355e0c8 optimize mapping algorithm: check whether exist "-" to connect share names 2024-10-02 11:38:11 -05:00
Blade He 035f028155 optimize mapping algorithm 2024-10-01 16:46:59 -05:00
Blade He 3adbd7631a optimize mapping algorithm 2024-10-01 15:31:15 -05:00
Blade He d92053a16e optimize mapping metrics algorithm 2024-10-01 12:19:45 -05:00
Blade He 18174bf1cf optimize mapping: choose proper candidates mapping list. 2024-10-01 11:35:29 -05:00
Blade He 60a26377e5 optimize investment mapping algorithm 2024-09-30 16:32:56 -05:00
Blade He 3aa596ea33 optimize mapping logic 2024-09-27 16:39:56 -05:00
Blade He 39cd53dc33 support calculate mapping metrics based on document investment mapping in database 2024-09-27 13:20:50 -05:00
Blade He 0c4c541319 optimize mapping algorithm, this is the fixed version to confirm mapping metrics 2024-09-27 09:25:11 -05:00
Blade He 7eba9a52ae recover algorithm to the better version 2024-09-26 19:25:17 -05:00
Blade He d25bae936c Optimize investment mapping algorithm. 2024-09-26 12:18:37 -05:00
Blade He 598e2ab820 investment mapping: optimize for currency logic 2024-09-25 17:28:22 -05:00
Blade He dd6701f18c 1. optimize investment mapping algorithm
2. realize investment mapping metrics
2024-09-25 15:15:38 -05:00
Blade He 0f14bf4a7a 1. get document/ provider mapping data
2. optimize metrics algorithm
3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
2024-09-23 17:21:02 -05:00
Blade He 8496c7b5ed optimize instructions
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He 91530d6089 add more description for Performance Fees calculation rules 2024-09-20 11:58:48 -05:00
Blade He 40bcce4404 instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE 2024-09-20 10:26:18 -05:00
Blade He c4985ac75f optimize data extract, metrics calculation algorithm 2024-09-19 22:45:08 -05:00
Blade He 48dc8690c3 support extract data by pdf page image 2024-09-19 16:29:26 -05:00
Blade He 67371e534e only calculate metrics for intersection document list 2024-09-19 11:54:51 -05:00
Blade He 27b3540c63 optimize metrics calculation algorithm 2024-09-19 11:44:17 -05:00
Blade He 98e86a6cfd realize to calculate data extraction metrics. 2024-09-18 17:10:54 -05:00
Blade He 50e6c3c19d a little change 2024-09-16 16:43:03 -05:00
Blade He 932870f406 support split text for this case: outputs over 4K tokens. 2024-09-16 12:03:13 -05:00
Blade He 0f6dbd27eb optimize instructions for performance fees. 2024-09-13 16:10:44 -05:00
Blade He e17414173a update to get more precise results 2024-09-12 16:00:49 -05:00
Blade He d56ac9482e Adjust for output example format 2024-09-11 09:24:36 -05:00
Blade He 0887608719 support auto-mapping fund/ share by raw names. 2024-09-09 17:34:53 -05:00
Blade He 878383a72c support extract the continuous page(s) for not missing next page data which without table header. 2024-09-06 16:29:35 -05:00
Blade He 1caf552065 support extract data by ChatGPT4o.
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He 7c83f9152a try to improve page filter precision 2024-09-04 17:01:12 -05:00
Blade He 7198450e53 support calculate page filter metrics. 2024-09-03 17:07:53 -05:00
Blade He f81e2862f3 update prompts to extract TOR, OGC, TER, Performance fees data. 2024-08-30 16:37:00 -05:00
Blade He 63da030fe1 update general prompts 2024-08-29 17:05:58 -05:00
Blade He 134b365b68 Try to generate general prompts for LUX English AR
- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
2024-08-28 16:44:19 -05:00
Blade He 32676728f6 optimize prompts 2024-08-28 10:21:26 -05:00
Blade He 15720d8bfd 1. Text-and-image all in one chat function by ChatGPT4o
2. many experiments for extracting data by two ways:
page text or page image.
2024-08-26 17:17:39 -05:00
Blade He 843f588015 support chat with image by ChatGPT4o 2024-08-26 11:19:07 -05:00
Blade He 6519dc23d4 support filter pages by data point keywords 2024-08-23 16:38:11 -05:00
Blade He 993664cf78 a lot of functions to prepare data. 2024-08-22 10:37:56 -05:00
Blade He f91e0cf1a8 auto-fix json data format 2024-08-19 17:59:32 -05:00
Blade He fa46b45ad5 support output tables as markdown format from pdf documents 2024-08-19 15:49:45 -05:00
Blade He 424c30853c initial 2024-08-19 09:52:13 -05:00