Blade He
04a2409c58
optimize investment mapping algorithm
2024-10-08 23:53:55 -05:00
Blade He
aa2c2332ae
optimize for more cases
2024-10-08 17:16:01 -05:00
Blade He
8bd6008425
refactor code
2024-10-07 10:34:13 -05:00
Blade He
b18c48efeb
A little change
2024-10-03 16:31:16 -05:00
Blade He
f0dd7f9e89
Consider multiple share short names cases.
2024-10-02 17:25:25 -05:00
Blade He
edb90c718e
Optimize mapping algorithm
...
Consider some share class names are with multiple short name, e.g.
CPR Invest Global Disruptive Opportunities Class I sw EUR - Acc
The short names are I and sw
The purpose is to support get all of short names from share class name.
2024-10-02 15:08:26 -05:00
Blade He
3bb13947af
Optimize mapping algorithm:
...
For multiple currencies in fund/ share name, if exist USD, remove it
Fix the issue for split words without space
If there is no currency in share class name, try to get same currency from document mapping which with same fund name and same short share class name.
2024-10-02 13:25:08 -05:00
Blade He
f06355e0c8
optimize mapping algorithm: check whether exist "-" to connect share names
2024-10-02 11:38:11 -05:00
Blade He
035f028155
optimize mapping algorithm
2024-10-01 16:46:59 -05:00
Blade He
3adbd7631a
optimize mapping algorithm
2024-10-01 15:31:15 -05:00
Blade He
d92053a16e
optimize mapping metrics algorithm
2024-10-01 12:19:45 -05:00
Blade He
18174bf1cf
optimize mapping: choose proper candidates mapping list.
2024-10-01 11:35:29 -05:00
Blade He
60a26377e5
optimize investment mapping algorithm
2024-09-30 16:32:56 -05:00
Blade He
3aa596ea33
optimize mapping logic
2024-09-27 16:39:56 -05:00
Blade He
39cd53dc33
support calculate mapping metrics based on document investment mapping in database
2024-09-27 13:20:50 -05:00
Blade He
0c4c541319
optimize mapping algorithm, this is the fixed version to confirm mapping metrics
2024-09-27 09:25:11 -05:00
Blade He
7eba9a52ae
recover algorithm to the better version
2024-09-26 19:25:17 -05:00
Blade He
d25bae936c
Optimize investment mapping algorithm.
2024-09-26 12:18:37 -05:00
Blade He
598e2ab820
investment mapping: optimize for currency logic
2024-09-25 17:28:22 -05:00
Blade He
dd6701f18c
1. optimize investment mapping algorithm
...
2. realize investment mapping metrics
2024-09-25 15:15:38 -05:00
Blade He
0f14bf4a7a
1. get document/ provider mapping data
...
2. optimize metrics algorithm
3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
2024-09-23 17:21:02 -05:00
Blade He
8496c7b5ed
optimize instructions
...
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He
91530d6089
add more description for Performance Fees calculation rules
2024-09-20 11:58:48 -05:00
Blade He
40bcce4404
instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE
2024-09-20 10:26:18 -05:00
Blade He
c4985ac75f
optimize data extract, metrics calculation algorithm
2024-09-19 22:45:08 -05:00
Blade He
48dc8690c3
support extract data by pdf page image
2024-09-19 16:29:26 -05:00
Blade He
67371e534e
only calculate metrics for intersection document list
2024-09-19 11:54:51 -05:00
Blade He
27b3540c63
optimize metrics calculation algorithm
2024-09-19 11:44:17 -05:00
Blade He
98e86a6cfd
realize to calculate data extraction metrics.
2024-09-18 17:10:54 -05:00
Blade He
50e6c3c19d
a little change
2024-09-16 16:43:03 -05:00
Blade He
932870f406
support split text for this case: outputs over 4K tokens.
2024-09-16 12:03:13 -05:00
Blade He
0f6dbd27eb
optimize instructions for performance fees.
2024-09-13 16:10:44 -05:00
Blade He
e17414173a
update to get more precise results
2024-09-12 16:00:49 -05:00
Blade He
d56ac9482e
Adjust for output example format
2024-09-11 09:24:36 -05:00
Blade He
0887608719
support auto-mapping fund/ share by raw names.
2024-09-09 17:34:53 -05:00
Blade He
878383a72c
support extract the continuous page(s) for not missing next page data which without table header.
2024-09-06 16:29:35 -05:00
Blade He
1caf552065
support extract data by ChatGPT4o.
...
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He
7c83f9152a
try to improve page filter precision
2024-09-04 17:01:12 -05:00
Blade He
7198450e53
support calculate page filter metrics.
2024-09-03 17:07:53 -05:00
Blade He
f81e2862f3
update prompts to extract TOR, OGC, TER, Performance fees data.
2024-08-30 16:37:00 -05:00
Blade He
63da030fe1
update general prompts
2024-08-29 17:05:58 -05:00
Blade He
134b365b68
Try to generate general prompts for LUX English AR
...
- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
2024-08-28 16:44:19 -05:00
Blade He
32676728f6
optimize prompts
2024-08-28 10:21:26 -05:00
Blade He
15720d8bfd
1. Text-and-image all in one chat function by ChatGPT4o
...
2. many experiments for extracting data by two ways:
page text or page image.
2024-08-26 17:17:39 -05:00
Blade He
843f588015
support chat with image by ChatGPT4o
2024-08-26 11:19:07 -05:00
Blade He
6519dc23d4
support filter pages by data point keywords
2024-08-23 16:38:11 -05:00
Blade He
993664cf78
a lot of functions to prepare data.
2024-08-22 10:37:56 -05:00
Blade He
f91e0cf1a8
auto-fix json data format
2024-08-19 17:59:32 -05:00
Blade He
fa46b45ad5
support output tables as markdown format from pdf documents
2024-08-19 15:49:45 -05:00
Blade He
424c30853c
initial
2024-08-19 09:52:13 -05:00