Blade He
a8810519f8
optimize instructions configuration
...
optimize drilldown part logic
2025-02-04 15:29:24 -06:00
Blade He
db0827435b
supplement EMEA AR configuration files
2025-01-16 11:30:44 -06:00
Blade He
9f0e77a11e
support load configurations by doc_source parameter
2025-01-16 11:17:48 -06:00
Blade He
91c86bb983
update AUS Prospectus relevant configuration
2025-01-08 17:40:57 -06:00
Blade He
0a867dcf07
complete configuration for AUS Prospectus
2025-01-07 16:25:13 -06:00
Blade He
9348e32caa
support more performance fee keywords
2025-01-06 13:14:20 -06:00
Blade He
309bb714f6
fix issue for parsing data via Vision Function.
2024-12-11 16:49:04 -06:00
Blade He
d673a99e21
switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
...
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He
75ea5e70de
1. support fetch data from messy-code page by ChatGPT4o Vision function.
...
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He
d79b05885d
optimize prompts for TOR
2024-12-06 14:50:34 -06:00
Blade He
a25991e2bb
1. Set TOR reported name priority
...
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He
70362b554f
Fix issue for "The last fund name of previous PDF page" logic:
...
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He
a11a99fdc3
1. Optimize instructions: not to fetch the data with "up to" statement.
...
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He
352886ade2
update instructions for TER, OGC, Performance Fees
2024-12-02 11:45:19 -06:00
Blade He
843bbbd13f
dynamic loading instructions for multilingual.
2024-11-20 17:00:22 -06:00
Blade He
067d89e0f9
Add datapoint_reportedname.json for dynamic loading reported names based on document language.
2024-11-19 16:49:15 -06:00
Blade He
a42c0b5c2b
optimize retrieve fund instructions
2024-11-13 10:25:08 -06:00
Blade He
7a41b03634
1. optimize instructions for fund name
...
2. optimize drilldown logic
2024-11-12 17:01:10 -06:00
Blade He
2645d528b1
support output data point reported name
2024-10-29 16:47:45 -05:00
Blade He
fa763f4f14
1. optimize instructions
...
2. optimize mapping algorithm
2024-10-24 16:24:21 -05:00
Blade He
53dadf61f4
optimize keywords/ instructions for special cases documents.
2024-10-23 16:56:43 -05:00
Blade He
171f3b6d1f
optimize for OGC data extraction.
2024-10-23 16:07:54 -05:00
Blade He
03365227b9
optimize instructions
2024-10-21 11:04:53 -05:00
Blade He
8b651f374c
optimize instructions
2024-10-14 09:12:05 -05:00
Blade He
92a26cd262
optimize configuration
2024-10-11 12:16:34 -05:00
Blade He
aa2c2332ae
optimize for more cases
2024-10-08 17:16:01 -05:00
Blade He
8496c7b5ed
optimize instructions
...
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He
91530d6089
add more description for Performance Fees calculation rules
2024-09-20 11:58:48 -05:00
Blade He
40bcce4404
instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE
2024-09-20 10:26:18 -05:00
Blade He
48dc8690c3
support extract data by pdf page image
2024-09-19 16:29:26 -05:00
Blade He
98e86a6cfd
realize to calculate data extraction metrics.
2024-09-18 17:10:54 -05:00
Blade He
932870f406
support split text for this case: outputs over 4K tokens.
2024-09-16 12:03:13 -05:00
Blade He
0f6dbd27eb
optimize instructions for performance fees.
2024-09-13 16:10:44 -05:00
Blade He
e17414173a
update to get more precise results
2024-09-12 16:00:49 -05:00
Blade He
d56ac9482e
Adjust for output example format
2024-09-11 09:24:36 -05:00
Blade He
878383a72c
support extract the continuous page(s) for not missing next page data which without table header.
2024-09-06 16:29:35 -05:00
Blade He
1caf552065
support extract data by ChatGPT4o.
...
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He
f81e2862f3
update prompts to extract TOR, OGC, TER, Performance fees data.
2024-08-30 16:37:00 -05:00
Blade He
63da030fe1
update general prompts
2024-08-29 17:05:58 -05:00
Blade He
134b365b68
Try to generate general prompts for LUX English AR
...
- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
2024-08-28 16:44:19 -05:00
Blade He
32676728f6
optimize prompts
2024-08-28 10:21:26 -05:00
Blade He
15720d8bfd
1. Text-and-image all in one chat function by ChatGPT4o
...
2. many experiments for extracting data by two ways:
page text or page image.
2024-08-26 17:17:39 -05:00
Blade He
843f588015
support chat with image by ChatGPT4o
2024-08-26 11:19:07 -05:00
Blade He
fa46b45ad5
support output tables as markdown format from pdf documents
2024-08-19 15:49:45 -05:00